Journal of the ACM

Vol. 45, num. 2, p. 288-323

Date of publication: 1998-03

International Colloquium on Automata, Languages, and Programming

p. 327-338

Date: 1998-01

Abstract:

It is well-known that the performance of quicksort may substantially be improved by selecting the median of a sample of three elements as the pivot of each partitioning stage. This variant makes more unlikely the worst-case (by decreasing the probability of uneven partitions), and improves the average number of comparisons as well. It is easily generalizable to samples of size s = 2k + 1. The larger the samples, the better the partitions, but more additional comparison will be needed to find the median of the sample. From a practical standpoint, many authors recommend using samples of 3 or 5 elements We show in this report that the optimal choice is to use samples of size s = a n^(1/2) + o(n^(1/2)), and analytically find the value of the constant a, which depends on the median-finding algorithm. The result holds not only for the average number of comparisons, but in general for the average total cost, which includes comparisons and exchanges. Only if the exchanges were exceptionally costly, say making a swap takes 10 times the time to compare two keys, then the optimal sample size is not Theta(n^(1/2)). In such a situation, we show that it would be better not to select the median of the samples, but the p-th quartile of the sample (p < 1/2) and quantify precisely this phenomenon. Last but not least, we show how to apply the same ideas and techniques to the analysis of Quickselect (a selection algorithm closely related to quicksort) and get similar results to those that we discuss for quicksort.]]>