In this work we show how to augment general purpose multidimensional data structures, such as K-d trees, to efficiently support search by rank (that is, to locate the i-th smallest element along the j-th coordinate, for given i and j) and to find the rank of a given item along a given coordinate. To do so, we introduce two simple, practical and very flexible algorithms - Select-by-Rank and Find-Rank - with very little overhead. Both algorithms can be easily implemented and adapted to several spatial indexes, although their analysis is far from trivial. We are able to show that for random K-d trees of size n the expected number of nodes visited by Find-Rank is Pn,i=(n1-1/K) for i=o(n) or i=n-o(n), and Pn,i=fK(i/n)center dot n+o(n) for i=xn+o(n) (with 0
Deletions in open addressing tables have often been seen as problematic. The usual solution is to use a special mark ’deleted’ so that probe sequences continue past deleted slots, as if there was an element still sitting there. Such a solution, notwithstanding is wide applicability, may involve serious performance degradation. In the first part of this paper we review a practical implementation of the often overlooked deletion algorithm for linear probing hash tables, analyze its properties and performance, and provide several strong arguments in favor of the Robin Hood variant. In particular, we show how a small variation can yield substantial improvements for unsuccesful search. In the second part we propose an algorithm for true deletion in open addressing hashing with secondary clustering, like quadratic hashing. As far as we know, this is the first time that such an algorithm appears in the literature. Although it involves some extra memory for bookkeeping, the algorithm is comparatively easy and efficient, and might be of practical value, besides its theoretical interest.
Suppose we have a set of K-dimensional records stored in a general purpose spatial index like a K-d tree. The index efficiently supports insertions, ordinary exact searches, orthogonal range searches, nearest neighbor searches, etc. Here we consider whether we can also efficiently support search by rank, that is, to locate the i-th smallest element along the j-th coordinate. We answer this question in the affirmative by developing a simple algorithm with expected cost O(na(1/K) log n), where n is the size of the K-d tree and a(1/K) < 1 for any K ¿ 2. The only requirement to support the search by rank is that each node in the K-d tree stores the size of the subtree rooted at that node (or some equivalent information). This is not too space demanding. Furthermore, it can be used to randomize the update algorithms to provide guarantees on the expected performance of the various operations on K-d trees. Although selection in multidimensional data can be solved more efficiently than with our algorithm, those solutions will rely on ad-hoc data structures or superlinear space. Our solution adds to an existing data structure (K-d trees) the capability of search by rank with very little overhead. The simplicity of the algorithm makes it easy to implement, practical and very flexible; however, its correctness and efficiency are far from self-evident. Furthermore, it can be easily adapted to other spatial indexes as well.