L'objectiu del grup és la producció de contribucions rellevants en les àrees d'expertesa dels components del grup i la seva disseminació en revistes i conferències internacionals de prestigi reconegut. És voluntat del grup que les contribucions tinguin un impacte significatiu a llarg termini. La transferència de tecnologia és considerada com una conseqüència de l'excel·lència en la recerca i s'ha de portar a terme com un mitjà per incrementar l'impacte dels resultats, obtenir recursos per al grup i explorar nous temes per a la recerca en el futur.
Partial match queries constitute the most basic type of associative queries in multidimensional data structures such as K-d trees or quadtrees. Given a query q=(q0,…,qK-1) where s of the coordinates are specified and K-s are left unspecified (qi=*), a partial match search returns the subset of data points x=(x0,…,xK-1) in the data structure that match the given query, that is, the data points such that xi=qi whenever qi¿*. There exists a wealth of results about the cost of partial match searches in many different multidimensional data structures, but most of these results deal with random queries. Only recently a few papers have begun to investigate the cost of partial match queries with a fixed query q. This paper represents a new contribution in this direction, giving a detailed asymptotic estimate of the expected cost Pn,q for a given fixed query q. From previous results on the cost of partial matches with a fixed query and the ones presented here, a deeper understanding is emerging, uncovering the following functional shape for Pn,q
Pn,q=¿·(¿i:qi is specifiedqi(1-qi))a/2·na+l.o.t.
(l.o.t. lower order terms, throughout this work) in many multidimensional data structures, which differ only in the exponent a and the constant ¿, both dependent on s and K, and, for some data structures, on the whole pattern of specified and unspecified coordinates in q as well. Although it is tempting to conjecture that this functional shape is “universal”, we have shown experimentally that it seems not to be true for a variant of K-d trees called squarish K-d trees.
The final publication is available at Springer via http://dx.doi.org/10.1007/s00453-015-0097-4
The new dual-pivot Quicksort by Vladimir Yaroslavskiy-used in Oracle's Java runtime library since version 7-features intriguing asymmetries. They make a basic variant of this algorithm use less comparisons than classic single-pivot Quicksort. In this paper, we extend the analysis to the case where the two pivots are chosen as fixed order statistics of a random sample. Surprisingly, dual-pivot Quicksort then needs more comparisons than a corresponding version of classic Quicksort, so it is clear that counting comparisons is not sufficient to explain the running time advantages observed for Yaroslavskiy's algorithm in practice. Consequently, we take a more holistic approach and give also the precise leading term of the average number of swaps, the number of executed Java Bytecode instructions and the number of scanned elements, a new simple cost measure that approximates I/O costs in the memory hierarchy. We determine optimal order statistics for each of the cost measures. It turns out that the asymmetries in Yaroslavskiy's algorithm render pivots with a systematic skew more efficient than the symmetric choice. Moreover, we finally have a convincing explanation for the success of Yaroslavskiy's algorithm in practice: compared with corresponding versions of classic single-pivot Quicksort, dual-pivot Quicksort needs significantly less I/Os, both with and without pivot sampling.
The final publication is available at Springer via http://dx.doi.org/10.1007/s00453-015-0041-7
We present multikey quickselect: an efficient, in-place, easy-to-implement algorithm for the selection problem for strings. We analyze its expected cost under a uniform model, measured as the number of character comparisons and pointer swaps, depending on the alphabet cardinality. From the analysis, we derive a binary variant of the algorithm, which is more efficient when the range of values for the alphabet is known. This variant can also be applied to multikey quicksort.
The hiring problem is a simple model for on-line decision-making, inspired by the well-known and time-honored secretary problem (see, for instance, the surveys of Freeman Int Stat 51:189-206, 1983 or Samuels Handbook of sequential analysis, 1991). In the combinatorial model of the hiring problem (Archibald and Martinez Proc. of the 21st Int. Col. on Formal Power Series and Algebraic Combinatorics (FPSAC), 2009), there is a sequence of candidates of unknown length, and we can rank all candidates from best to worst without ties; all orders are equally likely. Thus any finite prefix of the sequence of candidates can be modeled as a random permutation. Candidates come one after another, and a decision must be taken immediately either to hire or to discard the current candidate, based on her relative rank among all candidates seen so far. In this paper we analyze in depth the strategy hiring above the -th best candidate (or hiring above the -th best, for short), formally introduced by Archibald and Martinez (Proc. of the 21st Int. Col. on Formal Power Series and Algebraic Combinatorics (FPSAC), 2009). This hiring strategy hires the first candidates in the sequence whatever their relative ranks, then any further candidate is hired if her relative rank is better than the -th best candidate seen so far. The close connection of this hiring strategy to the notion of records in sequences and permutations is quite evident; records and their generalizations (namely, -records) have been throughly studied in the literature (see, for instance, the book of Arnold et al. Records, Wiley Series in Probability and Mathematical Statistics, 1998), and we explore here that relationship. We also discuss the relationship between this hiring strategy and the seating plan for the well-known Chinese restaurant process (Pitman Combinatorial stochastic processes, 2006). We analyze in this paper several random variables (we also call them hiring parameters) that give us an accurate description of the main probabilistic features of "hiring above the -th best", from the point of view of the hiring rate (number of hired candidates, waiting time to hire a certain number of candidates, time between consecutive hirings,...) and also of the quality of the hired staff (rank of the last hired candidate, best discarded candidate,...). We are able to obtain the exact and asymptotic probability distributions for the most fundamental parameter, namely, the number of hired candidates, and also for most of the other parameters. Another novel quantity that we analyze here is the number of replacements, a quantity naturally arising when we consider "hiring above the -th best" endowed with a replacement mechanism: that is, an incoming candidate can be: (1) hired anew, because the candidate is among the best seen so far, (2) hired to replace some previously hired worse candidate, or (3) discarded.
We consider the Moran process, as generalized by Lieberman et al. (Nature 433:312-316, 2005). A population resides on the vertices of a finite, connected, undirected graph and, at each time step, an individual is chosen at random with probability proportional to its assigned "fitness" value. It reproduces, placing a copy of itself on a neighbouring vertex chosen uniformly at random, replacing the individual that was there. The initial population consists of a single mutant of fitness r > 0 placed uniformly at random, with every other vertex occupied by an individual of fitness 1. The main quantities of interest are the probabilities that the descendants of the initial mutant come to occupy the whole graph (fixation) and that they die out (extinction); almost surely, these are the only possibilities. In general, exact computation of these quantities by standard Markov chain techniques requires solving a system of linear equations of size exponential in the order of the graph so is not feasible. We show that, with high probability, the number of steps needed to reach fixation or extinction is bounded by a polynomial in the number of vertices in the graph. This bound allows us to construct fully polynomial randomized approximation schemes (FPRAS) for the probability of fixation (when ra parts per thousand yen1) and of extinction (for all r > 0).