We propose the total bootstrap median (TBM) as a robust and efficient estimator of location and scale for small samples. We demonstrate its performance by estimating the mean and variance of a variety of distributions. We also show that, if the underlying distribution is unknown and there is either no contamination or low to moderate contamination, the TBM provides a better estimate of the mean, in mean square terms, than the sample mean or the sample median. In addition, the TBM is a better estimator of the variance of the underlying distribution than the sample variance or the square of the bias-corrected median absolute deviation from the median estimator. We also show that the TBM is an explicit L-estimator, which allows a direct study of its properties.
Serrat, C.; Rué, M.; Armero, C.; Piulachs, X.; Perpiñán, H.; Forte, A.; Páez, Á.; Gomez, G. Journal of applied statistics Vol. 42, num. 6, p. 1223-1239 DOI: 10.1080/02664763.2014.999032 Data de publicació: 2015-06-03 Article en revista
The paper describes the use of frequentist and Bayesian shared-parameter joint models of longitudinal measurements of prostate-specific antigen (PSA) and the risk of prostate cancer (PCa). The motivating dataset corresponds to the screening arm of the Spanish branch of the European Randomized Screening for Prostate Cancer study. The results show that PSA is highly associated with the risk of being diagnosed with PCa and that there is an age-varying effect of PSA on PCa risk. Both the frequentist and Bayesian paradigms produced very close parameter estimates and subsequent 95% confidence and credibility intervals. Dynamic estimations of disease-free probabilities obtained using Bayesian inference highlight the potential of joint models to guide personalized risk-based screening strategies.
A Bayesian cluster analysis for the results of an election based on multinomial mixture models is proposed. The number of clusters is chosen based on the careful comparison of the results with predictive simulations from the models, and by checking whether models capture most of the spatial dependence in the results. By implementing the analysis on five recent elections in Barcelona, the reader is walked through the choice of the best statistics and graphical displays to help chose a model and present the results. Even though the models do not use any information about the location of the areas in which the results are broken into, in the example they uncover a four-cluster structure with a strong spatial dependence, that is very stable over time and relates to the demographic composition.
The analysis of word frequency count data can be very useful in authorship attribution problems. Zerotruncated
generalized inverse Gaussian–Poisson mixture models are very helpful in the analysis of these
kinds of data because their model-mixing density estimates can be used as estimates of the density of the
word frequencies of the vocabulary. It is found that this model provides excellent fits for theword frequency
counts of very long texts, where the truncated inverse Gaussian–Poisson special case fails because it does
not allow for the large degree of over-dispersion in the data. The role played by the three parameters of
this truncated GIG-Poisson model is also explored. Our second goal is to compare the fit of the truncated
GIG-Poisson mixture model with the fit of the model that results from switching the order of the mixing
and truncation stages. A heuristic interpretation of the mixing distribution estimates obtained under this
alternative GIG-truncated Poisson mixture model is also provided.