Go to the content (press return)

Sampling Samples: Relevant Applications of Statistics in Digital Economy and Society

Total activity: 5
Type of activity
Competitive project
Sampling Samples
Funding entity
Funding entity code
47.190,00 €
Start date
End date
big data, datos electorales, datos funcionales, datos textuales, dimensionality reduction, discrete distributions, distribuciones discretas, distribución del grado de nodos, ecological inference, electoral data, functional data, grafo, graph, inferencia ecológica, node degree distribution, personal income, reducción de la dimensión, renta personal, textual data
In data analysis it is becoming more frequent that the sample that is being analyzed is formed by sampling units that do not correspond to the classical
statistical concept of a phisical individual from which one has measured one or more variables of interest. It is already usual that sampling units
conrespond to a subset of individual data, of a lower level, with information that has been aggregated to generate the sampling units that make the
sample under study.
During the last eight years the research team members have gained experience in the statistical analysis of situations that fall into this hierarchical
scheme of samples of samples. In particular we have worked in the analysis of functional data, with functions like probability densities that summarize
the information in the lower level samples, in the analisis of electoral results at a small area level and in the analisis of textual data, where at a higher
level data are discrete distributions and at a lower level the sampling units are the individuals voting in each area or the words used in each chapter. In
the context of the analysis of samples of samples one could also fit for example the analysis of social network or web page graphs.
The statistical objects that appear as summaries of the samples at a lower level and make the sample at a higher level can be very comples and difficult
to handle. In particular one will not always be able to define a scalar product on them, even though in most cases one should be able to define a distance
measure on them. We also find that sets of data from very different origins, with completely different lower level individuals, give place to higher level
samples of the same kind.
We consider that the analysis of samples of samples is a good starting point to frame the analysis of Big Data. The treatment of Big Data often requires
the aggregation of individual data either because the data base is not stored in a single server, or because the size of the data base (say N) might
recommend its partition into subsamples (say K) of smaller size (say n, with N=k*n). The individual analysis of each one of these K subsamples and the
corresponding transformation of the information contained in an hyper-datum, allows one to go from a sample of N individuals to a sample of K higher
level statistical units.
Our goal is to move ahead starting from our work in the analysis of samples of samples towards the statistical analysis of Big Data. In order for that
transition to be smooth, we plan on widening the range of applications in which to test our proposals (adding data on income distribution and medium
sized graphs), working with new data bases (demographic data, income micro-data, species counts in ecology and graph repositories), and tackling
problems with a more ambitious structure than the ones dealt with so far (functional data that depend on two arguments, the joint observation of r
discrete distributions in each higher level sampling unit, and allowing temporal and spatial dependencies among the higher level data).
Adm. Estat
Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016
Call year
Funcding program
Programa Estatal de I+D+i Orientada a los Retos de la Sociedad
Funding call
Retos de Investigación: Proyectos de I+D+i
Grant institution
Gobierno De España. Ministerio De Economía Y Competitividad, Mineco


Scientific and technological production

1 to 5 of 5 results