Ortells, R.; Egozcue, J. J.; Ortego, M.I.; Garola, A.
International Workshop on Compositional Data Analysis
p. 215-228
Presentation's date: 2015-06-04
Presentation of work at congresses
The purpose of this contribution is to evaluate whether there is enough statistical basis to establish a relationship between the popularity of certain terms in the Google, Inc. browser and the evolution of several worldwide economic indexes the subsequent week.
A linear model trying to predict the evolution of 19 financial indexes from all over the world with the information of how many times a selected group of 200 key words are looked up online the previous week is proposed.
The linear model that is proposed takes a compositional approach due to two reasons. First, because the information contained in the values of the financial indexes has a compositional nature. The strongest proof supporting this idea is that in case all values for the indexes on a certain week were multiplied by a factor, the information would remain unchanged. In fact, the value for a certain index is irrelevant by itself, since it is
its evolution with respect to the rest of indexes that indicates whether it is performing well. Therefore, this idea suggests that the numerical values of the 19 indexes for a certain week can be understood as a vector of the simplex and be analyzed accordingly. Second, the explicative variable has to be understood as a vector of the simplex as well, for a similar reason as before. For instance, let us imagine that the number of times
the words are looked up online in a certain week was multiplied by a factor. Indeed, the information contained in this vector would be exactly the same. Moreover, it seems intuitive as well how the absolute value for the number of searches is irrelevant by itself, since we will be interested in the relationships amongst variables. For the reasons we have just set, a compositional approach seems necessary in order to address the problem successfully, since both the explicative and the predicted variable
present a compositional nature. In other words, despite not adding up to a constant, the components of the vectors of both the explicative and predicted variable seem to be closely related in terms of giving information of a part of a whole, so tackling the problem through a compositional perspective seems appropriate. The analysis consists of an exploratory analysis of both response (indexes) and explanatory (searches) variables and a compositional linear multiple regression between both sets of variables.