Vocalizations, and less often gestures, have been the object of linguistic research for decades. However, the development of a general theory of communication with human language as a particular case requires a clear understanding of the organization of communication through other means. Infochemicals are chemical compounds that carry information and are employed by small organisms that cannot emit acoustic signals of an optimal frequency to achieve successful communication. Here, we investigate the distribution of infochemicals across species when they are ranked by their degree or the number of species with which they are associated (because they produce them or are sensitive to them). We evaluate the quality of the fit of different functions to the dependency between degree and rank by means of a penalty for the number of parameters of the function. Surprisingly, a double Zipf (a Zipf distribution with two regimes, each with a different exponent) is the model yielding the best fit although it is the function with the largest number of parameters. This suggests that the worldwide repertoire of infochemicals contains a core which is shared by many species and is reminiscent of the core vocabularies found for human language in dictionaries or large corpora.
The statistical analysis of the heterogeneity of the style of a text often leads to the analysis of contingency tables of ordered rows. When multiple authorship is suspected, one can explore that heterogeneity through either a change-point analysis of these rows, consistent with sudden changes of author, or a cluster analysis of them, consistent with authors contributing exchangeably, without taking order into consideration. Here an analysis is proposed that strikes a compromise between change-point and cluster analysis by incorporating the fact that parts close together are more likely to belong to the same author than parts far apart. The approach is illustrated by revisiting the authorship attribution of Tirant lo Blanc
Semple, S.; Hsu, M. J.; Agoramoorthy, G.; Ferrer-i-Cancho, R. Journal of quantitative linguistics Vol. 20, num. 3, p. 209-217 DOI: 10.1080/09296174.2013.799917 Data de publicació: 2013-07-04 Article en revista
Words follow the law of brevity, i.e. more frequent words tend to be shorter. From a statistical point of view, this qualitative definition of the law states that word length and word frequency are negatively correlated. Here the recent finding of patterning consistent with the law of brevity in Formosan macaque vocal communication (Semple, Hsu, & Agoramoorthy, 2010) is revisited. It is shown that the negative correlation between mean duration and frequency of use in the vocalizations of Formosan macaques is not an artefact of the use of a mean duration for each call type instead of the customary ‘word’ length of studies of the law in human language. The key point demonstrated is that the total duration of calls of a particular type increases with the number of calls of that type. The finding of the law of brevity in the vocalizations of these macaques therefore defies a trivial explanation.
Baixeries, J.; Hernandez Fernandez, A.; Forns, N.; Ferrer-i-Cancho, R. Journal of quantitative linguistics Vol. 20, num. 2, p. 94-104 DOI: 10.1080/09296174.2013.773141 Data de publicació: 2013 Article en revista
The relationship between the size of the whole and the size of the parts in language and music is known to follow the Menzerath-Altmann law at many levels of description (morphemes, words, sentences, …). Qualitatively, the law states that the larger the whole, the smaller its parts, e.g. the longer a word (in syllables) the shorter its syllables (in letters or
phonemes). This patterning has also been found in genomes: the longer a genome (in chromosomes), the shorter its chromosomes (in base pairs). However, it has been argued recently that mean chromosome length is trivially a pure power function of chromosome number with an exponent of -1. The functional dependency between mean chromosome size and chromosome number in groups of organisms from three different kingdoms is studied. The fit of a pure power function yields exponents between -1.6 and 0.1. It is shown that an exponent of -1 is unlikely for fungi, gymnosperm plants, insects, reptiles, ray-finned fishes and
amphibians. Even when the exponent is very close to -1, adding an exponential component
is able to yield a better fit with regard to a pure power-law in plants, mammals, ray-finned fishes and amphibians. The parameters of the Menzerath-Altmann law in genomes deviate significantly from a power law with a -1 exponent with the exception of birds and cartilaginous fishes.