Journal of quantitative linguistics

Vol. 25, num. 3, p. 207-237

DOI: 10.1080/09296174.2017.1366095

Date of publication: 2018-07-03

Abstract:

A family of information theoretic models of communication was introduced more than a decade ago to explain the origins of Zipf’s law for word frequencies. The family is a based on a combination of two information theoretic principles: maximization of mutual information between forms and meanings and minimization of form entropy. The family also sheds light on the origins of three other patterns: the principle of contrast; a related vocabulary learning bias; and the meaning-frequency law. Here two important components of the family, namely the information theoretic principles and the energy function that combines them linearly, are reviewed from the perspective of psycholinguistics, language learning, information theory and synergetic linguistics. The minimization of this linear function is linked to the problem of compression of standard information theory and might be tuned by self-organization.]]>

Journal of quantitative linguistics

Vol. 23, num. 2, p. 133-153

DOI: 10.1080/09296174.2016.1142323

Date of publication: 2016-06-07

Abstract:

Vocalizations, and less often gestures, have been the object of linguistic research for decades. However, the development of a general theory of communication with human language as a particular case requires a clear understanding of the organization of communication through other means. Infochemicals are chemical compounds that carry information and are employed by small organisms that cannot emit acoustic signals of an optimal frequency to achieve successful communication. Here, we investigate the distribution of infochemicals across species when they are ranked by their degree or the number of species with which they are associated (because they produce them or are sensitive to them). We evaluate the quality of the fit of different functions to the dependency between degree and rank by means of a penalty for the number of parameters of the function. Surprisingly, a double Zipf (a Zipf distribution with two regimes, each with a different exponent) is the model yielding the best fit although it is the function with the largest number of parameters. This suggests that the worldwide repertoire of infochemicals contains a core which is shared by many species and is reminiscent of the core vocabularies found for human language in dictionaries or large corpora.]]>

Journal of quantitative linguistics

Vol. 22, num. 3, p. 177-201

DOI: 10.1080/09296174.2015.1037159

Date of publication: 2015-07-09

Abstract:

The statistical analysis of the heterogeneity of the style of a text often leads to the analysis of contingency tables of ordered rows. When multiple authorship is suspected, one can explore that heterogeneity through either a change-point analysis of these rows, consistent with sudden changes of author, or a cluster analysis of them, consistent with authors contributing exchangeably, without taking order into consideration. Here an analysis is proposed that strikes a compromise between change-point and cluster analysis by incorporating the fact that parts close together are more likely to belong to the same author than parts far apart. The approach is illustrated by revisiting the authorship attribution of Tirant lo Blanc]]>

Journal of quantitative linguistics

Vol. 20, num. 3, p. 209-217

DOI: 10.1080/09296174.2013.799917

Date of publication: 2013-07-04

Abstract:

Words follow the law of brevity, i.e. more frequent words tend to be shorter. From a statistical point of view, this qualitative definition of the law states that word length and word frequency are negatively correlated. Here the recent finding of patterning consistent with the law of brevity in Formosan macaque vocal communication (Semple, Hsu, & Agoramoorthy, 2010) is revisited. It is shown that the negative correlation between mean duration and frequency of use in the vocalizations of Formosan macaques is not an artefact of the use of a mean duration for each call type instead of the customary ‘word’ length of studies of the law in human language. The key point demonstrated is that the total duration of calls of a particular type increases with the number of calls of that type. The finding of the law of brevity in the vocalizations of these macaques therefore defies a trivial explanation.]]>

Journal of quantitative linguistics

Vol. 20, num. 2, p. 94-104

DOI: 10.1080/09296174.2013.773141

Date of publication: 2013

Abstract:

The relationship between the size of the whole and the size of the parts in language and music is known to follow the Menzerath-Altmann law at many levels of description (morphemes, words, sentences, …). Qualitatively, the law states that the larger the whole, the smaller its parts, e.g. the longer a word (in syllables) the shorter its syllables (in letters or phonemes). This patterning has also been found in genomes: the longer a genome (in chromosomes), the shorter its chromosomes (in base pairs). However, it has been argued recently that mean chromosome length is trivially a pure power function of chromosome number with an exponent of -1. The functional dependency between mean chromosome size and chromosome number in groups of organisms from three different kingdoms is studied. The fit of a pure power function yields exponents between -1.6 and 0.1. It is shown that an exponent of -1 is unlikely for fungi, gymnosperm plants, insects, reptiles, ray-finned fishes and amphibians. Even when the exponent is very close to -1, adding an exponential component is able to yield a better fit with regard to a pure power-law in plants, mammals, ray-finned fishes and amphibians. The parameters of the Menzerath-Altmann law in genomes deviate significantly from a power law with a -1 exponent with the exception of birds and cartilaginous fishes.]]>

Journal of quantitative linguistics

Vol. 9, num. 1, p. 35-47

Date of publication: 2002-04

Journal of quantitative linguistics

Vol. 8, num. 3, p. 165-173

Date of publication: 2001-12