This work proposes an original textual statistical method to uncover the relationships between opinions, expressed as free-text answers, and respondents’ characteristics. This method also identifies the specific links between each characteristic and certain words used in these answers. Promising results are obtained as shown by an application to real data collected to know what health means for the non-experts, essential knowledge for effective public health interventions.
Kostov, Belchin Adriyanov; Becue Bertaut, Monica Maria; Husson, François Food quality and preference Vol. 32, num. Part A, p. 35-40 DOI: 10.1016/j.foodqual.2013.06.009 Date of publication: 2014-03 Journal article
Rhetorical strategy is relevant in the law domain, where language is a vital instrument. Textual statistics have much to offer for uncovering such a strategy. We propose a methodology that starts from a non-structured text; first, the breakpoints are automatically detected and lexically homogeneous parts are identified; then, the shape of the text through the trajectory of these parts and their hierarchical structure are uncovered; finally, the argument flow is tracked along. Several methods are combined. Chronological clustering of multidimensional count series detects the breakpoints; the shape of the text is revealed by applying correspondence analysis to the parts×words table while the progression of the argument is described by labelled time-constrained hierarchical clustering. This methodology is illustrated on a rhetoric forensic application, concretely a closing speech delivered by a prosecutor at Barcelona Criminal Court. This approach could also be useful in politics, communication and professional writing.
We present a statistical methodology for tracking the discursive strategy in a persuasive speech,
such as a forensic closing speech. First, homogeneous parts are automatically uncovered in this
non-structured corpus taking advantage of the distribution of the words; then, the temporal
trajectory of these blocks and their hierarchical nesting are spotted; finally, the flow of
arguments, through the flow of words, is tracked along. Different multidimensional methods are
combined. The starting point is to divide the speech into arbitrary short sequences, count the
occurrences of each word in each sequence and encode the speech into a sequences × words
table. A specific algorithm, called chronological clustering, allows for grouping these sequences
under the constraints of being both contiguous and lexically homogeneous; to ensure the latter, a
test is performed to authorize or not the fusion between nodes. As a result, the breakpoints are
detected and the speech is segmented into long enough homogeneous blocks. The shape of the
text is uncovered and visualized through correspondence analysis applied to the blocks×words
table. A time-constrained clustering allows for revealing the hierarchical structure of the
arrangement of the blocks. The nodes are labeled with their lexical characteristics; the flow of
arguments is pointed out along the labeled hierarchy. This methodology is illustrated by its
application to a forensic closing speech delivered by a prosecutor at Barcelona Criminal Court.
The first part of the speech is thus identified as well organized while the second evidences a
more difficult progression of the argumentation. This corresponds to, first, an exposition of
indisputable facts followed by a circumstantial evidence based reasoning, difficult to implement
in this kind of speech elaborated when in progress from only previous outline and notes. This
approach would also be useful as a tool to analyze any kind of written or oral persuasive texts.
We present Intra-Table Correspondence Analysis using two approaches: Correspondence Analysis with respect to a model and Weighted Principal Component Analysis. In addition, we use the relationship between Correspondence Analysis and the Log-Linear Models to provide a deeper insight into the interactions that each Correspondence Analysis describes. We develop in detail the Internal Correspondence Analysis as an Intra-Table Correspondence Analysis in two dimensions and introduce the Intra-blocks Correspondence Analysis. Moreover, we summarize the superimposed representations and give some aids to interpret the graphics associated to the subpartition structures of the table. Finally, the methods presented in this work are illustrated by their application to the standardized public test data collected from Colombian secondary education students in 2008.
Kostov, Belchin Adriyanov; Becue Bertaut, Monica Maria; Husson, François International Conference on Applied Stochastic Models and Data Analysis p. 121-122 Presentation's date: 2013-06 Presentation of work at congresses
We present multiple factor analysis for contingency tables (MFACT) and its implementation
in the FactoMineR package. This method, through an option of the MFA function, allows us to deal
with multiple contingency or frequency tables, in addition to the categorical and quantitative multiple
tables already considered in previous versions of the package. Thanks to this revised function, either
a multiple contingency table or a mixed multiple table integrating quantitative, categorical and
frequency data can be tackled.
The FactoMineR package (Lê et al., 2008; Husson et al., 2011) offers the most commonly used
principal component methods: principal component analysis (PCA), correspondence analysis (CA;
Benzécri, 1973), multiple correspondence analysis (MCA; Lebart et al., 2006) and multiple factor
analysis (MFA; Escofier and Pagès, 2008). Detailed presentations of these methods enriched by
numerous examples can be consulted at the website http://factominer.free.fr/.
An extension of the MFA function that considers contingency or frequency tables as proposed by
Bécue-Bertaut and Pagès (2004, 2008) is detailed in this article.
First, an example is presented in order to motivate the approach. Next, the mortality data used
to illustrate the method are introduced. Then we briefly describe multiple factor analysis (MFA)
and present the principles of its extension to contingency tables. A real example on mortality data
illustrates the handling of the MFA function to analyse these multiple tables and, finally, conclusions
Kostov, Belchin Adriyanov; Becue Bertaut, Monica Maria; Husson, François Congreso Nacional de Estadítica e Investigación Operativa. Jornadas de Estadística Pública p. 38 Presentation's date: 2013 Presentation of work at congresses
Becue Bertaut, Monica Maria; Cardoret, Marine; Kostov, Belchin Adriyanov; Torrens, Jordi; Urpí, Pilar; Pagès, Jèrôme New Techniques and Technologies for Statistics p. 1-21 DOI: 10.2901/Eurostat.C2011.001 Presentation's date: 2011-02-22 Presentation of work at congresses
En las encuestas por cuestionario, es usual introducir preguntas abiertas. En los estudios de consumo, los expertos y/o consumidores evalúan los productos mediante una puntuación y, conjuntamente, un comentario libre. El análisis estadístico de este tipo de datos, llamados datos mixtos, requiere de métodos específicos. Proponemos aquí emplear una metodología que combina el análisis factorial múltiple clásico y el análisis factorial múltiple para tablas de contingencia. Esta extensión del AFM permite tratar globalmente tablas léxicas, creadas a partir de las respuestas textuales, y tablas cuantitativas y categóricas.
Presentamos la metodología apoyándonos en datos recogidos en la evaluación de dos conjuntos de vinos. El análisis sensorial constituye en efecto un área de aplicación privilegiada, dado que los expertos y consumidores desean frecuentemente complementar los clásicos perfiles sensoriales con descripciones libres que traducen más fielmente sus percepciones.
La metodología se puede emplear también en el tratamiento de datos de encuesta, cuando se aborda una problemática con una batería de preguntas cerradas y de preguntas abiertas.
The characterization of wines by experts requires short-time consuming hall tests procedures. Thus, holistic methods have been developed such as napping which, for each expert, collects a bi-dimensional configuration of the products in only one session. Ultra-flash profiling, a spontaneous descriptive technique, offers a complement that allows for interpreting the results.
Hierarchical multiple factor analysis is a suitable tool for giving account of the evaluation of a same set of items by hierarchically structured sets of individuals. This method is applied to compare trained and non-trained panels in wine hall tests. Every panellist has to categorize the wines in clusters, describe them with free descriptive words and also give a hedonic score. Data coding leads to a wine × individual evaluation table in which the columns present a hierarchical structure. Hierarchical multiple factor analysis allows for exploring and visualizing the observed variability
among both wines and panellists. Visualizing tools are also offered to evaluate the similarity between panels and sets of panels.
Becue Bertaut, Monica Maria; Fernández-Aguirre, Karmele; Modroño-Herrán, Juan I; Jérôme, Pagès 7th International Conference on Social Science Methodology. RC33 - Logic and Methodology in Sociology p. 34-35 Presentation of work at congresses