Carregant...
Carregant...

Vés al contingut (premeu Retorn)

Automatic normalization of short texts by combining statistical and rule-based techniques

Autor
Ruiz, M.; Banchs, R.
Tipus d'activitat
Article en revista
Revista
Language resources and evaluation
Data de publicació
2013-03-01
Volum
47
Pàgina inicial
179
Pàgina final
193
DOI
https://doi.org/10.1007/s10579-012-9187-y Obrir en finestra nova
Repositori
http://hdl.handle.net/2117/102182 Obrir en finestra nova
URL
http://link.springer.com/article/10.1007%2Fs10579-012-9187-y#page-1 Obrir en finestra nova
Resum
Short texts are typically composed of small number of words, most of which are abbreviations, typos and other kinds of noise. This makes the noise to signal ratio relatively high for this specific category of text. A high proportion of noise in the data is undesirable for analysis procedures as well as machine learning applications. Text normalization techniques are used to reduce the noise and improve the quality of text for processing and analysis purposes. In this work, we propose a combinati...
Citació
Ruiz, M., Banchs, R. Automatic normalization of short texts by combining statistical and rule-based techniques. "Language resources and evaluation", 1 Març 2013, vol. 47, p. 179-193.
Paraules clau
Automatic extraction of rules, Normalization chats, Normalization of short texts, Perplexity, Statistical machine translation
Grup de recerca
IDEAI-UPC Intelligent Data Science and Artificial Intelligence
TALP - Centre de Tecnologies i Aplicacions del Llenguatge i la Parla
VEU - Grup de Tractament de la Parla

Participants

Arxius