Loading...
Loading...

Go to the content (press return)

TweetNorm_es: an annotated corpus for Spanish microtext normalization

Author
Alegria, I.; Aranberri, N.; Comas, P.R.; Fresno, V.; Gamallo, P.; Padro, L.; San Vicente, I.; Turmo, J.; Zubiaga, A.
Type of activity
Presentation of work at congresses
Name of edition
LREC 2014 - 9th International Conference on Language Resources and Evaluation
Date of publication
2014
Presentation's date
2014-05-29
Book of congress proceedings
LREC 2014: Ninth International Conference on Language Resources and Evaluation: Reykjavik, Islàndia: May, 26-31, 2014: proceedings
First page
2274
Last page
2278
Publisher
European Language Resources Association (ELRA)
Project funding
Adquisición de escenarios de conocimiento a través de la lectura de textos: inferencia de relaciones entre eventos (SKATeR)
Cross-lingual Knowledge Extraction
Repository
http://hdl.handle.net/2117/23411 Open in new window
URL
http://www.lrec-conf.org/proceedings/lrec2014/pdf/442_Paper.pdf Open in new window
Abstract
In this paper we introduce TweetNorm es, an annotated corpus of tweets in Spanish language, which we make publicly available under the terms of the CC-BY license. This corpus is intended for development and testing of microtext normalization systems. It was created for Tweet-Norm, a tweet normalization workshop and shared task, and is the result of a joint annotation effort from different research groups. In this paper we describe the methodology defined to build the corpus as well as the guidel...
Citation
Alegria, I. [et al.]. TweetNorm_es: an annotated corpus for Spanish microtext normalization. A: International Conference on Language Resources and Evaluation. "Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)". Reykjavik: European Language Resources Association (ELRA), 2014, p. 2274-2278.
Keywords
Microtext normalization, Twitter, phonology
Group of research
GPLN - Natural Language Processing Group
IDEAI-UPC - Intelligent Data Science and Artificial Intelligence Research Center
TALP - Centre for Language and Speech Technologies and Applications

Participants

  • Alegria, Iñaki  (author and speaker )
  • Aranberri, Nora  (author and speaker )
  • Comas Umbert, Pere Ramon  (author and speaker )
  • Fresno, Víctor  (author and speaker )
  • Gamallo Otero, Pablo  (author and speaker )
  • Padró Cirera, Lluís  (author and speaker )
  • San Vicente Roncal, Iñaki  (author and speaker )
  • Turmo Borras, Jorge  (author and speaker )
  • Zubiaga, Arkaitz  (author and speaker )

Attachments