Loading...
Loading...

Go to the content (press return)

Word-sense disambiguated multilingual Wikipedia corpus

Author
Reese, S.; Boleda, G.; Cuadros, M.; Padro, L.; Rigau, G.
Type of activity
Presentation of work at congresses
Name of edition
7th International Conference on Language Resources and Evaluation (LREC 2010)
Date of publication
2010
Presentation's date
2010-05
Book of congress proceedings
Proceedings of 7th Language Resources and Evaluation Conference
First page
1418
Last page
1421
Project funding
KNOW. Desarrollo de tecnologías multilíngües a gran escala para la comprensión del lenguaje. Análisis semántico
TECNOLOGIA ROBUSTA PARA LA MINERIA DE TEXTO ADAPTATIVA (KNOW 2)
Repository
http://hdl.handle.net/2117/7551 Open in new window
URL
http://www.lrec-conf.org/proceedings/lrec2010/pdf/222_Paper.pdf Open in new window
Abstract
This article presents a new freely available trilingual corpus (Catalan, Spanish, English) that contains large portions of the Wikipedia and has been automatically enriched with linguistic information. To our knowledge, this is the largest such corpus that is freely available to the community: In its present version, it contains over 750 million words. The corpora have been annotated with lemma and part of speech information using the open source library FreeLing. Also, they have been sense anno...
Citation
Reese, S. [et al.]. Word-sense disambiguated multilingual Wikipedia corpus. A: International Conference on Language Resources and Evaluation. "7th International Conference on Language Resources and Evaluation". La Valetta: 2010.
Group of research
GPLN - Natural Language Processing Group
IDEAI-UPC - Intelligent Data Science and Artificial Intelligence Research Center
TALP - Centre for Language and Speech Technologies and Applications

Participants

  • Reese, Samuel  (author and speaker )
  • Boleda Torrent, Gemma  (author and speaker )
  • Cuadros Oller, Montserrat  (author and speaker )
  • Padró Cirera, Lluís  (author and speaker )
  • Rigau Claramunt, German  (author and speaker )

Attachments