Graphic summary
  • Show / hide key
  • Information


Scientific and technological production
  •  

1 to 50 of 226 results
  • Access to the full text
    Towards automatic construction of domain ontologies: Application to ISA88  Open access

     Farreres De La Morena, Javier; Graells Sobre, Moises; Rodriguez Hontoria, Horacio; Espuña Camarasa, Antonio
    European Symposium on Computer Aided Process Engineering
    p. 871-876
    DOI: 10.1016/B978-0-444-63456-6.50146-0
    Presentation's date: 2014-06-17
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Process Systems Engineering has shown a growing interest on ontologies to develop knowledge models, organize information, and produce software accordingly. Although software tools supporting the structure of ontologies exist, developing a PSE ontology is a creative procedure to be performed by human experts from each specific domain. This work explores the opportunities for automatic construction of domain ontologies. Specialised documentation can be selected and automatically parsed; next pattern recognition methods can be used to extract concepts and relations; finally, supervision is required to validate the automatic outcome, as well as to complete the task. The bulk of the development of an ontology is expected to result from the application of systematic procedures, thus the development time will be significantly reduced. Automatic methods were prepared and applied to the development of an ontology for batch processing based on the ISA88 standard. Methods are described and commented, and results are discussed from the comparison with a previous ontology for the same domain manually developed.

    Process Systems Engineering has shown a growing interest on ontologies to develop knowledge models, organize information, and produce software accordingly. Although software tools supporting the structure of ontologies exist, developing a PSE ontology is a creative procedure to be performed by human experts from each specific domain. This work explores the opportunities for automatic construction of domain ontologies. Specialised documentation can be selected and automatically parsed; next pattern recognition methods can be used to extract concepts and relations; finally, supervision is required to validate the automatic outcome, as well as to complete the task. The bulk of the development of an ontology is expected to result from the application of systematic procedures, thus the development time will be significantly reduced. Automatic methods were prepared and applied to the development of an ontology for batch processing based on the ISA88 standard. Methods are described and commented, and results are discussed from the comparison with a previous ontology for the same domain manually developed.

  • Relational paraphrase acquisition from Wikipedia: The WRPA method and corpus

     Vila Rigat, Marta; Rodriguez Hontoria, Horacio; Martí Antonin, Maria Antònia
    Natural language engineering (Print)
    p. 1-35
    DOI: 10.1017/S1351324913000235
    Date of publication: 2013-09
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Paraphrase corpora are an essential but scarce resource in Natural Language Processing. In this paper, we present the Wikipedia-based Relational Paraphrase Acquisition (WRPA) method, which extracts relational paraphrases from Wikipedia, and the derived WRPA paraphrase corpus. The WRPA corpus currently covers person-related and authorship relations in English and Spanish, respectively, suggesting that, given adequate Wikipedia coverage, our method is independent of the language and the relation addressed. WRPA extracts entity pairs from structured information in Wikipedia applying distant learning and, based on the distributional hypothesis, uses them as anchor points for candidate paraphrase extraction from the free text in the body of Wikipedia articles. Focussing on relational paraphrasing and taking advantage of Wikipedia-structured information allows for an automatic and consistent evaluation of the results. The WRPA corpus characteristics distinguish it from other types of corpora that rely on string similarity or transformation operations. WRPA relies on distributional similarity and is the result of the free use of language outside any reformulation framework. Validation results show a high precision for the corpus.

  • TIN2012-38584-C06-01 - Adquisición de escenarios de conocimiento a través de la lectura de textos: inferencia de relaciones entre eventos (SKATeR)

     Rodriguez Hontoria, Horacio; Abad Soriano, Maria Teresa; Ageno Pulido, Alicia; Catala Roig, Neus; Comas Umbert, Pere Ramon; Farreres De La Morena, Javier; Fuentes Fort, Maria; Gatius Vila, Marta; Mehdizadeh Naderi, Ali; Padró Cirera, Lluís; Turmo Borras, Jorge
    Competitive project

     Share

  • LIARc: Labeling Implicit ARguments in Spanish deverbal nominalizations

     Peris, Aina; Taulé, Mariona; Rodriguez Hontoria, Horacio; Bertran Ibarz, Manuel
    International Conference on Intelligent Text Processing and Computational Linguistics
    p. 423-434
    DOI: 10.1007/978-3-642-37247-6_34
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    This paper deals with the automatic identification and annotation of the implicit arguments of deverbal nominalizations in Spanish. We present the first version of the LIAR system focusing on its classifier component. We have built a supervised Machine Learning feature based model that uses a subset of AnCora-Es as a training corpus. We have built four different models and the overall F-Measure is 89.9%, which means an increase F-Measure performance approximately 35 points over the baseline (55%). However, a detailed analysis of the feature performance is still needed. Future work will focus on using LIAR to automatically annotate the implicit arguments in the whole AnCora-Es.

  • Access to the full text
    UPC-CORE : What can machine translation evaluation metrics and Wikipedia do for estimating semantic textual similarity?  Open access

     Barron Cedeño, Luis Alberto; Màrquez Villodre, Lluís; Fuentes Fort, Maria; Rodriguez Hontoria, Horacio; Turmo Borras, Jorge
    Joint Conference on Lexical and Computational Semantics
    p. 1-5
    Presentation's date: 2013-06-13
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    In this paper we discuss our participation to the 2013 Semeval Semantic Textual Similarity task. Our core features include (i) a set of metrics borrowed from automatic machine translation, originally intended to evaluate automatic against reference translations and (ii) an instance of explicit semantic analysis, built upon opening paragraphs of Wikipedia 2010 articles. Our similarity estimator relies on a support vector regressor with RBF kernel. Our best approach required 13 machine translation metrics + explicit semantic analysis and ranked 65 in the competition. Our postcompetition analysis shows that the features have a good expression level, but overfitting and ¿mainly¿ normalization issues caused our correlation values to decrease.

    In this paper we discuss our participation to the 2013 Semeval Semantic Textual Similarity task. Our core features include (i) a set of metrics borrowed from automatic machine translation, originally intended to evaluate automatic against reference translations and (ii) an instance of explicit semantic analysis, built upon opening paragraphs of Wikipedia 2010 articles. Our similarity estimator relies on a support vector regressor with RBF kernel. Our best approach required 13 machine translation metrics + explicit semantic analysis and ranked 65 in the competition. Our postcompetition analysis shows that the features have a good expression level, but overfitting and —mainly— normalization issues caused our correlation values to decrease.

  • Paraphrase Scope and Typology. A Data-Driven Approach from Computational Linguistics.

     Vila Rigat, Marta
    Universitat de Barcelona (UB)
    Theses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    [cat] S'entén per paràfrasi la igualtat aproximada de significat entre fragments de text que difereixen en la forma. La paràfrasi és omnipresent en les llengües naturals, on es troba expressada de múltiples maneres. D'una banda, la ubiqüitat de la paràfrasi l'ha convertit en el centre d¿interès de moltes tasques específiques dins de la lingüística computacional; de l'altra, la seva complexitat ha fet de la paràfrasi un problema que encara no té una solució definitiva [eng] Paraphrasing is generally understood as approximate sameness of meaning between snippets of text with a different wording. Paraphrases are omnipresent in natural languages demonstrating all the aspects of its multifaceted nature. The pervasiveness of paraphrasing has made it a focus of several tasks in computational linguistics; its complexity has in turn resulted in paraphrase remaining a still unresolved challenge. Two basic issues, directly linked to the complex nature of paraphrasing, make its computational treatment particularly difficult, namely the absence of a precise and commonly accepted definition and the lack of reference corpora for paraphrasing. Based on the assumption that linguistic knowledge should underlie computational-linguistics research, this thesis aims to go a step forward in these two questions: paraphrase characterization and paraphrase-corpus building and annotation.

  • Access to the full text
    The TALP participation at TAC-KBP 2013  Open access

     Ageno Pulido, Alicia; Comas Umbert, Pere Ramon; Mehdizadeh Naderi, Ali; Rodriguez Hontoria, Horacio; Turmo Borras, Jorge
    Text Analysis Conference
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This document describes the work performed by the Universitat Politècnica de Catalunya (UPC) in its second participation at TAC-KBP 2013 in both the Entity Linking and the Slot Filling tasks.

  • Evaluación del trabajo de Fin de Grado  Open access

     Sanchez Carracedo, Fermin; Climent Vilaro, Juan; Corbalan Gonzalez, Julita; Fonseca Casas, Pau; Garcia Almiñana, Jordi; Herrero Zaragoza, José Ramón; Llinas Audet, Francisco Javier; Rodriguez Hontoria, Horacio; Sancho Samsó, Maria Ribera
    Jornadas de Enseñanza Universitaria de la Informática
    p. 303-310
    DOI: 10.6035/e-TIiT.2013.13
    Presentation's date: 2013-07
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Los Proyectos de Fin de Carrera (PFC) se han evaluado tradicionalmente a partir de una memoria y de una presentación pública. Esta evaluación, en general, la realiza un tribunal formado por varios profesores, que juzga de forma integral el proyecto a partir de la documentación entregada y de su presentación pública. Para poner la nota final los centros no disponen, en general, de unos criterios claros y precisos, por lo que cada tribunal usa su propia experiencia previa para decidir la nota de cada proyecto. Los Trabajos de Fin de Grado (TFG) substituyen en los nuevos planes de estudios de las ingenierías a los antiguos PFC. La evaluación de los TFG debe consi- derar, de forma explícita, tanto las competencias específicas como las genéricas, y es necesario que existan criterios claros sobre la forma de evaluarlas. Para avanzar en este sentido, el Ministerio de Cien cia e Innovación y la Agencia para la Calidad del Sistema Universitario de Catalunya financiaron en 2008 y 2009 el proyecto ¿Guía para la evaluación de competencias en los Trabajos de Fin de Grado y de Máster en las Ingenierías¿. Esta guía es, en realidad, una guía para ayudar a que cada centro/titulación defina su propio procedimiento de evaluación del TFG. En este trabajo se presenta una implementación de las propuestas contenidas en la guía y se define una metodología para evaluar los TFG a partir de las competencias que se trabajan en la titulación de Grado en Ingeniería Informática de la Facultat d¿Informàtica de Barcelona. La metodología puede ser fácilmente replicada o adaptada para otros centros y otras titulaciones, lo que puede facilitar la realización de su propia guía de evaluación de los TFG.

    Los Proyectos de Fin de Carrera (PFC) se han evaluado tradicionalmente a partir de una memoria y de una presentación pública. Esta evaluación, en general, la realiza un tribunal formado por varios profesores, que juzga de forma integral el proyecto a partir de la documentación entregada y de su presentación pública. Para poner la nota final los centros no disponen, en general, de unos criterios claros y precisos, por lo que cada tribunal usa su propia experiencia previa para decidir la nota de cada proyecto. Los Trabajos de Fin de Grado (TFG) substituyen en los nuevos planes de estudios de las ingenierías a los antiguos PFC. La evaluación de los TFG debe considerar, de forma explícita, tanto las competencias específicas como las genéricas, y es necesario que existan criterios claros sobre la forma de evaluarlas. Para avanzar en este sentido, el Ministerio de Ciencia e Innovación y la Agencia para la Calidad del Sistema Universitario de Catalunya financiaron en 2008 y 2009 el proyecto “Guía para la evaluación de competencias en los Trabajos de Fin de Grado y de Máster en las Ingenierías”. Esta guía es, en realidad, una guía para ayudar a que cada centro/titulación defina su propio procedimiento de evaluación del TFG. En este trabajo se presenta una implementación de las propuestas contenidas en la guía y se define una metodología para evaluar los TFG a partir de las competencias que se trabajan en la titulación de Grado en Ingeniería Informática de la Facultat d’Informàtica de Barcelona. La metodología puede ser fácilmente replicada o adaptada para otros centros y otras titulaciones, lo que puede facilitar la realización de su propia guía de evaluación de los TFG.

    SUMMARY -- Final Degree Projects (FDP) have traditionally been evaluated from a project report and a public presentation. This assessment is generally performed by a panel of several teachers who judg comprehensively the project from the documentation provided and analyze the public presentation. In general, schools do not have clear and precise criteria to set the final grade, as each panel uses its own previous experience to decide the mark of each project. The Bachelor Degree Thesis (BDT) replaced the former FDP in the new engineering curricula. Evaluation of FDP should consider explicitly both specific and professional skills, and clear criteria on how to assess competencies is required. To advance in this issue, the Ministry of Science and Innovation and the Quality Agency for the University System in Catalonia funded in 2008 and 2009 the project "Guía para la evaluación de competencias en los Trabajos de Fin de Grado y de Master en las Ingenierías". This guide is actually a guide to help each school / degree to define its own procedure for assessing the BDT. This paper presents an implementation of the suggestions contained in the guide and defines a methodology for assessing the BDT considering the professional skills trained in the Computer Engineering Degree from the Barcelona School of Informatics. The methodology can be easily replicated or adapted for other centres and degrees, which can facilitate the realization of its own guidance for the BDT evaluation.

  • IARG-AnCora: annotating AnCora corpus with implicit arguments

     Taulé, Mariona; Martí Antonin, Maria Antònia; Peris, Aina; Rodriguez Hontoria, Horacio; Moreno Boronat, Lidia; Moreda Pozo, Paloma
    Procesamiento del lenguaje natural
    Vol. 49, p. 181-184
    Date of publication: 2012-09
    Journal article

    Read the abstract Read the abstract  Share Reference managers Reference managers Open in new window

    IARG-AnCora tiene como objetivo la anotación con papeles temáticos de los argumentos implícitos de las nominalizaciones deverbales en el corpus AnCora. Estos corpus servirán de base para los sistemas de etiquetado automático de roles semánticos basados en técnicas de aprendizaje automático. Los analizadores semánticos son componentes básicos en las aplicaciones actuales de las tecnologías del lenguaje, en las que se quiere potenciar una comprensión más profunda del texto para realizar inferencias de más alto nivel y obtener así mejoras cualitativas en los resultados. | Iarg-AnCora aims to annotate the implicit arguments of deverbal nominalizations in AnCora corpus. This corpus will be the basis for systems of automatic semantic role labeling based on machine learning techniques. Semantic analyzers are essential components in the current applications of language technologies, in which it is important to obtain a deeper understanding of the text to make inferences on the highest level in order to obtain qualitative improvements in the results.

  • Access to the full text
    The TALP participation at TAC-KBP 2012  Open access

     González Pellicer, Edgar; Rodriguez Hontoria, Horacio; Turmo Borras, Jorge; Comas Umbert, Pere Ramon; Mehdizadeh Naderi, Ali; Ageno Pulido, Alicia; Sapena Masip, Emili; Vila Rigat, Marta; Martí Antonin, Maria Antònia
    Text Analysis Conference
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This document describes the work performed by the Universitat Politècnica de Catalunya (UPC) in its first participation at TAC-KBP 2012 in both the Entity Linking and the Slot Filling tasks.

    Postprint (author’s final draft)

  • Empirical methods for the study of denotation in nominalizations in Spanish

     Peris, Aina; Taulé, Mariona; Rodriguez Hontoria, Horacio
    Computational linguistics
    Vol. 38, num. 4, p. 827-865
    DOI: 10.1162/COLI_a_00112
    Date of publication: 2012
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Using Wikipedia for domain terms extraction

     Vivaldi, Jorge; Rodriguez Hontoria, Horacio
    Workshop on the Creation, Harmonization and Application of Terminology Resources
    p. 3-10
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Domain terms are a useful resource for tuning both resources and NLP processors to domain specific tasks. This paper proposes a method for obtaining terms from potentially any domain using Wikipedia.

    Domain terms are a useful resource for tuning both resources and NLP processors to domain specific tasks. This paper proposes a method for obtaining terms from potentially any domain using Wikipedia.

  • Access to the full text
    Summarizing a multimodal set of documents in a smart room  Open access

     Fuentes Fort, Maria; Rodriguez Hontoria, Horacio; Turmo Borras, Jorge
    International Conference on Language Resources and Evaluation
    p. 1-6
    Presentation's date: 2012-05-23
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This article reports an intrinsic automatic summarization evaluation in the scientific lecture domain. The lecture takes place in a Smart Room that has access to different types of documents produced from different media. An evaluation framework is presented to analyze the performance of systems producing summaries answering a user need. Several ROUGE metrics are used and a manual content responsiveness evaluation was carried out in order to analyze the performance of the evaluated approaches. Various multilingual summarization approaches are analyzed showing that the use of different types of documents outperforms the use of transcripts. In fact, not using any part of the spontaneous speech transcription in the summary improves the performance of automatic summaries. Moreover, the use of semantic information represented in the different textual documents coming from different media helps to improve summary quality.

    Postprint (author’s final draft)

  • IARG-AnCora: Annotating the AnCora corpus with implicit arguments

     Peris, Aina; Taulé, Mariona; Rodriguez Hontoria, Horacio
    ACL SIGSEM Workshop on Interoperable Semantic Annotation
    p. 56-60
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    IARG-AnCora is an ongoing project which aim is to annotate the implicit arguments of deverbal nominalizations in AnCora corpus. This corpus will be the basis for systems of automatic semantic role labeling based on machine learning techniques. Semantic analyzers are essential components in the current applications of language technologies, in which it is important to obtain a deeper understanding of the text to make inferences on the highest level in order to obtain qualitative improvements in the results.

  • Araknion: inducción de modelos lingüísticos a partir de corpora

     Martí Antonin, Maria Antònia; Taulé, Mariona; Rodriguez Hontoria, Horacio; Martínez Barco, Patricio Manuel; Carreras Perez, Xavier
    Procesamiento del lenguaje natural
    Vol. 47, p. 337-338
    Date of publication: 2011
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Extracting terminology from Wikipedia

     Vivaldi, Jorge; Rodriguez Hontoria, Horacio
    Procesamiento del lenguaje natural
    Vol. 47, p. 65-73
    Date of publication: 2011
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Paraphrase concept and typology: a linguistically based and computationally oriented approach

     Vila, Marta; Martí Antonin, Maria Antònia; Rodriguez Hontoria, Horacio
    Procesamiento del lenguaje natural
    Vol. 46, p. 83-90
    Date of publication: 2011
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Access to the full text
    Georeferencing textual annotations and tagsets with geographical knowledge and language models  Open access

     Ferrés Domènech, Daniel; Rodriguez Hontoria, Horacio
    Congreso de la Sociedad Española para el Procesamiento del Lenguaje Natural
    Presentation's date: 2011
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Presentamos en este artículo cuatro aproximaciones al georeferenciado genérico de anotaciones textuales multilingües y etiquetas sem ánticas. Las cuatro aproximaciones se basan en el uso de 1) Conocimiento geogr áfi co, 2) Modelos del lenguaje (LM), 3) Modelos del lenguaje con predicciones re-ranking y 4) Fusi ón de las predicciones basadas en conocimiento geográfi co con otras aproximaciones. Los recursos empleados incluyen el gazetteer geogr áfi co Geonames, los modelos de recuperación de informaci ón TFIDF y BM25, el Hiemstra Language Modelling (HLM), listas de stop words para varias lenguas y un diccionario electróonico de la lengua inglesa. Los mejores resultados en precisión del georeferenciado se han obtenido con la aproximación de re-ranking que usa el HLM y con su fusióon con conocimiento geográfi co. Estas estrategias mejoran los mejores resultados de los mejores sistemas participantes en la tarea o cial de georeferenciado en MediaEval 2010. Nuestro mejor resultado obtiene una precisión de 68.53% en la tarea de geoeferenciado hasta 100 Km. This paper describes generic approaches for georeferencing multilingual textual annotations and sets of tags from metadata associated to textual or multimedia content with high precision. We present four approaches based on: 1) Geographical Knowledge, 2) Language Modelling (LM), 3) Language Modelling with Re-Ranking predictions, 4) Fusion of Geographical Knowledge predictions with the other approaches. The resources employed were the Geonames geographical gazetteer, the TFIDF and BM25 Information Retrieval algorithms, the Hiemstra Language Modelling (HLM) algorithm, stopwords lists from several languages, and an electronic English dictionary. The best results in georeferencing accuracy are achieved with the HLM Re-Ranking approach and its fusion with Geographical Knowledge. These strategies outperformed the best results in accuracy reported by the state-of-the art systems that participated at MediaEval 2010 official Placing task. Our best results achieved are 68.53% of accuracy georeferencing up to a distance of 100 Km.

    Postprint (author’s final draft)

  • Access to the full text
    Cultural configuration of Wikipedia: measuring autoreferentiality in different languages  Open access

     Miquel Ribé, Marc; Rodriguez Hontoria, Horacio
    Recent Advances in Natural Language Processing
    p. 316-322
    Presentation's date: 2011
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Among the motivations to write in Wikipedia given by the current literature there is often coincidence, but none of the studies presents the hypothesis of contributing for the visibility of the own national or language related content. Similar to topical coverage studies, we outline a method which allows collecting the articles of this content, to later analyse them in several dimensions. To prove its universality, the tests are repeated for up to twenty language editions of Wikipedia. Finally, through the best indicators from each dimension we obtain an index which represents the degree of autoreferentiality of the encyclopedia. Last, we point out the impact of this fact and the risk of not considering its existence in the design of applications based on user generated content.

  • Access to the full text
    TALP at MediaEval 2011 Placing Task: georeferencing Flickr videos with geographical knowledge and information retrieval  Open access

     Ferrés Domènech, Daniel; Rodriguez Hontoria, Horacio
    MediaEval Workshop
    Presentation's date: 2011
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This paper describes our Georeferencing approaches, experiments, and results at the MediaEval 2011 Placing Task evaluation. The task consists of predicting the most probable geographical coordinates of Flickr videos. Our approaches used only Flickr users textual annotations and tagsets to predict. We used three approaches for this task: 1) a Geographical Knowledge approach, 2) an Information Retrieval based approach with Re-Ranking, and 3) a combination of both (GeoFusion). The GeoFusion approach achieved the best results within the margin of errors from 10km to 10000km.

  • WRPA: a system for relational paraphrase acquisition from Wikipedia

     Vila, Marta; Rodriguez Hontoria, Horacio; Martí Antonin, Maria Antònia
    Procesamiento del lenguaje natural
    Vol. 45, p. 11-19
    Date of publication: 2010
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Inference of lexical ontologies. The LeOnI methodology

     Farreres De La Morena, Javier; Gibert Oliveras, Karina; Rodriguez Hontoria, Horacio; Pluempitiwiriyawej, Charnyote
    Artificial intelligence
    Vol. 174, num. 1, p. 1-19
    DOI: 10.1016/j.artint.2009.09.004
    Date of publication: 2010-01
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    In this article we present a method for semi-automatically deriving lexico-conceptual ontologies in other languages, given a lexico-conceptual ontology for one language and bilingual mapping resources. Our method uses a logistic regression model to combine mappings proposed by a set of classifiers (up to 17 in our implementation). The method is formally described and evaluated by means of two implementations for semi-automatically building Spanish and Thai WordNets using Princeton's WordNet for English and conventional English¿Spanish and English¿Thai bilingual dictionaries.

  • Multilingual On-Line Translation

     Rodriguez Hontoria, Horacio; Gonzalez Bermudez, Meritxell; España Bonet, Cristina; Farwell, David Loring; Carreras Perez, Xavier; Xambó Descamps, Sebastian; Màrquez Villodre, Lluís; Padró Cirera, Lluís; Saludes Closa, Jordi
    Competitive project

     Share

  • Automatically extending named entities coverage of Arabic WordNet using Wikipedia

     Alkhalifa, Musa; Rodriguez Hontoria, Horacio
    International journal on information & communication technologies
    Vol. 3, num. 3, p. 20-36
    Date of publication: 2010
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Using Wikipedia for term extraction in the biomedical domain: first experiences

     Vivaldi, Jorge; Rodriguez Hontoria, Horacio
    Procesamiento del lenguaje natural
    Vol. 45, p. 251-254
    Date of publication: 2010
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Access to the full text
    TALP at MediaEval 2010 placing task: geographical focus detection of Flickr textual annotations  Open access

     Ferrés, Dani; Rodriguez Hontoria, Horacio
    MediaEval Workshop
    Presentation's date: 2010
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This paper describes our geographical text analysis and geotagging experiments in the context of the Multimedia Placing Task at MediaEval 2010 evaluation. The task consists of predicting the most probable coordinates of Flickr videos. We used a Natural Language Processing approach trying to match geographical place names in the Flickr users textual annotations. The resources employed to deal with this task were the Geonames geographical gazetteer, stopwords lists from several languages, and an electronic English dictionary. We used two geographical focus disambiguation strategies, one based on population heuristics and another that combines geographical knowledge and population heuristics. The second strategy does achieve the best results. Using stopwords lists and the English dictionary as a lter for ambiguous place names also improves the results.

  • Access to the full text
    TALP at WePS-3 2010  Open access

     Ferrés Domènech, Daniel; Rodriguez Hontoria, Horacio
    Conference on Multilingual and Multimodal Information Access Evaluation
    Presentation's date: 2010
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    In this paper we present our system and experiments at the Third Web People Search Workshop (WePS-3) task for clustering web people search documents in English. In our experiments we used a simple approach with three algorithms: Lingo, Hierachical Agglomerative Clustering (HAC), and a 2-step HAC algorithm. We also present the results and initial conclusions in the context of the WePS-3 Task 1 for clustering. We obtained best results with HAC and 2-step HAC algorithms.

    Postprint (author’s final draft)

  • Access to the full text
    Semantic annotation of deverbal nominalizations in the Spanish corpus AnCora  Open access

     Peris, Aina; Taulé, Mariona; Rodriguez Hontoria, Horacio
    International Workshop on Treebanks and Linguistic Theories
    p. 187-198
    Presentation's date: 2010
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This paper presents the methodology and the linguistic criteria followed to enrich the AnCora-Es corpus with the semantic annotation of deverbal nominalizations. The first step was to run two independent automated processes: one for the annotation of denotation types and another one for the annotation of argument structure. Secondly, we manually checked both types of information and measured inter-annotator agreement. The result is the Spanish AnCora-Es corpus enriched with the semantic annotation of deverbal nominalizations. As far as we know, this is the first Spanish corpus annotated with this type of information.

    Postprint (author’s final draft)

  • Finding domain terms using Wikipedia

     Vivaldi, Jorge; Rodriguez Hontoria, Horacio
    International Conference on Language Resources and Evaluation
    p. 386-393
    Presentation's date: 2010
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Access to the full text
    ADN-classifier: automatically assigning denotation types to nominalizations  Open access

     Peris, Aina; Taulé, Mariona; Boleda Torrent, Gemma; Rodriguez Hontoria, Horacio
    International Conference on Language Resources and Evaluation
    p. 1-7
    Presentation's date: 2010-05
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This paper presents the ADN-Classifier, an Automatic classification system of Spanish Deverbal Nominalizations aimed at identifying its semantic denotation (i.e. event, result, underspecified, or lexicalized). The classifier can be used for NLP tasks such as coreference resolution or paraphrase detection. To our knowledge, the ADN-Classifier is the first effort in acquisition of denotations for nominalizations using Machine Learning.We compare the results of the classifier when using a decreasing number of Knowledge Sources, namely (1) the complete nominal lexicon (AnCora-Nom) that includes sense distictions, (2) the nominal lexicon (AnCora-Nom) removing the sense-specific information, (3) nominalizations’ context information obtained from a treebank corpus (AnCora-Es) and (4) the combination of the previous linguistic resources. In a realistic scenario, that is, without sense distinction, the best results achieved are those taking into account the information declared in the lexicon (89.40% accuracy). This shows that the lexicon contains crucial information (such as argument structure) that corpus-derived features cannot substitute for.

  • Access to the full text
    GeoTextMESS: result fusion with fuzzy Borda ranking in geographical information retrieval  Open access

     Buscaldi, Davide; Perea Ortega, Jose Manuel; Rosso, Paolo; Ureña López, L. Alfonso; Ferrés Domènech, Daniel; Rodriguez Hontoria, Horacio
    Lecture notes in computer science
    Vol. 5706, p. 867-874
    DOI: 10.1007/978-3-642-04447-2_114
    Date of publication: 2009
    Journal article

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    In this paper we discuss the integration of different GIR systems by means of a fuzzy Borda method for result fusion. Two of the systems, the one by the Universidad Politécnica de Valencia and the one of the Universidad of Jaén participated to the GeoCLEF task under the name TextMess. The proposed result fusion method takes as input the document lists returned by the different systems and returns a document list where the documents are ranked according to the fuzzy Borda voting scheme. The obtained results show that the fusion method allows to improve the results of the component systems, although the fusion is not optimal, because it is effective only if the components return a similar set of relevant documents.

    Postprint (author’s final draft)

  • hacia un sistema de clasificación automática de sustantivos deverbales

     Peris, Aina; Taulé Delor, Mariona; Rodriguez Hontoria, Horacio
    Procesamiento del lenguaje natural
    num. 43, p. 23-31
    Date of publication: 2009
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Anotación morfo-sintáctica y semántica de corpus: adquisición y uso

     Rodriguez Hontoria, Horacio
    Date of publication: 2009
    Book chapter

     Share Reference managers Reference managers Open in new window

  • Access to the full text
    CoCo, a web interface for corpora compilation  Open access

     España Bonet, Cristina; Vila Rigat, Marta; Rodriguez Hontoria, Horacio; Martí Antonin, Maria Antònia
    Procesamiento del lenguaje natural
    Vol. 43, p. 367-368
    DOI: 10.1.1.149.2790
    Date of publication: 2009
    Journal article

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    CoCo es una interfaz web colaborativa para la compilación de recursos lingüísticos. En esta demo se presenta una de sus posibles aplicaciones: la obtención de paráfrasis. / CoCo is a collaborative web interface for the compilation of linguistic resources. In this demo we are presenting one of its possible applications: paraphrase acquisition.

  • Automatically extending NE coverage of Arabic WordNet using Wikipedia

     Alkhalifa, Musa; Rodriguez Hontoria, Horacio
    International Conference on Arabic Language Processing
    p. 23-30
    Presentation's date: 2009
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Access to the full text
    TALP at GikiCLEF 2009  Open access

     Ferrés Domènech, Daniel; Rodriguez Hontoria, Horacio
    Conference on Multilingual and Multimodal Information Access Evaluation
    p. 1-4
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This paper describes our experiments in Geographical Information Retrieval with the Wikipedia collection in the context of our participation in the GikiCLEF 2009 Multilingual task in English and Spanish. Our system, called gikiTALP, follows a very simple approach that uses standard Information Retrieval with the Sphinx full-text search engine and some Natural Language Processing techniques without Geographical Knowdledge.

    Postprint (author’s final draft)

  • TALP at GeoCLEF 2007: results of a geographical knowledge filtering approach with Terrier

     Ferrés, Dani; Rodriguez Hontoria, Horacio
    Lecture notes in computer science
    num. 5152, p. 830-833
    DOI: 10.1007/978-3-540-85760-0_105
    Date of publication: 2008
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Kornai, András: Mathematical Linguistics (book review)

     Rodriguez Hontoria, Horacio
    Machine translation
    Vol. 21, num. 4, p. 253-256
    DOI: 10.1007/s10590-008-9043-4
    Date of publication: 2008
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • A Flexible Multitask Summarizer for Documents from Different Media, Domain and Language  Open access

     Fuentes Fort, Maria
    Department of Computer Science, Universitat Politècnica de Catalunya
    Theses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Automatic Summarization is probably crucial with the increase of document generation. Particularly when retrieving, managing and processing information have become decisive tasks. However, one should not expect perfect systems able to substitute human sumaries. The automatic sumarization process strongly depends not only on the characteristics of the documents, but also on user different needs.Thus, several aspects have to be taken into account when designing an information system for summarizing, because, depending on the characteristics of the input documents and the desired results, several techniques can be aplied. In order to suport this process, the final goal of the thesis is to provide a flexible multitask summarizer architecture. This goal is decomposed in three main research purposes. First, to study the process of porting systems to different summarization tasks, processing documents in different lenguages, domains or media with the aim of designing a generic architecture to permit the easy addition of new tasks by reusing existents tools. Second, the developes prototypes for some tasks involving aspects related with the lenguage, the media and the domain of the document or documents to be summarized as well as aspects related with the summary content: generic, novelly summaries, or summaries that give answer to a specific user need. Third, to create an evaluation framework to analyze the performance of several approaches in written news and scientific oral presentation domains, focusing mainly in its intrinsic evaluation.

    El resumen automático probablemente sea crucial en un momento en que la gran cantidad de documentos generados diariamente hace que recuperar, tratar y asimilar la información que contienen se haya convertido en una ardua y a su vez decisiva tarea. A pesar de ello, no podemos esperar que los resúmenes producidos de forma automática vayan a ser capaces de sustituir a los humanos. El proceso de resumen automático no sólo depende de las características propias de los documentos a ser resumidos, sino que es fuertemente dependiente de las necesidades específicas de los usuarios. Por ello, el diseño de un sistema de información para resumen conlleva tener en cuenta varios aspectos. En función de las características de los documentos de entrada y de los resultados deseados es posible aplicar distintas técnicas. Por esta razón surge la necesidad de diseñar una arquitectura flexible que permita la implementación de múltiples tareas de resumen. Este es el objetivo final de la tesis que presento dividido en tres subtemas de investigación. En primer lugar, estudiar el proceso de adaptabilidad de sistemas a diferentes tareas de resumen, como son procesar documentos producidos en diferentes lenguas, dominios y medios (sonido y texto), con la voluntad de diseñar una arquitectura genérica que permita la fácil incorporación de nuevas tareas a través de reutilizar herramientas existentes. En segundo lugar, desarrollar prototipos para distintas tareas, teniendo en cuenta aspectos relacionados con la lengua, el dominio y el medio del documento o conjunto de documentos que requieren ser resumidos, así como aspectos relacionados con el contenido final del resumen: genérico, novedad o resumen que de respuesta a una necesidad especifica. En tercer lugar, crear un marco de evaluación que permita analizar la competencia intrínseca de distintos prototipos al resumir noticias escritas y presentaciones científicas orales.

  • TALP at GeoQuery 2007: linguistic and geographical analysis for query parsing

     Ferrés, Dani; Rodriguez Hontoria, Horacio
    Lecture notes in computer science
    num. 5152, p. 834-837
    DOI: 10.1007/978-3-540-85760-0_106
    Date of publication: 2008
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Arabic WordNet: current state and future extensions

     Rodriguez Hontoria, Horacio; Farwell, David Loring; Farreres De La Morena, Javier; Bertran, Manuel; Alkhalifa, Musa; Martí Antonin, Maria Antònia; Elkateb, Sabri; Black, William; Kirk, James; Pease, Adam; Vossen, Piek; Felbaum, Christianne
    International WordNet Conference
    p. 387-405
    Presentation's date: 2008
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Arabic WordNet: semi-automatic extensions using Bayesian inference

     Rodriguez Hontoria, Horacio; Farwell, David Loring; Farreres De La Morena, Javier; Bertran, Manuel; Alkhalifa, Musa; Martí Antonin, Maria Antònia
    International Conference on Language Resources and Evaluation
    p. 1702-1706
    Presentation's date: 2008
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • TALP at TAC 2008: a semantic approach to recognizing textual entailment

     Ageno Pulido, Alicia; Cruz, Fermín; Farwell, David Loring; Ferrés, Daniel; Rodriguez Hontoria, Horacio; Turmo Borras, Jorge
    Text Analysis Conference
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Evaluation of terms and term extraction systems: a practical approach

     Vivaldi, Jorge; Rodriguez Hontoria, Horacio
    Terminology (Amsterdam)
    Vol. 13, num. 2, p. 225-248
    DOI: 10.1075/term.13.2.06viv
    Date of publication: 2007
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • TALP at GeoCLEF 2006: experiments using JIRS and Lucene with the ADL feature type thesaurus

     Ferrés, Dani; Rodriguez Hontoria, Horacio
    Lecture notes in computer science
    num. 4730, p. 962-969
    DOI: 10.1007/978-3-540-74999-8_124
    Date of publication: 2007
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • The UPC System for Arabic-to-English Entity Translation

     Farwell, David Loring; Gimenez Linares, Jesús Ángel; González Pellicer, Edgar; Halkoum, Reda; Rodriguez Hontoria, Horacio; Surdeanu, Mihai
    Automatic Content Extraction (ACE) Entity Translation (ET) 2007 Pilot Evaluation
    p. 1-12
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Access to the full text
    Support vector machines for query-focused summarization trained and evaluated on pyramid data  Open access

     Fuentes Fort, Maria; Alfonseca, Enrique; Rodriguez Hontoria, Horacio
    Annual Meeting of the Association for Computational Linguistics
    p. 57-60
    Presentation's date: 2007-06-25
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This paper presents the use of Support Vector Machines (SVM) to detect relevant information to be included in a queryfocused summary. Several SVMs are trained using information from pyramids of summary content units. Their performance is compared with the best performing systems in DUC-2005, using both ROUGE and autoPan, an automatic scoring method for pyramid evaluation.

  • Access to the full text
    FEMsum at DUC 2007  Open access

     Fuentes Fort, Maria; Rodriguez Hontoria, Horacio; Ferrés Domènech, Daniel
    Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics
    p. 1-7
    Presentation's date: 2007-06-26
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This paper describes and analyzes how the FEMsum system deals with DUC 2007 tasks of providing summary-length answers to complex questions, both background and just-the-news summaries. We participated in producing background summaries for the main task with the FEMsum approach that obtained better results in our last year participation. The FEMsum semantic based approach was adapted to deal with the update pilot task with the aim of producing just-the-news summaries.

    Postprint (author’s final draft)