Graphic summary
  • Show / hide key
  • Information


Scientific and technological production
  •  

1 to 22 of 22 results
  • Access to the full text
    MT techniques in a retrieval system of semantically enriched patents  Open access

     Gonzalez Bermudez, Meritxell; Mateva, Maria; Enache, Ramona; España Bonet, Cristina; Màrquez Villodre, Lluís; Popov, Borislav; Ranta, Aarne
    Machine Translation Summit
    Presentation's date: 2013-09-04
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This paper focuses on how automatic translation techniques integrated in a patent retrieval system increase its capabilities and make possible extended features and functionalities. We describe 1) a novel methodology for natural language to SPARQL translation based on a grammar¿ ontology interoperability automation and a query grammar for the patents domain; 2) a devised strategy for statisticalbased translation of patents that allows to transfer semantic annotations to the target language; 3) a built-in knowledge representation infrastructure that uses multilingual semantic annotations; and 4) an online application that offers a multilingual search interface over structural knowledge databases (domain ontologies) and multilingual documents (biomedical patents) that have been automatically translated.

    This paper focuses on how automatic translation techniques integrated in a patent retrieval system increase its capabilities and make possible extended features and functionalities. We describe 1) a novel methodology for natural language to SPARQL translation based on a grammar– ontology interoperability automation and a query grammar for the patents domain; 2) a devised strategy for statisticalbased translation of patents that allows to transfer semantic annotations to the target language; 3) a built-in knowledge representation infrastructure that uses multilingual semantic annotations; and 4) an online application that offers a multilingual search interface over structural knowledge databases (domain ontologies) and multilingual documents (biomedical patents) that have been automatically translated.

  • Access to the full text
    Deep evaluation of hybrid architectures: use of different metrics in MERT weight optimization  Open access

     España Bonet, Cristina; Labaka, Gorka; Díaz de Ilarraza Sánchez, Arantza; Màrquez Villodre, Lluís; Sarasola, Kepa
    Free/Open-Source Rule-Based Machine Translation
    Presentation's date: 2012-06-14
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    The process of developing hybrid MT systems is usually guided by an evaluation method used to compare different combinations of basic subsystems. This work presents a deep evaluation experiment of a hybrid architecture, which combines rule-based and statistical translation approaches. Differences between the results obtained from automatic and human evaluations corroborate the inappropriateness of pure lexical automatic evaluation metrics to compare the outputs of systems that use very different translation approaches. An examination of sentences with controversial results suggested that linguistic well-formedness should be considered in the evaluation of output translations. Following this idea, we have experimented with a new simple automatic evaluation metric, which combines lexical and PoS information. This measure showed higher agreement with human assessments than BLEU in a previous study (Labaka et al., 2011). In this paper we have extended its usage throughout the system development cycle, focusing on its ability to improve parameter optimization. Results are not totally conclusive. Manual evaluation reflects a slight improvement, compared to BLEU, when using the proposed measure in system optimization. However, the improvement is too small to draw any clear conclusion. We believe that we should first focus on integrating more linguistically representative features in the developing of the hybrid system, and then go deeper into the development of automatic evaluation metrics.

    The process of developing hybrid MT systems is usually guided by an evaluation method used to compare different combinations of basic subsystems. This work presents a deep evaluation experiment of a hybrid architecture, which combines rule-based and statistical translation approaches. Differences between the results obtained from automatic and human evaluations corroborate the inappropriateness of pure lexical automatic evaluation metrics to compare the outputs of systems that use very different translation approaches. An examination of sentences with controversial results suggested that linguistic well-formedness should be considered in the evaluation of output translations. Following this idea, we have experimented with a new simple automatic evaluation metric, which combines lexical and PoS information. This measure showed higher agreement with human assessments than BLEU in a previous study (Labaka et al., 2011). In this paper we have extended its usage throughout the system development cycle, focusing on its ability to improve parameter optimization. Results are not totally conclusive. Manual evaluation reflects a slight improvement, compared to BLEU, when using the proposed measure in system optimization. However, the improvement is too small to draw any clear conclusion. We believe that we should first focus on integrating more linguistically representative features in the developing of the hybrid system, and then go deeper into the development of automatic evaluation metrics.

  • Access to the full text
    Full machine translation for factoid question answering  Open access

     España Bonet, Cristina; Comas Umbert, Pere Ramon
    Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation and Hybrid Approaches to Machine Translation
    Presentation's date: 2012-04-23
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    In this paper we present an SMT-based approach to Question Answering (QA). QA is the task of extracting exact answers in response to natural language questions. In our approach, the answer is a translation of the question obtained with an SMT system. We use the n-best translations of a given question to find similar sentences in the document collection that contain the real answer. Although it is not the first time that SMT inspires a QA system, it is the first approach that uses a full Machine Translation system for generating answers. Our approach is validated with the datasets of the TREC QA evaluation.

  • Access to the full text
    A hybrid system for patent translation  Open access

     Enache, Ramona; España Bonet, Cristina; Ranta, Aarne; Màrquez Villodre, Lluís
    Conference of the European Association for Machine Translation
    Presentation's date: 2012-05-30
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This work presents a HMT system for patent translation. The system exploits the high coverage of SMT and the high precision of an RBMT system based on GF to deal with specific issues of the language. The translator is specifically developed to translate patents and it is evaluated in the English-French language pair. Although the number of issues tackled by the grammar are not extremely numerous yet, both manual and automatic evaluations consistently show their preference for the hybrid system in front of the two individual translators.

  • Access to the full text
    Context-aware machine translation for software localization  Open access

     Muntés Mulero, Víctor; Paladini Adell, Patricia; España Bonet, Cristina; Màrquez Villodre, Lluís
    Conference of the European Association for Machine Translation
    Presentation's date: 2012-05-28
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Software localization requires translating short text strings appearing in user interfaces (UI) into several languages. These strings are usually unrelated to the other strings in the UI. Due to the lack of semantic context, many ambiguity problems cannot be solved during translation. However, UI are composed of several visual components to which text strings are associated. Although this association might be very valuable for word disambiguation, it has not been exploited. In this paper, we present the problem of lack of context awareness for UI localization, providing real examples and identifying the main research challenges.

  • The patents retrieval prototype in the MOLTO project

     Chechev, Milen; Gonzalez Bermudez, Meritxell; Màrquez Villodre, Lluís; España Bonet, Cristina
    International World Wide Web Conference
    Presentation's date: 2012-04-16
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    This paper describes the patents retrieval prototype developed within the MOLTO project. The prototype aims to provide a multilingual natural language interface for querying the content of patent documents. The developed system is focused on the biomedical and pharmaceutical domain and includes the translation of the patent claims and abstracts into English, French and German. Aiming at the best retrieval results of the patent information and text content, patent documents are preprocessed and semantically annotated. Then, the annotations are stored and indexed in an OWLIM semantic repository, which contains a patent specific ontology and others from different domains. The prototype, accessible online at http://molto-patents.ontotext.com, presents a multilingual natural language interface to query the retrieval system. In MOLTO, the multilingualism of the queries is addressed by means of the GF Tool, which provides an easy way to build and maintain controlled language grammars for interlingual translation in limited domains. The abstract representation obtained from the GF is used to retrieve both the matched RDF instances and the list of patents semantically related to the user's search criteria. The online interface allows to browse the retrieved patents and shows on the text the semantic annotations that explain the reason why any particular patent has matched the user's criteria.

    This paper describes the patents retrieval prototype developed within the MOLTO project. The prototype aims to provide a multilingual natural language interface for querying the content of patent documents. The developed system is focused on the biomedical and pharmaceutical domain and includes the translation of the patent claims and abstracts into English, French and German. Aiming at the best retrieval results of the patent information and text content, patent documents are preprocessed and semantically annotated. Then, the annotations are stored and indexed in an OWLIM semantic repository, which contains a patent speci c ontology and others from di erent domains. The prototype, accessible online at http://molto-patents. ontotext.com, presents a multilingual natural language interface to query the retrieval system. In MOLTO, the multilingualism of the queries is addressed by means of the GF Tool, which provides an easy way to build and maintain controlled language grammars for interlingual translation in limited domains. The abstract representation obtained from the GF is used to retrieve both the matched RDF instances and the list of patents semantically related to the user's search criteria. The online interface allows to browse the retrieved patents and shows on the text the semantic annotations that explain the reason why any particular patent has matched the user's criteria.

  • Access to the full text
    Patent translation within the MOLTO project  Open access

     España Bonet, Cristina; Enache, Ramona; Slaski, Adam; Ranta, Aarne; Màrquez Villodre, Lluís; Gonzalez Bermudez, Meritxell
    Workshop on Patent Translation
    Presentation's date: 2011-09-23
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    MOLTO is an FP7 European project whose goal is to translate texts between multiple languages in real time with high quality. Patents translation is a case of study where research is focused on simultaneously obtaining a large coverage without loosing quality in the translation. This is achieved by hybridising between a grammar-based multilingual translation system, GF, and a specialised statistical machine translation system. Moreover, both individual systems by themselves already represent a step forward in the translation of patents in the biomedical domain, for which the systems have been trained.

  • Access to the full text
    Hybrid machine translation guided by a rule-based system  Open access

     España Bonet, Cristina; Màrquez Villodre, Lluís; Labaka, Gorka; Sarasola, Kepa; Díaz de Ilarraza Sánchez, Arantza
    Machine Translation Summit
    Presentation's date: 2011-09-22
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This paper presents a machine translation architecture which hybridizes Matxin, a rulebased system, with regular phrase-based Statistical Machine Translation. In short, the hybrid translation process is guided by the rulebased engine and, before transference, a set of partial candidate translations provided by SMT subsystems is used to enrich the treebased representation. The final hybrid translation is created by choosing the most probable combination among the available fragments with a statistical decoder in a monotonic way. We have applied the hybrid model to a pair of distant languages, Spanish and Basque, and according to our evaluation (both automatic and manual) the hybrid approach significantly outperforms the best SMT system on out-of-domain data.

    Postprint (author’s final draft)

  • Access to the full text
    Deep evaluation of hybrid architectures: simple metrics correlated with human judgments  Open access

     Labaka, Gorka; Sarasola, Kepa; Díaz de Ilarraza Sánchez, Arantza; España Bonet, Cristina; Màrquez Villodre, Lluís
    International Workshop on Using Linguistic Information for Hybrid Machine Translation
    Presentation's date: 2011-11-18
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    The process of developing hybrid MT systems is guided by the evaluation method used to compare different combinations of basic subsystems. This work presents a deep evaluation experiment of a hybrid architecture that tries to get the best of both worlds, rule-based and statistical. In a first evaluation human assessments were used to compare just the single statistical system and the hybrid one, the rule-based system was not compared by hand because the results of automatic evaluation showed a clear disadvantage. But a second and wider evaluation experiment surprisingly showed that according to human evaluation the best system was the rule-based, the one that achieved the worst results using automatic evaluation. An examination of sentences with controversial results suggested that linguistic well-formedness in the output should be considered in evaluation. After experimenting with 6 possible metrics we conclude that a simple arithmetic mean of BLEU and BLEU calculated on parts of speech of words is clearly a more human conformant metric than lexical metrics alone.

    Postprint (author’s final draft)

  • Language technology challenges of a small language (Catalan)

     Melero, Maite; Boleda Torrent, Gemma; Cuadros Oller, Montserrat; España Bonet, Cristina; Padró Cirera, Lluís; Quixal, Martí; Rodríguez, Carlos; Saurí, Roser
    International Conference on Language Resources and Evaluation
    Presentation's date: 2010-05
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    In this paper, we present a brief snapshot of the state of affairs in computational processing of Catalan and the initiatives that are starting to take place in an effort to bring the field a step forward, by making a better and more efficient use of the already existing resources and tools, by bridging the gap between research and market, and by establishing periodical meeting points for the community. In particular, we present the results of the First Workshop on the Computational Processing of Catalan, which succeeded in putting together a fair representation of the research in the area, and received attention from both the industry and the administration. Aside from facilitating communication among researchers and between developers and users, the Workshop provided the organizers with valuable information about existing resources, tools, developers and providers. This information has allowed us to go a step further by setting up a “harvesting” procedure which will hopefully build the seed of a portal-catalogue-observatory of language resources and technologies in Catalan

  • Access to the full text
    Robust estimation of feature weights in statistical machine translation  Open access

     España Bonet, Cristina; Màrquez Villodre, Lluís
    Annual Conference of the European Association for Machine Translation
    Presentation's date: 2010
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Weights of the various components in a standard Statistical Machine Translation model are usually estimated via Minimum Error Rate Training. With this, one finds their optimum value on a development set with the expectation that these optimal weights generalise well to other test sets. However, this is not always the case when domains differ. This work uses a perceptron algorithm to learn more robust weights to be used on out-of-domain corpora without the need for specialised data. For an Arabic-to-English translation system, the generalisation of weights represents an improvement of more than 2 points of BLEU with respect to the MERT baseline using the same information.

  • Multilingual On-Line Translation

     Rodriguez Hontoria, Horacio; Gonzalez Bermudez, Meritxell; España Bonet, Cristina; Farwell, David Loring; Carreras Perez, Xavier; Xambó Descamps, Sebastian; Màrquez Villodre, Lluís; Padró Cirera, Lluís; Saludes Closa, Jordi
    Participation in a competitive project

     Share

  • Discriminative Phrase-Based Models for Arabic Machine Translation

     España Bonet, Cristina; Gimenez Linares, Jesús Ángel; Màrquez Villodre, Lluís
    ACM transactions on asian language information processing
    Date of publication: 2009
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Type Ia SNe along redshift: the R(SiII) ratio and the expansion velocities in intermediate z supernovae

     Altavilla, Giuseppe; Ruiz Lapuente, Pilar; Balastegui Manso, Andreu; Mendez, Javier; Irwin, M; España Bonet, Cristina; Ellis, R. S.; Folatelli, G.; Goobar, Ariel; Hillebrandt, Wolfgang; McMahon, R. M.; Nobili, Serena; Stanishev, V.; Walton, N. A.
    Astrophysical journal
    Date of publication: 2009
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Access to the full text
    CoCo, a web interface for corpora compilation  Open access

     España Bonet, Cristina; Vila Rigat, Marta; Rodriguez Hontoria, Horacio; Martí, Maria Antònia
    Procesamiento del lenguaje natural
    Date of publication: 2009
    Journal article

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    CoCo es una interfaz web colaborativa para la compilación de recursos lingüísticos. En esta demo se presenta una de sus posibles aplicaciones: la obtención de paráfrasis. / CoCo is a collaborative web interface for the compilation of linguistic resources. In this demo we are presenting one of its possible applications: paraphrase acquisition.

  • Sobre la I Jornada del Processament Computacional del Català

     Boleda Torrent, Gemma; Cuadros Oller, Montserrat; España Bonet, Cristina; Melero, Maite; Padró Cirera, Lluís; Quixal, Martí; Rodríguez, Carlos
    Llengua i ús
    Date of publication: 2009
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    El processament computacional de la llengua abraça qualsevol activitat relacionada amb la creació, gestió i utilització de tecnologia i recursos lingüístics. En el pla científic, aquesta activitat és central en disciplines com ara la lingüística de corpus, l'enginyeria lingüística, o el processament del llenguatge natural escrit o parlat. En el pla quotidià, el processament s'inclou en un ampli ventall d'aplicacions cada cop més habituals: sistemes automàtics d'atenció telefònica, traducció automàtica, etc. La gran majoria d'aquestes aplicacions requereixen eines i recursos lingüístics específics per a cada llengua. Per a llengües amb un mercat ampli, com l'anglès o el castellà, l'oferta de productes i serveis basats en tecnologia lingüística és variada i habitual. Per al cas de llengües com el català, és més difícil trobar productes i serveis que s'ofereixin ja "de fàbrica" amb aquesta tecnologia. Per tal de reflectir l'estat actual de les tecnologies de la llengua aplicades al català, de posar en contacte els membres d'aquesta comunitat, i d'impulsar iniciatives que les potenciïn, el març del 2009 es va celebrar al Palau Robert de Barcelona la primera Jornada del Processament Computacional del Català (JPCC). La Jornada tenia l'objectiu d'esdevenir un punt de trobada i alhora un aparador per als grups de recerca de l'àrea, i encetar el debat sobre com articular la comunitat per tal de potenciar l'ús i el desenvolupament del català tant en la tecnologia lingüística com en els productes i serveis que en depenen. Aquest article presenta un resum del contingut i les conclusions de la Jornada.

  • Primera Jornada del Processament Computacional del Català

     Boleda Torrent, Gemma; Cuadros Oller, Montserrat; España Bonet, Cristina; Melero, Maite; Padró Cirera, Lluís; Quixal, Martí; Rodríguez, Carlos
    Procesamiento del lenguaje natural
    Date of publication: 2009-09
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Presentamos las conclusiones de la primera Jornada del Processament Computacional del Català, celebrado en Barcelona en marzo del 2009. We present the conclusions of the first Jornada del Processament Computacional del Català, held in Barcelona on March 2009

  • Premi ACIA a tesis de màster curs 2007/2008

     España Bonet, Cristina
    Award or recognition

     Share

  • El català i les tecnologies de la llengua

     Boleda Torrent, Gemma; Cuadros Oller, Montserrat; España Bonet, Cristina; Melero, Maite; Padró Cirera, Lluís; Quixal, Martí; Rodríguez, Carlos
    Llengua, Societat i Comunicació
    Date of publication: 2009
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    El processament computacional de la llengua abraça qualsevol activitat relacionada amb la creació, gestió i utilització de tecnologia i recursos lingüístics. En el pla científic, aquesta activitat és central en disciplines com ara la lingüística de corpus, l'enginyeria lingüística, o el processament del llenguatge natural escrit o parlat. En el pla quotidià, s'inclou en un ampli ventall d'aplicacions cada cop més habituals: sistemes automàtics d'atenció telefònica, traducció automàtica, etc. La gran majoria d'aquestes aplicacions requereixen eines i recursos lingüístics específics per a cada llengua. Per a llengües amb un mercat ampli, com l'anglès o el castellà, l'oferta de productes i serveis basats en tecnologia lingüística és variada i habitual. Per al cas de llengües com el català, és més difícil trobar productes i serveis que s'ofereixin ja “de fàbrica” amb aquesta tecnologia. Aquest article presenta una panoràmica de l'estat actual de les tecnologies de la llengua per al català, així com diversos aspectes que avui dia es debaten en el si de la comunitat científica dedicada al processament del llenguatge natural parlat i escrit.

  • Tracing the equation of state and the density of cosmological constant along z

     España Bonet, Cristina; Ruiz Lapuente, Pilar
    Journal of cosmology and astroparticle physics
    Date of publication: 2008
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Testing the running of the cosmological constant with Type Ia Supernovae at high z

     España Bonet, Cristina; Ruiz Lapuente, Pilar; Shapiro, Ilya; Solà Peracaula, Joan
    Journal of cosmology and astroparticle physics
    Date of publication: 2004
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Variable cosmological constant as a Planck scale effect

     Shapiro, Ilya; Solà Peracaula, Joan; España Bonet, Cristina; Ruiz Lapuente, Pilar
    Physics letters B
    Date of publication: 2003
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window