Graphic summary
  • Show / hide key
  • Information


Scientific and technological production
  •  

1 to 50 of 224 results
  • Improving Statistical Machine Translation Through Adaptation and Learning  Open access

     Henriquez Quintana, Carlos Alberto
    Defense's date: 2014-03-07
    Department of Signal Theory and Communications, Universitat Politècnica de Catalunya
    Theses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Amb l'aparició dels sistemes gratuïts de traducció automàtica (TA) a Internet sorgeix la possibilitat de millorar les traduccions automàtiques amb l'ajuda dels usuaris d'aquests sistemes. Un dels mètodes per aconseguir aquestes millores és sol·licitar als usuaris que proveeixin una millor traducció. És possible que el sistema hagi comès un error i si l'usuari és capaç de detectar-lo, seria de gran ajuda que li ensenyi al sistema on va cometre l'error, de manera que pugui ser corregit per a una futura ocasió. Actualment, la majoria dels sistemes de traducció que es troben a Internet proporcionen una àrea de text perquè els usuaris sugereixin una millor traducció (com el traductor de Google) o un sistema de classificació per puntuar una traducció (com el sistema de Microsoft).    El 2009, com a part del Setè Programa Macro de la Comissió Europea, es dóna inici al Projecte FAUST amb l'objectiu de "desenvolupar sistemes de traducció automàtica que responguin amb rapidesa i de forma intel·ligent a la retroalimentació dels usuaris" . En concret, un dels objectius del projecte era "desenvolupar mecanismes per incorporar de forma instantània la resposta dels usuaris en els motors de TA que s'utilitzen en entorns de producció, ...". Com a membre del Projecte FAUST, aquesta tesi es centra en el desenvolupament d'un mecanisme d'aquest tipo.    Formalment, l'objectiu general d'aquest treball va ser dissenyar i implementar una estratègia per millorar la qualitat de la traducció d'un sistema de traducció estadística prèviament entrenat, amb traduccions humanes que siguin correccions de traduccions automàtiques del sistema.    Per fer front a aquest problema el dividim en tres objectius específics:    1. Definir una relació entre les paraules d'una frase corregida i les paraules de la traducció del sistema, per tal de detectar els errors que el sistema hagi comès.    2. Incloure les correccions d'aquests errors en el sistema original, de manera que aprengui a solucionar-los en cas que es produeixi una situació similar.    3. Provar l'estratègia en diferents escenaris i amb diferents dades, per tal de validar les aplicacions de la metodologia proposada .        Les principals aportacions realitzades en el camp de traducció automàtica estadística que poden trobar-se en aquesta tesi doctoral són:    - Definim una funció de similitud que compara la sortida d'un sistema de TA amb una referència de traducció per a aquesta sortida i alinea els errors comesos amb les traduccions correctes trobades en la referència. Aquesta informació s'utilitza per calcular un alineament entre la frase original i la referència.    - Definim un mètode per dur a terme l'adaptació de domini basat en l'alineació abans esmentada. Utilitzant aquest alineat amb un corpus paral·lel pertanyent al domini a adaptar, extraiem unitats de traducció que corresponen tant a unitats existents que van ser utilitzades correctament pel traductor, com a unitats noves que corregeixen els errors de traducció detectats durant el alineat.    - Apliquem amb èxit el mètode en un escenari real: millorar la qualitat de traducció d'un sistema de traducció automàtica estadística, usant post edicions facilitades per usuaris reals d'aquest sistema.    - El mètode proposat en aquesta tesi és capaç d'aconseguir millores significatives en la qualitat de la traducció amb un material d'aprenentatge petit, corresponent al 0,5% del material utilitzat per a l'entrenament del sistema original. Els resultats de les nostres avaluacions també indiquen que la millora aconseguida amb l'estratègia d'adaptació de domini és observable tant en mètriques d'avaluació automàtica com en mètriques d'avaluació manual.

    With the arrival of free on-line machine translation (MT) systems, came the possibility to improve automatic translations with the help of daily users. One of the methods to achieve such improvements is to ask to users themselves for a better translation. It is possible that the system had made a mistake and if the user is able to detect it, it would be a valuable help to let the user teach the system where it made the mistake so it does not make it again if it finds a similar situation. Most of the translation systems you can find on-line provide a text area for users to suggest a better translation (like Google translator) or a ranking system for them to use (like Microsoft's). In 2009, as part of the Seventh Framework Programme of the European Commission, the FAUST project started with the goal of developing "machine translation (MT) systems which respond rapidly and intelligently to user feedback". Specifically, one of the project objective was to "develop mechanisms for instantaneously incorporating user feedback into the MT engines that are used in production environments, ...". As a member of the FAUST project, this thesis focused on developing one such mechanism. Formally, the general objective of this work was to design and implement a strategy to improve the translation quality of an already trained Statistical Machine Translation (SMT) system, using translations of input sentences that are corrections of the system's attempt to translate them. To address this problem we divided it in three specific objectives: 1. Define a relation between the words of a correction sentence and the words in the system's translation, in order to detect the errors that the former is aiming to solve. 2. Include the error corrections in the original system, so it learns how to solve them in case a similar situation occurs. 3. Test the strategy in different scenarios and with different data, in order to validate the applications of the proposed methodology. The main contributions made to the SMT field that can be found in this Ph.D. thesis are: - We defined a similarity function that compares an MT system output with a translation reference for that output and align the errors made by the system with the correct translations found in the reference. This information is then used to compute an alignment between the original input sentence and the reference. - We defined a method to perform domain adaptation based on the alignment mentioned before. Using this alignment with an in-domain parallel corpus, we extract new translation units that correspond both to units found in the system and were correctly chosen during translation and new units that include the correct translations found in the reference. These new units are then scored and combined with the units in the original system in order to improve its quality in terms of both human an automatic metrics. - We succesfully applied the method in a new task: to improve a SMT translation quality using post-editions provided by real users of the system. In this case, the alignment was computed over a parallel corpus build with post-editions, extracting translation units that correspond both to units found in the system and were correctly chosen during translation and new units that include the corrections found in the feedback provided. - The method proposed in this dissertation is able to achieve significant improvements in translation quality with a small learning material, corresponding to a 0.5% of the training material used to build the original system. Results from our evaluations also indicate that the improvement achieved with the domain adaptation strategy is measurable by both automatic a human-based evaluation metrics.

    Esta tesis propone un nuevo método para mejorar un sistema de Traducción Automática Estadística (SMT por sus siglas en inglés) utilizando post-ediciones de sus traducciones automáticas. La estrategia puede asociarse con la adaptación de dominio, considerando las post-ediciones obtenidas a través de usuarios reales del sistema de traducción como el material del dominio a adaptar. El método compara las post-ediciones con las traducciones automáticas con la finalidad de detectar automáticamente los lugares en los que el traductor cometió algún error, para poder aprender de ello. Una vez los errores han sido detectados se realiza un alineado a nivel de palabras entre las oraciones originales y las postediciones, para extraer unidades de traducción que son luego incorporadas al sistema base de manera que se corrijan los errores en futuras traducciones. Nuestros resultados muestran mejoras estadísticamente significativas a partir de un conjunto de datos que representa en tamaño un 0, 5% del material utilizado durante el entrenamiento. Junto con las medidas automáticas de calidad, también presentamos un análisis cualitativo del sistema para validar los resultados. Las mejoras en la traducción se observan en su mayoría en el léxico y el reordenamiento de palabras, seguido de correcciones morfológicas. La estrategia, que introduce los conceptos de corpus aumentado, función de similaridad y unidades de traducción derivadas, es probada con dos paradigmas de SMT (traducción basada en N-gramas y en frases), con dos pares de lengua (Catalán-Español e Inglés-Español) y en diferentes escenarios de adaptación de dominio, incluyendo un dominio abierto en el cual el sistema fue adaptado a través de peticiones recogidas por usuarios reales a través de internet, obteniendo resultados similares durante todas las pruebas. Los resultados de esta investigación forman parte del projecto FAUST (en inglés, Feedback Analysis for User adaptive Statistical Translation), un proyecto del Séptimo Programa Marco de la Comisión Europea.

  • The TALP-UPC phrase-based translation systems for WMT13: system combination with morphology generation, domain adaptation and corpus filtering

     Formiga Fanals, Lluis; Ruiz Costa-Jussà, Marta; Mariño Acebal, Jose Bernardo; Rodríguez Fonollosa, José Adrián; Barron Cedeño, Luis Alberto; Màrquez Villodre, Lluís
    Workshop on Statistical Machine Translation
    Presentation's date: 2013-08-08
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    This paper describes the TALP participation in the WMT13 evaluation campaign. Our participation is based on the combination of several statistical machine translation systems: based on standard hrasebased Moses systems. Variations include techniques such as morphology generation, training sentence filtering, and domain adaptation through unit derivation. The results show a coherent improvement on TER, METEOR, NIST, and BLEU scores when compared to our baseline system.

  • Study and comparison of rule-based and statistical catalan-spanish machine translation systems

     Ruiz Costa-Jussà, Marta; Farrús, Mireia; Mariño Acebal, Jose Bernardo; Rodríguez Fonollosa, José Adrián
    Computing and informatics
    Date of publication: 2012
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Pivot strategies as an alternative for statistical machine translation tasks involving Iberian languages

     Henriquez Quintana, Carlos Alberto; Ruiz Costa-Jussà, Marta; Banchs, Rafael E.; Formiga Fanals, Lluis; Mariño Acebal, Jose Bernardo
    CEUR Workshop proceedings
    Date of publication: 2012-01-10
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Access to the full text
    Improving English to Spanish out-of-domain translations by morphology generalization and generation  Open access

     Formiga Fanals, Lluis; Hernandez Huerta, Adolfo; Mariño Acebal, Jose Bernardo; Monte Moreno, Enrique
    Monolingual Machine Translation Workshop
    Presentation's date: 2012-11-01
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This paper presents a detailed study of a method for morphology generalization and generation to address out-of-domain translations in English-to-Spanish phrase-based MT. The paper studies whether the morphological richness of the target language causes poor quality translation when translating out-ofdomain. In detail, this approach first translates into Spanish simplified forms and then predicts the final inflected forms through a morphology generation step based on shallow and deep-projected linguistic information available from both the source and targetlanguage sentences. Obtained results highlight the importance of generalization, and therefore generation, for dealing with out-ofdomain data.

  • Access to the full text
    The TALP-UPC phrase-based translation systems for WMT12: morphology simplification and domain adaptation  Open access

     Formiga Fanals, Lluis; Henriquez Quintana, Carlos Alberto; Hernandez Huerta, Adolfo; Mariño Acebal, Jose Bernardo; Monte Moreno, Enrique; Rodríguez Fonollosa, José Adrián
    Workshop on Statistical Machine Translation
    Presentation's date: 2012-06-08
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This paper describes the UPC participation in the WMT 12 evaluation campaign. All sys- tems presented are based on standard phrase- based Moses systems. Variations adopted sev- eral improvement techniques such as mor- phology simplification and generation and do- main adaptation. The morphology simpli- fication overcomes the data sparsity prob- lem when translating into morphologically- rich languages such as Spanish by translat- ing first to a morphology-simplified language and secondly leave the morphology gener- ation to an independent classification task. The domain adaptation approach improves the SMT system by adding new translation units learned from MT-output and reference align- ment. Results depict an improvement on TER, METEOR, NIST and BLEU scores compared to our baseline system, obtaining on the of- ficial test set more benefits from the domain adaptation approach than from the morpho- logical generalization method.

  • Access to the full text
    Overcoming statistical machine translation limitations: error analysis and proposed solutions for the Catalan¿Spanish language pair  Open access

     Mariño Acebal, Jose Bernardo; Ruiz Costa-Jussà, Marta; Poch, Marc; Hernandez Huerta, Adolfo; Herníquez, Carlos; Rodríguez Fonollosa, José Adrián; Farrús Cabecerán, Mireia
    Language resources and evaluation
    Date of publication: 2011-02-20
    Journal article

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This work aims to improve anN-gram-based statistical machine translation system between the Catalan and Spanish languages, trained with an aligned Spanish– Catalan parallel corpus consisting of 1.7 million sentences taken from El Periódico newspaper. Starting from a linguistic error analysis above this baseline system, orthographic, morphological, lexical, semantic and syntactic problems are approached using a set of techniques. The proposed solutions include the development and application of additional statistical techniques, text pre- and post-processing tasks, and rules based on the use of grammatical categories, as well as lexical categorization. The performance of the improved system is clearly increased, as is shown in both human and automatic evaluations of the system, with a gain of about 1.1 points BLEU observed in the Spanish-to-Catalan direction of translation, and a gain of about 0.5 points in the reverse direction. The final system is freely available online as a linguistic resource

  • Ncode: an Open Source Bilingual N-gram SMT Toolkit

     Crego, Josep Maria; Yvon, François; Mariño Acebal, Jose Bernardo
    The Prague Bulletin of Mathematical Linguistics
    Date of publication: 2011-10
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Feedback Analysis for User adaptive Statistical Translation

     Màrquez Villodre, Lluís; Formiga Fanals, Lluis; Mariño Acebal, Jose Bernardo; Gonzalez Bermudez, Meritxell; Rodríguez Fonollosa, José Adrián; Monte Moreno, Enrique; Barron Cedeño, Luis Alberto
    Participation in a competitive project

     Share

  • Feedback Analysis for User adaptive Statistical Translation

     Mariño Acebal, Jose Bernardo
    Participation in a competitive project

     Share

  • BUSQUEDA DE INFORMACIÓN EN CONTENIDOS AUDIOVISUALES PLURILINGUES

     Esquerra Llucià, Ignasi; Monte Moreno, Enrique; Rodríguez Fonollosa, José Adrián; Bonafonte Cavez, Antonio Jesus; Polyakova, Tatyana; Mariño Acebal, Jose Bernardo; Ruiz Costa-jussa, Marta; Adell Mercado, Jordi; Moreno Bilbao, M. Asuncion
    Participation in a competitive project

     Share

  • Access to the full text
    GILABVIR: Virtual laboratories and remote laboratories in engineering. A teaching innovation group of interest  Open access

     Cabrera Bean, Margarita Asuncion; Bragos Bardia, Ramon; Pérez, Marimar; Mariño Acebal, Jose Bernardo; Rius Casals, Juan-manuel; Gomis Bellmunt, Oriol; Casañ Guerrero, Maria Jose; Gironella i Cobos, Framcesc Xavier
    IEEE Annual Engineering Education Conference
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    GILABVIR (Grup d’Interès en Laboratoris Virtuals i Remots) is a recently created Virtual and Remote Laboratory Group of Interest of UPC (Universitat Politècnica de Catalunya) and it is integrated in a more general teaching innovation project. RIMA [1], [2]. RIMA has been developed to promote research on the use of innovative learning methodologies applied to engineering education and it was specially created to assess in the new European higher education adaptation process.

  • Access to the full text
    Automatic and human evaluation study of a rule-based and a statistical Catalan-Spanish machine translation systems  Open access

     Ruiz Costa-Jussà, Marta; Farrús Cabecerán, Mireia; Mariño Acebal, Jose Bernardo; Rodríguez Fonollosa, José Adrián
    International Conference on Language Resources and Evaluation
    Presentation's date: 2010-05-20
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Machine translation systems can be classified into rule-based and corpus-based approaches, in terms of their core technology. Since both paradigms have largely been used during the last years, one of the aims in the research community is to know how these systems differ in terms of translation quality. To this end, this paper reports a study and comparison of a rule-based and a corpus-based (particularly, statistical) Catalan-Spanish machine translation systems, both of them freely available in the web. The translation quality analysis is performed under two different domains: journalistic and medical. The systems are evaluated by using standard automatic measures, as well as by native human evaluators. Automatic results show that the statistical system performs better than the rule-based system. Human judgements show that in the Spanishto- Catalan direction the statistical system also performs better than the rule-based system, while in the Catalan-to-Spanish direction is the other way round. Although the statistical system obtains the best automatic scores, its errors tend to be more penalized by human judgements than the errors of the rule-based system. This can be explained because statistical errors are usually unexpected and they do not follow any pattern.

  • Linguistic-based evaluation criteria to identify statistical machine translation errors

     Farrús, Mireia; Ruiz Costa-Jussà, Marta; Mariño Acebal, Jose Bernardo; Rodríguez Fonollosa, José Adrián
    Annual Conference of the European Association for Machine Translation
    Presentation's date: 2010-05-28
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • The TALP-UPC Ngram-Based Statistical Machine Translation for ACL-WMT 2008

     Khalilov, M; Hernández, A; Costa-Jussà, M R; Crego, J M; Henríquez, C A Q; Lambert, P; Rodríguez Fonollosa, José Adrián; Mariño Acebal, Jose Bernardo; Banchs, R
    46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies
    Presentation's date: 2009-06
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • ORGANIZACIÓN DEL 13º CONGRESO ANUAL DE LA ASOCIACIÓN EUROPEA PARA LA TRADUCCIÓNAUTOMATICA, EAMT-2009

     Farwell, David Loring; Mariño Acebal, Jose Bernardo; Màrquez Villodre, Lluís; Rodríguez Fonollosa, José Adrián
    Participation in a competitive project

     Share

  • VEU: GRUP DE TRACTAMENT DE LA PARLA

     Bonafonte Cavez, Antonio Jesus; Casar Lopez, Marta; Ruiz Costa-jussa, Marta; Nogueiras Rodriguez, Albino; Esquerra Llucià, Ignasi; Salavedra Moli, Josep; Farrús Cabecerán, Mireia; Hernando Pericas, Francisco Javier; Rodríguez Fonollosa, José Adrián; Monte Moreno, Enrique; Mariño Acebal, Jose Bernardo; Nadeu Camprubí, Climent; Moreno Bilbao, M. Asuncion; Vallverdu Bayes, Francisco
    Participation in a competitive project

     Share

  • Access to the full text
    The TALP on-line Spanish-Catalan machine-translation system  Open access

     Poch, Manel; Farrús, Mireia; Ruiz Costa-Jussà, Marta; Mariño Acebal, Jose Bernardo; Hernández, Adolfo; Henríquez, Carlos; Rodríguez Fonollosa, José Adrián
    Joint SIG-IL/Microsoft Workshop on Speech and Language Technologies for Iberian Languages
    Presentation's date: 2009-09-04
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    In this paper the statistical machine translator (SMT) between Catalan and Spanish developed at the TALP research center (UPC) and its web demonstration are described.

  • 4.1.1 Descripción de las Técnicas Desarrolladas

     Bonafonte Cavez, Antonio Jesus; Hernando Pericas, Francisco Javier; Mariño Acebal, Jose Bernardo; Moreno Bilbao, M. Asuncion; Nadeu Camprubí, Climent
    Date: 2008-09
    Report

     Share Reference managers Reference managers Open in new window

  • Architecture and Modeling for N-gram-based Statistical Machine Translation

     Crego Clemente, Jose Maria
    Defense's date: 2008-04-18
    Department of Signal Theory and Communications, Universitat Politècnica de Catalunya
    Theses

     Share Reference managers Reference managers Open in new window

  • Access to the full text
    The TALP & I2R SMT Systems for IWSLT 2008  Open access

     Khalilov, M; Costa-Jussà, M R; Henríquez, C A Q; Rodríguez Fonollosa, José Adrián; Hernández, A; Mariño Acebal, Jose Bernardo; Banchs, R; Chen, B; Zhang, M; Aw, A; Li, H
    International Workshop on Spoken Language Translation
    Presentation's date: 2008-10
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This paper gives a description of the statistical machine translation (SMT) systems developed at the TALP Research Center of the UPC (Universitat Polit`ecnica de Catalunya) for our participation in the IWSLT’08 evaluation campaign. We present Ngram-based (TALPtuples) and phrase-based (TALPphrases) SMT systems. The paper explains the 2008 systems’ architecture and outlines translation schemes we have used, mainly focusing on the new techniques that are challenged to improve speech-to-speech translation quality. The novelties we have introduced are: improved reordering method, linear combination of translation and reordering models and new technique dealing with punctuation marks insertion for a phrase-based SMT system. This year we focus on the Arabic-English, Chinese-Spanish and pivot Chinese-(English)-Spanish translation tasks.

  • Learning engineering ethics by debate

     Nadeu Camprubí, Climent; Mariño Acebal, Jose Bernardo; Farrús Cabecerán, Mireia
    International Conference on Ehtics and Human Values in Engineering.
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • INTRODUCING LINGUISTIC KNOWLEDGE INTO STATISTICAL MACHINE TRANSLATION  Open access

     de Gispert Ramis, Adrià
    Defense's date: 2007-01-26
    Department of Signal Theory and Communications, Universitat Politècnica de Catalunya
    Theses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Aquesta tesi està dedicada a l'estudi de la utilització de informació morfosintàctica en el marc dels sistemes de traducció estocàstica, amb l'objectiu de millorar-ne la qualitat a través de la incorporació de informació lingüística més enllà del nivell simbòlic superficial de les paraules.El sistema de traducció estocàstica utilitzat en aquest treball segueix un enfocament basat en tuples, unitats bilingües que permeten estimar un model de traducció de probabilitat conjunta per mitjà de la combinació, dins un entorn log-linial, de cadenes d'n-grames i funcions característiques addicionals. Es presenta un estudi detallat d'aquesta aproximació, que inclou la seva transformació des d'una implementació d'X-grames en autòmats d'estats finits, més orientada a la traducció de veu, cap a l'actual solució d'n-grames orientada a la traducció de text de gran vocabulari. La tesi estudia també les fases d'entrenament i decodificació, així com el rendiment per a diferents tasques (variant el tamany dels corpora o el parell d'idiomes) i els principals problemes reflectits en les anàlisis d'error.La tesis també investiga la incorporació de informació lingüística específicament en aliniament per paraules. Es proposa l'extensió mitjançant classificació de formes verbals d'un algorisme d'aliniament paraula a paraula basat en co-ocurrències, amb resultats positius. Així mateix, s'avalua de forma empírica l'impacte en qualitat d'aliniament i de traducció que s'obté mitjançant l'etiquetatge morfològic, la lematització, la classificació de formes verbals i el truncament o stemming del text paral·lel.Pel que fa al model de traducció, es proposa un model de tractament de les formes verbals per mitjà d'un model de instanciació addicional, i es realitzen experiments en la direcció d'anglès a castellà. La tesi també introdueix un model de llenguatge d'etiquetes morfològiques del destí per tal d'abordar problemes de concordança. Finalment, s'estudia l'impacte de la derivació morfològica en la formulació de la traducció estocàstica mitjançant n-grames, avaluant empíricament el possible guany derivat d'estratègies de reducció morfològica.

    This Ph.D. thesis dissertation addresses the use of morphosyntactic information in order to improve the performance of Statistical Machine Translation (SMT) systems, providing them with additional linguistic information beyond the surface level of words from parallel corpora.The statistical machine translation system in this work here follows a tuple-based approach, modelling joint-probability translation models via log-linear combination of bilingual n-grams with additional feature functions. A detailed study of the approach is conducted. This includes its initial development from a speech-oriented Finite-State Transducer architecture implementing X-grams towards a large-vocabulary text-oriented n-grams implementation, training and decoding particularities, portability across language pairs and tasks, and main difficulties as revealed in error analyses.The use of linguistic knowledge to improve word alignment quality is also studied. A cooccurrence-based one-to-one word alignment algorithm is extended with verb form classification with successful results. Additionally, we evaluate the impact in word alignment and translation quality of Part-Of-Speech, base form, verb form classification and stemming on state-of-art word alignment tools.Furthermore, the thesis proposes a translation model tackling verb form generation through an additional verb instance model, reporting experiments in English-to-Spanish tasks. Disagreement is addressed via incorporating a target Part-Of-Speech language model. Finally, we study the impact of morphology derivation on Ngram-based SMT formulation, empirically evaluating the quality gain that is to be gained via morphology reduction.

  • Engineering and social responsibility: a case-based source

     Nadeu Camprubí, Climent; Mariño Acebal, Jose Bernardo; Farrús Cabecerán, Mireia
    III International Conference on Technoethics = Congreso Internacional de Tecnoética
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Ngram-Based Statistical Machine Translation Enhanced with Multiple Weighted Reordering Hypotheses

     Costa-Jussà, M R; Crego, J; Lambert, P; Khalilov, M; Rodríguez Fonollosa, José Adrián; Mariño Acebal, Jose Bernardo; Banchs, R
    45th Annual Meeting of the Association of Computational Linguistics
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • The TALP N-gram based SMT system for the IWSLT 2007

     Lambert, P; Costa-Jussà, M R; Crego, J M; Khalilov, M; Mariño Acebal, Jose Bernardo; Banchs, R; Rodríguez Fonollosa, José Adrián; Schwenk, H
    International Workshop on Spoken Language Translation
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Analysis and System Combination of Phrase- and N-gram-based Statistical Machine Translation Systems

     Costa-Jussà, M R; Crego, J M; Vilar, D; Rodríguez Fonollosa, José Adrián; Mariño Acebal, Jose Bernardo; Ney, H
    Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • GeoVAQA: A Voice Activated geographical Question Answering system

     Luque, J; Ferrés, D; Hernando Pericas, Francisco Javier; Mariño Acebal, Jose Bernardo; Rodriguez Hontoria, Horacio
    Jornadas en Tecnología del Habla
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • UPC's bilingual N-gram translation system

     Mariño Acebal, Jose Bernardo
    TC-STAR workshop on Speech-to-Speech Tanslation
    Presentation's date: 2006-06-19
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Integration of POStag-based source reordering into SMT decoding by an extended search graph

     Crego, J M; Mariño Acebal, Jose Bernardo
    7th conference of the Association for Machine Translation in the Americas
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Reordering experiments for n-gram-based SMT

     Mariño Acebal, Jose Bernardo
    2006 IEEE/ACL Workshop on Spoken Language Technology
    Presentation's date: 2006-12-10
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • TALP Phrase-Based System and TALP System Combination for the IWSLT 2006

     Mariño Acebal, Jose Bernardo
    International Workshop on Spoken Language Translation
    Presentation's date: 2006-11-27
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Linguistic tuple segmentation in n-gram based statistical machine translation

     Gispert, A De; Mariño Acebal, Jose Bernardo
    International Conference on Spoken Language Processing - Interspeech 2006
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • UPC's Bilingual N-gram Translation System

     Mariño Acebal, Jose Bernardo; Banchs, R; Crego, J M; Gispert, A; Lambert, P; Rodríguez Fonollosa, José Adrián; Khalilov, M
    TC-Star Speech to Speech Translation Workshop
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Joint training of codebooks and acoustic models in automatic speech recognition using semi-continuous HMMs

     Nogueiras Rodriguez, Albino; Caballero Galeote, Monica; Mariño Acebal, Jose Bernardo
    Jornadas en Tecnologías del Habla
    Presentation's date: 2006-11
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Linguistic tuple segmentation in n-gram based statistical machine translation

     Mariño Acebal, Jose Bernardo
    International Conference on Spoken Language Processing - Interspeech 2006
    Presentation's date: 2006-09-17
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Reordering experiments for n-gram-based SMT

     Crego, J M; Mariño Acebal, Jose Bernardo
    2006 IEEE/ACL Workshop on Spoken Language Technology
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • TALP Phrase-based statistical translation system for European language pairs

     Crego, J M; Gispert, A; Lambert, P; Khalilov, M; Mariño Acebal, Jose Bernardo; Rodríguez Fonollosa, José Adrián; Banchs, R
    Human Language Technology conference - North American chapter of the Association for Computational Linguistics annual meeting
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • TALP Phrase-Based System and TALP System Combination for the IWSLT 2006

     Mariño Acebal, Jose Bernardo
    International Workshop on Spoken Language Translation
    Presentation's date: 2006-11-27
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • N-gram-based SMT System Enhanced with Reordering Patterns

     Crego, J M; Gispert, A; Lambert, P; Khalilov, M; Banchs, R; Mariño Acebal, Jose Bernardo; Rodríguez Fonollosa, José Adrián
    Human Language Technology conference - North American chapter of the Association for Computational Linguistics annual meeting
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Morpho-syntactic Information for Automatic Error Analysis of Statistical Machine Translation Output

     Popovic, M; Gispert, A De; Gupta, D; Lambert, P; Ney, H; Mariño Acebal, Jose Bernardo; Federico, M; Banchs, R
    Human Language Technology conference - North American chapter of the Association for Computational Linguistics annual meeting
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • N-gram Based Machine Translation

     Mariño Acebal, Jose Bernardo; Banchs, R; Crego, J; Gispert, A; Lambert, P; Rodríguez Fonollosa, José Adrián; Costa-Jussà, M
    Computational linguistics
    Date of publication: 2006-12
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Linguistic knowledge in statistical phrase-based word alignment

     Gispert, A De; Mariño Acebal, Jose Bernardo; Josep, M Crego
    Natural language engineering (Print)
    Date of publication: 2006-03
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Segmentación lingüística de tuplas para el modelado de la traducción estocástica mediante n-gramas

     Gispert, A De; Mariño Acebal, Jose Bernardo
    Procesamiento del lenguaje natural
    Date of publication: 2006-09
    Journal article

     Share Reference managers Reference managers Open in new window

  • Integración de reordenamientos en el algoritmo de decodificación en traducción automática estocástica

     Crego, J M; Mariño Acebal, Jose Bernardo
    Procesamiento del lenguaje natural
    Date of publication: 2006-09
    Journal article

     Share Reference managers Reference managers Open in new window

  • Improving statistical MT by coupling reordering and decoding

     Crego Clemente, Jose Maria; Marino, J B; Mariño Acebal, Jose Bernardo
    Machine translation
    Date of publication: 2006-07
    Journal article

     Share Reference managers Reference managers Open in new window

  • AVIVAVOZ: Tecnologías para la Traducción de Voz: Reconocimiento, Traduccïón Estadística Basada en Corpus y Síntesis

     Mariño Acebal, Jose Bernardo; Nogueiras Rodriguez, Albino
    Participation in a competitive project

     Share

  • The TALP Ngram-based SMT System for IWSLT 2006

     Josep, M Crego; Gispert, Adrià de; Lambert, Patrik; Khalilov, Maxim; Costa-Jussà, Marta R; Mariño Acebal, Jose Bernardo; Rodríguez Fonollosa, José Adrián
    International Workshop on Spoken Language Translation
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • TALP Phrase-Based System and TALP System Combination for the IWSLT 2006

     Costa-Jussà, Marta R; Josep, M Crego; Gispert, Adrià de; Lambert, Patrik; Khalilov, Maxim; Rodríguez Fonollosa, José Adrián; Mariño Acebal, Jose Bernardo
    International Workshop on Spoken Language Translation
    Presentation's date: 2006-11-27
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window