Graphic summary
  • Show / hide key
  • Information


Scientific and technological production
  •  

1 to 50 of 263 results
  • PROGRAMA SEGUNDA VOZ

     Rodríguez Fonollosa, José Adrián
    Competitive project

     Share

  • Flight Quest Phase 2

     Rodríguez Fonollosa, José Adrián
    Award or recognition

    View View Open in new window  Share

  • Crowdsourcing in Data Science

     Rodríguez Fonollosa, José Adrián
    Workshop on Data Science in Aviation
    Presentation's date: 2014-05-21
    Presentation of work at congresses

    Read the abstract Read the abstract  Share Reference managers Reference managers Open in new window

    Example of open innovation actions in Data Science

  • PROGRAMA SEGUNDA VOZ

     Henriquez Quintana, Aaron; Nogueiras Rodriguez, Albino; Comas Umbert, Pere Ramon; Rodríguez Fonollosa, José Adrián
    Competitive project

     Share

  • Grup de tractament de la parla

     Mariño Acebal, Jose Bernardo; Nadeu Camprubí, Climent; Moreno Bilbao, M. Asuncion; Rodríguez Fonollosa, José Adrián; Hernando Pericas, Francisco Javier; Vallverdu Bayes, Francisco; Monte Moreno, Enrique; Salavedra Moli, Josep; Nogueiras Rodriguez, Albino; Esquerra Llucià, Ignasi; Formiga Fanals, Lluis; Jauk, Igor; Raboshchuk, Ganna; Zewoudie, Abraham Woubie; Bonafonte Cavez, Antonio Jesus
    Competitive project

     Share

  • The TALP-UPC phrase-based translation systems for WMT13: system combination with morphology generation, domain adaptation and corpus filtering

     Formiga Fanals, Lluis; Ruiz Costa-Jussà, Marta; Mariño Acebal, Jose Bernardo; Rodríguez Fonollosa, José Adrián; Barron Cedeño, Luis Alberto; Màrquez Villodre, Lluís
    Workshop on Statistical Machine Translation
    p. 134-140
    Presentation's date: 2013-08-08
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    This paper describes the TALP participation in the WMT13 evaluation campaign. Our participation is based on the combination of several statistical machine translation systems: based on standard hrasebased Moses systems. Variations include techniques such as morphology generation, training sentence filtering, and domain adaptation through unit derivation. The results show a coherent improvement on TER, METEOR, NIST, and BLEU scores when compared to our baseline system.

  • The TALP-UPC approach to system selection: ASIYA features and pairwise classification using random forests

     Formiga Fanals, Lluis; Gonzalez Bermudez, Meritxell; Barron Cedeño, Luis Alberto; Rodríguez Fonollosa, José Adrián; Màrquez Villodre, Lluís
    Workshop on Statistical Machine Translation
    p. 359-364
    Presentation's date: 2013-08-08
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    This paper describes the TALP-UPC participation in the WMT¿13 Shared Task on Quality Estimation (QE). Our participation is reduced to task 1.2 on System Selection. We used a broad set of features (86 for German-to-English and 97 for English-to-Spanish) ranging from standard QE features to features based on pseudo-references and semantic similarity. We approached system selection by means of pairwise ranking decisions. For that, we learned Random Forest classifiers especially tailored for the problem. Evaluation at development time showed considerably good results in a cross-validation experiment, with Kendall¿s values around 0.30. The results on the test set dropped significantly, raising different discussions to be taken into account.

    This paper describes the TALP-UPC participation in the WMT’13 Shared Task on Quality Estimation (QE). Our participation is reduced to task 1.2 on System Selection. We used a broad set of features (86 for German-to-English and 97 for English-to-Spanish) ranging from standard QE features to features based on pseudo-references and semantic similarity. We approached system selection by means of pairwise ranking decisions. For that, we learned Random Forest classifiers especially tailored for the problem. Evaluation at development time showed considerably good results in a cross-validation experiment, with Kendall’s values around 0.30. The results on the test set dropped significantly, raising different discussions to be taken into account.

  • Approaches to Machine Translation: Rule-based, Statistical and Hybrid

     Ruiz Costa-Jussà, Marta; Rodríguez Fonollosa, José Adrián
    Competitive project

     Share

  • Modelling the effects of spontaneous speech in speech recognition

     Shulz, Henrik; Rodríguez Fonollosa, José Adrián
    Afeka Speech Processing Conference
    Presentation's date: 2013-07-01
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Intrinsic variability of the speaker in spontaneous speech remains a challenge to state of the art Automatic speech recognition (ASR). While planned speech exhibits a moderate variability, the significant variability of spontaneous speech is caused by situation, context, intention, emotion and listeners. This conditioning of speech is observable in terms of speaking rate and in feature space. We analysed broadcast news (BN) and broadcast conversational (BC) speech in terms of phoneme rate (PR) and feature space reduction (FSR), and contrasted both with the planned speech data. Strong statistically significant differences were revealed. We cluster the speech segments with respect to their degree of PR and FSR forming a set of variability classes, and induce the variability classes into the Hidden-Markov-Model (HMM) based acoustic model (AM). In recognition we follow two approaches: the first considers the variability class as context variable, the second relies on prior estimation of the variability class after the first pass of a multi-pass recognition system. Beside explicit modelling of the intrinsic speech variability of the speaker, we furthermore segregate the general speaker specific characteristics by means of speaker adaptive training (SAT) into feature space transforms using ConstrainedMaximumLikelihood Linear Regression (CMLLR), and apply the adaptive approach in third pass recognition. By approaching to model both within speaker variation and between speaker variation in spontaneous speech, we address two fundamental sources of speech variability that determine the performance of ASR systems.

  • PROGRAMA SEGUNDA VOZ

     Formiga Fanals, Lluis; Nogueiras Rodriguez, Albino; Rodríguez Fonollosa, José Adrián
    Competitive project

     Share

  • Access to the full text
    Dealing with input noise in statistical machine translation  Open access

     Formiga Fanals, Lluis; Rodríguez Fonollosa, José Adrián
    International Conference on Computational Linguistics
    p. 319-328
    Presentation's date: 2012-12-13
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Misspelled words have a direct impact on the final quality obtained by Statistical Machine Translation (SMT) systems as the input becomes noisy and unpredictable. This paper presents some improvement strategies for translating real-life noisy input. The proposed strategies are based on a preprocessing step consisting in a character-based translator (MT) from noisy into cleaned text. The use of a character-level translator allows us to provide various spelling alternatives in a lattice format to the final bilingual translator. Therefore, the final MT is the one that decides the best path to be translated. The different hypotheses are obtained under the assumption of a noisy channel model for this task. This paper shows the experiments done with real-life noisy input and a standard phrase-based SMT system from English into Spanish.

  • Search engine for multilingual audiovisual contents

     Pérez, José David; Bonafonte Cavez, Antonio Jesus; Cardenal, Antonio; Ruiz Costajussà, Marta; Rodríguez Fonollosa, José Adrián; Moreno Bilbao, M. Asuncion; Navas, Eva; Rodríguez Banga, Eduardo
    Jornadas en Tecnología del Habla and III Iberian SLTech Workshop
    p. 422-430
    Presentation's date: 2012-11
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Measuring acoustic reduction in feature space

     Rodríguez Fonollosa, José Adrián; Schulz, Henrik
    IberSPEECH
    p. 113-122
    Presentation's date: 2012-11-21
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Modelling varying speaking style remains a challenge to sta te of the art speech recognition and synthesis systems. Vowel a nd consonant reduction have been identified as correlative to speaking st yle variation, but still lack a common measurement. The reduction phenomen aare often observed without consideration of coarticulation an dassimilation e ! ects, and as a result of speaking rate variability. We present an analy- sis of acoustic reduction in Mel Frequency cepstral coe " cie nts (MFCC) feature space of phonemes, estimate duration and determine the degree of correlation between duration reduction and feature spac ereduction for two di ! erent speaking styles present in broadcast news a nd conversa- tional recordings. We analyse the feature space reduction o fconsonants and vowels in context in a syllable environment

  • Access to the full text
    Correcting input noise in SMT as a char-based translation problem  Open access

     Formiga Fanals, Lluis; Rodríguez Fonollosa, José Adrián
    Date: 2012-10-31
    Report

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Misspelled words have a direct impact on the final quality obtained by Statistical Machine Translation (SMT) systems as the input becomes noisy and unpredictable. This paper presents some improvement strategies for translating real-life noisy input. The proposed strategies are based on a preprocessing step consisting in a character-based translator.

  • Integration of Machine Translation Paradigms

     Ruiz Costa-jussa, Marta; Rodríguez Fonollosa, José Adrián
    Competitive project

     Share

  • Access to the full text
    The TALP-UPC phrase-based translation systems for WMT12: morphology simplification and domain adaptation  Open access

     Formiga Fanals, Lluis; Henriquez Quintana, Carlos Alberto; Hernandez Huerta, Adolfo; Mariño Acebal, Jose Bernardo; Monte Moreno, Enrique; Rodríguez Fonollosa, José Adrián
    Workshop on Statistical Machine Translation
    p. 275-282
    Presentation's date: 2012-06-08
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This paper describes the UPC participation in the WMT 12 evaluation campaign. All sys- tems presented are based on standard phrase- based Moses systems. Variations adopted sev- eral improvement techniques such as mor- phology simplification and generation and do- main adaptation. The morphology simpli- fication overcomes the data sparsity prob- lem when translating into morphologically- rich languages such as Spanish by translat- ing first to a morphology-simplified language and secondly leave the morphology gener- ation to an independent classification task. The domain adaptation approach improves the SMT system by adding new translation units learned from MT-output and reference align- ment. Results depict an improvement on TER, METEOR, NIST and BLEU scores compared to our baseline system, obtaining on the of- ficial test set more benefits from the domain adaptation approach than from the morpho- logical generalization method.

  • Access to the full text
    The BUCEADOR multi-language search engine for digital libraries  Open access

     Adell, Jordi; Bonafonte Cavez, Antonio Jesus; Cardenal, Antonio; Ruiz Costajussà, Marta; Rodríguez Fonollosa, José Adrián; Moreno Bilbao, M. Asuncion; Navas, Eva; Rodríguez Banga, Eduardo
    International Conference on Language Resources and Evaluation
    p. 1705-1709
    Presentation's date: 2012-05-24
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This paper presents a web-based multimedia search engine built within the Buceador (www.buceador.org) research project. A proof-of-concept tool has been implemented which is able to retrieve information from a digital library made of multimedia documents in the 4 official languages in Spain (Spanish, Basque, Catalan and Galician). The retrieved documents are presented in the user language after translation and dubbing (the four previous languages + English). The paper presents the tool functionality, the architecture, the digital library and provide some information about the technology involved in the fields of automatic speech recognition, statistical machine translation, text-to-speech synthesis and information retrieval. Each technology has been adapted to the purposes of the presented tool as well as to interact with the rest of the technologies involved.

  • Enhancing the european Linguistic Infraestructure

     Vallverdu Bayes, Francisco; Butko, Taras; Nadeu Camprubí, Climent; Bonafonte Cavez, Antonio Jesus; Rodríguez Fonollosa, José Adrián; Wolf, Martin; Moreno Bilbao, M. Asuncion
    Competitive project

     Share

  • Reconocimiento de voz y audio para inteligencia ambiental

     Butko, Taras; Zelenak, Martin; Wolf, Martin; Vallverdu Bayes, Francisco; Nadeu Camprubí, Climent; Rodríguez Fonollosa, José Adrián; Casar Lopez, Marta; Nogueiras Rodriguez, Albino; Salavedra Moli, Josep; Hernando Pericas, Francisco Javier
    Competitive project

     Share

  • Access to the full text
    English-Latvian SMT: the challenge of translating into a free word order language  Open access

     Khalilov, Maxim; Rodríguez Fonollosa, José Adrián; Skadina, Inguna; Bralitis, Edgar; Pretkalnina, Lauma
    International Workshop on Spoken Languages Technologies for Under-resourced Languages
    p. 87-94
    Presentation's date: 2010-05-03
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This paper presents a comparative study of two approaches to statistical machine translation (SMT) and their application to a task of English-to-Latvian translation, which is still an open research line in the field of automatic translation. We consider a state-of-the-art phrase-based SMT and an alternative N-gram-based SMT systems. The major differences between these two approaches lie in the distinct representations of bilingual units, which are the components of the bilingual model driving translation process and in the statistical modeling of the translation context. Latvian being a rather free word order language implies additional difficulties to the translation process. We contrast different reordering models and investigate how well they deal with the word ordering issue. Moving beyond automatic scores of translation quality that are classically presented in MT research papers, we contribute presenting a manual error analysis of MT systems output that helps to shed light on advantages and disadvantages of the SMT systems under consideration and identify the most prominent source of errors typical for both SMT systems.

  • Access to the full text
    Using linear interpolation and weighted reordering hypotheses in the moses system  Open access

     Ruiz Costa-Jussà, Marta; Rodríguez Fonollosa, José Adrián
    International Conference on Language Resources and Evaluation
    p. 1712-1718
    Presentation's date: 2010-05-20
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This paper proposes to introduce a novel reordering model in the open-source Moses toolkit. The main idea is to provide weighted reordering hypotheses to the SMT decoder. These hypotheses are built using a first-step Ngram-based SMT translation from a source language into a third representation that is called reordered source language. Each hypothesis has its own weight provided by the Ngram-based decoder. This proposed reordering technique offers a better and more efficient translation when compared to both the distance-based and the lexicalized reordering. In addition to this reordering approach, this paper describes a domain adaptation technique which is based on a linear combination of an specific indomain and an extra out-domain translation models. Results for both approaches are reported in the Arabic-to-English 2008 IWSLT task. When implementing the weighted reordering hypotheses and the domain adaptation technique in the final translation system, translation results reach improvements up to 2.5 BLEU compared to a standard state-of-the-art Moses baseline system.

  • Access to the full text
    Automatic and human evaluation study of a rule-based and a statistical Catalan-Spanish machine translation systems  Open access

     Ruiz Costa-Jussà, Marta; Farrús Cabecerán, Mireia; Mariño Acebal, Jose Bernardo; Rodríguez Fonollosa, José Adrián
    International Conference on Language Resources and Evaluation
    p. 1707-1711
    Presentation's date: 2010-05-20
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Machine translation systems can be classified into rule-based and corpus-based approaches, in terms of their core technology. Since both paradigms have largely been used during the last years, one of the aims in the research community is to know how these systems differ in terms of translation quality. To this end, this paper reports a study and comparison of a rule-based and a corpus-based (particularly, statistical) Catalan-Spanish machine translation systems, both of them freely available in the web. The translation quality analysis is performed under two different domains: journalistic and medical. The systems are evaluated by using standard automatic measures, as well as by native human evaluators. Automatic results show that the statistical system performs better than the rule-based system. Human judgements show that in the Spanishto- Catalan direction the statistical system also performs better than the rule-based system, while in the Catalan-to-Spanish direction is the other way round. Although the statistical system obtains the best automatic scores, its errors tend to be more penalized by human judgements than the errors of the rule-based system. This can be explained because statistical errors are usually unexpected and they do not follow any pattern.

  • Linguistic-based evaluation criteria to identify statistical machine translation errors

     Farrús, Mireia; Ruiz Costa-Jussà, Marta; Mariño Acebal, Jose Bernardo; Rodríguez Fonollosa, José Adrián
    Annual Conference of the European Association for Machine Translation
    p. 167-173
    Presentation's date: 2010-05-28
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Access to the full text
    Towards improving English-Latvian translation: a system comparison and a new rescoring feature  Open access

     Khalilov, Maxim; Rodríguez Fonollosa, José Adrián; Skadina, Inguna; Braliti, Edgars; Pretkalnina, Lauma
    International Conference on Language Resources and Evaluation
    p. 1719-1725
    Presentation's date: 2010-05-20
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This paper presents a comparative study of two alternative approaches to statistical machine translation (SMT) and their application to a task of English-to-Latvian translation. Furthermore, a novel feature intending to reflect the relatively free word order scheme of the Latvian language is proposed and successfully applied on the n-best list rescoring step. Moving beyond classical automatic scores of translation quality that are classically presented in MT research papers, we contribute presenting a manual error analysis of MT systems output that helps to shed light on advantages and disadvantages of the SMT systems under consideration.

  • Feedback Analysis for User adaptive Statistical Translation

     Formiga Fanals, Lluis; Màrquez Villodre, Lluís; Gonzalez Bermudez, Meritxell; Mariño Acebal, Jose Bernardo; Rodríguez Fonollosa, José Adrián; Monte Moreno, Enrique; Barron Cedeño, Luis Alberto
    Competitive project

     Share

  • BUSQUEDA DE INFORMACIÓN EN CONTENIDOS AUDIOVISUALES PLURILINGUES

     Mariño Acebal, Jose Bernardo; Monte Moreno, Enrique; Bonafonte Cavez, Antonio Jesus; Polyakova, Tatyana; Esquerra Llucià, Ignasi; Rodríguez Fonollosa, José Adrián; Ruiz Costa-jussa, Marta; Adell Mercado, Jordi; Moreno Bilbao, M. Asuncion
    Competitive project

     Share

  • New statistical and syntactic models for machine translation

     Khalilov, Maxim
    Department of Signal Theory and Communications, Universitat Politècnica de Catalunya
    Theses

     Share Reference managers Reference managers Open in new window

  • VEU: GRUP DE TRACTAMENT DE LA PARLA

     Bonafonte Cavez, Antonio Jesus; Casar Lopez, Marta; Ruiz Costa-jussa, Marta; Nogueiras Rodriguez, Albino; Esquerra Llucià, Ignasi; Salavedra Moli, Josep; Farrús Cabecerán, Mireia; Hernando Pericas, Francisco Javier; Rodríguez Fonollosa, José Adrián; Monte Moreno, Enrique; Mariño Acebal, Jose Bernardo; Nadeu Camprubí, Climent; Moreno Bilbao, M. Asuncion; Vallverdu Bayes, Francisco
    Competitive project

     Share

  • A baseline system for the transcription of catalan broadcast conversation

     Schulz, Henrik; Rodríguez Fonollosa, José Adrián; Rybach, David
    Joint SIG-IL/Microsoft Workshop on Speech and Language Technologies for Iberian Languages
    p. 49-52
    Presentation's date: 2009-09-04
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Access to the full text
    The TALP on-line Spanish-Catalan machine-translation system  Open access

     Poch, Manel; Farrús, Mireia; Ruiz Costa-Jussà, Marta; Mariño Acebal, Jose Bernardo; Hernández, Adolfo; Henríquez, Carlos; Rodríguez Fonollosa, José Adrián
    Joint SIG-IL/Microsoft Workshop on Speech and Language Technologies for Iberian Languages
    p. 105
    Presentation's date: 2009-09-04
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    In this paper the statistical machine translator (SMT) between Catalan and Spanish developed at the TALP research center (UPC) and its web demonstration are described.

  • A Catalan broadcast conversational speech database

     Schulz, Henrik; Rodríguez Fonollosa, José Adrián
    Joint SIG-IL/Microsoft Workshop on Speech and Language Technologies for Iberian Languages
    p. 27-30
    Presentation's date: 2009-09-04
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Data driven methods in speech and linguistic research, and system develoment require appropriate speech databases. A new Catalan speech database has been developed with a particular emphasis on broadcast conversational speech. The article describes origin and nature of the broadcasts and its acoustic environment. Annotation and transcription provide statistics on specific phenomena of exhibited speech, speaker characteristics and acoustic events. It concludes with perspective uses and limitations.

  • The TALP-UPC Ngram-Based Statistical Machine Translation for ACL-WMT 2008

     Khalilov, M; Hernández, A; Costa-Jussà, M R; Crego, J M; Henríquez, C A Q; Lambert, P; Rodríguez Fonollosa, José Adrián; Mariño Acebal, Jose Bernardo; Banchs, R
    46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies
    p. 127-130
    Presentation's date: 2009-06
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Access to the full text
    Coupling hierarchical word reordering and decoding in phrase-based statistical machine translation  Open access

     Dras, Mark; Khalilov, Maxim; Rodríguez Fonollosa, José Adrián
    Association for Computational Linguistics. North American Chapter. Conference
    p. 78-86
    Presentation's date: 2009-05
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    In this paper, we start with the existing idea of taking reordering rules automatically derived from syntactic representations, and applying them in a preprocessing step before translation to make the source sentence structurally more like the target; and we propose a new approach to hierarchically extracting these rules. We evaluate this, combined with a lattice-based decoding, and show improvements over stateof-the-art distortion models.

  • Access to the full text
    A new subtree-transfer approach to syntax-based reordering for statistical machine translation  Open access

     Khalilov, Maxim; Rodríguez Fonollosa, José Adrián; Dras, Mark
    Annual Conference of the European Association for Machine Translation
    Presentation's date: 2009-05-15
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    In this paper we address the problem of translating between languages with word order disparity. The idea of augmenting statistical machine translation (SMT) by using a syntax-based reordering step prior to translation, proposed in recent years, has been quite successful in improving translation quality. We present a new technique for extracting syntax-based reordering rules, which are derived through a syntactically augmented alignment of source and target texts. The parallel corpus with reordered source side is then passed to an N-gram-based machine translation system and the obtained results are contrasted with a monotone system performance. In experiments, we show significant improvement for the Chinese-to-English translation task.

  • Access to the full text
    The TALP-UPC phrase-based translation system for EACL-WMT 2009  Open access

     Rodríguez Fonollosa, José Adrián; Khalilov, Maxim; Ruiz Costa-Jussà, Marta; Henríquez, Carlos; Hernández, Adolfo; Banchs, Rafael E.
    Association for Computational Linguistics. European Chapter. Conference
    p. 85-89
    Presentation's date: 2009-04-01
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This study presents the TALP-UPC submission to the EACL Fourth Worskhop on Statistical Machine Translation 2009 evaluation campaign. It outlines the architecture and configuration of the 2009 phrase-based statistical machine translation (SMT) system, putting emphasis on the major novelty of this year: combination of SMT systems implementing different word reordering algorithms. Traditionally, we have concentrated on the Spanish-to-English and English-to-Spanish News Commentary translation tasks.

  • Access to the full text
    N-gram-based statistical machine translation versus syntax augmented machine translation: comparison and system combination  Open access

     Khalilov, Maxim; Rodríguez Fonollosa, José Adrián
    Association for Computational Linguistics. European Chapter. Conference
    p. 424-432
    Presentation's date: 2009-04
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    In this paper we compare and contrast two approaches to Machine Translation (MT): the CMU-UKA Syntax Augmented Machine Translation system (SAMT) and UPC-TALP N-gram-based Statistical Machine Translation (SMT). SAMT is a hierarchical syntax-driven translation system underlain by a phrase-based model and a target part parse tree. In N-gram-based SMT, the translation process is based on bilingual units related to word-to-word alignment and statistical modeling of the bilingual context following a maximumentropy framework. We provide a stepby- step comparison of the systems and report results in terms of automatic evaluation metrics and required computational resources for a smaller Arabic-to-English translation task (1.5M tokens in the training corpus). Human error analysis clarifies advantages and disadvantages of the systems under consideration. Finally, we combine the output of both systems to yield significant improvements in translation quality.

  • ORGANIZACIÓN DEL 13º CONGRESO ANUAL DE LA ASOCIACIÓN EUROPEA PARA LA TRADUCCIÓNAUTOMATICA, EAMT-2009

     Màrquez Villodre, Lluís; Mariño Acebal, Jose Bernardo; Farwell, David Loring; Rodríguez Fonollosa, José Adrián
    Competitive project

     Share

  • On the use of augmented HMM models for overcoming time and parameter independence assumptions in ASR

     Casar Lopez, Marta; Rodríguez Fonollosa, José Adrián
    Jornadas en Tecnología del Habla
    p. 1-4
    Presentation's date: 2008-11
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Técnicas estadísticas para el filtrado de un corpus bilingüe en traducción automática

     Montolar, Enrique; Ruiz Costa-Jussà, Marta; Rodríguez Fonollosa, José Adrián
    Jornadas en Tecnología del Habla
    p. 285-288
    Presentation's date: 2008-11
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Informe anual del proyecto TECNOPARLA

     Rodríguez Fonollosa, José Adrián
    Date: 2008-11
    Report

     Share Reference managers Reference managers Open in new window

  • Deriving benefit from a generalized syntax-based reordering

     Khalilov, Maxim; Rodríguez Fonollosa, José Adrián
    Jornadas en Tecnología del Habla
    p. 269-272
    Presentation's date: 2008-11
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Overcoming HMM time and parameter independence assumptions for ASR

     Casar Lopez, Marta; Rodríguez Fonollosa, José Adrián
    Date of publication: 2008-11-01
    Book chapter

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Neural Network Language Models for Translation with Limited Data

     Khalilov, M; Rodríguez Fonollosa, José Adrián; Zamora-Martínez, F; Castro-Bleda, M J; España, S
    IEEE International Conference on Tools with Artificial Intelligence
    p. 445-451
    DOI: 10.1109/ICTAI.2008.35
    Presentation's date: 2008-11
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Access to the full text
    The TALP & I2R SMT Systems for IWSLT 2008  Open access

     Khalilov, M; Costa-Jussà, M R; Henríquez, C A Q; Rodríguez Fonollosa, José Adrián; Hernández, A; Mariño Acebal, Jose Bernardo; Banchs, R; Chen, B; Zhang, M; Aw, A; Li, H
    International Workshop on Spoken Language Translation
    p. 116-123
    Presentation's date: 2008-10
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This paper gives a description of the statistical machine translation (SMT) systems developed at the TALP Research Center of the UPC (Universitat Polit`ecnica de Catalunya) for our participation in the IWSLT’08 evaluation campaign. We present Ngram-based (TALPtuples) and phrase-based (TALPphrases) SMT systems. The paper explains the 2008 systems’ architecture and outlines translation schemes we have used, mainly focusing on the new techniques that are challenged to improve speech-to-speech translation quality. The novelties we have introduced are: improved reordering method, linear combination of translation and reordering models and new technique dealing with punctuation marks insertion for a phrase-based SMT system. This year we focus on the Arabic-English, Chinese-Spanish and pivot Chinese-(English)-Spanish translation tasks.

  • Computing multiple weighted reordering hypotheses for a statistical machine translation phrase-based system

     Costa-Jussà, Marta R; Rodríguez Fonollosa, José Adrián
    Conference of the Association for Machine Translation in the Americas
    p. 82-88
    Presentation's date: 2008-10
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • NEW REORDERING AND MODELING APPROACHES FOR STATISTICAL MACHINE TRANSLATION

     Ruiz Costa-jussa, Marta
    Department of Signal Theory and Communications, Universitat Politècnica de Catalunya
    Theses

     Share Reference managers Reference managers Open in new window

  • Overcoming HMM time independence assumption using N-gram based modelling for continuous speech recognition

     Casar Lopez, Marta; Rodríguez Fonollosa, José Adrián
    European Signal Processing Conference
    p. 1-5
    Presentation's date: 2008-08
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Arabic-English translation improvement by target-side neural network language modeling

     Khalilov, M; Rodríguez Fonollosa, José Adrián; Zamora-Martínez, F; Castro-Bleda, M J; España, S
    Language Resources and Evaluation Conference
    p. 83-88
    Presentation's date: 2008-05
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Using Reordering in Statistical Machine Translation based on Alignment Block Classification

     Costa-Jussà, Marta R; Rodríguez Fonollosa, José Adrián; Monte Moreno, Enrique
    Language Resources and Evaluation Conference
    p. 1749-1754
    Presentation's date: 2008-05
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Informe del año 2007 del proyecto WORDFINDER

     Rodríguez Fonollosa, José Adrián
    Date: 2007-12
    Report

     Share Reference managers Reference managers Open in new window