Graphic summary
  • Show / hide key
  • Information


Scientific and technological production
  •  

1 to 50 of 260 results
  • PROGRAMA SEGUNDA VOZ

     Rodríguez Fonollosa, José Adrián
    Participation in a competitive project

     Share

  • Flight Quest Phase 2

     Rodríguez Fonollosa, José Adrián
    Award or recognition

    View View Open in new window  Share

  • Access to the full text
    Neural network language models to select the best translation  Open access

     Khalilov, Maxim; Rodríguez Fonollosa, José Adrián; Zamora Martínez, Francisco; Castro Bleda, María José; España Boquera, Salvador
    Computational Linguistics in the Netherlands Journal
    Date of publication: 2013-12-20
    Journal article

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    The quality of translations produced by statistical machine translation (SMT) systems crucially depends on the generalization ability provided by the statistical models involved in the process. While most modern SMT systems use n-gram models to predict the next element in a sequence of tokens, our system uses a continuous space language model (LM) based on neural networks (NN). In contrast to works in which the NN LM is only used to estimate the probabilities of shortlist words (Schwenk 2010), we calculate the posterior probabilities of out-of-shortlist words using an additional neuron and unigram probabilities. Experimental results on a small Italian-to-English and a large Arabic-to-English translation task, which take into account dierent word history lengths (n-gram order), show that the NN LMs are scalable to small and large data and can improve an n-gram-based SMT system. For the most part, this approach aims to improve translation quality for tasks that lack translation data, but we also demonstrate its scalability to large-vocabulary tasks.

    The quality of translations produced by statistical machine translation (SMT) systems crucially depends on the generalization ability provided by the statistical models involved in the process. While most modern SMT systems use n-gram models to predict the next element in a sequence of tokens, our system uses a continuous space language model (LM) based on neural networks (NN). In contrast to works in which the NN LM is only used to estimate the probabilities of shortlist words (Schwenk 2010), we calculate the posterior probabilities of out-of-shortlist words using an additional neuron and unigram probabilities. Experimental results on a small Italian- to-English and a large Arabic-to-English translation task, which take into account di erent word history lengths (n-gram order), show that the NN LMs are scalable to small and large data and can improve an n-gram-based SMT system. For the most part, this approach aims to improve translation quality for tasks that lack translation data, but we also demonstrate its scalability to large-vocabulary tasks.

  • PROGRAMA SEGUNDA VOZ

     Rodríguez Fonollosa, José Adrián
    Participation in a competitive project

     Share

  • Approaches to Machine Translation: Rule-based, Statistical and Hybrid

     Ruiz Costa-Jussà, Marta; Rodríguez Fonollosa, José Adrián
    Participation in a competitive project

     Share

  • Modelling the effects of spontaneous speech in speech recognition

     Shulz, Henrik; Rodríguez Fonollosa, José Adrián
    Afeka Speech Processing Conference
    Presentation's date: 2013-07-01
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Intrinsic variability of the speaker in spontaneous speech remains a challenge to state of the art Automatic speech recognition (ASR). While planned speech exhibits a moderate variability, the significant variability of spontaneous speech is caused by situation, context, intention, emotion and listeners. This conditioning of speech is observable in terms of speaking rate and in feature space. We analysed broadcast news (BN) and broadcast conversational (BC) speech in terms of phoneme rate (PR) and feature space reduction (FSR), and contrasted both with the planned speech data. Strong statistically significant differences were revealed. We cluster the speech segments with respect to their degree of PR and FSR forming a set of variability classes, and induce the variability classes into the Hidden-Markov-Model (HMM) based acoustic model (AM). In recognition we follow two approaches: the first considers the variability class as context variable, the second relies on prior estimation of the variability class after the first pass of a multi-pass recognition system. Beside explicit modelling of the intrinsic speech variability of the speaker, we furthermore segregate the general speaker specific characteristics by means of speaker adaptive training (SAT) into feature space transforms using ConstrainedMaximumLikelihood Linear Regression (CMLLR), and apply the adaptive approach in third pass recognition. By approaching to model both within speaker variation and between speaker variation in spontaneous speech, we address two fundamental sources of speech variability that determine the performance of ASR systems.

  • The TALP-UPC phrase-based translation systems for WMT13: system combination with morphology generation, domain adaptation and corpus filtering

     Formiga Fanals, Lluis; Ruiz Costa-Jussà, Marta; Mariño Acebal, Jose Bernardo; Rodríguez Fonollosa, José Adrián; Barron Cedeño, Luis Alberto; Màrquez Villodre, Lluís
    Workshop on Statistical Machine Translation
    Presentation's date: 2013-08-08
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    This paper describes the TALP participation in the WMT13 evaluation campaign. Our participation is based on the combination of several statistical machine translation systems: based on standard hrasebased Moses systems. Variations include techniques such as morphology generation, training sentence filtering, and domain adaptation through unit derivation. The results show a coherent improvement on TER, METEOR, NIST, and BLEU scores when compared to our baseline system.

  • The TALP-UPC approach to system selection: ASIYA features and pairwise classification using random forests

     Formiga Fanals, Lluis; Gonzalez Bermudez, Meritxell; Barron Cedeño, Luis Alberto; Rodríguez Fonollosa, José Adrián; Màrquez Villodre, Lluís
    Workshop on Statistical Machine Translation
    Presentation's date: 2013-08-08
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    This paper describes the TALP-UPC participation in the WMT¿13 Shared Task on Quality Estimation (QE). Our participation is reduced to task 1.2 on System Selection. We used a broad set of features (86 for German-to-English and 97 for English-to-Spanish) ranging from standard QE features to features based on pseudo-references and semantic similarity. We approached system selection by means of pairwise ranking decisions. For that, we learned Random Forest classifiers especially tailored for the problem. Evaluation at development time showed considerably good results in a cross-validation experiment, with Kendall¿s values around 0.30. The results on the test set dropped significantly, raising different discussions to be taken into account.

    This paper describes the TALP-UPC participation in the WMT’13 Shared Task on Quality Estimation (QE). Our participation is reduced to task 1.2 on System Selection. We used a broad set of features (86 for German-to-English and 97 for English-to-Spanish) ranging from standard QE features to features based on pseudo-references and semantic similarity. We approached system selection by means of pairwise ranking decisions. For that, we learned Random Forest classifiers especially tailored for the problem. Evaluation at development time showed considerably good results in a cross-validation experiment, with Kendall’s values around 0.30. The results on the test set dropped significantly, raising different discussions to be taken into account.

  • Study and comparison of rule-based and statistical catalan-spanish machine translation systems

     Ruiz Costa-Jussà, Marta; Farrús, Mireia; Mariño Acebal, Jose Bernardo; Rodríguez Fonollosa, José Adrián
    Computing and informatics
    Date of publication: 2012
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Access to the full text
    Correcting input noise in SMT as a char-based translation problem  Open access

     Formiga Fanals, Lluis; Rodríguez Fonollosa, José Adrián
    Date: 2012-10-31
    Report

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Misspelled words have a direct impact on the final quality obtained by Statistical Machine Translation (SMT) systems as the input becomes noisy and unpredictable. This paper presents some improvement strategies for translating real-life noisy input. The proposed strategies are based on a preprocessing step consisting in a character-based translator.

  • Integration of Machine Translation Paradigms

     Ruiz Costa-jussa, Marta; Rodríguez Fonollosa, José Adrián
    Participation in a competitive project

     Share

  • Measuring acoustic reduction in feature space

     Rodríguez Fonollosa, José Adrián; Schulz, Henrik
    IberSPEECH
    Presentation's date: 2012-11-21
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Modelling varying speaking style remains a challenge to sta te of the art speech recognition and synthesis systems. Vowel a nd consonant reduction have been identified as correlative to speaking st yle variation, but still lack a common measurement. The reduction phenomen aare often observed without consideration of coarticulation an dassimilation e ! ects, and as a result of speaking rate variability. We present an analy- sis of acoustic reduction in Mel Frequency cepstral coe " cie nts (MFCC) feature space of phonemes, estimate duration and determine the degree of correlation between duration reduction and feature spac ereduction for two di ! erent speaking styles present in broadcast news a nd conversa- tional recordings. We analyse the feature space reduction o fconsonants and vowels in context in a syllable environment

  • Access to the full text
    The BUCEADOR multi-language search engine for digital libraries  Open access

     Adell, Jordi; Bonafonte Cavez, Antonio Jesus; Cardenal, Antonio; Ruiz Costajussà, Marta; Rodríguez Fonollosa, José Adrián; Moreno Bilbao, M. Asuncion; Navas, Eva; Rodríguez Banga, Eduardo
    International Conference on Language Resources and Evaluation
    Presentation's date: 2012-05-24
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This paper presents a web-based multimedia search engine built within the Buceador (www.buceador.org) research project. A proof-of-concept tool has been implemented which is able to retrieve information from a digital library made of multimedia documents in the 4 official languages in Spain (Spanish, Basque, Catalan and Galician). The retrieved documents are presented in the user language after translation and dubbing (the four previous languages + English). The paper presents the tool functionality, the architecture, the digital library and provide some information about the technology involved in the fields of automatic speech recognition, statistical machine translation, text-to-speech synthesis and information retrieval. Each technology has been adapted to the purposes of the presented tool as well as to interact with the rest of the technologies involved.

  • Access to the full text
    Dealing with input noise in statistical machine translation  Open access

     Formiga Fanals, Lluis; Rodríguez Fonollosa, José Adrián
    International Conference on Computational Linguistics
    Presentation's date: 2012-12-13
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Misspelled words have a direct impact on the final quality obtained by Statistical Machine Translation (SMT) systems as the input becomes noisy and unpredictable. This paper presents some improvement strategies for translating real-life noisy input. The proposed strategies are based on a preprocessing step consisting in a character-based translator (MT) from noisy into cleaned text. The use of a character-level translator allows us to provide various spelling alternatives in a lattice format to the final bilingual translator. Therefore, the final MT is the one that decides the best path to be translated. The different hypotheses are obtained under the assumption of a noisy channel model for this task. This paper shows the experiments done with real-life noisy input and a standard phrase-based SMT system from English into Spanish.

  • Access to the full text
    The TALP-UPC phrase-based translation systems for WMT12: morphology simplification and domain adaptation  Open access

     Formiga Fanals, Lluis; Henriquez Quintana, Carlos Alberto; Hernandez Huerta, Adolfo; Mariño Acebal, Jose Bernardo; Monte Moreno, Enrique; Rodríguez Fonollosa, José Adrián
    Workshop on Statistical Machine Translation
    Presentation's date: 2012-06-08
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This paper describes the UPC participation in the WMT 12 evaluation campaign. All sys- tems presented are based on standard phrase- based Moses systems. Variations adopted sev- eral improvement techniques such as mor- phology simplification and generation and do- main adaptation. The morphology simpli- fication overcomes the data sparsity prob- lem when translating into morphologically- rich languages such as Spanish by translat- ing first to a morphology-simplified language and secondly leave the morphology gener- ation to an independent classification task. The domain adaptation approach improves the SMT system by adding new translation units learned from MT-output and reference align- ment. Results depict an improvement on TER, METEOR, NIST and BLEU scores compared to our baseline system, obtaining on the of- ficial test set more benefits from the domain adaptation approach than from the morpho- logical generalization method.

  • Search engine for multilingual audiovisual contents

     Pérez, José David; Bonafonte Cavez, Antonio Jesus; Cardenal, Antonio; Ruiz Costajussà, Marta; Rodríguez Fonollosa, José Adrián; Moreno Bilbao, M. Asuncion; Navas, Eva; Rodríguez Banga, Eduardo
    Jornadas en Tecnología del Habla and III Iberian SLTech Workshop
    Presentation's date: 2012-11
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Recursive alignment block classification technique for word reordering in statistical machine translation

     Ruiz Costa-Jussà, Marta; Rodríguez Fonollosa, José Adrián; Monte Moreno, Enrique
    Language resources and evaluation
    Date of publication: 2011-05
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Syntax-based reordering for statistical machine translation

     Khalilov, Maxim; Rodríguez Fonollosa, José Adrián
    Computer speech and language
    Date of publication: 2011-10
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    In this paper, we develop an approach called syntax-based reordering (SBR) to handling the fundamental problem of word ordering for statistical machine translation (SMT). We propose to alleviate the word order challenge including morpho-syntactical and statistical information in the context of a pre-translation reordering framework aimed at capturing short- and long-distance word distortion dependencies. We examine the proposed approach from the theoretical and experimental points of view discussing and analyzing its advantages and limitations in comparison with some of the state-of-the-art reordering methods. In the final part of the paper, we describe the results of applying the syntax-based model to translation tasks with a great need for reordering (Chinese-to-English and Arabic-to-English). The experiments are carried out on standard phrase-based and alternative N-gram-based SMT systems. We first investigate sparse training data scenarios, in which the translation and reordering models are trained on a sparse bilingual data, then scaling the method to a large training set and demonstrating that the improvement in terms of translation quality is maintained.

  • Access to the full text
    Overcoming statistical machine translation limitations: error analysis and proposed solutions for the Catalan¿Spanish language pair  Open access

     Mariño Acebal, Jose Bernardo; Ruiz Costa-Jussà, Marta; Poch, Marc; Hernandez Huerta, Adolfo; Herníquez, Carlos; Rodríguez Fonollosa, José Adrián; Farrús Cabecerán, Mireia
    Language resources and evaluation
    Date of publication: 2011-02-20
    Journal article

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This work aims to improve anN-gram-based statistical machine translation system between the Catalan and Spanish languages, trained with an aligned Spanish– Catalan parallel corpus consisting of 1.7 million sentences taken from El Periódico newspaper. Starting from a linguistic error analysis above this baseline system, orthographic, morphological, lexical, semantic and syntactic problems are approached using a set of techniques. The proposed solutions include the development and application of additional statistical techniques, text pre- and post-processing tasks, and rules based on the use of grammatical categories, as well as lexical categorization. The performance of the improved system is clearly increased, as is shown in both human and automatic evaluations of the system, with a gain of about 1.1 points BLEU observed in the Spanish-to-Catalan direction of translation, and a gain of about 0.5 points in the reverse direction. The final system is freely available online as a linguistic resource

  • Reconocimiento de voz y audio para inteligencia ambiental

     Butko, Taras; Rodríguez Fonollosa, José Adrián; Nadeu Camprubí, Climent; Vallverdu Bayes, Francisco; Salavedra Moli, Josep; Nogueiras Rodriguez, Albino; Casar Lopez, Marta; Wolf, Martin; Zelenak, Martin; Hernando Pericas, Francisco Javier
    Participation in a competitive project

     Share

  • Enhancing the european Linguistic Infraestructure

     Bonafonte Cavez, Antonio Jesus; Nadeu Camprubí, Climent; Vallverdu Bayes, Francisco; Butko, Taras; Rodríguez Fonollosa, José Adrián; Wolf, Martin; Moreno Bilbao, M. Asuncion
    Participation in a competitive project

     Share

  • Access to the full text
    La tecnologia de la parla en català: avenços i reptes  Open access

     Rodríguez Fonollosa, José Adrián
    Llengua i ús
    Date of publication: 2010-07
    Journal article

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    L’article brinda un panorama sumari però exhaustiu de la situació de les tecnologies de la parla en català. Una primera part introductòria dóna pas a una relació d’aplicacions i recursos. Finalment, s’aborda el projecte Tecnoparla, amb els mòduls que l’integren, i s’apunten unes perspectives de futur.

  • Feedback Analysis for User adaptive Statistical Translation

     Màrquez Villodre, Lluís; Formiga Fanals, Lluis; Mariño Acebal, Jose Bernardo; Gonzalez Bermudez, Meritxell; Rodríguez Fonollosa, José Adrián; Monte Moreno, Enrique; Barron Cedeño, Luis Alberto
    Participation in a competitive project

     Share

  • BUSQUEDA DE INFORMACIÓN EN CONTENIDOS AUDIOVISUALES PLURILINGUES

     Esquerra Llucià, Ignasi; Monte Moreno, Enrique; Rodríguez Fonollosa, José Adrián; Bonafonte Cavez, Antonio Jesus; Polyakova, Tatyana; Mariño Acebal, Jose Bernardo; Ruiz Costa-jussa, Marta; Adell Mercado, Jordi; Moreno Bilbao, M. Asuncion
    Participation in a competitive project

     Share

  • Linguistic-based evaluation criteria to identify statistical machine translation errors

     Farrús, Mireia; Ruiz Costa-Jussà, Marta; Mariño Acebal, Jose Bernardo; Rodríguez Fonollosa, José Adrián
    Annual Conference of the European Association for Machine Translation
    Presentation's date: 2010-05-28
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Access to the full text
    Towards improving English-Latvian translation: a system comparison and a new rescoring feature  Open access

     Khalilov, Maxim; Rodríguez Fonollosa, José Adrián; Skadina, Inguna; Braliti, Edgars; Pretkalnina, Lauma
    International Conference on Language Resources and Evaluation
    Presentation's date: 2010-05-20
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This paper presents a comparative study of two alternative approaches to statistical machine translation (SMT) and their application to a task of English-to-Latvian translation. Furthermore, a novel feature intending to reflect the relatively free word order scheme of the Latvian language is proposed and successfully applied on the n-best list rescoring step. Moving beyond classical automatic scores of translation quality that are classically presented in MT research papers, we contribute presenting a manual error analysis of MT systems output that helps to shed light on advantages and disadvantages of the SMT systems under consideration.

  • Access to the full text
    Using linear interpolation and weighted reordering hypotheses in the moses system  Open access

     Ruiz Costa-Jussà, Marta; Rodríguez Fonollosa, José Adrián
    International Conference on Language Resources and Evaluation
    Presentation's date: 2010-05-20
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This paper proposes to introduce a novel reordering model in the open-source Moses toolkit. The main idea is to provide weighted reordering hypotheses to the SMT decoder. These hypotheses are built using a first-step Ngram-based SMT translation from a source language into a third representation that is called reordered source language. Each hypothesis has its own weight provided by the Ngram-based decoder. This proposed reordering technique offers a better and more efficient translation when compared to both the distance-based and the lexicalized reordering. In addition to this reordering approach, this paper describes a domain adaptation technique which is based on a linear combination of an specific indomain and an extra out-domain translation models. Results for both approaches are reported in the Arabic-to-English 2008 IWSLT task. When implementing the weighted reordering hypotheses and the domain adaptation technique in the final translation system, translation results reach improvements up to 2.5 BLEU compared to a standard state-of-the-art Moses baseline system.

  • Access to the full text
    Automatic and human evaluation study of a rule-based and a statistical Catalan-Spanish machine translation systems  Open access

     Ruiz Costa-Jussà, Marta; Farrús Cabecerán, Mireia; Mariño Acebal, Jose Bernardo; Rodríguez Fonollosa, José Adrián
    International Conference on Language Resources and Evaluation
    Presentation's date: 2010-05-20
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Machine translation systems can be classified into rule-based and corpus-based approaches, in terms of their core technology. Since both paradigms have largely been used during the last years, one of the aims in the research community is to know how these systems differ in terms of translation quality. To this end, this paper reports a study and comparison of a rule-based and a corpus-based (particularly, statistical) Catalan-Spanish machine translation systems, both of them freely available in the web. The translation quality analysis is performed under two different domains: journalistic and medical. The systems are evaluated by using standard automatic measures, as well as by native human evaluators. Automatic results show that the statistical system performs better than the rule-based system. Human judgements show that in the Spanishto- Catalan direction the statistical system also performs better than the rule-based system, while in the Catalan-to-Spanish direction is the other way round. Although the statistical system obtains the best automatic scores, its errors tend to be more penalized by human judgements than the errors of the rule-based system. This can be explained because statistical errors are usually unexpected and they do not follow any pattern.

  • Access to the full text
    English-Latvian SMT: the challenge of translating into a free word order language  Open access

     Khalilov, Maxim; Rodríguez Fonollosa, José Adrián; Skadina, Inguna; Bralitis, Edgar; Pretkalnina, Lauma
    International Workshop on Spoken Languages Technologies for Under-resourced Languages
    Presentation's date: 2010-05-03
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This paper presents a comparative study of two approaches to statistical machine translation (SMT) and their application to a task of English-to-Latvian translation, which is still an open research line in the field of automatic translation. We consider a state-of-the-art phrase-based SMT and an alternative N-gram-based SMT systems. The major differences between these two approaches lie in the distinct representations of bilingual units, which are the components of the bilingual model driving translation process and in the statistical modeling of the translation context. Latvian being a rather free word order language implies additional difficulties to the translation process. We contrast different reordering models and investigate how well they deal with the word ordering issue. Moving beyond automatic scores of translation quality that are classically presented in MT research papers, we contribute presenting a manual error analysis of MT systems output that helps to shed light on advantages and disadvantages of the SMT systems under consideration and identify the most prominent source of errors typical for both SMT systems.

  • Access to the full text
    State-of-the-art word reordering approaches in statistical machine translation: a survey  Open access

     Ruiz Costa-Jussà, Marta; Rodríguez Fonollosa, José Adrián
    IEICE transactions on information and systems
    Date of publication: 2009-11-01
    Journal article

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This paper surveys several state-of-the-art reordering techniques employed in Statistical Machine Translation systems. Reordering is understood as the word-order redistribution of the translated words. In original SMT systems, this different order is ony modeled within the limits of translation units. Relying only in the reordering provided by translation units may not be good enought in most language pairs, whichmight require longer reorderings. Therefore, additional techniques may be deployed to face the reordering challenge. The Statistical Machine Translation community has been very active recently in deveoping reordering techniques. This paper gives a brief survey and classification of seevral well-known reordering approaches.

  • ORGANIZACIÓN DEL 13º CONGRESO ANUAL DE LA ASOCIACIÓN EUROPEA PARA LA TRADUCCIÓNAUTOMATICA, EAMT-2009

     Farwell, David Loring; Mariño Acebal, Jose Bernardo; Màrquez Villodre, Lluís; Rodríguez Fonollosa, José Adrián
    Participation in a competitive project

     Share

  • VEU: GRUP DE TRACTAMENT DE LA PARLA

     Bonafonte Cavez, Antonio Jesus; Casar Lopez, Marta; Ruiz Costa-jussa, Marta; Nogueiras Rodriguez, Albino; Esquerra Llucià, Ignasi; Salavedra Moli, Josep; Farrús Cabecerán, Mireia; Hernando Pericas, Francisco Javier; Rodríguez Fonollosa, José Adrián; Monte Moreno, Enrique; Mariño Acebal, Jose Bernardo; Nadeu Camprubí, Climent; Moreno Bilbao, M. Asuncion; Vallverdu Bayes, Francisco
    Participation in a competitive project

     Share

  • The TALP-UPC Ngram-Based Statistical Machine Translation for ACL-WMT 2008

     Khalilov, M; Hernández, A; Costa-Jussà, M R; Crego, J M; Henríquez, C A Q; Lambert, P; Rodríguez Fonollosa, José Adrián; Mariño Acebal, Jose Bernardo; Banchs, R
    46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies
    Presentation's date: 2009-06
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • An Ngram-based reordering model  awarded activity

     Ruiz Costa-Jussà, Marta; Rodríguez Fonollosa, José Adrián
    Computer speech and language
    Date of publication: 2009-07
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    This paper describes in detail a novel approach to the reordering challenge in statistical machine translation (SMT). This Ngram-based reordering (NbR) approach uses the powerful techniques of SMT systems to generate a weighted reordering graph. Thus, statistical criteria reordering constraints are supplied to an SMT system, and this allows an extension to the SMT decoding search. The NbR approach is capable of generalizing reorderings that have been learned during training, through the use of word classes instead of words themselves. Improvement in translation performance is demonstrated with the EPPS task (Spanish and German to English) and the BTEC task (Arabic to English).

    This paper describes in detail a novel approach to the reordering challenge in statistical machine translation (SMT). This Ngram-based reordering (NbR) approach uses the powerful techniques of SMT systems to generate a weighted reordering graph. Thus, statistical criteria reordering constraints are supplied to an SMT system, and this allows an extension to the SMT decoding search. The NbR approach is capable of generalizing reorderings that have been learned during training, through the use of word classes instead of words themselves. Improvement in translation performance is demonstrated with the EPPS task (Spanish and German to English) and the BTEC task (Arabic to English).

    Mejor artículo 2009 publicado en una revista internacional firmado en primer lugar por un joven investigador de una universidad española; otorgado por la Red Temática de Temática de Tecnologías del Habla

  • Phrase and Ngram-based Statistical Machine Translation System Combination

     Ruiz Costa-Jussà, Marta; Rodríguez Fonollosa, José Adrián
    Applied artificial intelligence
    Date of publication: 2009-08
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Transcription of Catalan Broadcast Conversation

     Schulz, Henrik; Rodríguez Fonollosa, José Adrián; Rybach, D
    Lecture notes in computer science
    Date of publication: 2009-01
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • New statistical and syntactic models for machine translation

     Khalilov, Maxim
    Defense's date: 2009-10-15
    Department of Signal Theory and Communications, Universitat Politècnica de Catalunya
    Theses

     Share Reference managers Reference managers Open in new window

  • Access to the full text
    Coupling hierarchical word reordering and decoding in phrase-based statistical machine translation  Open access

     Dras, Mark; Khalilov, Maxim; Rodríguez Fonollosa, José Adrián
    Association for Computational Linguistics. North American Chapter. Conference
    Presentation's date: 2009-05
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    In this paper, we start with the existing idea of taking reordering rules automatically derived from syntactic representations, and applying them in a preprocessing step before translation to make the source sentence structurally more like the target; and we propose a new approach to hierarchically extracting these rules. We evaluate this, combined with a lattice-based decoding, and show improvements over stateof-the-art distortion models.

  • A baseline system for the transcription of catalan broadcast conversation

     Schulz, Henrik; Rodríguez Fonollosa, José Adrián; Rybach, David
    Joint SIG-IL/Microsoft Workshop on Speech and Language Technologies for Iberian Languages
    Presentation's date: 2009-09-04
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Access to the full text
    N-gram-based statistical machine translation versus syntax augmented machine translation: comparison and system combination  Open access

     Khalilov, Maxim; Rodríguez Fonollosa, José Adrián
    Association for Computational Linguistics. European Chapter. Conference
    Presentation's date: 2009-04
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    In this paper we compare and contrast two approaches to Machine Translation (MT): the CMU-UKA Syntax Augmented Machine Translation system (SAMT) and UPC-TALP N-gram-based Statistical Machine Translation (SMT). SAMT is a hierarchical syntax-driven translation system underlain by a phrase-based model and a target part parse tree. In N-gram-based SMT, the translation process is based on bilingual units related to word-to-word alignment and statistical modeling of the bilingual context following a maximumentropy framework. We provide a stepby- step comparison of the systems and report results in terms of automatic evaluation metrics and required computational resources for a smaller Arabic-to-English translation task (1.5M tokens in the training corpus). Human error analysis clarifies advantages and disadvantages of the systems under consideration. Finally, we combine the output of both systems to yield significant improvements in translation quality.

  • Access to the full text
    The TALP on-line Spanish-Catalan machine-translation system  Open access

     Poch, Manel; Farrús, Mireia; Ruiz Costa-Jussà, Marta; Mariño Acebal, Jose Bernardo; Hernández, Adolfo; Henríquez, Carlos; Rodríguez Fonollosa, José Adrián
    Joint SIG-IL/Microsoft Workshop on Speech and Language Technologies for Iberian Languages
    Presentation's date: 2009-09-04
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    In this paper the statistical machine translator (SMT) between Catalan and Spanish developed at the TALP research center (UPC) and its web demonstration are described.

  • Access to the full text
    A new subtree-transfer approach to syntax-based reordering for statistical machine translation  Open access

     Khalilov, Maxim; Rodríguez Fonollosa, José Adrián; Dras, Mark
    Annual Conference of the European Association for Machine Translation
    Presentation's date: 2009-05-15
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    In this paper we address the problem of translating between languages with word order disparity. The idea of augmenting statistical machine translation (SMT) by using a syntax-based reordering step prior to translation, proposed in recent years, has been quite successful in improving translation quality. We present a new technique for extracting syntax-based reordering rules, which are derived through a syntactically augmented alignment of source and target texts. The parallel corpus with reordered source side is then passed to an N-gram-based machine translation system and the obtained results are contrasted with a monotone system performance. In experiments, we show significant improvement for the Chinese-to-English translation task.

  • Access to the full text
    The TALP-UPC phrase-based translation system for EACL-WMT 2009  Open access

     Rodríguez Fonollosa, José Adrián; Khalilov, Maxim; Ruiz Costa-Jussà, Marta; Henríquez, Carlos; Hernández, Adolfo; Banchs, Rafael E.
    Association for Computational Linguistics. European Chapter. Conference
    Presentation's date: 2009-04-01
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This study presents the TALP-UPC submission to the EACL Fourth Worskhop on Statistical Machine Translation 2009 evaluation campaign. It outlines the architecture and configuration of the 2009 phrase-based statistical machine translation (SMT) system, putting emphasis on the major novelty of this year: combination of SMT systems implementing different word reordering algorithms. Traditionally, we have concentrated on the Spanish-to-English and English-to-Spanish News Commentary translation tasks.

  • A Catalan broadcast conversational speech database

     Schulz, Henrik; Rodríguez Fonollosa, José Adrián
    Joint SIG-IL/Microsoft Workshop on Speech and Language Technologies for Iberian Languages
    Presentation's date: 2009-09-04
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Data driven methods in speech and linguistic research, and system develoment require appropriate speech databases. A new Catalan speech database has been developed with a particular emphasis on broadcast conversational speech. The article describes origin and nature of the broadcasts and its acoustic environment. Annotation and transcription provide statistics on specific phenomena of exhibited speech, speaker characteristics and acoustic events. It concludes with perspective uses and limitations.

  • Informe anual del proyecto TECNOPARLA

     Rodríguez Fonollosa, José Adrián
    Date: 2008-11
    Report

     Share Reference managers Reference managers Open in new window

  • NEW REORDERING AND MODELING APPROACHES FOR STATISTICAL MACHINE TRANSLATION

     Ruiz Costa-jussa, Marta
    Defense's date: 2008-09-17
    Department of Signal Theory and Communications, Universitat Politècnica de Catalunya
    Theses

     Share Reference managers Reference managers Open in new window

  • Deriving benefit from a generalized syntax-based reordering

     Khalilov, Maxim; Rodríguez Fonollosa, José Adrián
    Jornadas en Tecnología del Habla
    Presentation's date: 2008-11
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • On the use of augmented HMM models for overcoming time and parameter independence assumptions in ASR

     Casar Lopez, Marta; Rodríguez Fonollosa, José Adrián
    Jornadas en Tecnología del Habla
    Presentation's date: 2008-11
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Access to the full text
    The TALP & I2R SMT Systems for IWSLT 2008  Open access

     Khalilov, M; Costa-Jussà, M R; Henríquez, C A Q; Rodríguez Fonollosa, José Adrián; Hernández, A; Mariño Acebal, Jose Bernardo; Banchs, R; Chen, B; Zhang, M; Aw, A; Li, H
    International Workshop on Spoken Language Translation
    Presentation's date: 2008-10
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This paper gives a description of the statistical machine translation (SMT) systems developed at the TALP Research Center of the UPC (Universitat Polit`ecnica de Catalunya) for our participation in the IWSLT’08 evaluation campaign. We present Ngram-based (TALPtuples) and phrase-based (TALPphrases) SMT systems. The paper explains the 2008 systems’ architecture and outlines translation schemes we have used, mainly focusing on the new techniques that are challenged to improve speech-to-speech translation quality. The novelties we have introduced are: improved reordering method, linear combination of translation and reordering models and new technique dealing with punctuation marks insertion for a phrase-based SMT system. This year we focus on the Arabic-English, Chinese-Spanish and pivot Chinese-(English)-Spanish translation tasks.

  • Técnicas estadísticas para el filtrado de un corpus bilingüe en traducción automática

     Montolar, Enrique; Ruiz Costa-Jussà, Marta; Rodríguez Fonollosa, José Adrián
    Jornadas en Tecnología del Habla
    Presentation's date: 2008-11
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window