Graphic summary
  • Show / hide key
  • Information


Scientific and technological production
  •  

1 to 50 of 195 results
  • Learning probabilistic automata : a study in state distinguishability

     De Balle Pigem, Borja; Castro Rabal, Jorge; Gavaldà Mestre, Ricard
    Theoretical computer science
    Vol. 473, p. 46-60
    DOI: 10.1016/j.tcs.2012.10.009
    Date of publication: 2013-02-18
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Known algorithms for learning PDFA can only be shown to run in time polynomial in the so-called distinguishability μ of the target machine, besides the number of states and the usual accuracy and confidence parameters. We show that the dependence on μ is necessary in the worst case for every algorithm whose structure resembles existing ones. As a technical tool, a new variant of Statistical Queries termed View the MathML source-queries is defined. We show how to simulate View the MathML source-queries using classical Statistical Queries and show that known PAC algorithms for learning PDFA are in fact statistical query algorithms. Our results include a lower bound: every algorithm to learn PDFA with queries using a reasonable tolerance must make Ω(1/μ1−c) queries for every c>0. Finally, an adaptive algorithm that PAC-learns w.r.t. another measure of complexity is described. This yields better efficiency in many cases, while retaining the same inevitable worst-case behavior. Our algorithm requires fewer input parameters than previously existing ones, and has a better sample bound.

  • Adaptively learning probabilistic deterministic automata from data streams

     De Balle Pigem, Borja; Castro Rabal, Jorge; Gavaldà Mestre, Ricard
    Machine learning
    p. 1-29
    DOI: 10.1007/s10994-013-5408-x
    Date of publication: 2013-10
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Markovian models with hidden state are widely-used formalisms for modeling sequential phenomena. Learnability of these models has been well studied when the sample is given in batch mode, and algorithms with PAC-like learning guarantees exist for specific classes of models such as Probabilistic Deterministic Finite Automata (PDFA). Here we focus on PDFA and give an algorithm for inferring models in this class in the restrictive data stream scenario: Unlike existing methods, our algorithm works incrementally and in one pass, uses memory sublinear in the stream length, and processes input items in amortized constant time. We also present extensions of the algorithm that (1) reduce to a minimum the need for guessing parameters of the target distribution and (2) are able to adapt to changes in the input distribution, relearning new models when needed. We provide rigorous PAC-like bounds for all of the above. Our algorithm makes a key usage of stream sketching techniques for reducing memory and processing time, and is modular in that it can use different tests for state equivalence and for change detection in the stream.

  • Learning Finite-State Machines: Statistical and Algorithmic Aspects  Open access

     De Balle Pigem, Borja
    Department of Computer Science, Universitat Politècnica de Catalunya
    Theses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Aquesta tesi tracta sobre problemes d'aprenentatge automàtic de models generatius i predictius sobre dades seqüencials. Tots els models considerats en aquesta tesi tenen en comú que es poden definir en termes de màquines d'estat finit. Una de les línies d'investigació en la qual ens centrem és el disseny d'algorismes d'aprenentatge per a l'anàleg probabilistica dels clàssics Autòmats Finits Deterministes (DFA). Aquestes màquines d'estats donen lloc a una classe de models generatius sobre seqüències que gaudeix de bones propietats algorísmiques i d'un bon nivell d'expressivitat. El algorismes de fusió d'estats per aprendre aquests models es poden interpretar com un esquema de clusterització divisiva on el "graf de dependències" entre els clústers no és necessariament un arbre. En aquesta tesi hem caracteritzat aquest tipus d'algorismes emprant la terminologia d'oracles estadístics, i hem emprat aquesta caracterització per demostrar una cota inferior de la complexitat d'aquests algorismes que depèn explícitament d'un criteri de separabilitat dels estats de la màquina objectiu. En la situació més realista on l'algorisme rep els elements de la mostra d'entrenament un a un en una seqüència infinita, hem dissenyat un algorisme de fusió d'estats amb uns requeriments algorísmics molt estrictes en quant a temps de processat de cada element i memòria total emprada per l'algorisme. Per tots aquests algorismes hem demostrat garanties d'aprenentatge tipus PAC (probablement aproximadament correcte). Al cor dels algorismes de fusió d'estats hi trobem sempre un test estadístic sobre similitud de distribucions de probabilitat. En la versió en línia de l'algorisme hem usat un test basat en la tècnica bootstrap que accelera la velocitat de convergència de l'algorisme en molts casos. Seguint en aquesta línia, també hem estudiat classes de models més general que són susceptibles de ser apreses mitjançant algorismes de fusió d'estats. Fent servir aquestes tècniques hem dissenyat algorismes d'aprenentatge PAC per a models de Markov de temps continu i transduccions probabilistes entre parelles de seqüències alineades. Les eines bàsiques per obtenir tots aquests resultats inclouen una varietat de desigualtats de concentració en probabilitat i algorismes eficients per resumir grans volums de dades.Seguint una altra línia de treball, aquesta tesi presenta contribucions en el camp dels algorismes d'aprenentatge basats en mètodes espectrals. La principal virtut d'aquests algorismes es la possibilitat d'acotar formalment l'error en termes de la mida de la mostra d'entrenament, i el guany en velocitat que representen envers les alternatives iteratives com l'algorisme de Baum-Welch. En aquesta tesi presentem el primer algorisme d'aquest tipus capaç d'aprendre distribucions condicionals sobre parelles de cadenes alineades. També demostrem que aquests algorismes poden aprendre qualsevol autòmat probabilista i d'aquesta manera estenem el conjunt de màquines que aquests algorismes són capaços d'aprendre. Els dos últims capítols presenten nous algorismes d'aprenentatge que combinen els mètodes espectrals amb l'optimització convexa. En un cas, això ens permet donar una formulació alternativa per molts dels algorismes espectrals existents. En el segon cas, presentem el primer anàlisi d'un mètode d'aprenentatge d'autòmats amb pesos sota una formulació agnòstica; és a dir, sense assumir que les dades d'entrenament han estat generades per cap autòmat. Els resultats d'aquesta segona part beuen de la literatura clàssica en teoria d'autòmats i dels mètodes de l'aprenentatge estadístic.

    The present thesis addresses several machine learning problems on generative and predictive models on sequential data. All the models considered have in common that they can be de ned in terms of nite-state machines. On one line of work we study algorithms for learning the probabilistic analog of Deterministic Finite Automata (DFA). This provides a fairly expressive generative model for sequences with very interesting algorithmic properties. State-merging algorithms for learning these models can be interpreted as a divisive clustering scheme where the "dependency graph" between clusters is not necessarily a tree. We characterize these algorithms in terms of statistical queries and a use this characterization for proving a lower bound with an explicit dependency on the distinguishability of the target machine. In a more realistic setting, we give an adaptive state-merging algorithm satisfying the stringent algorithmic constraints of the data streams computing paradigm. Our algorithms come with strict PAC learning guarantees. At the heart of state-merging algorithms lies a statistical test for distribution similarity. In the streaming version this is replaced with a bootstrap-based test which yields faster convergence in many situations. We also studied a wider class of models for which the state-merging paradigm also yield PAC learning algorithms. Applications of this method are given to continuous-time Markovian models and stochastic transducers on pairs of aligned sequences. The main tools used for obtaining these results include a variety of concentration inequalities and sketching algorithms. In another line of work we contribute to the rapidly growing body of spectral learning algorithms. The main virtues of this type of algorithms include the possibility of proving nite-sample error bounds in the realizable case and enormous savings on computing time over iterative methods like Expectation-Maximization. In this thesis we give the rst application of this method for learning conditional distributions over pairs of aligned sequences de ned by probabilistic nite-state transducers. We also prove that the method can learn the whole class of probabilistic automata, thus extending the class of models previously known to be learnable with this approach. In the last two chapters we present works combining spectral learning with methods from convex optimization and matrix completion. Respectively, these yield an alternative interpretation of spectral learning and an extension to cases with missing data. In the latter case we used a novel joint stability analysis of matrix completion and spectral learning to prove the rst generalization bound for this type of algorithms that holds in the non-realizable case. Work in this area has been motivated by connections between spectral learning, classic automata theory, and statistical learning; tools from these three areas have been used.

  • Improved Self-management of DataCenter Systems Applying Machine Learning  Open access

     Berral Garcia, Josep Lluis
    Universitat Politècnica de Catalunya
    Theses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    La Computació Autònoma és una àrea de recerca de les Ciències i Tecnologies de Computadors, originada durant els anys 2000. Es centra en l'optimització de sistemes distribuits de computació complexos mitjançant auto-gestió. Com aquests sistemes creixen en complexitat, com ara els centres de dades distribuits per a computació al núvol, operadors i arquitectes de sistemes necessiten suport per entendre, dissenyar i optimitzar aquests sistemes, i encara més quan aquests estan distribuits arreu del món i pertanyen a differents organitzacions. L'auto-gestió permet a aquests sistemes millorar la gestió de recursos i energia, elements importants sobretot quan tenen costos d'execució i ús.En aquesta tesi proposem la millora de tècniques de computació autònoma per a gestió de recursos, aplicant mètodes de modelatge i predicció, usant Aprenentatge Automàtic i Intel·ligència Artificial. Els mètodes d'aprenentatge automàtic poden trobar models acurats a partir del comportament de sistemes, així com predir estats i valors. Aquests models tenen l'avantatge de poder-se actualitzar davant de canvis en el sistema, observant nous exemples i re-entrenant els models. Per tant, mitjançant aquestes tècniques podem trobar nous mètodes per a prendre decisions "intel·ligents" i descobrir nova informació i coneixement dels sistemes observats.Aquesta tesi parteix de l'estat de l'art, on la gestió es basa en el coneixement d'un administrador expert, on les dades son sempre conegudes i els models son fets ad-hoc per experts, centrant-se en components de computació com recursos de CPU/Memòria/IO; fins a un nou estat de l'art on la gestió es dirigeix per models son automàticament apresos, proporcionant informació i predicció per a dades incompletes, mancants o incertes, en un escenari de xarxes de centres de dades d'abast global.* Primer de tot tractem l'escenari on els components de presa de decisions coneixen tota la informació i estat del sistema: quan consumeix cada treball, quina qualitat de servei es proporciona, quins temps requereix cada procés, etc. Tot centrant-se en cada component i política, de cada element involucrat en l'execució d'aquests treballs.* Següentment ens centrem en l'escenari on en comptes d'oracles fixats que ens proveeixen informació a partir d'una fòrmula escrita per un expert, usem aprenentatge automàtic per crear aquests oracles. Aqui ens fixem en components i detalls específics on algunes dades poden no ser conegudes i per tant han de ser predites per un model.* També reduim el problema d'optimització d'assignació de recursos a treballs (en aquest cas serveis web virtualitzats), a un problema matemàtic, indicant tots els factors, variables i elements que el defineixen, així com les condicions que acoten el problema. El problema d'assignació pot ser modelat com un Programa Mixte Lineal-Enter. Aqui l'escenari ja contempla la gestió d'un centre de dades complet, introduint dades predites mitjançant els models apresos.* Complementem el model ampliant el nombre d'elements a predir, estudiant els més importants (CPu, memòria i IO) que poden patir "soroll" al ser monitoritzats i estimats. Un cop els predictors apresos ajuden a millorar la presa de decisions, el sistema pot auto-gestionar-se sense dependre tant de coneixement expert, i la recerca es pot centrar en un escenari on tots els elements son difícils d'estimar. Aqui introduim nous elements importants per a la gestió, donat un context on els dentres de dades son repartits pel món, amb diferents costos d'energia i condicions per als nivells de qualitat de servei.* Finalment, fem una breu introducció als costos de situar centres de dades en aquesta xarxa, orientant el consum cap a energies renovables, per tal d'abaratir els costos d'energia.

    Autonomic Computing is a Computer Science and Technologies research area, originated during mid 2000's. It focuses on optimization and improvement of complex distributed computing systems through self-control and self-management. As distributed computing systems grow in complexity, like multi-datacenter systems in cloud computing, the system operators and architects need more help to understand, design and optimize manually these systems, even more when these systems are distributed along the world and belong to different entities and authorities. Self-management lets these distributed computing systems improve their resource and energy management, a very important issue when resources have a cost, by obtaining, running or maintaining them. Here we propose to improve Autonomic Computing techniques for resource management by applying modeling and prediction methods from Machine Learning and Artificial Intelligence. Machine Learning methods can find accurate models from system behaviors and often intelligible explanations to them, also predict and infer system states and values. These models obtained from automatic learning have the advantage of being easily updated to workload or configuration changes by re-taking examples and re-training the predictors. So employing automatic modeling and predictive abilities, we can find new methods for making "intelligent" decisions and discovering new information and knowledge from systems. This thesis departs from the state of the art, where management is based on administrators expertise, well known data, ad-hoc studied algorithms and models, and elements to be studied from computing machine point of view; to a novel state of the art where management is driven by models learned from the same system, providing useful feedback, making up for incomplete, missing or uncertain data, from a global network of datacenters point of view. - First of all, we cover the scenario where the decision maker works knowing all pieces of information from the system: how much will each job consume, how is and will be the desired quality of service, what are the deadlines for the workload, etc. All of this focusing on each component and policy of each element involved in executing these jobs. -Then we focus on the scenario where instead of fixed oracles that provide us information from an expert formula or set of conditions, machine learning is used to create these oracles. Here we look at components and specific details while some part of the information is not known and must be learned and predicted. - We reduce the problem of optimizing resource allocations and requirements for virtualized web-services to a mathematical problem, indicating each factor, variable and element involved, also all the constraints the scheduling process must attend to. The scheduling problem can be modeled as a Mixed Integer Linear Program. Here we face an scenario of a full datacenter, further we introduce some information prediction. - We complement the model by expanding the predicted elements, studying the main resources (this is CPU, Memory and IO) that can suffer from noise, inaccuracy or unavailability. Once learning predictors for certain components let the decision making improve, the system can become more ¿expert-knowledge independent¿ and research can focus on an scenario where all the elements provide noisy, uncertainty or private information. Also we introduce to the management optimization new factors as for each datacenter context and costs may change, turning the model as "multi-datacenter" - Finally, we review of the cost of placing datacenters depending on green energy sources, and distribute the load according to green energy availability.

  • The Architecture of a Churn Prediction System Based on Stream Mining

     De Balle Pigem, Borja; Casas Fernandez, Bernardino; Catarineu Morales, Alex; Gavaldà Mestre, Ricard; Manzano Macho, David
    Congrés Internacional de l¿Associació Catalana d'Intel·ligència Artificial
    p. 157-166
    DOI: 10.3233/978-1-61499-320-9-157
    Presentation's date: 2013-10-24
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Churning is the movement of customers from a company to another. For any company, being able to predict with some time which of their customers will churn is essential to take actions in order to retain them, and for this reason most sectors invest substantial effort in techniques for (semi) automatically predicting churning, and data mining and machine learning are among the techniques successfully used to this effect. In this paper we describe a prototype for churn prediction using stream mining methods, which offer the additional promise of detecting new patterns of churn in real-time streams of high-speed data, and adapting quickly to a changing reality. The prototype is implemented on top of the MOA (Massive Online Analysis) framework for stream mining. The application implicit in the prototype is the telecommunication operator (mobile phone) sector.

  • An efficient closed frequent itemset miner for the MOA stream mining system

     Quadrana, Massimo; Bifet Figuerol, Albert Carles; Gavaldà Mestre, Ricard
    International Conference of the Catalan Association for Artificial Intelligence
    p. 203-212
    DOI: 10.3233/978-1-61499-320-9-203
    Presentation's date: 2013-10-24
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    We describe and evaluate an implementation of the IncMine algorithm due to Cheng, Ke, and Ng (2008) for mining frequent closed itemsets from data streams, working on the MOA platform. The goal was to produce a robust, efficient, and usable tool for that task that can both be used by practitioners and used for evaluation of research in the area. We experimentally confirm the excellent performance of the algorithm and its ability to handle concept drift.

  • Access to the full text
    Power-aware multi-data center management using machine learning  Open access

     Berral Garcia, Josep Lluis; Gavaldà Mestre, Ricard; Torres Viñals, Jordi
    International Workshop on Power-aware Algorithms, Systems, and Architectures
    p. 858-867
    DOI: 10.1109/ICPP.2013.102
    Presentation's date: 2013-10-01
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    The cloud relies upon multi-datacenter (multi-DC) infrastructures distributed along the world, where people and enterprises pay for resources to offer their web-services to worldwide clients. Intelligent management is required to automate and manage these infrastructures, as the amount of resources and data to manage exceeds the capacities of human operators. Also, it must take into account the cost of running the resources (energy) and the quality of service towards web-services and clients. (De-)consolidation and priming proximity to clients become two main strategies to allocate resources and properly place these web-services in the multi-DC network. Here we present a mathematical model to describe the scheduling problem given web-services and hosts across a multi-DC system, enhancing the decision makers with models for the system behavior obtained using machine learning. After running the system on real DC infrastructures we see that the model drives web-services to the best locations given quality of service, energy consumption, and client proximity, also (de-)consolidating according to the resources required for each web-service given its load.

    The cloud relies upon multi-datacenter (multi-DC) infrastructures distributed along the world, where people and enterprises pay for resources to offer their web-services to worldwide clients. Intelligent management is required to automate and manage these infrastructures, as the amount of resources and data to manage exceeds the capacities of human operators. Also, it must take into account the cost of running the resources (energy) and the quality of service towards web-services and clients. (De-)consolidation and priming proximity to clients become two main strategies to allocate resources and properly place these web-services in the multi-DC network. Here we present a mathematical model to describe the scheduling problem given web-services and hosts across a multi-DC system, enhancing the decision makers with models for the system behavior obtained using machine learning. After running the system on real DC infrastructures we see that the model drives web-services to the best locations given quality of service, energy consumption, and client proximity, also (de-)consolidating according to the resources required for each web-service given its load.

    Postprint (author’s final draft)

  • Empowering automatic data-center management with machine learning

     Berral Garcia, Josep Lluis; Gavaldà Mestre, Ricard; Torres Viñals, Jordi
    ACM Symposium on Applied Computing
    p. 170-172
    DOI: 10.1145/2480362.2480397
    Presentation's date: 2013-03-21
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    The Cloud as computing paradigm has become nowadays crucial for most Internet business models. Managing and optimizing its performance on a moment-by-moment basis is not easy given as the amount and diversity of elements involved (hardware, applications, workloads, customer needs...). Here we show how a combination of scheduling algorithms and data mining techniques helps improving the performance and profitability of a data-center running virtualized web-services. We model the data-center's main resources (CPU, memory, IO), quality of service (viewed as response time), and workloads (incoming streams of requests) from past executions. We show how these models to help scheduling algorithms make better decisions about job and resource allocation, aiming for a balance between throughput, quality of service, and power consumption.

  • Energy-efficient and multifaceted resource management for profit-driven virtualized data centers

     Goiri, Iñigo; Berral Garcia, Josep Lluis; Fitó Comellas, Josep Oriol; Julià Massó, Ferran; Nou Castell, Ramon; Guitart Fernández, Jordi; Gavaldà Mestre, Ricard; Torres Viñals, Jordi
    Future generation computer systems
    Vol. 28, num. 5, p. 718-731
    DOI: 10.1016/j.future.2011.12.002
    Date of publication: 2012-05
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    As long as virtualization has been introduced in data centers, it has been opening new chances for resource management. Nowadays, it is not just used as a tool for consolidating underused nodes and save power; it also allows new solutions to well-known challenges, such as heterogeneity management. Virtualization helps to encapsulate Web-based applications or HPC jobs in virtual machines (VMs) and see them as a single entity which can be managed in an easier and more efficient way. We propose a new scheduling policy that models and manages a virtualized data center. It focuses on the allocation of VMs in data center nodes according to multiple facets to optimize the provider’s profit. In particular, it considers energy efficiency, virtualization overheads, and SLA violation penalties, and supports the outsourcing to external providers. The proposed approach is compared to other common scheduling policies, demonstrating that a provider can improve its benefit by 30% and save power while handling other challenges, such as resource outsourcing, in a better and more intuitive way than other typical approaches do.

    Postprint (author’s final draft)

  • A methodology for the evaluation of high response time on E-commerce users and sales

     Poggi, Nicolas; Carrera Perez, David; Gavaldà Mestre, Ricard; Ayguade Parra, Eduard; Torres Viñals, Jordi
    Information systems frontiers
    p. 1-19
    DOI: 10.1007/s10796-012-9387-4
    Date of publication: 2012-10-06
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    The widespread adoption of high speed Internet access and it¿s usage for everyday tasks are causing profound changes in users¿ expectations in terms of Web site performance and reliability. At the same time, server management is living a period of changes with the emergence of the cloud computing paradigm that enables scaling server infrastructures within minutes. To help set performance objectives for maximizing user satisfaction and sales, while minimizing the number of servers and their cost, we present a methodology to determine how user sales are affected as response time increases. We begin with the characterization of more than 6 months of Web performance measurements, followed by the study of how the fraction of buyers in the workload is higher at peak traffic times, to then build a model of sales through a learning process using a 5-year sales dataset. Finally, we present our evaluation of high response time on users for popular applications found in the Web.

  • Toward energy-aware scheduling using machine learning

     Berral Garcia, Josep Lluis; Goiri Presa, Iñigo; Nou Castell, Ramon; Julià Massó, Ferran; Fitó Comellas, Josep Oriol; Guitart Fernández, Jordi; Gavaldà Mestre, Ricard; Torres Viñals, Jordi
    DOI: 10.1002/9781118342015.ch8
    Date of publication: 2012-07-30
    Book chapter

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Best Student Paper ICGI 2012

     De Balle Pigem, Borja; Castro Rabal, Jorge; Gavaldà Mestre, Ricard
    Award or recognition

    View View Open in new window  Share

  • MINERIA EN DATOS BIOLOGICOS Y SOCIALES: ALGORITMOS, TEORIA E IMPLEMENTACION

     Morrill, Glyn Verden; Quattoni, Ariadna Julieta; Arratia Quesada, Argimiro Alejandro; De Balle Pigem, Borja; Arias Vicente, Marta; Casas Fernandez, Bernardino; Bifet Figuerol, Albert Carles; Berral Garcia, Josep Lluis; Lopez Herrera, Josefina; Baixeries i Juvillà, Jaume; Delgado Pin, Jordi; Belanche Muñoz, Luis Antonio; Castro Rabal, Jorge; Lozano Bojados, Antoni; Ferrer Cancho, Ramon; Sierra Santibañez, Maria Josefina; Gavaldà Mestre, Ricard
    Competitive project

     Share

  • Learning probability distributions generated by finite-state machines

     Castro Rabal, Jorge; Gavaldà Mestre, Ricard
    International Conference on Grammatical Inference
    Presentation's date: 2012-09-05
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Online techniques for dealing with concept drift in process mining

     Carmona Vargas, Jose; Gavaldà Mestre, Ricard
    International Symposium on Intelligent Data Analysis
    p. 90-102
    DOI: 10.1007/978-3-642-34156-4_10
    Presentation's date: 2012-12-27
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Access to the full text
    Bootstrapping and learning PDFA in data streams  Open access

     De Balle Pigem, Borja; Castro Rabal, Jorge; Gavaldà Mestre, Ricard
    International Conference on Grammatical Inference
    p. 34-48
    Presentation's date: 2012-09-07
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Markovian models with hidden state are widely-used formalisms for modeling sequential phenomena. Learnability of these models has been well studied when the sample is given in batch mode, and algorithms with PAC-like learning guarantees exist for specic classes of models such as Probabilistic Deterministic Finite Automata (PDFA). Here we focus on PDFA and give an algorithm for infering models in this class under the stringent data stream scenario: unlike existing methods, our algorithm works incrementally and in one pass, uses memory sublinear in the stream length, and processes input items in amortized constant time. We provide rigorous PAC-like bounds for all of the above, as well as an evaluation on synthetic data showing that the algorithm performs well in practice. Our algorithm makes a key usage of several old and new sketching techniques. In particular, we develop a new sketch for implementing bootstrapping in a streaming setting which may be of independent interest. In experiments we have observed that this sketch yields important reductions in the examples required for performing some crucial statistical tests in our algorithm.

  • Applying trust metrics based on user interactions to recommendation in social networks

     Lumbreras, Alberto; Gavaldà Mestre, Ricard
    IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
    p. 1159-1164
    DOI: 10.1109/ASONAM.2012.200
    Presentation's date: 2012-08
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Mining frequent closed trees in evolving data streams

     Bifet Figuerol, Albert Carles; Gavaldà Mestre, Ricard
    Intelligent data analysis
    Vol. 15, num. 1, p. 29-48
    DOI: 10.3233/IDA-2010-0454
    Date of publication: 2011
    Journal article

     Share Reference managers Reference managers Open in new window

  • XML Tree classification on evolving data streams

     Bifet Figuerol, Albert Carles; Gavaldà Mestre, Ricard
    DOI: 10.4018/978-1-61350-356-0
    Date of publication: 2011-11
    Book chapter

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Organización del International Workshop on Social Web Mining

     Gavaldà Mestre, Ricard
    Competitive project

     Share

  • Mining frequent closed graphs on evolving data streams.

     Bifet Figuerol, Albert Carles; Holmes, Geoff; Pfahringer, Bernhard; Gavaldà Mestre, Ricard
    ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
    p. 591-599
    DOI: 2020408.2020501
    Presentation's date: 2011-08
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Graph mining is a challenging task by itself, and even more so when processing data streams which evolve in real-time. Data stream mining faces hard constraints regarding time and space for processing, and also needs to provide for concept drift detection. In this paper we present a framework for studying graph pattern mining on time-varying streams. Three new methods for mining frequent closed subgraphs are presented. All methods work on coresets of closed subgraphs, compressed representations of graph sets, and maintain these sets in a batch-incremental manner, but use different approaches to address potential concept drift. An evaluation study on datasets comprising up to four million graphs explores the strength and limitations of the proposed methods. To the best of our knowledge this is the first work on mining frequent closed subgraphs in non-stationary data streams.

  • Proactive software rejuvenation solution for web enviroments on virtualized platforms

     Alonso López, Javier
    Department of Computer Architecture, Universitat Politècnica de Catalunya
    Theses

    Read the abstract Read the abstract  Share Reference managers Reference managers Open in new window

    The availability of the Information Technologies for everything, from everywhere, at all times is a growing requirement. We use information Technologies from common and social tasks to critical tasks like managing nuclear power plants or even the International Space Station (ISS). However, the availability of IT infrastructures is still a huge challenge nowadays. In a quick look around news, we can find reports of corporate outage, affecting millions of users and impacting on the revenue and image of the companies. It is well known that, currently, computer system outages are more often due to software faults, than hardware faults. Several studies have reported that one of the causes of unplanned software outages is the software aging phenomenon. This term refers to the accumulation of errors, usually causing resource contention, during long running application executions, like web applications, which normally cause applications/systems to hang or crash. Gradual performance degradation could also accompany software aging phenomena. The software aging phenomena are often related to memory bloating/ leaks, unterminated threads, data corruption, unreleased file-locks or overruns. We can find several examples of software aging in the industry. The work presented in this thesis aims to offer a proactive and predictive software rejuvenation solution for Internet Services against software aging caused by resource exhaustion. To this end, we first present a threshold based proactive rejuvenation to avoid the consequences of software aging. This first approach has some limitations, but the most important of them it is the need to know a priori the resource or resources involved in the crash and the critical condition values. Moreover, we need some expertise to fix the threshold value to trigger the rejuvenation action. Due to these limitations, we have evaluated the use of Machine Learning to overcome the weaknesses of our first approach to obtain a proactive and predictive solution. Finally, the current and increasing tendency to use virtualization technologies to improve the resource utilization has made traditional data centers turn into virtualized data centers or platforms. We have used a Mathematical Programming approach to virtual machine allocation and migration to optimize the resources, accepting as many services as possible on the platform while at the same time, guaranteeing the availability (via our software rejuvenation proposal) of the services deployed against the software aging phenomena. The thesis is supported by an exhaustive experimental evaluation that proves the effectiveness and feasibility of our proposals for current systems.

  • Adaptive scheduling on power-aware managed data-centers using machine learning

     Berral Garcia, Josep Lluis; Gavaldà Mestre, Ricard; Torres Viñals, Jordi
    ACM/IEEE International Conference on Grid Computing
    p. 66-73
    DOI: 10.1109/Grid.2011.18
    Presentation's date: 2011-09-22
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Energy-related costs have become one of the major economic factors in IT data-centers, and companies and the research community are currently working on new efficient power-aware resource management strategies, also known as “Green IT”. Here we propose an autonomic scheduling of tasks and web-services over cloud environments, focusing on the profit optimization by executing a set of tasks according to servicelevel agreements minus its costs like power consumption. The principal contribution is the use of machine learning techniques in order to predict a priori resource usages, like CPU consumption, and estimate the tasks response time based on the monitored data traffic characteristics. Further, in order to optimize the scheduling, an exact solver based on mixed integer linear programming is used as a proof of concept, and also compared to some approximate algorithm solvers to find valid alternatives for the NP-hard problem of exact schedule solving. Experiments show that machine learning algorithms can predict system behaviors with acceptable accuracy, also the ILP solver obtains the optimal solution managing to adjust appropriately the schedule according to profits and cost of power increases, also reducing migrations when their cost is taken into consideration. Finally, is demonstrated that one of the approximate algorithm solvers is much faster but close in terms of the optimization goal to the exact solver.

  • Learning read-constant polynomials of constant degree modulo composites

     Chattopadhyay, Arkadev; Gavaldà Mestre, Ricard; Hansen, Kristoffer Arnsfelt; Thérien, Denis
    International Computer Science Symposium in Russia
    p. 29-42
    DOI: 10.1007/978-3-642-20712-9_3
    Presentation's date: 2011-06-14
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • SalaboMiner: a biomedical literature mining tool for inferring the genetics of complex diseases

     Rib, Leonor; Gavaldà Mestre, Ricard; Soria, José Manuel; Buil, Alfonso
    International Conference on Bioinformatics Models, Methods and Algorithms
    p. 143-148
    Presentation's date: 2011-01-26
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Access to the full text
    Detecting sentiment change in twitter streaming data  Open access

     Bifet Figuerol, Albert Carles; Holmes, Geoffrey; Pfahringer, Bernhard; Gavaldà Mestre, Ricard
    Workshop on Applications of Pattern Analysis
    p. 5-11
    Presentation's date: 2011-10-19
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    MOA-TweetReader is a real-time system to read tweets in real time, to detect changes, and to fi nd the terms whose frequency changed. Twitter is a micro-blogging service built to discover what is happening at any moment in time, anywhere in the world. Twitter messages are short, and generated constantly, and well suited for knowledge discovery using data stream mining. MOA-TweetReader is a software extension to the MOA framework. Massive Online Analysis (MOA) is a software environment for implementing algorithms and running experiments for online learning from evolving data streams.

  • Non-intrusive estimation of QoS degradation impact on E-commerce user satisfaction

     Poggi, Nicolas; Carrera Perez, David; Gavaldà Mestre, Ricard; Ayguade Parra, Eduard
    IEEE International Symposium on Network Computing and Applications
    p. 179-186
    DOI: 10.1109/NCA.2011.31
    Presentation's date: 2011-08-26
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Optimal resource allocation in a virtualized software aging platform with software rejuvenation

     Alonso López, Javier; Goiri Presa, Iñigo; Guitart Fernández, Jordi; Gavaldà Mestre, Ricard; Torres Viñals, Jordi
    IEEE International Symposium on Software Reliability Engineering
    p. 250-259
    DOI: 10.1109/ISSRE.2011.30
    Presentation's date: 2011-11-29
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Nowadays, virtualized platforms have become the most popular option to deploy complex enough services. The reason is that virtualization allows resource providers to increase resource utilization. Deployed services are expected to be always available, but these long-running services are especially sensitive to suffer from software aging phenomenon. This term refers to an accumulation of errors, which usually causes resource exhaustion, and eventually makes the service hang/crash. To counteract this phenomenon, a preventive approach to fault management, called software rejuvenation has been proposed. In this paper, we propose a framework which provides transparent and predictive software rejuvenation to web services that suffer software aging on virtualized platforms, achieving high levels of availability. To exploit the provider resources, the framework also seeks to maximize the number of services running simultaneously on the platform, while guaranteeing the resources needed by each service.

  • Introduction to programming

     Cortadella Fortuny, Jordi; Gavaldà Mestre, Ricard; Orejas Valdes, Fernando
    Date: 2010-09-01
    Report

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Access to the full text
    An integer linear programming representation for data-center power-aware management  Open access

     Berral Garcia, Josep Lluis; Gavaldà Mestre, Ricard; Torres Viñals, Jordi
    Date: 2010-11-12
    Report

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This work exposes how to represent a grid data-center based scheduling problem, taking the advantages of the virtualization and consolidation techniques, as a linear integer programming problem including all three mentioned factors. Although being integer linear programming (ILP) a computationally hard problem, specifying correctly its constraints and optimization function can contribute to find integer optimal solutions in relative short time. So ILP solutions can help designers and system managers not only to apply them to schedulers but also to create new heuristics and holistic functions that approximate well to the optimal solutions in a quicker way.

    Postprint (author’s final draft)

  • European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases 2010

     Arias Vicente, Marta; Berral Garcia, Josep Lluis; Quattoni, Ariadna Julieta; De Balle Pigem, Borja; Casas Fernandez, Bernardino; Bifet Figuerol, Albert Carles; Balcazar Navarro, Jose Luis; Gavaldà Mestre, Ricard
    Competitive project

     Share

  • Organización de los congresos ECML/PKDD'10

     Bifet Figuerol, Albert Carles; Quattoni, Ariadna Julieta; De Balle Pigem, Borja; Berral Garcia, Josep Lluis; Casas Fernandez, Bernardino; Balcazar Navarro, Jose Luis; Gavaldà Mestre, Ricard; Carreras Perez, Xavier; Arias Vicente, Marta
    Competitive project

     Share

  • Access to the full text
    Towards energy-aware scheduling in data centers using machine learning  Open access

     Berral Garcia, Josep Lluis; Goiri Presa, Iñigo; Nou Castell, Ramon; Julià, Ferran; Guitart Fernández, Jordi; Gavaldà Mestre, Ricard; Torres Viñals, Jordi
    International Conference on Energy-Efficient Computing and Networking
    p. 215-224
    Presentation's date: 2010-04-15
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    As energy-related costs have become a major economical factor for IT infrastructures and data-centers, companies and the research community are being challenged to nd better and more efficient power-aware resource management strategies. There is a growing interest in \Green" IT and there is still a big gap in this area to be covered. In order to obtain an energy-ecient data center, we propose a framework that provides an intelligent consolidation methodology using different techniques such as turning on/o machines, power-aware consolidation algorithms, and machine learning techniques to deal with uncertain information while maximizing performance. For the machine learning approach, we use models learned from previous system behaviors in order to predict power consumption levels, CPU loads, and SLA timings, and improve scheduling decisions. Our framework is vertical, because it considers from watt consumption to workload features, and cross-disciplinary, as it uses a wide variety of techniques. We evaluate these techniques with a framework that covers the whole control cycle of a real scenario, using a simulation with representative heterogeneous workloads, and we measure the quality of the results according to a set of metrics focused toward our goals, besides traditional policies. The results obtained indicate that our approach is close to the optimal placement and behaves better when the level of uncertainty increases.

    As energy-related costs have become a major economical factor for IT infrastructures and data-centers, companies and the research community are being challenged to nd better and more efficient power-aware resource management strategies. There is a growing interest in "Green" IT and there is still a big gap in this area to be covered. In order to obtain an energy-efficient data center, we propose a framework that provides an intelligent consolidation methodology using di erent techniques such as turning on/o ff machines, power-aware consolidation algorithms, and machine learning techniques to deal with uncertain information while maximizing performance. For the machine learning approach, we use models learned from previous system behaviors in order to predict power consumption levels, CPU loads, and SLA timings, and improve scheduling decisions. Our framework is vertical, because it considers from watt consumption to workload features, and cross-disciplinary, as it uses a wide variety of techniques. We evaluate these techniques with a framework that covers the whole control cycle of a real scenario, using a simulation with representative heterogeneous workloads, and we measure the quality of the results according to a set of metrics focused toward our goals, besides traditional policies. The results obtained indicate that our approach is close to the optimal placement and behaves better when the level of uncertainty increases.

    Postprint (author’s final draft)

  • Access to the full text
    J2EE instrumentation for software aging root cause application component determination with AspectJ  Open access

     Alonso López, Javier; Torres Viñals, Jordi; Berral Garcia, Josep Lluis; Gavaldà Mestre, Ricard
    IEEE Workshop on Dependable Parallel, Distributed and Network-Centric System
    p. 1-8
    DOI: 10.1109/IPDPSW.2010.5470857
    Presentation's date: 2010-04-23
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Unplanned system outages have a negative impact on company revenues and image. While the last decades have seen a lot of efforts from industry and academia to avoid them, they still happen and their impact is increasing. According to many studies, one of the most important causes of these outages is software aging. Software aging phenomena refers to the accumulation of errors, usually provoking resource contention, during long running application executions, like web applications, which normally cause applications/systems hang or crash. Determining the software aging root cause failure, not the resource or resources involved in, is a huge task due to the growing day by day complexity of the systems. In this paper we present a monitoring framework based on Aspect Programming to monitor the resources used by every application component in runtime. Knowing the resources used by every component of the application we can determine which components are related to the software aging. Furthermore, we present a case study where we evaluate our approach to determine in a web application scenario, which components are involved in the software aging with promising results.

    Unplanned system outages have a negative impact on company revenues and image. While the last decades have seen a lot of efforts from industry and academia to avoid them, they still happen and their impact is increasing. According to many studies, one of the most important causes of these outages is software aging. Software aging phenomena refers to the accumulation of errors, usually provoking resource contention, during long running application executions, like web applications, which normally cause applications/systems hang or crash. Determining the software aging root cause failure, not the resource or resources involved in, is a huge task due to the growing day by day complexity of the systems. In this paper we present a monitoring framework based on Aspect Programming to monitor the resources used by every application component in runtime. Knowing the resources used by every component of the application we can determine which components are related to the software aging. Furthermore, we present a case study where we evaluate our approach to determine in a web application scenario, which components are involved in the software aging with promising results.

  • Characterization of workload and resource consumption for an online travel and booking site

     Poggi Mastrokalo, Nicolas; Carrera Perez, David; Gavaldà Mestre, Ricard; Torres Viñals, Jordi; Ayguade Parra, Eduard
    IEEE International Symposium on Workload Characterization
    p. 1-10
    DOI: 10.1109/IISWC.2010.5649408
    Presentation's date: 2010-12-02
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Online travel and ticket booking is one of the top E-Commerce industries. As they present a mix of products: flights, hotels, tickets, restaurants, activities and vacational packages, they rely on a wide range of technologies to support them: Javascript, AJAX, XML, B2B Web services, Caching, Search Algorithms and Affiliation; resulting in a very rich and heterogeneous workload. Moreover, visits to travel sites present a great variability depending on time of the day, season, promotions, events, and linking; creating bursty traffic, making capacity planning a challenge. It is therefore of great importance to understand how users and crawlers interact on travel sites and their effect on server resources, for devising cost effective infrastructures and improving the Quality of Service for users. In this paper we present a detailed workload and resource consumption characterization of the web site of a top national Online Travel Agency. Characterization is performed on server logs, including both HTTP data and resource consumption of the requests, as well as the server load status during the execution. From the dataset we characterize user sessions, their patterns and how response time is affected as load on Web servers increases. We provide a fine grain analysis by performing experiments differentiating: types of request, time of the day, products, and resource requirements for each. Results show that the workload is bursty, as expected, that exhibit different properties between day and night traffic in terms of request type mix, that user session length cover a wide range of durations, which response time grows proportionally to server load, and that response time of external data providers also increase on peak hours, amongst other results. Such results can be useful for optimizing infrastructure costs, improving QoS for users, and development of realistic workload generators for similar applications.

  • Access to the full text
    Adaptive on-line software aging prediction based on machine learning  Open access

     Alonso López, Javier; Torres Viñals, Jordi; Berral Garcia, Josep Lluis; Gavaldà Mestre, Ricard
    IEEE/IFIP International Conference on Dependable Systems and Networks
    p. 507-516
    DOI: 10.1109/DSN.2010.5544275
    Presentation's date: 2010-07-28
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    The growing complexity of software systems is resulting in an increasing number of software faults. According to the literature, software faults are becoming one of the main sources of unplanned system outages, and have an important impact on company benefits and image. For this reason, a lot of techniques (such as clustering, fail-over techniques, or server redundancy) have been proposed to avoid software failures, and yet they still happen. Many software failures are those due to the software aging phenomena. In this work, we present a detailed evaluation of our chosen machine learning prediction algorithm (M5P) in front of dynamic and non-deterministic software aging. We have tested our prediction model on a three-tier web 12EE application achieving acceptable prediction accuracy against complex scenarios with small training data sets. Furthermore, we have found an interesting approach to help to determine the root cause failure: The model generated by machine learning algorithms.

  • Access to the full text
    Learning PDFA with asynchronous transitions  Open access

     De Balle Pigem, Borja; Castro Rabal, Jorge; Gavaldà Mestre, Ricard
    International Conference on Grammatical Inference
    p. 271-275
    DOI: 10.1007/978-3-642-15488-1_24
    Presentation's date: 2010-09-14
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    In this paper we extend the PAC learning algorithm due to Clark and Thollard for learning distributions generated by PDFA to automata whose transitions may take varying time lengths, governed by exponential distributions.

  • Access to the full text
    A lower bound for learning distributions generated by probabilistic automata  Open access

     De Balle Pigem, Borja; Castro Rabal, Jorge; Gavaldà Mestre, Ricard
    International Conference on Algorithmic Learning Theory
    p. 179-193
    DOI: 10.1007/978-3-642-16108-7_17
    Presentation's date: 2010-10-07
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Known algorithms for learning PDFA can only be shown to run in time polynomial in the so-called distinguishability μ of the target machine, besides the number of states and the usual accuracy and confidence parameters. We show that the dependence on μ is necessary for every algorithm whose structure resembles existing ones. As a technical tool, a new variant of Statistical Queries termed L ∞-queries is defined. We show how these queries can be simulated from samples and observe that known PAC algorithms for learning PDFA can be rewritten to access its target using L∞-queries and standard Statistical Queries. Finally, we show a lower bound: every algorithm to learn PDFA using queries with a resonable tolerance needs a number of queries larger than (1=μ )c for every c < 1.

    Postprint (author’s final draft)

  • SalamboMiner: a literature database mining tool based on bayesian networks

     Buil Calvo, José Antonio; Rib, Leonor; Soria, José Manuel; Gavaldà Mestre, Ricard
    Genetic epidemiology
    Vol. 33, num. 8, p. 806-807
    DOI: 10.1002/gepi.20463
    Date of publication: 2009-12
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Adaptive XML Tree Classification on Evolving Data Streams

     Bifet Figuerol, Albert Carles; Gavaldà Mestre, Ricard
    Lecture notes in artificial intelligence
    Vol. 5781, p. 147-162
    Date of publication: 2009
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Adaptive Learning from Evolving Data Streams

     Bifet Figuerol, Albert Carles; Gavaldà Mestre, Ricard
    Lecture notes in computer science
    Vol. 5772, p. 249-260
    Date of publication: 2009
    Journal article

     Share Reference managers Reference managers Open in new window

  • The frequency spectrum of finite samples from the intermittent silence process

     Ferrer Cancho, Ramon; Gavaldà Mestre, Ricard
    Journal of the American Society for Information Science and Technology
    Vol. 60, num. 4, p. 837-843
    Date of publication: 2009-03
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Algorithmic learning theory

     Gavaldà Mestre, Ricard
    Vol. 5809 LNAI
    Collaboration in journals

     Share

  • LARCA

     Sierra Santibañez, Maria Josefina; Delgado Pin, Jordi; Castro Rabal, Jorge; Baixeries i Juvillà, Jaume; Morrill, Glyn Verden; Balcazar Navarro, Jose Luis; Bifet Figuerol, Albert Carles; Lopez Herrera, Josefina; Arias Vicente, Marta; Berral Garcia, Josep Lluis; Quattoni, Ariadna Julieta; Arratia Quesada, Argimiro Alejandro; De Balle Pigem, Borja; Gavaldà Mestre, Ricard
    Competitive project

     Share

  • SECUENCIAS SIMBOLICAS:ANALISIS,APRENDIZAJE,MINERIA Y EVOLUCION - BARCELONA

     Baixeries i Juvillà, Jaume; Bifet Figuerol, Albert Carles; Lopez Herrera, Josefina; Arias Vicente, Marta; Delgado Pin, Jordi; Arratia Quesada, Argimiro Alejandro; Berral Garcia, Josep Lluis; Morrill, Glyn Verden; Lozano Bojados, Antoni; Sierra Santibañez, Maria Josefina; Ferrer Cancho, Ramon; Quattoni, Ariadna Julieta; Gavaldà Mestre, Ricard
    Competitive project

     Share

  • Conference chair

     Gavaldà Mestre, Ricard
    20th International Conference on Algorithmic Learning Theory (ALT'09)
    Presentation's date: 2009-10-01
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Adaptative XML Tree Mining on Envolving Data Systems

     Bifet Figuerol, Albert Carles; Gavaldà Mestre, Ricard
    7th International Workshop On Mining and Learning with Graphs
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • New ensemble methods for envolving data streams

     Bifet Figuerol, Albert Carles; Homes, G; Pfahringer, B; Kirkby, R; Gavaldà Mestre, Ricard
    ACM SIGKDD Intl. Conference on Knowledge Discovery and Data Mining
    p. 139-148
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Predicting web serves crashes: a case study in comparing prediction algorithms

     Gavaldà Mestre, Ricard
    International Conference on Autonomic and Autonomous Systems
    Presentation's date: 2009-04-20
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Adaptive XML Tree Classification on Evolving Data Streams

     Bifet Figuerol, Albert Carles; Gavaldà Mestre, Ricard
    European Conference on Machine Learning and Knowledge Discovery in Databases
    p. 147-162
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window