Torres Viñals, Jordi
Total activity: 217
Expertise
Big Data, Cloud Computing, Distributed Systems, Green Computing, Hight Performance Computing
Professional category
University professor
Doctoral courses
Doctor en Informàtica
University degree
Llicenciat en Informàtica
Research group
CAP - High Performace Computing Group
Department
Department of Computer Architecture
School
Barcelona School of Informatics (FIB)
E-mail
torresac.upc.edu
Contact details
UPC directory Open in new window
Orcid
0000-0003-1963-7418 Open in new window
Links of interest
home page Open in new window

Graphic summary
  • Show / hide key
  • Information


Scientific and technological production
  •  

1 to 50 of 217 results
  • Deadline-based MapReduce workload management

     Polo, Jorda; Becerra Fontal, Yolanda; Carrera Perez, David; Steinder, Malgorzata; Whalley, Ian; Torres Viñals, Jordi; Ayguade Parra, Eduard
    IEEE transactions on network and service management
    Date of publication: 2013
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    This paper presents a scheduling technique for multi-job MapReduce workloads that is able to dynamically build performance models of the executing workloads, and then use these models for scheduling purposes. This ability is leveraged to adaptively manage workload performance while observing and taking advantage of the particulars of the execution environment of modern data analytics applications, such as hardware heterogeneity and distributed storage. The technique targets a highly dynamic environment in which new jobs can be submitted at any time, and in which MapReduce workloads share physical resources with other workloads. Thus the actual amount of resources available for applications can vary over time. Beyond the formulation of the problem and the description of the algorithm and technique, a working prototype (called Adaptive Scheduler) has been implemented. Using the prototype and medium-sized clusters (of the order of tens of nodes), the following aspects have been studied separately: the scheduler's ability to meet high-level performance goals guided only by user-defined completion time goals; the scheduler's ability to favor data-locality in the scheduling algorithm; and the scheduler's ability to deal with hardware heterogeneity, which introduces hardware affinity and relative performance characterization for those applications that can benefit from executing on specialized processors.

  • Access to the full text
    Aeneas: A tool to enable applications to effectively use non-relational databases  Open access

     Cugnasco, Cesare; Hernández, Roger; Becerra Fontal, Yolanda; Torres Viñals, Jordi; Ayguade Parra, Eduard
    International Conference on Computational Science
    Presentation's date: 2013-06
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Non-relational databases arise as a solution to solve the scalability problems of relational databases when dealing with big data applications. However, they are highly configurable prone to user decisions that can heavily affect their performance. In order to maximize the performance, different data models and queries should be analyzed to choose the best fit. This may involve a wide range of tests and may result in productivity issues. We present Aeneas, a tool to support the design of data management code for applications using non-relational databases. Aeneas provides an easy and fast methodology to support the decision about how to organize and retrieve data in order to improve the performance.

    Non-relational databases arise as a solution to solve the scalability problems of relational databases when dealing with big data applications. However, they are highly configurable prone to user decisions that can heavily affect their performance. In order to maximize the performance, different data models and queries should be analyzed to choose the best fit. This may involve a wide range of tests and may result in productivity issues. We present Aeneas, a tool to support the design of data management code for applications using non-relational databases. Aeneas provides an easy and fast methodology to support the decision about how to organize and retrieve data in order to improve the performance.

    Postprint (author’s final draft)

  • Access to the full text
    Enabling distributed key-value stores with low latency-impact snapshot support  Open access

     Polo, Jorda; Becerra Fontal, Yolanda; Carrera Perez, David; Torres Viñals, Jordi; Ayguade Parra, Eduard; Spreitzer, Mike; Steinder, Malgorzata
    IEEE International Symposium on Network Computing and Applications
    Presentation's date: 2013-08
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Current distributed key-value stores generally provide greater scalability at the expense of weaker consistency and isolation. However, additional isolation support is becoming increasingly important in the environments in which these stores are deployed, where different kinds of applications with different needs are executed, from transactional workloads to data analytics. While fully-fledged ACID support may not be feasible, it is still possible to take advantage of the design of these data stores, which often include the notion of multiversion concurrency control, to enable them with additional features at a much lower performance cost and maintaining its scalability and availability. In this paper we explore the effects that additional consistency guarantees and isolation capabilities may have on a state of the art key-value store: Apache Cassandra. We propose and implement a new multiversioned isolation level that provides stronger guarantees without compromising Cassandra's scalability and availability. As shown in our experiments, our version of Cassandra allows Snapshot Isolation-like transactions, preserving the overall performance and scalability of the system.

    Current distributed key-value stores generally provide greater scalability at the expense of weaker consistency and isolation. However, additional isolation support is becoming increasingly important in the environments in which these stores are deployed, where different kinds of applications with different needs are executed, from transactional workloads to data analytics. While fully-fledged ACID support may not be feasible, it is still possible to take advantage of the design of these data stores, which often include the notion of multiversion concurrency control, to enable them with additional features at a much lower performance cost and maintaining its scalability and availability. In this paper we explore the effects that additional consistency guarantees and isolation capabilities may have on a state of the art key-value store: Apache Cassandra. We propose and implement a new multiversioned isolation level that provides stronger guarantees without compromising Cassandra's scalability and availability. As shown in our experiments, our version of Cassandra allows Snapshot Isolation-like transactions, preserving the overall performance and scalability of the system.

    Postprint (author’s final draft)

  • Access to the full text
    Power-aware multi-data center management using machine learning  Open access

     Berral Garcia, Josep Lluis; Gavaldà Mestre, Ricard; Torres Viñals, Jordi
    International Workshop on Power-aware Algorithms, Systems, and Architectures
    Presentation's date: 2013-10-01
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    The cloud relies upon multi-datacenter (multi-DC) infrastructures distributed along the world, where people and enterprises pay for resources to offer their web-services to worldwide clients. Intelligent management is required to automate and manage these infrastructures, as the amount of resources and data to manage exceeds the capacities of human operators. Also, it must take into account the cost of running the resources (energy) and the quality of service towards web-services and clients. (De-)consolidation and priming proximity to clients become two main strategies to allocate resources and properly place these web-services in the multi-DC network. Here we present a mathematical model to describe the scheduling problem given web-services and hosts across a multi-DC system, enhancing the decision makers with models for the system behavior obtained using machine learning. After running the system on real DC infrastructures we see that the model drives web-services to the best locations given quality of service, energy consumption, and client proximity, also (de-)consolidating according to the resources required for each web-service given its load.

    The cloud relies upon multi-datacenter (multi-DC) infrastructures distributed along the world, where people and enterprises pay for resources to offer their web-services to worldwide clients. Intelligent management is required to automate and manage these infrastructures, as the amount of resources and data to manage exceeds the capacities of human operators. Also, it must take into account the cost of running the resources (energy) and the quality of service towards web-services and clients. (De-)consolidation and priming proximity to clients become two main strategies to allocate resources and properly place these web-services in the multi-DC network. Here we present a mathematical model to describe the scheduling problem given web-services and hosts across a multi-DC system, enhancing the decision makers with models for the system behavior obtained using machine learning. After running the system on real DC infrastructures we see that the model drives web-services to the best locations given quality of service, energy consumption, and client proximity, also (de-)consolidating according to the resources required for each web-service given its load.

    Postprint (author’s final draft)

  • Empowering automatic data-center management with machine learning

     Berral Garcia, Josep Lluis; Gavaldà Mestre, Ricard; Torres Viñals, Jordi
    ACM Symposium on Applied Computing
    Presentation's date: 2013-03-21
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    The Cloud as computing paradigm has become nowadays crucial for most Internet business models. Managing and optimizing its performance on a moment-by-moment basis is not easy given as the amount and diversity of elements involved (hardware, applications, workloads, customer needs...). Here we show how a combination of scheduling algorithms and data mining techniques helps improving the performance and profitability of a data-center running virtualized web-services. We model the data-center's main resources (CPU, memory, IO), quality of service (viewed as response time), and workloads (incoming streams of requests) from past executions. We show how these models to help scheduling algorithms make better decisions about job and resource allocation, aiming for a balance between throughput, quality of service, and power consumption.

  • Improved Self-management of DataCenter Systems Applying Machine Learning  Open access

     Berral Garcia, Josep Lluis
    Defense's date: 2013-11-22
    Universitat Politècnica de Catalunya
    Theses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    La Computació Autònoma és una àrea de recerca de les Ciències i Tecnologies de Computadors, originada durant els anys 2000. Es centra en l'optimització de sistemes distribuits de computació complexos mitjançant auto-gestió. Com aquests sistemes creixen en complexitat, com ara els centres de dades distribuits per a computació al núvol, operadors i arquitectes de sistemes necessiten suport per entendre, dissenyar i optimitzar aquests sistemes, i encara més quan aquests estan distribuits arreu del món i pertanyen a differents organitzacions. L'auto-gestió permet a aquests sistemes millorar la gestió de recursos i energia, elements importants sobretot quan tenen costos d'execució i ús.En aquesta tesi proposem la millora de tècniques de computació autònoma per a gestió de recursos, aplicant mètodes de modelatge i predicció, usant Aprenentatge Automàtic i Intel·ligència Artificial. Els mètodes d'aprenentatge automàtic poden trobar models acurats a partir del comportament de sistemes, així com predir estats i valors. Aquests models tenen l'avantatge de poder-se actualitzar davant de canvis en el sistema, observant nous exemples i re-entrenant els models. Per tant, mitjançant aquestes tècniques podem trobar nous mètodes per a prendre decisions "intel·ligents" i descobrir nova informació i coneixement dels sistemes observats.Aquesta tesi parteix de l'estat de l'art, on la gestió es basa en el coneixement d'un administrador expert, on les dades son sempre conegudes i els models son fets ad-hoc per experts, centrant-se en components de computació com recursos de CPU/Memòria/IO; fins a un nou estat de l'art on la gestió es dirigeix per models son automàticament apresos, proporcionant informació i predicció per a dades incompletes, mancants o incertes, en un escenari de xarxes de centres de dades d'abast global.* Primer de tot tractem l'escenari on els components de presa de decisions coneixen tota la informació i estat del sistema: quan consumeix cada treball, quina qualitat de servei es proporciona, quins temps requereix cada procés, etc. Tot centrant-se en cada component i política, de cada element involucrat en l'execució d'aquests treballs.* Següentment ens centrem en l'escenari on en comptes d'oracles fixats que ens proveeixen informació a partir d'una fòrmula escrita per un expert, usem aprenentatge automàtic per crear aquests oracles. Aqui ens fixem en components i detalls específics on algunes dades poden no ser conegudes i per tant han de ser predites per un model.* També reduim el problema d'optimització d'assignació de recursos a treballs (en aquest cas serveis web virtualitzats), a un problema matemàtic, indicant tots els factors, variables i elements que el defineixen, així com les condicions que acoten el problema. El problema d'assignació pot ser modelat com un Programa Mixte Lineal-Enter. Aqui l'escenari ja contempla la gestió d'un centre de dades complet, introduint dades predites mitjançant els models apresos.* Complementem el model ampliant el nombre d'elements a predir, estudiant els més importants (CPu, memòria i IO) que poden patir "soroll" al ser monitoritzats i estimats. Un cop els predictors apresos ajuden a millorar la presa de decisions, el sistema pot auto-gestionar-se sense dependre tant de coneixement expert, i la recerca es pot centrar en un escenari on tots els elements son difícils d'estimar. Aqui introduim nous elements importants per a la gestió, donat un context on els dentres de dades son repartits pel món, amb diferents costos d'energia i condicions per als nivells de qualitat de servei.* Finalment, fem una breu introducció als costos de situar centres de dades en aquesta xarxa, orientant el consum cap a energies renovables, per tal d'abaratir els costos d'energia.

    Autonomic Computing is a Computer Science and Technologies research area, originated during mid 2000's. It focuses on optimization and improvement of complex distributed computing systems through self-control and self-management. As distributed computing systems grow in complexity, like multi-datacenter systems in cloud computing, the system operators and architects need more help to understand, design and optimize manually these systems, even more when these systems are distributed along the world and belong to different entities and authorities. Self-management lets these distributed computing systems improve their resource and energy management, a very important issue when resources have a cost, by obtaining, running or maintaining them. Here we propose to improve Autonomic Computing techniques for resource management by applying modeling and prediction methods from Machine Learning and Artificial Intelligence. Machine Learning methods can find accurate models from system behaviors and often intelligible explanations to them, also predict and infer system states and values. These models obtained from automatic learning have the advantage of being easily updated to workload or configuration changes by re-taking examples and re-training the predictors. So employing automatic modeling and predictive abilities, we can find new methods for making "intelligent" decisions and discovering new information and knowledge from systems. This thesis departs from the state of the art, where management is based on administrators expertise, well known data, ad-hoc studied algorithms and models, and elements to be studied from computing machine point of view; to a novel state of the art where management is driven by models learned from the same system, providing useful feedback, making up for incomplete, missing or uncertain data, from a global network of datacenters point of view. - First of all, we cover the scenario where the decision maker works knowing all pieces of information from the system: how much will each job consume, how is and will be the desired quality of service, what are the deadlines for the workload, etc. All of this focusing on each component and policy of each element involved in executing these jobs. -Then we focus on the scenario where instead of fixed oracles that provide us information from an expert formula or set of conditions, machine learning is used to create these oracles. Here we look at components and specific details while some part of the information is not known and must be learned and predicted. - We reduce the problem of optimizing resource allocations and requirements for virtualized web-services to a mathematical problem, indicating each factor, variable and element involved, also all the constraints the scheduling process must attend to. The scheduling problem can be modeled as a Mixed Integer Linear Program. Here we face an scenario of a full datacenter, further we introduce some information prediction. - We complement the model by expanding the predicted elements, studying the main resources (this is CPU, Memory and IO) that can suffer from noise, inaccuracy or unavailability. Once learning predictors for certain components let the decision making improve, the system can become more ¿expert-knowledge independent¿ and research can focus on an scenario where all the elements provide noisy, uncertainty or private information. Also we introduce to the management optimization new factors as for each datacenter context and costs may change, turning the model as "multi-datacenter" - Finally, we review of the cost of placing datacenters depending on green energy sources, and distribute the load according to green energy availability.

  • Advanced concepts and tools for renewable energy supply of IT Data Centres (RenewIT)

     Guitart Fernández, Jordi; Salom, Jaume; Torres Viñals, Jordi; Macias Lloret, Mario
    Participation in a competitive project

     Share

  • Adapting Service lifeCycle towards EfficienT Clouds (ASCETiC)

     Juan Ferrer, Ana; Guitart Fernández, Jordi; Torres Viñals, Jordi; Macias Lloret, Mario
    Participation in a competitive project

     Share

  • Green Computing Node for European Micro-servers (EuroServer)

     Guitart Fernández, Jordi; Durand, Yves; Torres Viñals, Jordi; Subirats, Josep
    Participation in a competitive project

     Share

  • Economic model of a cloud provider operating in a federated cloud

     Goiri Presa, Iñigo; Guitart Fernández, Jordi; Torres Viñals, Jordi
    Information systems frontiers
    Date of publication: 2012-09
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Resource provisioning in Cloud providers is a challenge because of the high variability of load over time. On the one hand, the providers can serve most of the requests owning only a restricted amount of resources, but this forces to reject customers during peak hours. On the other hand, valley hours incur in under-utilization of the resources, which forces the providers to increase their prices to be profitable. Federation overcomes these limitations and allows providers to dynamically outsource resources to others in response to demand variations. Furthermore, it allows providers with underused resources to rent them to other providers. Both techniques make the provider getting more profit when used adequately. Federation of Cloud providers requires having a clear understanding of the consequences of each decision. In this paper, we present a characterization of providers operating in a federated Cloud which helps to choose the most convenient decision depending on the environment conditions. These include when to outsource to other providers, rent free resources to other providers (i.e., insourcing), or turn off unused nodes to save power. We characterize these decisions as a function of several parameters and implement a federated provider that uses this characterization to exploit federation. Finally, we evaluate the profitability of using these techniques using the data from a real provider.

  • Energy-efficient and multifaceted resource management for profit-driven virtualized data centers

     Goiri, Iñigo; Berral Garcia, Josep Lluis; Fitó Comellas, Josep Oriol; Julià Massó, Ferran; Nou Castell, Ramon; Guitart Fernández, Jordi; Gavaldà Mestre, Ricard; Torres Viñals, Jordi
    Future generation computer systems
    Date of publication: 2012-05
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    As long as virtualization has been introduced in data centers, it has been opening new chances for resource management. Nowadays, it is not just used as a tool for consolidating underused nodes and save power; it also allows new solutions to well-known challenges, such as heterogeneity management. Virtualization helps to encapsulate Web-based applications or HPC jobs in virtual machines (VMs) and see them as a single entity which can be managed in an easier and more efficient way. We propose a new scheduling policy that models and manages a virtualized data center. It focuses on the allocation of VMs in data center nodes according to multiple facets to optimize the provider’s profit. In particular, it considers energy efficiency, virtualization overheads, and SLA violation penalties, and supports the outsourcing to external providers. The proposed approach is compared to other common scheduling policies, demonstrating that a provider can improve its benefit by 30% and save power while handling other challenges, such as resource outsourcing, in a better and more intuitive way than other typical approaches do.

    Postprint (author’s final draft)

  • Autonomic placement of mixed batch and transactional workloads

     Carrera Perez, David; Steinder, Malgorzata; Whalley, Ian; Torres Viñals, Jordi; Ayguade Parra, Eduard
    IEEE transactions on parallel and distributed systems
    Date of publication: 2012-02-01
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    To reduce the cost of infrastructure and electrical energy, enterprise datacenters consolidate workloads on the same physical hardware. Often, these workloads comprise both transactional and long-running analytic computations. Such consolidation brings new performance management challenges due to the intrinsically different nature of a heterogeneous set of mixed workloads, ranging from scientific simulations to multitier transactional applications. The fact that such different workloads have different natures imposes the need for new scheduling mechanisms to manage collocated heterogeneous sets of applications, such as running a web application and a batch job on the same physical server, with differentiated performance goals. In this paper, we present a technique that enables existing middleware to fairly manage mixed workloads: long running jobs and transactional applications. Our technique permits collocation of the workload types on the same physical hardware, and leverages virtualization control mechanisms to perform online system reconfiguration. In our experiments, including simulations as well as a prototype system built on top of state-of-the-art commercial middleware, we demonstrate that our technique maximizes mixed workload performance while providing service differentiation based on high-level performance goals.

  • A methodology for the evaluation of high response time on E-commerce users and sales

     Poggi, Nicolas; Carrera Perez, David; Gavaldà Mestre, Ricard; Ayguade Parra, Eduard; Torres Viñals, Jordi
    Information systems frontiers
    Date of publication: 2012-10-06
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    The widespread adoption of high speed Internet access and it¿s usage for everyday tasks are causing profound changes in users¿ expectations in terms of Web site performance and reliability. At the same time, server management is living a period of changes with the emergence of the cloud computing paradigm that enables scaling server infrastructures within minutes. To help set performance objectives for maximizing user satisfaction and sales, while minimizing the number of servers and their cost, we present a methodology to determine how user sales are affected as response time increases. We begin with the characterization of more than 6 months of Web performance measurements, followed by the study of how the fraction of buyers in the workload is higher at peak traffic times, to then build a model of sales through a learning process using a 5-year sales dataset. Finally, we present our evaluation of high response time on users for popular applications found in the Web.

  • Energy accounting for shared virtualized environments under DVFS using PMC-based power models

     Bertran Monfort, Ramon; Becerra Fontal, Yolanda; Carrera Perez, David; Beltran Querol, Vicenç; Gonzalez Tallada, Marc; Martorell Bofill, Xavier; Navarro Mas, Nacho; Torres Viñals, Jordi; Ayguade Parra, Eduard
    Future generation computer systems
    Date of publication: 2012-02
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Virtualized infrastructure providers demand new methods to increase the accuracy of the accounting models used to charge their customers. Future data centers will be composed of many-core systems that will host a large number of virtual machines (VMs) each. While resource utilization accounting can be achieved with existing system tools, energy accounting is a complex task when per-VM granularity is the goal. In this paper, we propose a methodology that brings new opportunities to energy accounting by adding an unprecedented degree of accuracy on the per-VM measurements. We present a system ¿ which leverages CPU and memory power models based in performance monitoring counters (PMCs) ¿ to perform energy accounting in virtualized systems. The contribution of this paper is threefold. First, we show that PMC-based power modeling methods are still valid on virtualized environments. Second, we show that the Dynamic Voltage and Frequency Scaling (DVFS) mechanism, which commonly is used by infrastructure providers to avoid power and thermal emergencies, does not affect the accuracy of the models. And third, we introduce a novel methodology for accounting of energy consumption in virtualized systems. Accounting is done on a per-VM basis, even in the case where multiple VMs are deployed on top of the same physical hardware, bypassing the limitations of per-server aggregated power metering. Overall, the results for an Intel® Core¿ 2 Duo show errors in energy estimations <5%. Such an approach brings flexibility to the chargeback models used by service and infrastructure providers. For instance, we are able to detect cases where VMs executed during the same amount of time, present more than 20% differences in energy consumption even only taking into account the consumption of the CPU and the memory.

  • EMOTIVE cloud: the BSC's IaaS open source solution for cloud computing

     Vaqué, Alex; Goiri Presa, Iñigo; Guitart Fernández, Jordi; Torres Viñals, Jordi
    Date of publication: 2012-01
    Book chapter

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Toward energy-aware scheduling using machine learning

     Berral Garcia, Josep Lluis; Goiri Presa, Iñigo; Nou Castell, Ramon; Julià Massó, Ferran; Fitó Comellas, Josep Oriol; Guitart Fernández, Jordi; Gavaldà Mestre, Ricard; Torres Viñals, Jordi
    Date of publication: 2012-07-30
    Book chapter

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Green data center infrastructures in the cloud computing era

     Ricciardi, Sergio; Palmieri, Francesco; Torres Viñals, Jordi; Di Martino, Beniamino; Santos Boada, German; Sole Pareta, Josep
    Date of publication: 2012-11-29
    Book chapter

     Share Reference managers Reference managers Open in new window

  • GreenHadoop: leveraging green energy in data-processing frameworks

     Goiri, Iñigo; Le, Kien; Nguyen, Thu D.; Guitart Fernández, Jordi; Torres Viñals, Jordi; Bianchini, Ricardo
    ACM European Conference on Computer Systems
    Presentation's date: 2012-04-10
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Interest has been growing in powering datacenters (at least partially) with renewable or "green" sources of energy, such as solar or wind. However, it is challenging to use these sources because, unlike the "brown" (carbon-intensive) energy drawn from the electrical grid, they are not always available. This means that energy demand and supply must be matched, if we are to take full advantage of the green energy to minimize brown energy consumption. In this paper, we investigate how to manage a datacenter's computational workload to match the green energy supply. In particular, we consider data-processing frameworks, in which many background computations can be delayed by a bounded amount of time. We propose GreenHadoop, a MapReduce framework for a datacenter powered by a photovoltaic solar array and the electrical grid (as a backup). GreenHadoop predicts the amount of solar energy that will be available in the near future, and schedules the MapReduce jobs to maximize the green energy consumption within the jobs' time bounds. If brown energy must be used to avoid time bound violations, GreenHadoop selects times when brown energy is cheap, while also managing the cost of peak brown power consumption. Our experimental results demonstrate that GreenHadoop can significantly increase green energy consumption and decrease electricity cost, compared to Hadoop.

  • Towards sustainable solutions for European cloud computing

     Le, Kien; Nguyen, Thu D.; Goiri, Iñigo; Bianchini, Ricardo; Guitart Fernández, Jordi; Torres Viñals, Jordi
    Upgrade
    Date of publication: 2011-10
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Public Cloud Computing means that we are outsourcing our data to places where we cannot keep track of it. This creates a problem in terms of the privacy of our data and its availability. Unfortunately, the risk generated by unallocated computation and storage is not the only problem. In addition, the high energy consumption of the Cloud also contributes to climate change, since most of the electricity produced around the world comes from burning coal and natural gas, which are carbon-intensive approaches to energy production. This article reflects on these problems that arise with Cloud Computing and proposes sustainable solutions to mitigate them in countries like Spain.

  • A path to achieving a self-managed Grid middleware

     Nou Castell, Ramon; Julià, Ferran; Hogan, Kevin; Torres Viñals, Jordi
    Future generation computer systems
    Date of publication: 2011-01
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Tantamount to the overall performance delivered by a Grid environment is the quality of the middleware on which distributed Grid applications can run. Due to its complex nature, this middleware can be difficult to investigate in full detail and can also be problematic to tune efficiently, especially when running on a production type environment. Thanks to the BSC Monitoring Framework, a set of tools that can instrument and analyze Java applications as well as the entire system, we were able to undertake both global and fine-grained investigation into one of the most popular Grid middleware of the moment, Globus Toolkit 4. The steps taken, revealed some interesting findings and resulted in the detection of some job management problems in this middleware. Primarily, the main issue was that it was possible to reach a situation which caused jobs to be lost on the node due to an overloading amount of jobs being processed by the system. Again, the BSC-MF was used to investigate this issue further and helped extract a possible solution to prevent the node becoming a point of contention in the architecture. A simple but effective policy was formulated, which prioritized the finishing and acceptance of jobs over the response time and throughput, and was evaluated as a solution to the problem. It was determined that, due to the dynamic nature of the problem, it could be best resolved by adding self-managing capabilities to the middleware. Using the new policy, a prototype of an autonomous system was built and succeeded in allowing more jobs to be accepted and finished correctly. The improvement over the original GT4 middleware was significant and resulted in better performance by a factor of 30%. The path from investigation to development, as described in this paper, might serve as a guide to others involved in the field who are interested in extracting knowledge about a Grid node, extending the Grid middleware or adding self-managing behaviour to their applications.

  • GreenSlot: scheduling energy consumption in green datacenters

     Goiri Presa, Iñigo; Le, Kien; Haque, Md. E.; Beauchea, Ryan; Nguyen, Thu D.; Guitart Fernández, Jordi; Torres Viñals, Jordi; Bianchini, Ricardo
    International Conference for High Performance Computing, Networking, Storage and Analysis
    Presentation's date: 2011-11-16
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    In this paper, we propose GreenSlot, a parallel batch job scheduler for a datacenter powered by a photovoltaic solar array and the electrical grid (as a backup). GreenSlot predicts the amount of solar energy that will be available in the near future, and schedules the workload to maximize the green energy consumption while meeting the jobs' deadlines. If grid energy must be used to avoid deadline violations, the scheduler selects times when it is cheap. Our results for production scientific workloads demonstrate that Green-Slot can increase green energy consumption by up to 117% and decrease energy cost by up to 39%, compared to a conventional scheduler. Based on these positive results, we conclude that green datacenters and green-energy-aware scheduling can have a significant role in building a more sustainable IT ecosystem.

  • Optimal resource allocation in a virtualized software aging platform with software rejuvenation

     Alonso López, Javier; Goiri Presa, Iñigo; Guitart Fernández, Jordi; Gavaldà Mestre, Ricard; Torres Viñals, Jordi
    IEEE International Symposium on Software Reliability Engineering
    Presentation's date: 2011-11-29
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Nowadays, virtualized platforms have become the most popular option to deploy complex enough services. The reason is that virtualization allows resource providers to increase resource utilization. Deployed services are expected to be always available, but these long-running services are especially sensitive to suffer from software aging phenomenon. This term refers to an accumulation of errors, which usually causes resource exhaustion, and eventually makes the service hang/crash. To counteract this phenomenon, a preventive approach to fault management, called software rejuvenation has been proposed. In this paper, we propose a framework which provides transparent and predictive software rejuvenation to web services that suffer software aging on virtualized platforms, achieving high levels of availability. To exploit the provider resources, the framework also seeks to maximize the number of services running simultaneously on the platform, while guaranteeing the resources needed by each service.

  • Intelligent placement of datacenters for Internet services

     Goiri Presa, Iñigo; Le, Kien; Guitart Fernández, Jordi; Torres Viñals, Jordi; Bianchini, Ricardo
    International Conference on Distributed Computing Systems
    Presentation's date: 2011-06-20
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Popular Internet services are hosted by multiple geographically distributed data centers. The location of the data centers has a direct impact on the services' response times, capital and operational costs, and (indirect) carbon dioxide emissions. Selecting a location involves many important considerations, including its proximity to population centers, power plants, and network backbones, the source of the electricity in the region, the electricity, land, and water prices at the location, and the average temperatures at the location. As there can be many potential locations and many issues to consider for each of them, the selection process can be extremely involved and time-consuming. In this paper, we focus on the selection process and its automation. Specifically, we propose a framework that formalizes the process as a non-linear cost optimization problem, and approaches for solving the problem. Based on the framework, we characterize areas across the United States as potential locations for data centers, and delve deeper into seven interesting locations. Using the framework and our solution approaches, we illustrate the selection trade offs by quantifying the minimum cost of (1) achieving different response times, availability levels, and consistency times, and (2) restricting services to green energy and chiller-less data centers. Among other interesting results, we demonstrate that the intelligent placement of data centers can save millions of dollars under a variety of conditions. We also demonstrate that the selection process is most efficient and accurate when it uses a novel combination of linear programming and simulated annealing.

  • Adaptive scheduling on power-aware managed data-centers using machine learning

     Berral Garcia, Josep Lluis; Gavaldà Mestre, Ricard; Torres Viñals, Jordi
    ACM/IEEE International Conference on Grid Computing
    Presentation's date: 2011-09-22
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Energy-related costs have become one of the major economic factors in IT data-centers, and companies and the research community are currently working on new efficient power-aware resource management strategies, also known as “Green IT”. Here we propose an autonomic scheduling of tasks and web-services over cloud environments, focusing on the profit optimization by executing a set of tasks according to servicelevel agreements minus its costs like power consumption. The principal contribution is the use of machine learning techniques in order to predict a priori resource usages, like CPU consumption, and estimate the tasks response time based on the monitored data traffic characteristics. Further, in order to optimize the scheduling, an exact solver based on mixed integer linear programming is used as a proof of concept, and also compared to some approximate algorithm solvers to find valid alternatives for the NP-hard problem of exact schedule solving. Experiments show that machine learning algorithms can predict system behaviors with acceptable accuracy, also the ILP solver obtains the optimal solution managing to adjust appropriately the schedule according to profits and cost of power increases, also reducing migrations when their cost is taken into consideration. Finally, is demonstrated that one of the approximate algorithm solvers is much faster but close in terms of the optimization goal to the exact solver.

  • Resource-aware adaptive scheduling for MapReduce clusters

     Polo, Jordà; Castillo, Claris; Carrera Perez, David; Becerra Fontal, Yolanda; Whalley, Ian; Steinder, Malgorzata; Torres Viñals, Jordi; Ayguade Parra, Eduard
    ACM/IFIP/USENIX International Middleware Conference
    Presentation's date: 2011-12-16
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    We present a resource-aware scheduling technique for MapReduce multi-job workloads that aims at improving resource utilization across machines while observing completion time goals. Existing MapReduce schedulers define a static number of slots to represent the capacity of a cluster, creating a fixed number of execution slots per machine. This abstraction works for homogeneous workloads, but fails to capture the different resource requirements of individual jobs in multi-user environments. Our technique leverages job profiling information to dynamically adjust the number of slots on each machine, as well as workload placement across them, to maximize the resource utilization of the cluster. In addition, our technique is guided by user-provided completion time goals for each job.

  • Empreses en el núvol: claus per entendre la internet global

     Torres Viñals, Jordi
    Date of publication: 2011-06-01
    Book

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Empresas en la nube: ventajas y retos del cloud computing

     Torres Viñals, Jordi
    Date of publication: 2011-06-01
    Book

     Share Reference managers Reference managers Open in new window

  • Proactive software rejuvenation solution for web enviroments on virtualized platforms

     Alonso López, Javier
    Defense's date: 2011-02-21
    Department of Computer Architecture, Universitat Politècnica de Catalunya
    Theses

    Read the abstract Read the abstract  Share Reference managers Reference managers Open in new window

    The availability of the Information Technologies for everything, from everywhere, at all times is a growing requirement. We use information Technologies from common and social tasks to critical tasks like managing nuclear power plants or even the International Space Station (ISS). However, the availability of IT infrastructures is still a huge challenge nowadays. In a quick look around news, we can find reports of corporate outage, affecting millions of users and impacting on the revenue and image of the companies. It is well known that, currently, computer system outages are more often due to software faults, than hardware faults. Several studies have reported that one of the causes of unplanned software outages is the software aging phenomenon. This term refers to the accumulation of errors, usually causing resource contention, during long running application executions, like web applications, which normally cause applications/systems to hang or crash. Gradual performance degradation could also accompany software aging phenomena. The software aging phenomena are often related to memory bloating/ leaks, unterminated threads, data corruption, unreleased file-locks or overruns. We can find several examples of software aging in the industry. The work presented in this thesis aims to offer a proactive and predictive software rejuvenation solution for Internet Services against software aging caused by resource exhaustion. To this end, we first present a threshold based proactive rejuvenation to avoid the consequences of software aging. This first approach has some limitations, but the most important of them it is the need to know a priori the resource or resources involved in the crash and the critical condition values. Moreover, we need some expertise to fix the threshold value to trigger the rejuvenation action. Due to these limitations, we have evaluated the use of Machine Learning to overcome the weaknesses of our first approach to obtain a proactive and predictive solution. Finally, the current and increasing tendency to use virtualization technologies to improve the resource utilization has made traditional data centers turn into virtualized data centers or platforms. We have used a Mathematical Programming approach to virtual machine allocation and migration to optimize the resources, accepting as many services as possible on the platform while at the same time, guaranteeing the availability (via our software rejuvenation proposal) of the services deployed against the software aging phenomena. The thesis is supported by an exhaustive experimental evaluation that proves the effectiveness and feasibility of our proposals for current systems.

  • Multifaceted Resource Management on Virtualized Providers .  Open access

     Goiri Presa, Iñigo
    Defense's date: 2011-06-14
    Universitat Politècnica de Catalunya
    Theses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Last decade, providers started using Virtual Machines (VMs) in their datacenters to pack users and their applications. This was a good way to consolidate multiple users in fewer physical nodes while isolating them from each other. Later on in 2006, Amazon started offering their Infrastructure as a Service where their users rent computing resources as VMs in a pay-as-you-go manner. However, virtualized providers cannot be managed like traditional ones as they are now confronted with a set of new challenges. First of all, providers must deal efficiently with new management operations such as the dynamic creation of VMs. These operations enable new capabilities that were not there before, such as moving VMs across the nodes, or the ability to checkpoint VMs. We propose a Decentralized virtualization management infrastructure to create VMs on demand, migrate them between nodes, and checkpointing mechanisms. With the introduction of this infrastructure, virtualized providers become decentralized and are able to scale. Secondly, these providers consolidate multiple VMs in a single machine to more efficiently utilize resources. Nevertheless, this is not straightforward and implies the use of more complex resource management techniques. In addition, this requires that both customers and providers can be confident that signed Service Level Agreements (SLAs) are supporting their respective business activities to their best extent. Providers typically offer very simple metrics that hinder an efficient exploitation of their resources. To solve this, we propose mechanisms to dynamically distribute resources among VMs and a resource-level metric, which together allow increasing provider utilization while maintaining Quality of Service. Thirdly, the provider must allocate the VMs evaluating multiple facets such as power consumption and customers' requirements. In addition, it must exploit the new capabilities introduced by virtualization and manage its overhead. Ultimately, this VM placement must minimize the costs associated with the execution of a VM in a provider to maximize the provider's profit. We propose a new scheduling policy that places VMs on provider nodes according to multiple facets and is able to understand and manage the overheads of dealing with virtualization. And fourthly, resource provisioning in these providers is a challenge because of the high load variability over time. Providers can serve most of the requests owning only a restricted amount of resources but this under-provisioning may cause customers to be rejected during peak hours. In the opposite situation, valley hours incur under-utilization of the resources. As this new paradigm makes the access to resources easier, providers can share resources to serve their loads. We leverage a federated scenario where multiple providers share their resources to overcome this load variability. We exploit the federation capabilities to create policies that take the most convenient decision depending on the environment conditions and tackle the load variability. All these challenges mean that providers must manage their virtualized resources in a different way than they have done traditionally. This dissertation identifies and studies the challenges faced by virtualized provider that offers IaaS, and designs and evaluates a solution to manage the provider's resources in the most cost-effective way by exploiting the virtualization capabilities.

  • A survey on performance management for Internet applications

     Guitart Fernández, Jordi; Torres Viñals, Jordi; Ayguade Parra, Eduard
    Concurrency and computation. Practice and experience
    Date of publication: 2010-01-01
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Internet applications have become indispensable for many business and personal processes, turning the performance of these applications into a key issue. For this reason, recent research has comprehensively explored mechanisms for managing the performance of these applications, with special focus on dealing with overload situations and providing QoS guarantees to clients. This paper makes a survey on the different proposals in the literature for managing Internet applications' performance. We present a complete taxonomy that characterizes and classifies these proposals into several categories including request scheduling, admission control, service differentiation, dynamic resource management, service degradation, control theoretic approaches, works using queuing models, observation-based approaches that use runtime measurements, and overall approaches combining several mechanisms. For each work, we provide a brief description in order to provide the reader with a global understanding of the research progress in this area.

    Internet applications have become indispensable for many business and personal processes, turning the performance of these applications into a key issue. For this reason, recent research has comprehensively explored mechanisms for managing the performance of these applications, with special focus on dealing with overload situations and providing QoS guarantees to clients. This paper makes a survey on the different proposals in the literature for managing Internet applications’ performance. We present a complete taxonomy that characterizes and classifies these proposals into several categories including request scheduling, admission control, service differentiation, dynamic resource management, service degradation, control theoretic approaches, works using queuing models, observation-based approaches that use runtime measurements, and overall approaches combining several mechanisms. For each work, we provide a brief description in order to provide the reader with a global understanding of the research progress in this area.

  • Exploiting semantics and virtualization for SLA-driven resource allocation in service providers

     Ejarque, Jorge; de Palol, Marc; Goiri Presa, Iñigo; Julià, Ferran; Guitart Fernández, Jordi; Badia Sala, Rosa Maria; Torres Viñals, Jordi
    Concurrency and computation. Practice and experience
    Date of publication: 2010-04-01
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Resource management is a key challenge that service providers must adequately face in order to accomplish their business goals. This paper introduces a framework, the semantically enhanced resource allocator (SERA), aimed to facilitate service provider management, reducing costs and at the same time fulfilling the QoS agreed with the customers. The SERA assigns resources depending on the information given by the service providers according to its business goals and on the resource requirements of the tasks. Tasks and resources are semantically described and these descriptions are used to infer the resource assignments. Virtualization is used to provide an application specific and isolated virtual environment for each task. In addition, the system supports fine-grain dynamic resource distribution among these virtual environments based on Service-Level Agreements. The required adaptation is implemented using agents, guarantying enough resources to each task in order to meet the agreed performance goals.

    Resource management is a key challenge that service providers must adequately face in order to accomplish their business goals. This paper introduces a framework, the semantically enhanced resource allocator (SERA), aimed to facilitate service provider management, reducing costs and at the same time fulfilling the QoS agreed with the customers. The SERA assigns resources depending on the information given by the service providers according to its business goals and on the resource requirements of the tasks. Tasks and resources are semantically described and these descriptions are used to infer the resource assignments. Virtualization is used to provide an application specific and isolated virtual environment for each task. In addition, the system supports fine-grain dynamic resource distribution among these virtual environments based on Service-Level Agreements. The required adaptation is implemented using agents, guarantying enough resources to each task in order to meet the agreed performance goals.

  • Maximizing revenue in grid markets using an economically enhanced resource manager

     Macias Lloret, Mario; Rana, Omer; Smith, Garry; Guitart Fernández, Jordi; Torres Viñals, Jordi
    Concurrency and Computation: Practice and Experience
    Date of publication: 2010-09
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Traditional resource management has had as its main objective the optimization of throughput, based on parameters such as CPU, memory, and network bandwidth. With the appearance of Grid markets, new variables that determine economic expenditure, benefit and opportunity must be taken into account. The Self-organizing ICT Resource Management (SORMA) project aims at allowing resource owners and consumers to exploit market mechanisms to sell and buy resources across the Grid. SORMA's motivation is to achieve efficient resource utilization by maximizing revenue for resource providers and minimizing the cost of resource consumption within a market environment. An overriding factor in Grid markets is the need to ensure that the desired quality of service levels meet the expectations of market participants. This paper explains the proposed use of an economically enhanced resource manager (EERM) for resource provisioning based on economic models. In particular, this paper describes techniques used by the EERM to support revenue maximization across multiple service level agreements and provides an application scenario to demonstrate its usefulness and effectiveness.

    Traditional resource management has had as its main objective the optimization of throughput, based on parameters such as CPU, memory, and network bandwidth. With the appearance of Grid markets, new variables that determine economic expenditure, benefit and opportunity must be taken into account. The Self-organizing ICT Resource Management (SORMA) project aims at allowing resource owners and consumers to exploit market mechanisms to sell and buy resources across the Grid. SORMA’s motivation is to achieve efficient resource utilization by maximizing revenue for resource providers and minimizing the cost of resource consumption within a market environment. An overriding factor in Grid markets is the need to ensure that the desired quality of service levels meet the expectations of market participants. This paper explains the proposed use of an economically enhanced resource manager (EERM) for resource provisioning based on economic models. In particular, this paper describes techniques used by the EERM to support revenue maximization across multiple service level agreements and provides an application scenario to demonstrate its usefulness and effectiveness.

  • Enforcing service level agreements using an economically enhanced resource manager

     Macias Lloret, Mario; Smith, Garry; Rana, Omer; Guitart Fernández, Jordi; Torres Viñals, Jordi
    Date of publication: 2010-01
    Book chapter

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Extended resource management using client classification and economic enhancements

     Püschel, Tim; Borissov, Nikolay; Neumann, Dirk; Macias Lloret, Mario; Guitart Fernández, Jordi; Torres Viñals, Jordi
    Date of publication: 2010-01
    Book chapter

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Commercialization of computing resources will become more and more important as the transition from Grid computing in academic environments to commercial services based on concepts such as utility or Cloud computing progresses. This results in the necessity to not only base components on technical aspects, but also to include economical aspects in their design. This paper presents a framework that links technical and economical aspects to the management of computational resources. Economic enhancements like dynamic pricing and client classification are introduced based on a technical resource management environment and positioned within this resulting in a proposed architecture for an Economically Enhanced Resource Manager (EERM). The introduced approach is evaluated considering various economic design criteria and example scenarios.

  • Access to the full text
    Energy-aware scheduling in virtualized datacenters  Open access

     Goiri Presa, Iñigo; Julià Massó, Ferran; Nou Castell, Ramon; Berral Garcia, Josep Lluis; Guitart Fernández, Jordi; Torres Viñals, Jordi
    IEEE International Conference on Cluster Computing
    Presentation's date: 2010-09-20
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    The reduction of energy consumption in large-scale datacenters is being accomplished through an extensive use of virtualization, which enables the consolidation of multiple workloads in a smaller number of machines. Nevertheless, virtualization also incurs some additional overheads (e.g. virtual machine creation and migration) that can influence what is the best consolidated configuration, and thus, they must be taken into account. In this paper, we present a dynamic job scheduling policy for power-aware resource allocation in a virtualized datacenter. Our policy tries to consolidate workloads from separate machines into a smaller number of nodes, while fulfilling the amount of hardware resources needed to preserve the quality of service of each job. This allows turning off the spare servers, thus reducing the overall datacenter power consumption. As a novelty, this policy incorporates all the virtualization overheads in the decision process. In addition, our policy is prepared to consider other important parameters for a datacenter, such as reliability or dynamic SLA enforcement, in a synergistic way with power consumption. The introduced policy is evaluated comparing it against common policies in a simulated environment that accurately models HPC jobs execution in a virtualized datacenter including power consumption modeling and obtains a power consumption reduction of 15% with respect to typical policies.

  • Access to the full text
    Multifaceted resource management for dealing with heterogeneous workloads in virtualized data centers  Open access

     Goiri Presa, Iñigo; Fitó Comellas, Josep Oriol; Julià Masso, Ferran; Nou Castell, Ramon; Berral Garcia, Josep Lluis; Guitart Fernández, Jordi; Torres Viñals, Jordi
    ACM/IEEE International Conference on Grid Computing
    Presentation's date: 2010-10-25
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    As long as virtualization has been introduced in data centers, it has been opening new chances for resource management. Now, it is not just used as a tool for consolidating underused nodes and save power, it also allows new solutions to well-known challenges, such as fault tolerance or heterogeneity management. Virtualization helps to encapsulate Web-based applications or HPC jobs in virtual machines and see them as a single entity which can be managed in an easier way. This paper proposes a new scheduling policy to model and manage a virtualized data center which mainly focuses on the allocation of VMs in data center nodes according to multiple facets while optimizing the provider's profit. In particular, it considers energy efficiency, virtualization overheads, fault tolerance, and SLA violation penalties, while adding the ability to outsource resources to external providers. Using our approach, a data center can improve the provider's benefit by 15% and get a power reduction while solving well-known challenges, such as fault tolerance and outsourcing, in a better a more intuitive way that typical approaches do.

    As long as virtualization has been introduced in data centers, it has been opening new chances for resource management. Now, it is not just used as a tool for consolidating underused nodes and save power, it also allows new solutions to well-known challenges, such as fault tolerance or heterogeneity management. Virtualization helps to encapsulate Web-based applications or HPC jobs in virtual machines and see them as a single entity which can be managed in an easier way. This paper proposes a new scheduling policy to model and manage a virtualized data center which mainly focuses on the allocation of VMs in data center nodes according to multiple facets while optimizing the provider’s profit. In particular, it considers energy efficiency, virtualization overheads, fault tolerance, and SLA violation penalties, while adding the ability to outsource resources to external providers. Using our approach, a data center can improve the provider’s benefit by 15% and get a power reduction while solving well-known challenges, such as fault tolerance and outsourcing, in a better a more intuitive way that typical approaches do.

  • Access to the full text
    BSC contributions in energy-aware resource management for large scale distributed systems  Open access

     Torres Viñals, Jordi; Ayguade Parra, Eduard; Carrera Perez, David; Guitart Fernández, Jordi; Beltran Querol, Vicenç; Becerra Fontal, Yolanda; Badia Sala, Rosa Maria; Labarta Mancho, Jesus Jose; Valero Cortes, Mateo
    Workshop of the COST Action IC0804 on Energy Efficiency in Large Scale Distributed Systems
    Presentation's date: 2010-04-15
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This paper introduces the work being carried out at Barcelona Supercomputing Center in the area of Green Computing. We have been working in resource management for a long time and recently we included the energy parameter in the decision process, considering that for a more sustainable science, the paradigm will shift from “time to solution” to “kWh to the solution”. We will present our proposals organized in four points that follow the cloud computing stack. For each point we will enumerate the latest achievements that will be published during 2010 that are the basics for our future research. To conclude the paper we will review our ongoing and future research work and an overview of the projects where BSC is participating.

  • Characterization of workload and resource consumption for an online travel and booking site

     Poggi Mastrokalo, Nicolas; Carrera Perez, David; Gavaldà Mestre, Ricard; Torres Viñals, Jordi; Ayguade Parra, Eduard
    IEEE International Symposium on Workload Characterization
    Presentation's date: 2010-12-02
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Online travel and ticket booking is one of the top E-Commerce industries. As they present a mix of products: flights, hotels, tickets, restaurants, activities and vacational packages, they rely on a wide range of technologies to support them: Javascript, AJAX, XML, B2B Web services, Caching, Search Algorithms and Affiliation; resulting in a very rich and heterogeneous workload. Moreover, visits to travel sites present a great variability depending on time of the day, season, promotions, events, and linking; creating bursty traffic, making capacity planning a challenge. It is therefore of great importance to understand how users and crawlers interact on travel sites and their effect on server resources, for devising cost effective infrastructures and improving the Quality of Service for users. In this paper we present a detailed workload and resource consumption characterization of the web site of a top national Online Travel Agency. Characterization is performed on server logs, including both HTTP data and resource consumption of the requests, as well as the server load status during the execution. From the dataset we characterize user sessions, their patterns and how response time is affected as load on Web servers increases. We provide a fine grain analysis by performing experiments differentiating: types of request, time of the day, products, and resource requirements for each. Results show that the workload is bursty, as expected, that exhibit different properties between day and night traffic in terms of request type mix, that user session length cover a wide range of durations, which response time grows proportionally to server load, and that response time of external data providers also increase on peak hours, amongst other results. Such results can be useful for optimizing infrastructure costs, improving QoS for users, and development of realistic workload generators for similar applications.

  • Access to the full text
    Checkpoint-based fault-tolerant infrastructure for virtualized service providers  Open access

     Goiri Presa, Iñigo; Julià, Ferran; Guitart Fernández, Jordi; Torres Viñals, Jordi
    IEEE/ IFIP Network Operations and management Symposium
    Presentation's date: 2010-04-19
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Crash and omission failures are common in service providers: a disk can break down or a link can fail anytime. In addition, the probability of a node failure increases with the number of nodes. Apart from reducing the provider’s computation power and jeopardizing the fulfillment of his contracts, this can also lead to computation time wasting when the crash occurs before finishing the task execution. In order to avoid this problem, efficient checkpoint infrastructures are required, especially in virtualized environments where these infrastructures must deal with huge virtual machine images. This paper proposes a smart checkpoint infrastructure for virtualized service providers. It uses Another Union File System to differentiate read-only from read-write parts in the virtual machine image. In this way, read-only parts can be checkpointed only once, while the rest of checkpoints must only save the modifications in read-write parts, thus reducing the time needed to make a checkpoint. The checkpoints are stored in a Hadoop Distributed File System. This allows resuming a task execution faster after a node crash and increasing the fault tolerance of the system, since checkpoints are distributed and replicated in all the nodes of the provider. This paper presents a running implementation of this infrastructure and its evaluation, demonstrating that it is an effective way to make faster checkpoints with low interference on task execution and efficient task recovery after a node failure.

  • Access to the full text
    Characterizing cloud federation for enhancing providers' profit  Open access

     Goiri Presa, Iñigo; Guitart Fernández, Jordi; Torres Viñals, Jordi
    IEEE International Conference on Cloud Computing Technology and Science
    Presentation's date: 2010-07-05
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Cloud federation has been proposed as a new paradigm that allows providers to avoid the limitation of owning only a restricted amount of resources, which forces them to reject new customers when they have not enough local resources to fulfill their customers’ requirements. Federation allows a provider to dynamically outsource resources to other providers in response to demand variations. It also allows a provider that has underused resources to rent part of them to other providers. Both things could make the provider to get more profit when used adequately. This requires that the provider has a clear understanding of the potential of each federation decision, in order to choose the most convenient depending on the environment conditions. In this paper, we present a complete characterization of providers’ federation in the Cloud, including decision equations to outsource resources to other providers, rent free resources to other providers (i.e. insourcing), or shutdown unused nodes to save power, and we characterize these decisions as a function of several parameters. Then, we demonstrate in the evaluation section how a provider can enhance its profit by using these equations to exploit federation, and how the different parameters influence which is the best decision on each situation.

  • Performance-driven task co-scheduling for MapReduce environments

     Polo, Jordà; Carrera Perez, David; Becerra Fontal, Yolanda; Torres Viñals, Jordi; Steinder, Malgorzata; Ayguade Parra, Eduard; Whalley, Ian
    IEEE Network Operations and Management Symposium (NOMS)
    Presentation's date: 2010-04-21
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    MapReduce is a data-driven programming model proposed by Google in 2004 which is especially well suited for distributed data analytics applications. We consider the management of MapReduce applications in an environment where multiple applications share the same physical resources. Such sharing is in line with recent trends in data center management which aim to consolidate workloads in order to achieve cost and energy savings. In a shared environment, it is necessary to predict and manage the performance of workloads given a set of performance goals defined for them. In this paper, we address this problem by introducing a new task scheduler for a MapReduce framework that allows performance-driven management of MapReduce tasks. The proposed task scheduler dynamically predicts the performance of concurrent MapReduce jobs and adjusts the resource allocation for the jobs. It allows applications to meet their performance objectives without over-provisioning of physical resources.

  • Access to the full text
    Adaptive on-line software aging prediction based on machine learning  Open access

     Alonso López, Javier; Torres Viñals, Jordi; Berral Garcia, Josep Lluis; Gavaldà Mestre, Ricard
    IEEE/IFIP International Conference on Dependable Systems and Networks
    Presentation's date: 2010-07-28
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    The growing complexity of software systems is resulting in an increasing number of software faults. According to the literature, software faults are becoming one of the main sources of unplanned system outages, and have an important impact on company benefits and image. For this reason, a lot of techniques (such as clustering, fail-over techniques, or server redundancy) have been proposed to avoid software failures, and yet they still happen. Many software failures are those due to the software aging phenomena. In this work, we present a detailed evaluation of our chosen machine learning prediction algorithm (M5P) in front of dynamic and non-deterministic software aging. We have tested our prediction model on a three-tier web 12EE application achieving acceptable prediction accuracy against complex scenarios with small training data sets. Furthermore, we have found an interesting approach to help to determine the root cause failure: The model generated by machine learning algorithms.

  • Performance management of accelerated MapReduce workloads in heterogeneous clusters

     Polo, Jordà; Carrera Perez, David; Becerra Fontal, Yolanda; Beltran Querol, Vicenç; Torres Viñals, Jordi; Ayguade Parra, Eduard
    International Conference on Parallel Processing
    Presentation's date: 2010-09-16
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Next generation data centers will be composed of thousands of hybrid systems in an attempt to increase overall cluster performance and to minimize energy consumption. New programming models, such as MapReduce, specifically designed to make the most of very large infrastructures will be leveraged to develop massively distributed services. At the same time, data centers will bring an unprecedented degree of workload consolidation, hosting in the same infrastructure distributed services from many different users. In this paper we present our advancements in leveraging the Adaptive MapReduce Scheduler to meet user defined high level performance goals while transparently and efficiently exploiting the capabilities of hybrid systems. While the Adaptive Scheduler was already able to dynamically allocate resources to co-located MapReduce jobs based on their completion time goals, it was completely unaware of specific hardware capabilities. In our work we describe the changes introduced in the Adaptive Scheduler to enable it with hardware awareness and with the ability to co-schedule accelerable and non-accelerable jobs on the same heterogeneous MapReduce cluster, making the most of the underlying hybrid systems. The developed prototype is tested in a cluster of Cell/BE blades and relies on the use of accelerated and non-accelerated versions of the MapReduce tasks of different deployed applications to dynamically select the best version to run on each node. Decisions are made after workload composition and jobs' completion time goals. Results show that the augmented Adaptive Scheduler provides dynamic resource allocation across jobs, hardware affinity when possible, and is even able to spread jobs' tasks across accelerated and non-accelerated nodes in order to meet performance goals in extreme conditions. To our knowledge this is the first MapReduce scheduler and prototype that is able to manage high-level performance goals even in presence of hybrid systems and accelerable jobs.

  • Accurate energy accounting for shared virtualized environments using PMC-based power modeling techniques

     Bertran Monfort, Ramon; Becerra Fontal, Yolanda; Carrera Perez, David; Beltran Querol, Vicenç; Gonzalez Tallada, Marc; Martorell Bofill, Xavier; Torres Viñals, Jordi; Ayguade Parra, Eduard
    ACM/IEEE International Conference on Grid Computing
    Presentation's date: 2010-10-27
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    irtualized infrastructure providers demand new methods to increase the accuracy of the accounting models used to charge their customers. Future data centers will be composed of many-core systems that will host a large number of virtual machines (VMs) each. While resource utilization accounting can be achieved with existing system tools, energy accounting is a complex task when per-VM granularity is the goal. In this paper, we propose a methodology that brings new opportunities to energy accounting by adding an unprecedented degree of accuracy on the per-VM measurements. We present a system -which leverages CPU and memory power models based in performance monitoring counters (PMCs)- to perform energy accounting in virtualized systems. The contribution of this paper is twofold. First, we show that PMC-based power modeling methods are still valid on virtualized environments. And second, we introduce a novel methodology for accounting of energy consumption in virtualized systems. In overall, the results for an Intel® Core¿ 2 Duo show errors in energy estimations below the 5%. Such approach brings flexibility to the chargeback models used by service and infrastructure providers. For instance, we show that VMs executed during the same amount of time, present more than 20% differences in energy consumption even only taking into account the consumption of the CPU and the memory.

  • Access to the full text
    An integer linear programming representation for data-center power-aware management  Open access

     Berral Garcia, Josep Lluis; Gavaldà Mestre, Ricard; Torres Viñals, Jordi
    Date: 2010-11-12
    Report

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This work exposes how to represent a grid data-center based scheduling problem, taking the advantages of the virtualization and consolidation techniques, as a linear integer programming problem including all three mentioned factors. Although being integer linear programming (ILP) a computationally hard problem, specifying correctly its constraints and optimization function can contribute to find integer optimal solutions in relative short time. So ILP solutions can help designers and system managers not only to apply them to schedulers but also to create new heuristics and holistic functions that approximate well to the optimal solutions in a quicker way.

    Postprint (author’s final draft)

  • Access to the full text
    Towards energy-aware scheduling in data centers using machine learning  Open access

     Berral Garcia, Josep Lluis; Goiri Presa, Iñigo; Nou Castell, Ramon; Julià, Ferran; Guitart Fernández, Jordi; Gavaldà Mestre, Ricard; Torres Viñals, Jordi
    International Conference on Energy-Efficient Computing and Networking
    Presentation's date: 2010-04-15
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    As energy-related costs have become a major economical factor for IT infrastructures and data-centers, companies and the research community are being challenged to nd better and more efficient power-aware resource management strategies. There is a growing interest in \Green" IT and there is still a big gap in this area to be covered. In order to obtain an energy-ecient data center, we propose a framework that provides an intelligent consolidation methodology using different techniques such as turning on/o machines, power-aware consolidation algorithms, and machine learning techniques to deal with uncertain information while maximizing performance. For the machine learning approach, we use models learned from previous system behaviors in order to predict power consumption levels, CPU loads, and SLA timings, and improve scheduling decisions. Our framework is vertical, because it considers from watt consumption to workload features, and cross-disciplinary, as it uses a wide variety of techniques. We evaluate these techniques with a framework that covers the whole control cycle of a real scenario, using a simulation with representative heterogeneous workloads, and we measure the quality of the results according to a set of metrics focused toward our goals, besides traditional policies. The results obtained indicate that our approach is close to the optimal placement and behaves better when the level of uncertainty increases.

    As energy-related costs have become a major economical factor for IT infrastructures and data-centers, companies and the research community are being challenged to nd better and more efficient power-aware resource management strategies. There is a growing interest in "Green" IT and there is still a big gap in this area to be covered. In order to obtain an energy-efficient data center, we propose a framework that provides an intelligent consolidation methodology using di erent techniques such as turning on/o ff machines, power-aware consolidation algorithms, and machine learning techniques to deal with uncertain information while maximizing performance. For the machine learning approach, we use models learned from previous system behaviors in order to predict power consumption levels, CPU loads, and SLA timings, and improve scheduling decisions. Our framework is vertical, because it considers from watt consumption to workload features, and cross-disciplinary, as it uses a wide variety of techniques. We evaluate these techniques with a framework that covers the whole control cycle of a real scenario, using a simulation with representative heterogeneous workloads, and we measure the quality of the results according to a set of metrics focused toward our goals, besides traditional policies. The results obtained indicate that our approach is close to the optimal placement and behaves better when the level of uncertainty increases.

    Postprint (author’s final draft)

  • Access to the full text
    J2EE instrumentation for software aging root cause application component determination with AspectJ  Open access

     Alonso López, Javier; Torres Viñals, Jordi; Berral Garcia, Josep Lluis; Gavaldà Mestre, Ricard
    IEEE Workshop on Dependable Parallel, Distributed and Network-Centric System
    Presentation's date: 2010-04-23
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Unplanned system outages have a negative impact on company revenues and image. While the last decades have seen a lot of efforts from industry and academia to avoid them, they still happen and their impact is increasing. According to many studies, one of the most important causes of these outages is software aging. Software aging phenomena refers to the accumulation of errors, usually provoking resource contention, during long running application executions, like web applications, which normally cause applications/systems hang or crash. Determining the software aging root cause failure, not the resource or resources involved in, is a huge task due to the growing day by day complexity of the systems. In this paper we present a monitoring framework based on Aspect Programming to monitor the resources used by every application component in runtime. Knowing the resources used by every component of the application we can determine which components are related to the software aging. Furthermore, we present a case study where we evaluate our approach to determine in a web application scenario, which components are involved in the software aging with promising results.

    Unplanned system outages have a negative impact on company revenues and image. While the last decades have seen a lot of efforts from industry and academia to avoid them, they still happen and their impact is increasing. According to many studies, one of the most important causes of these outages is software aging. Software aging phenomena refers to the accumulation of errors, usually provoking resource contention, during long running application executions, like web applications, which normally cause applications/systems hang or crash. Determining the software aging root cause failure, not the resource or resources involved in, is a huge task due to the growing day by day complexity of the systems. In this paper we present a monitoring framework based on Aspect Programming to monitor the resources used by every application component in runtime. Knowing the resources used by every component of the application we can determine which components are related to the software aging. Furthermore, we present a case study where we evaluate our approach to determine in a web application scenario, which components are involved in the software aging with promising results.

  • Autonomic QoS control in enterprise Grid environments using online simulation

     Nou Castell, Ramon; Kounev, Samuel; Julià, Ferran; Torres Viñals, Jordi
    Journal of systems and software
    Date of publication: 2009-03
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    As Grid Computing increasingly enters the commercial domain, performance and quality of service (QoS) issues are becoming a major concern. The inherent complexity, heterogeneity and dynamics of Grid computing environments pose some challenges in managing their capacity to ensure that QoS requirements are continuously met. In this paper, a comprehensive framework for autonomic QoS control in enterprise Grid environments using online simulation is proposed. This paper presents a novel methodology for designing autonomic QoS-aware resource managers that have the capability to predict the performance of the Grid components they manage and allocate resources in such a way that service level agreements are honored. Support for advanced features such as autonomic workload characterization on-the-fly, dynamic deployment of Grid servers on demand, as well as dynamic system reconfiguration after a server failure is provided. The goal is to make the Grid middleware self-configurable and adaptable to changes in the system environment and workload. The approach is subjected to an extensive experimental evaluation in the context of a real-world Grid environment and its effectiveness, practicality and performance are demonstrated.

  • Self-adaptive utility-based web session management

     Poggi, Nicolas; Moreno, Toni; Berral Garcia, Josep Lluis; Gavaldà Mestre, Ricard; Torres Viñals, Jordi
    Computer networks
    Date of publication: 2009-07
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    In the Internet, where millions of users are a click away from your site, being able to dynamically classify the workload in real time, and predict its short term behavior, is crucial for proper self-management and business efficiency. As workloads vary significantly according to current time of day, season, promotions and linking, it becomes impractical for some ecommerce sites to keep over-dimensioned infrastructures to accommodate the whole load. When server resources are exceeded, session-based admission control systems allow maintaining a high throughput in terms of properly finished sessions and QoS for a limited number of sessions; however, by denying access to excess users, the website looses potential customers. In the present study we describe the architecture of AUGURES, a system that learns to predict Web user's intentions for visiting the site as well its resource usage. Predictions are made from information known at the time of their first request and later from navigational clicks. For this purpose we use machine learning techniques and Markov-chain models. The system uses these predictions to automatically shape QoS for the most profitable sessions, predict short-term resource needs, and dynamically provision servers according to the expected revenue and the cost to serve it. We test the AUGURES prototype on access logs from a high-traffic, online travel agency, obtaining promising results.

  • Using virtualization to improve software rejuvenation

     Moura Silva, Luis; Alonso López, Javier; Torres Viñals, Jordi
    IEEE transactions on computers
    Date of publication: 2009-10-29
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    In this paper, we present an approach for software rejuvenation based on automated self-healing techniques that can be easily applied to off-the-shelf application servers. Software aging and transient failures are detected through continuous monitoring of system data and performability metrics of the application server. If some anomalous behavior is identified, the system triggers an automatic rejuvenation action. This self-healing scheme is meant to disrupt the running service for a minimal amount of time, achieving zero downtime in most cases. In our scheme, we exploit the usage of virtualization to optimize the self-recovery actions. The techniques described in this paper have been tested with a set of open-source Linux tools and the XEN virtualization middleware. We conducted an experimental study with two application benchmarks (Tomcat/Axis and TPC-W). Our results demonstrate that virtualization can be extremely helpful for failover and software rejuvenation in the occurrence of transient failures and software aging.