Graphic summary
  • Show / hide key
  • Information


Scientific and technological production
  •  

1 to 50 of 88 results
  • A systematic methodology to generate decomposable and responsive power models for CMPs

     Bertran Monfort, Ramon; Gonzalez Tallada, Marc; Martorell Bofill, Xavier; Navarro Mas, Nacho; Ayguade Parra, Eduard
    IEEE transactions on computers
    Date of publication: 2013-07
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Power modeling based on performance monitoring counters (PMCs) attracted the interest of researchers since it became a quick approach to understand the power behavior of real systems. Consequently, several power-aware policies use models to guide their decisions. Hence, the presence of power models that are informative, accurate, and capable of detecting power phases is critical to improve the success of power-saving techniques. Additionally, the design of current processors varied considerably with the appearance of CMPs (multiple cores sharing resources). Thus, PMC-based power models warrant further investigation on current energy-efficient multicore processors. In this paper, we present a systematic methodology to produce decomposable PMC-based power models on current multicore architectures. Apart from being able to estimate the power consumption accurately, the models provide per component power consumption, supplying extra insights about power behavior. Moreover, we study theirresponsiveness -the capacity to detect power phases-. Specifically, we produce power models for an Intel Core 2 Duo with one and two cores enabled for all the DVFS configurations. The models are empirically validated using the SPECcpu2006, NAS and LMBENCH benchmarks. Finally, we compare the models against existing approaches concluding that the proposed methodology produces more accurate, responsive, and informative models.

  • Energy accounting for shared virtualized environments under DVFS using PMC-based power models

     Bertran Monfort, Ramon; Becerra Fontal, Yolanda; Carrera Perez, David; Beltran Querol, Vicenç; Gonzalez Tallada, Marc; Martorell Bofill, Xavier; Navarro Mas, Nacho; Torres Viñals, Jordi; Ayguade Parra, Eduard
    Future generation computer systems
    Date of publication: 2012-02
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Virtualized infrastructure providers demand new methods to increase the accuracy of the accounting models used to charge their customers. Future data centers will be composed of many-core systems that will host a large number of virtual machines (VMs) each. While resource utilization accounting can be achieved with existing system tools, energy accounting is a complex task when per-VM granularity is the goal. In this paper, we propose a methodology that brings new opportunities to energy accounting by adding an unprecedented degree of accuracy on the per-VM measurements. We present a system ¿ which leverages CPU and memory power models based in performance monitoring counters (PMCs) ¿ to perform energy accounting in virtualized systems. The contribution of this paper is threefold. First, we show that PMC-based power modeling methods are still valid on virtualized environments. Second, we show that the Dynamic Voltage and Frequency Scaling (DVFS) mechanism, which commonly is used by infrastructure providers to avoid power and thermal emergencies, does not affect the accuracy of the models. And third, we introduce a novel methodology for accounting of energy consumption in virtualized systems. Accounting is done on a per-VM basis, even in the case where multiple VMs are deployed on top of the same physical hardware, bypassing the limitations of per-server aggregated power metering. Overall, the results for an Intel® Core¿ 2 Duo show errors in energy estimations <5%. Such an approach brings flexibility to the chargeback models used by service and infrastructure providers. For instance, we are able to detect cases where VMs executed during the same amount of time, present more than 20% differences in energy consumption even only taking into account the consumption of the CPU and the memory.

  • DMA++: on the fly data realignment for on-chip memories

     Vujic, Nikola; Cabarcas Jaramillo, Felipe; Gonzalez Tallada, Marc; Ramirez Bellido, Alejandro; Martorell Bofill, Xavier; Ayguade Parra, Eduard
    IEEE transactions on computers
    Date of publication: 2012-02
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • POTRA: a framework for building power models for next generation multicore architectures

     Bertran Monfort, Ramon; Gonzalez Tallada, Marc; Martorell Bofill, Xavier; Navarro Mas, Nacho; Ayguade Parra, Eduard
    ACM SIGMETRICS performance evaluation review
    Date of publication: 2012-06
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Software Caching Techniques and Hardware Optimizations for On-Chip Local Memories  Open access

     Vujic, Nikola
    Defense's date: 2012-06-05
    Department of Computer Architecture, Universitat Politècnica de Catalunya
    Theses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Despite the fact that the most viable L1 memories in processors are caches, on-chip local memories have been a great topic of consideration lately. Local memories are an interesting design option due to their many benefits: less area occupancy, reduced energy consumption and fast and constant access time. These benefits are especially interesting for the design of modern multicore processors since power and latency are important assets in computer architecture today. Also, local memories do not generate coherency traffic which is important for the scalability of the multicore systems. Unfortunately, local memories have not been well accepted in modern processors yet, mainly due to their poor programmability. Systems with on-chip local memories do not have hardware support for transparent data transfers between local and global memories, and thus ease of programming is one of the main impediments for the broad acceptance of those systems. This thesis addresses software and hardware optimizations regarding the programmability, and the usage of the on-chip local memories in the context of both single-core and multicore systems. Software optimizations are related to the software caching techniques. Software cache is a robust approach to provide the user with a transparent view of the memory architecture; but this software approach can suffer from poor performance. In this thesis, we start optimizing traditional software cache by proposing a hierarchical, hybrid software-cache architecture. Afterwards, we develop few optimizations in order to speedup our hybrid software cache as much as possible. As the result of the software optimizations we obtain that our hybrid software cache performs from 4 to 10 times faster than traditional software cache on a set of NAS parallel benchmarks. We do not stop with software caching. We cover some other aspects of the architectures with on-chip local memories, such as the quality of the generated code and its correspondence with the quality of the buffer management in local memories, in order to improve performance of these architectures. Therefore, we run our research till we reach the limit in software and start proposing optimizations on the hardware level. Two hardware proposals are presented in this thesis. One is about relaxing alignment constraints imposed in the architectures with on-chip local memories and the other proposal is about accelerating the management of local memories by providing hardware support for the majority of actions performed in our software cache.

    Malgrat les memòries cau encara son el component basic pel disseny del subsistema de memòria, les memòries locals han esdevingut una alternativa degut a les seves característiques pel que fa a l’ocupació d’àrea, el seu consum energètic i el seu rendiment amb un temps d’accés ràpid i constant. Aquestes característiques son d’especial interès quan les properes arquitectures multi-nucli estan limitades pel consum de potencia i la latència del subsistema de memòria.Les memòries locals pateixen de limitacions respecte la complexitat en la seva programació, fet que dificulta la seva introducció en arquitectures multi-nucli, tot i els avantatges esmentats anteriorment. Aquesta tesi presenta un seguit de solucions basades en programari i maquinari específicament dissenyat per resoldre aquestes limitacions.Les optimitzacions del programari estan basades amb tècniques d'emmagatzematge de memòria cau suportades per llibreries especifiques. La memòria cau per programari és un sòlid mètode per proporcionar a l'usuari una visió transparent de l'arquitectura, però aquest enfocament pot patir d'un rendiment deficient. En aquesta tesi, es proposa una estructura jeràrquica i híbrida. Posteriorment, desenvolupem optimitzacions per tal d'accelerar l’execució del programari que suporta el disseny de la memòria cau. Com a resultat de les optimitzacions realitzades, obtenim que el nostre disseny híbrid es comporta de 4 a 10 vegades més ràpid que una implementació tradicional de memòria cau sobre un conjunt d’aplicacions de referencia, com son els “NAS parallel benchmarks”.El treball de tesi inclou altres aspectes de les arquitectures amb memòries locals, com ara la qualitat del codi generat i la seva correspondència amb la qualitat de la gestió de memòria intermèdia en les memòries locals, per tal de millorar el rendiment d'aquestes arquitectures. La tesi desenvolupa propostes basades estrictament en el disseny de nou maquinari per tal de millorar el rendiment de les memòries locals quan ja no es possible realitzar mes optimitzacions en el programari. En particular, la tesi presenta dues propostes de maquinari: una relaxa les restriccions imposades per les memòries locals respecte l’alineament de dades, l’altra introdueix maquinari específic per accelerar les operacions mes usuals sobre les memòries locals.

  • Counter-based power modeling methods: top-down vs. bottom-up

     Bertran Monfort, Ramon; Gonzalez Tallada, Marc; Martorell Bofill, Xavier; Navarro Mas, Nacho; Ayguade Parra, Eduard
    The computer journal (Kalispell, Mont.)
    Date of publication: 2012-08-24
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Access to the full text
    Systematic energy characterization of CMP/SMT processor systems via automated micro-benchmarks  Open access

     Bertran Monfort, Ramon; Buyuktosunoglu, Alper; Gupta, Meeta S.; Gonzalez Tallada, Marc; Bose, Pradip
    IEEE/ACM International Symposium on Microarchitecture
    Presentation's date: 2012-12-01
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Microprocessor-based systems today are composed of multi-core, multi-threaded processors with complex cache hierarchies and gigabytes of main memory. Accurate characterization of such a system, through predictive pre-silicon modeling and/or diagnostic postsilicon measurement based analysis are increasingly cumbersome and error prone. This is especially true of energy-related characterization studies. In this paper, we take the position that automated micro-benchmarks generated with particular objectives in mind hold the key to obtaining accurate energy-related characterization. As such, we first present a flexible micro-benchmark generation framework (MicroProbe) that is used to probe complex multi-core/multi-threaded systems with a variety and range of energy-related queries in mind. We then present experimental results centered around an IBM POWER7 CMP/SMT system to demonstrate how the systematically generated micro-benchmarks can be used to answer three specific queries: (a) How to project application-specific (and if needed, phase-specific) power consumption with component-wise breakdowns? (b) How to measure energy-per-instruction (EPI) values for the target machine? (c) How to bound the worst-case (maximum) power consumption in order to determine safe, but practical (i.e. affordable) packaging or cooling solutions? The solution approaches to the above problems are all new. Hardware measurement based analysis shows superior power projection accuracy (with error margins of less than 2.3% across SPEC CPU2006) as well as max-power stressing capability (with 10.7% increase in processor power over the very worst-case power seen during the execution of SPEC CPU2006 applications).

    Postprint (author’s final draft)

  • POTRA: a framework for building power models for next generation multicore architectures

     Bertran Monfort, Ramon; Gonzalez Tallada, Marc; Martorell Bofill, Xavier; Navarro Mas, Nacho; Ayguade Parra, Eduard
    ACM SIGMETRICS/PERFORMANCE joint International Conference on Measurement and Modeling of Computer Systems
    Presentation's date: 2012
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • DMA-circular: an enhanced high level programmable DMA controller for optimized management of on-chip local memories

     Vujic, Nikola; Alvarez, Lluc; Gonzalez Tallada, Marc; Martorell Bofill, Xavier; Ayguade Parra, Eduard
    ACM International Conference on Computing Frontiers
    Presentation's date: 2012-05-15
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Hardware-software coherence protocol for the coexistence of caches and local memories

     Alvarez, Lluc; Vilanova, Lluis; Gonzalez Tallada, Marc; Martorell Bofill, Xavier; Navarro Mas, Nacho; Ayguade Parra, Eduard
    International Conference for High Performance Computing, Networking, Storage and Analysis
    Presentation's date: 2012-11-07
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Optimizing the exploitation of multicore processors and GPUs with OpenMP and OpenCL

     Ferrer, Roger; Planas Carbonell, Judit; Bellens, Pieter; Duran Gonzalez, Alejandro; Gonzalez Tallada, Marc; Martorell Bofill, Xavier; Badia Sala, Rosa Maria; Ayguade Parra, Eduard; Labarta Mancho, Jesus Jose
    Lecture notes in computer science
    Date of publication: 2011
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    In this paper, we present OMPSs, a programming model based on OpenMP and StarSs, that can also incorporate the use of OpenCL or CUDA kernels. We evaluate the proposal on three different architectures, SMP, Cell/B.E. and GPUs, showing the wide usefulness of the approach. The evaluation is done with four different benchmarks, Matrix Multiply, BlackScholes, Perlin Noise, and Julia Set. We compare the results obtained with the execution of the same benchmarks written in OpenCL, in the same architectures. The results show that OMPSs greatly outperforms the OpenCL environment. It is more flexible to exploit multiple accelerators. And due to the simplicity of the annotations, it increases programmer¿s productivity

    In this paper, we present OMPSs, a programming model based on OpenMP and StarSs, that can also incorporate the use of OpenCL or CUDA kernels. We evaluate the proposal on three different architectures, SMP, Cell/B.E. and GPUs, showing the wide usefulness of the approach. The evaluation is done with four different benchmarks, Matrix Multiply, BlackScholes, Perlin Noise, and Julia Set. We compare the results obtained with the execution of the same benchmarks written in OpenCL, in the same architectures. The results show that OMPSs greatly outperforms the OpenCL environment. It is more flexible to exploit multiple accelerators. And due to the simplicity of the annotations, it increases programmer’s productivity

  • Design space exploration for aggressive core replication schemes in CMPs

     Álvarez Martí, Lluc; Bertran Monfort, Ramon; Gonzalez Tallada, Marc; Martorell Bofill, Xavier; Navarro Mas, Nacho; Ayguade Parra, Eduard
    International Symposium on High Performance Distributed Computing
    Presentation's date: 2011-06-08
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Extending OpenMP to survive the heterogeneous multi-core era

     Ayguade Parra, Eduard; Badia Sala, Rosa Maria; Bellens, Pieter; Cabrera, Daniel; Duran González, Alejandro; Ferrer, Roger; Gonzalez Tallada, Marc; Igual, Francisco D.; Jimenez Gonzalez, Daniel; Labarta Mancho, Jesus Jose; Martinell, Lluis; Martorell Bofill, Xavier; Mayo, Rafael; Pérez Cáncer, Josep Maria; Planas, Judit; Quintana Ortí, Enrique Salvador
    International journal of parallel programming
    Date of publication: 2010-10
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    This paper advances the state-of-the-art in programming models for exploiting task-level parallelism on heterogeneous many-core systems, presenting a number of extensions to the OpenMP language inspired in the StarSs programming model. The proposed extensions allow the programmer to write portable code easily for a number of different platforms, relieving him/her from developing the specific code to off-load tasks to the accelerators and the synchronization of tasks. Our results obtained from the StarSs instantiations for SMPs, theCell, and GPUs report reasonable parallel performance. However, the real impact of our approach in is the productivity gains it yields for the programmer.

  • Local memory design space exploration for high-performance computing

     Bertran Monfort, Ramon; Gonzalez Tallada, Marc; Martorell Bofill, Xavier; Navarro Mas, Nacho; Ayguade Parra, Eduard
    The Computer journal (paper)
    Date of publication: 2010-03-23
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Automatic Prefetch and Modulo Scheduling Transformations for the Cell BE Architecture

     Vujic, Nikola; Gonzalez Tallada, Marc; Martorell, Xavier; Ayguade Parra, Eduard
    IEEE transactions on parallel and distributed systems
    Date of publication: 2010-04
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Parallel programming models for heterogeneous multicore architectures

     Ferrer, Roger; Bellens, Pieter; Beltran Querol, Vicenç; Gonzalez Tallada, Marc; Martorell Bofill, Xavier; Ayguade Parra, Eduard; Badia Sala, Rosa Maria; Yeom, Jae-Seung; Schneider, Scott; Koukos, Konstantinos; Alvanos, Michail; Nikolopoulos, Dimitrios S.; Bilas, Angelos
    IEEE micro
    Date of publication: 2010-09-01
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • HiPEAC Paper Award

     Vujic, Nikola; Gonzalez Tallada, Marc; Ramirez Bellido, Alejandro; Cabarcas Jaramillo, Felipe; Martorell Bofill, Xavier; Ayguade Parra, Eduard
    Award or recognition

     Share

  • Accurate energy accounting for shared virtualized environments using PMC-based power modeling techniques

     Bertran Monfort, Ramon; Becerra Fontal, Yolanda; Carrera Perez, David; Beltran Querol, Vicenç; Gonzalez Tallada, Marc; Martorell Bofill, Xavier; Torres Viñals, Jordi; Ayguade Parra, Eduard
    ACM/IEEE International Conference on Grid Computing
    Presentation's date: 2010-10-27
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    irtualized infrastructure providers demand new methods to increase the accuracy of the accounting models used to charge their customers. Future data centers will be composed of many-core systems that will host a large number of virtual machines (VMs) each. While resource utilization accounting can be achieved with existing system tools, energy accounting is a complex task when per-VM granularity is the goal. In this paper, we propose a methodology that brings new opportunities to energy accounting by adding an unprecedented degree of accuracy on the per-VM measurements. We present a system -which leverages CPU and memory power models based in performance monitoring counters (PMCs)- to perform energy accounting in virtualized systems. The contribution of this paper is twofold. First, we show that PMC-based power modeling methods are still valid on virtualized environments. And second, we introduce a novel methodology for accounting of energy consumption in virtualized systems. In overall, the results for an Intel® Core¿ 2 Duo show errors in energy estimations below the 5%. Such approach brings flexibility to the chargeback models used by service and infrastructure providers. For instance, we show that VMs executed during the same amount of time, present more than 20% differences in energy consumption even only taking into account the consumption of the CPU and the memory.

  • Decomposable and responsive power models for multicore processors using performance counters

     Bertran Monfort, Ramon; Gonzalez Tallada, Marc; Martorell Bofill, Xavier; Navarro Mas, Nacho; Ayguade Parra, Eduard
    International Conference for High Performance Computing, Networking, Storage and Analysis
    Presentation's date: 2010-06-04
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • DMA++: on the fly data realignment for on-chip memories

     Vujic, Nikola; Gonzalez Tallada, Marc; Cabarcas Jaramillo, Felipe; Ramirez Bellido, Alejandro; Martorell Bofill, Xavier; Ayguade Parra, Eduard
    International Symposium on High-Performance Computer Architecture (HPCA)
    Presentation's date: 2010
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • MPEXPAR: MODELS DE PROGRAMACIO I ENTORNS D'EXECUCIO PARAL·LELS

     Cortes Rossello, Antonio; Gil Gómez, Maria Luisa; Navarro Mas, Nacho; Corbalan Gonzalez, Julita; Costa Prats, Juan Jose; Farreras Esclusa, Montserrat; Herrero Zaragoza, José Ramón; Tejedor Saavedra, Enric; Gonzalez Tallada, Marc; Becerra Fontal, Yolanda; Nou Castell, Ramon; Sirvent Pardell, Raül; Guitart Fernández, Jordi; Carrera Perez, David; Alonso López, Javier; Labarta Mancho, Jesus Jose; Martorell Bofill, Xavier; Torres Viñals, Jordi; Badia Sala, Rosa Maria; Ayguade Parra, Eduard
    Participation in a competitive project

     Share

  • Achieving high memory performance from heterogeneous architectures with the SARC programming model

     Ferrer, Roger; Beltran Querol, Vicenç; Gonzalez Tallada, Marc; Martorell Bofill, Xavier; Ayguade Parra, Eduard
    Workshop on Memory Performance: dealing with Applications, Systems and Architecture
    Presentation's date: 2009-09
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Laboratorio de Introducción a los Computadores: funcionamiento y dificultades docentes

     Navarro Guerrero, Juan Jose; Cruz Diaz, Josep-llorenç; Faúndez Zanuy, Marcos; Gonzalez Tallada, Marc; Manso Cortes, Oscar; Muntés Mulero, Víctor; Palomar Perez, Oscar; Rodero Castro, Ivan; Sanchez Castaño, Friman; Solé Simó, Marc
    Jornades de Docència del Departament d'Arquitectura de Computadors
    Presentation's date: 2009-02
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • A proposal to extend the OpenMP tasking model for heterogeneous architectures

     Ayguade Parra, Eduard; Badia Sala, Rosa Maria; Cabrera, Daniel; Duran Gonzalez, Alejandro; Igual, Francisco D.; Jimenez Gonzalez, Daniel; Labarta Mancho, Jesus Jose; Mayo, Rafael; Pérez, Josep M.; Quintana Ortí, Enrique Salvador; Martorell Bofill, Xavier; Gonzalez Tallada, Marc
    International Workshop on OpenMP
    Presentation's date: 2009-06-03
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Adaptive and speculative memory consistency support for multi-core architectures with on-chip local memories

     Vujic, Nikola; Álvarez, Lluc; Gonzalez Tallada, Marc; Martorell Bofill, Xavier; Ayguade Parra, Eduard
    International Workshop on Languages and Compilers for Parallel Computing
    Presentation's date: 2009-10
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Access to the full text
    Speeding up distributed MapReduce applications using hardware accelerators  Open access

     Becerra Fontal, Yolanda; Beltran Querol, Vicenç; Carrera Perez, David; Gonzalez Tallada, Marc; Torres Viñals, Jordi; Ayguade Parra, Eduard
    International Conference on Parallel Processing
    Presentation's date: 2009-09-22
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    In an attempt to increase the performance/cost ratio, large compute clusters are becoming heterogeneous at multiple levels: from asymmetric processors, to different system architectures, operating systems and networks. Exploiting the intrinsic multi-level parallelism present in such a complex execution environment has become a challenging task using traditional parallel and distributed programming models. As a result, an increasing need for novel approaches to exploiting parallelism has arisen in these environments. MapReduce is a data-driven programming model originally proposed by Google back in 2004 as a flexible alternative to the existing models, specially devoted to hiding the complexity of both developing and running massively distributed applications in large compute clusters. In some recent works, the MapReduce model has been also used to exploit parallelism in other non-distributed environments, such as multi-cores, heterogeneous processors and GPUs. In this paper we introduce a novel approach for exploiting the heterogeneity of a Cell BE cluster linking an existing MapReduce runtime implementation for distributed clusters and one runtime to exploit the parallelism of the Cell BE nodes. The novel contribution of this work is the design and evaluation of a MapReduce execution environment that effectively exploits the parallelism existing at both the Cell BE cluster level and the heterogeneous processors level.

    In an attempt to increase the performance/cost ratio, large compute clusters are becoming heterogeneous at multiple levels: from asymmetric processors, to different system architectures, operating systems and networks. Exploiting the intrinsic multi-level parallelism present in such a complex execution environment has become a challenging task using traditional parallel and distributed programming models. As a result, an increasing need for novel approaches to exploiting parallelism has arisen in these environments. MapReduce is a data-driven programming model originally proposed by Google back in 2004 as a flexible alternative to the existing models, specially devoted to hiding the complexity of both developing and running massively distributed applications in large compute clusters. In some recent works, the MapReduce model has been also used to exploit parallelism in other non-distributed environments, such as multi-cores, heterogeneous processors and GPUs. In this paper we introduce a novel approach for exploiting the heterogeneity of a Cell BE cluster linking an existing MapReduce runtime implementation for distributed clusters and one runtime to exploit the parallelism of the Cell BE nodes. The novel contribution of this work is the design and evaluation of a MapReduce execution environment that effectively exploits the parallelism existing at both the Cell BE cluster level and the heterogeneous processors level.

  • A novel asynchronous Software Cache implementation for the Cell-BE processor

     Balart, J; Gonzalez Tallada, Marc; Martorell Bofill, Xavier; Ayguade Parra, Eduard; Sura, Z; Chen, T; Zhang, T; O'Brien, K
    Lecture notes in computer science
    Date of publication: 2008-10
    Journal article

     Share Reference managers Reference managers Open in new window

  • Automatic Pre-Fetch and Modulo Scheduling Transformations for the Cell BE Architecture

     Vujic, N; Gonzalez Tallada, Marc; Martorell Bofill, Xavier; Ayguade Parra, Eduard
    Lecture notes in computer science
    Date of publication: 2008-01
    Journal article

     Share Reference managers Reference managers Open in new window

  • Prefetching Irregular References for Software Cache on Cell

     Tong, Chen; Zhang, Tao; Zehra, Sura; Gonzalez Tallada, Marc; Kathryn, O'brien; O'brien, Kevin
    International Symposium on Code Generation and Optimization
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Hybrid access-specific software cache techniques for the cell BE architecture

     Gonzalez Tallada, Marc; Vujic, Nikola; Martorell Bofill, Xavier; Ayguade Parra, Eduard; Eichenberger, Alexandre E.; Chen, Tong; Sura, Zehra; Zhang, Tao; O'Brien, Kevin; O¿Brien, Kathryn
    International Conference on Parallel Architectures and Compilation Techniques
    Presentation's date: 2008-10
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Ease of programming is one of the main impediments for the broad acceptance of multi-core systems with no hardware support for transparent data transfer between local and global memories. Software cache is a robust approach to provide the user with a transparent view of the memory architecture; but this software approach can suffer from poor performance. In this paper, we propose a hierarchical, hybrid software-cache architecture that classifies at compile time memory accesses in two classes, highlocality and irregular. Our approach then steers the memory references toward one of two specific cache structures optimized for their respective access pattern. The specific cache structures are optimized to enable high-level compiler optimizations to aggressively unroll loops, reorder cache references, and/or transform surrounding loops so as to practically eliminate the software cache overhead in the innermost loop. Performance evaluation indicates that improvements due to the optimized software-cache structures combined with the proposed codeoptimizations translate into 3.5 to 8.4 speedup factors, compared to a traditional software cache approach. As a result, we demonstrate that the Cell BE processor can be a competitive alternative to a modern server-class multi-core such as the IBM Power5 processor for a set of parallel NAS applications.

  • Evaluation of memory performance on the cell BE with the SARC programming model

     Ferrer, Roger; Gonzalez Tallada, Marc; Federico, Silla; Martorell Bofill, Xavier; Ayguade Parra, Eduard
    Workshop on Memory Performance: dealing with Applications, Systems and Architecture
    Presentation's date: 2008-10
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    With the advent of multicore architectures, especially with the heterogeneous ones, both computational and memory top performance are difficult to obtain using traditional programming models. Usually, programmers have to fully reorganize the code and data of their applications in order to maximize resource usage, and work with the low-level interfaces offered by the vendor-provided SDKs, to obtain high computational and memory performances. In this paper, we present the evaluation of the SARC programming model on the Cell BE architecture, with respect to memory performance. We show how we have annotated the HPL STREAM and RandomAccess applications, and the memory bandwidth obtained. Results indicate that the programming model provides good productivity and competitive performance on this kind of architectures.

  • Hybrid Access-Specific Software Cache Techniques for the Cell BE Architecture

     Gonzalez Tallada, Marc
    Parallel Architectures and Compilation Techniques
    Presentation's date: 2008-10-29
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Hybrid Access-Specific Software Cache Techniques for the Cell BE Architecture

     Gonzalez Tallada, Marc
    International Conference on Parallel Architectures and Compilation Techniques
    Presentation's date: 2008-10-25
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Prefetching Irregular References for Software Cache on Cell

     Gonzalez Tallada, Marc
    International Symposium on Code Generation and Optimization
    Presentation's date: 2008-04-06
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • DATA TRANSFER OPTIMIZED SOFTWARE CACHE FOR REGULAR MEMORY REFERENCES

     Gonzalez Tallada, Marc; Martorell Bofill, Xavier; Ayguade Parra, Eduard; Tong, Chen; Eichenberger, Alex; Zera, Sura; Kathryn, O'brien; O'brien, Kevin; Zhang, Tao
    Date of request: 2008-03-28
    Invention patent

     Share Reference managers Reference managers Open in new window

  • DATA TRANSFER OPTIMIZED SOFTWARE CACHE FOR IRREGULAR MEMORY REFRENCES

     Gonzalez Tallada, Marc; Martorell Bofill, Xavier; Ayguade Parra, Eduard; Eichenberger, Alex; Tong, Chen; Zehra, Sura; Zhang, Tao; Kathryn, O'brien; O'brien, Kevin
    Date of request: 2008-03-28
    Invention patent

     Share Reference managers Reference managers Open in new window

  • PREFETCHING IRREGULAR DATA REFERENCES FOR SOFTWARE CONTROLLED CACHE

     Gonzalez Tallada, Marc; Tong, Chen; Zhang, Tao; Zehra, Sura
    Date of request: 2008-04-02
    Invention patent

     Share Reference managers Reference managers Open in new window

  • REDUCING CACHE POLLUTION OF A SOFTWARE CONTROLLED CACHE

     Gonzalez Tallada, Marc
    Date of request: 2008-04-02
    Invention patent

     Share Reference managers Reference managers Open in new window

  • EFFICIENT SOFTWARE CACHE ACCESSING WITH HANDLING REUSE

     Gonzalez Tallada, Marc
    Date of request: 2008-04-02
    Invention patent

     Share Reference managers Reference managers Open in new window

  • OPTIMIZED CODE GENERATION TARGETING A HIGH LOCALITY SOFTWARE CACHE

     Gonzalez Tallada, Marc; Tong, Chen; Eichenberger, Alex; Zera, Sura; Kathryn, O'brien; O'brien, Kevin; Zhang, Tao
    Date of request: 2008-10-02
    Invention patent

     Share Reference managers Reference managers Open in new window

  • DYNAMICALLY CONTROLLING A PREFETCHING RANGE OF A SOFTWARE CONTROLLED CACHE

     Gonzalez Tallada, Marc; Tong, Chen; Zhang, Tao; Zehra, Sura
    Date of request: 2008-04-02
    Invention patent

     Share Reference managers Reference managers Open in new window

  • A proposal for error handling in OpenMP

     Duran González, Alejandro; Ferrer, Roger; Costa Prats, Juan Jose; Gonzalez Tallada, Marc; Martorell Bofill, Xavier; Ayguade Parra, Eduard; Labarta Mancho, Jesus Jose
    International journal of parallel programming
    Date of publication: 2007-08
    Journal article

     Share Reference managers Reference managers Open in new window

  • Improving Data Locality in NAS BT Benchmark

     Vaquero, Jordi; Gonzalez Tallada, Marc; Costa Prats, Juan Jose; Javier, Bueno; Martorell Bofill, Xavier; Cortes Rossello, Antonio; Ayguade Parra, Eduard
    Third International Summer School on Advanced Computer Architecture and Compilation for Embedded Systems (ACACES 2007)
    Presentation's date: 2007-07
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • A Novel Asynchronous Software Cache Implementation for the Cell-Be Processor

     Balart, Jairo; Gonzalez Tallada, Marc; Martorell Bofill, Xavier; Ayguade Parra, Eduard; Zehra, Sura; Tong, Chen; Zhang, Tao; O'brien, Kevin; Kathryn, O'brien
    Workshop on Languages and Compilers for Parallel Computing
    Presentation's date: 2007-10
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Mercurium C/C ++ source-to-source compiler

     Ferrer, Roger; Gonzalez Tallada, Marc; Martorell Bofill, Xavier; Ayguade Parra, Eduard
    Third International Summer School on Advanced Computer Architecture and Compilation for Embedded Systems (ACACES 2007)
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • A Proposal for Error Handling in OpenMP

     Duran González, Alejandro; Ferrer, Roger; Costa Prats, Juan Jose; Gonzalez Tallada, Marc; Martorell Bofill, Xavier; Ayguade Parra, Eduard; Labarta Mancho, Jesus Jose
    Lecture notes in computer science
    Date of publication: 2006-06
    Journal article

     Share Reference managers Reference managers Open in new window

  • Employing NestedOpenMP for the Parallelization of Multi-Zone Computational Fluid Dynamics Applications

     Ayguade Parra, Eduard; Gonzalez Tallada, Marc; Martorell Bofill, Xavier; Jost, G
    Journal of parallel and distributed computing
    Date of publication: 2006-05
    Journal article

     Share Reference managers Reference managers Open in new window

  • Runtime Address Space Computation for SDSM Systems

     Balart, J; Gonzalez Tallada, Marc; Martorell Bofill, Xavier; Ayguade Parra, Eduard; Labarta Mancho, Jesus Jose
    Lecture notes in computer science
    Date of publication: 2006-11
    Journal article

     Share Reference managers Reference managers Open in new window

  • Experiences Parallelizing a Web Server with OpenMP

     Balart Tarzan, Jairo; Duran González, Alejandro; Gonzalez Tallada, Marc; Martorell Bofill, Xavier; Ayguade Parra, Eduard; Labarta Mancho, Jesus Jose
    Lecture notes in computer science
    Date of publication: 2006-06
    Journal article

     Share Reference managers Reference managers Open in new window

  • Experiences Parallelizing a Web Server with OpenMP

     Balart, Jairo; Duran González, Alejandro; Gonzalez Tallada, Marc; Martorell Bofill, Xavier; Ayguade Parra, Eduard; Labarta Mancho, Jesus Jose
    International Workshops, IWOMP 2005 and IWOMP 2006. OpenMP Shared Memory Parallel Programming
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window