Graphic summary
  • Show / hide key
  • Information


Scientific and technological production
  •  

1 to 50 of 205 results
  • Exploiting reuse locality on inclusive shared last-level caches

     Albericio, Jorge; Ibáñez Marín, Pablo Enrique; Viñals Yufera, Víctor; Llaberia Griño, Jose M.
    ACM transactions on architecture and code optimization
    Vol. 9, num. 4, p. 38-1-38-19
    DOI: 10.1145/2400682.2400697
    Date of publication: 2013-01
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Vectorized register tiling

     Berna Juan, Alejandro; Jimenez Castells, Marta; Llaberia Griño, Jose M.
    Date: 2012-01
    Report

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Source code transformations for efficient SIMD code generation

     Berna Juan, Alejandro; Jimenez Castells, Marta; Llaberia Griño, Jose M.
    Date: 2012-01
    Report

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • HIPEAC 3 - European Network of Excellence on HighPerformance Embedded Architecture and Compilers

     Gil Gómez, Maria Luisa; Navarro Mas, Nacho; Martorell Bofill, Xavier; Valero Cortes, Mateo; Ayguade Parra, Eduard; Ramirez Bellido, Alejandro; Badia Sala, Rosa Maria; Labarta Mancho, Jesus Jose; Llaberia Griño, Jose M.
    Competitive project

     Share

  • ABS: a low-cost adaptive controller for prefetching in a banked shared last-level cache

     Albericio, Jorge; Gran, Rubén; Ibañez, Pablo; Viñals Yúfera, Víctor; Llaberia Griño, Jose M.
    ACM transactions on architecture and code optimization
    Vol. 8, num. 4, p. 19:1-19:20
    DOI: 10.1145/2086696.2086698
    Date of publication: 2012-01
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Effcient handling of lock hand-off in DSM multiprocessors with buffering coherence controllers

     Sahelices, Benjamin; de Dios, Agustín; Ibañez, Pablo; Viñals Yufera, Victor; Llaberia Griño, Jose M.
    Journal of computer science and technology
    Vol. 27, num. 1, p. 75-91
    DOI: 10.1007/s11390-012-1207-2
    Date of publication: 2012-01
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Filtering directory lookups in CMPs with write-through caches

     Bosque Arbiol, Ana; Viñals, Victor; Ibañez, Pablo; Llaberia Griño, Jose M.
    International European Conference on Parallel and Distributed Computing
    p. 269-281
    DOI: 10.1007/978-3-642-23400-2_26
    Presentation's date: 2011-09-02
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Filtering directory lookups in CMPs

     Bosque Arbiol, Ana
    Universidad de Zaragoza
    Theses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Filtering directory lookups in CMPs

     Bosque Arbiol, Ana; Viñals Yufera, Victor; IBÁÑEZ MARÍN, PABLO; Llaberia Griño, Jose M.
    Microprocessors and microsystems
    Vol. 35, num. 8, p. 695-707
    DOI: 10.1016/j.micpro.2011.08.006
    Date of publication: 2011-11
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Access to the full text
    Source-to-Source transformations for efficient SIMD code generation  Open access

     Berna Juan, Alejandro; Jimenez Castells, Marta; Llaberia Griño, Jose M.
    Jornadas de Paralelismo
    p. 719-726
    Presentation's date: 2011-09
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    In the last years, there has been much effort in commercial compilers to generate efficient SIMD instructions-based code sequences from conventional sequential programs. However, the small numbers of compilers that can automatically use these instructions achieve in most cases unsatisfactory results. Therefore, the code often has to be written manually in assembly language or using compiler built-in functions to achieve high performance. In this work, we present source-to-source transformations that help commercial vectorizing compilers to generate efficient SIMD code. Experimental results show that excellent performance can be achieved. In particular, for the problem of matrix product (SGEMM) we almost achieve as high performance as hand-optimized numerical libraries. Our source-tosource transformations are based on the scalar replacement and unroll and jam transformations presented by Callahan et all. In particular, we extend the use of scalar replacement to vectorial replacement and combine this transformation with unroll and jam and outer loop vectorization to fully exploit the vector register level and thus to help the compiler to generate efficient SIMD code. We will show experimentally the effectiveness of our proposal.

  • Non-Speculative Enhancements for the Scheduling Logic

     Gran Tejero, Ruben
    Department of Computer Architecture, Universitat Politècnica de Catalunya
    Theses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Access to the full text
    Filtering directory lookups in CMPs  Open access

     Bosque, Ana; Viñals Yúfera, Víctor; Ibáñez Marín, Pablo Enrique; Llaberia Griño, Jose M.
    Euromicro Conference on Digital System Design: Architectures, Methods and Tools
    p. 207-216
    DOI: doi.ieeecomputersociety.org/10.1109/DSD.2010.85
    Presentation's date: 2010
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Coherence protocols consume an important fraction of power to determine which coherence action should take place. In this paper we focus on CMPs with a shared cache and a directory-based coherence protocol implemented as a duplicate of local caches tags. We observe that a big fraction of directory lookups produce a miss since the block looked up is not cached in any local cache. We propose to add a filter before the directory lookup in order to reduce the number of lookups to this structure. The filter identifies whether the current block was last accessed as a data or as an instruction. With this information, looking up the whole directory can be avoided for most accesses. We evaluate the filter in a CMP with 8 in-order processors with 4 threads each and a memory hierarchy with a shared L2 cache.We show that a filter with a size of 3% of the tag array of the shared cache can avoid more than 70% of all comparisons performed by directory lookups with a performance loss of just 0.2% for SPLASH2 and 1.5% for Specweb2005. On average, the number of 15-bit comparisons avoided per cycle is 54 out of 77 for SPLASH2 and 29 out of 41 for Specweb2005. In both cases, the filter requires less than one read of 1 bit per cycle.

  • ARQUITECTURA DE COMPUTADORS D'ALTRES PRESTACIONS (CAP)

     Jimenez Castells, Marta; Pericas Gleim, Miquel; Navarro Guerrero, Juan Jose; Llaberia Griño, Jose M.; Llosa Espuny, Jose Francisco; Villavieja Prados, Carlos; Alvarez Martinez, Carlos; Jimenez Gonzalez, Daniel; Ramirez Bellido, Alejandro; Morancho Llena, Enrique; Fernandez Jimenez, Agustin; Pajuelo González, Manuel Alejandro; Olive Duran, Angel; Sanchez Carracedo, Fermin; Moreto Planas, Miquel; Verdu Mula, Javier; Abella Ferrer, Jaume; Valero Cortes, Mateo
    Competitive project

     Share

  • Access to the full text
    On reducing misspeculations on a pipelined scheduler  Open access

     Gran Tejero, Ruben; Morancho Llena, Enrique; Olive Duran, Angel; Llaberia Griño, Jose M.
    IEEE International Parallel and Distributed Processing Symposium
    p. 1-12
    DOI: 10.1109/IPDPS.2009.5160990
    Presentation's date: 2009-05
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Pipelining the scheduling logic, which exposes and exploits the instruction level parallelism, degrades processor performance. In a 4-issue processor, our evaluations show that pipelining the scheduling logic over two cycles degrades performance by 10% in SPEC-2000 integer benchmarks. Such a performance degradation is due to sacrificing the ability to execute dependent instructions in consecutive cycles. Speculative selection is a previously proposed technique that boosts the performance of a processor with a pipelined scheduling logic. However, this new speculation source increases the overall number of misspeculated instructions, and this unuseful work wastes energy. In this work we introduce a non-speculative mechanism named Dependence Level Scheduler (DLS)which not only tolerates the scheduling-logic latency but also reduces the number of misspeculated instructions with respect to a scheduler with speculative selection. In DLS, the selection of a group of one-cycle instructions (producer-level) is overlapped with the wake up in advance of its group of dependent instructions. DLS is not speculative because the group of woken in advance instructions will compete for selection only after issuing all producer-level instructions. On average, DLS reduces the number of misspeculated instructions with respect to a speculative scheduler by 17.9%. From the IPC point of view, the speculative scheduler outperforms DLS by 0.3%. Moreover, we propose two non-speculative improvements to DLS.

  • Store Buffer Design for Multibanked Data Caches

     Torres, E; Ibanez, P; Vinals-Yufera, V; Llaberia Griño, Jose M.
    IEEE transactions on computers
    Vol. 58, num. 10, p. 1307-1320
    DOI: 10.1109/TC.2009.57
    Date of publication: 2009-10
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    This paper focuses on how to design a store buffer (STB) well suited to first-level multibanked data caches. The goal is to forward data from in-flight stores into dependent loads within the latency of a cache bank. Taking into account the store lifetime in the processor pipeline and the data forwarding behavior, we propose a particular two-level STB design in which forwarding is done speculatively from a distributed first-level STB made of extremely small banks, whereas a centralized, second-level STB enforces correct store-load ordering. Besides, the two-level STB admits two simplifications that leave performance almost unchanged. Regarding the second-level STB, we suggest to remove its data forwarding capability, while for the first-level STB, it is possible to: 1) remove the instruction age checking and 2) compare only the less significant address bits. Experimentation covers both integer and floating point codes executing in dynamically scheduled processors. Following our guidelines and running SPEC-2K over an 8-way processor, a two-level STB with four 8-entry banks in the first level performs similar to an ideal, single-level STB with 128-entry banks working at the first-level cache latency. Also, we show that the proposed two-level design is suitable for a memory-latency-tolerant processor.

  • A methodology to characterize critical section bottlenecks in DSM multiprocessors

     Sahelices, Benjamin; Ibañez, Pablo; Viñals, Victor; Llaberia Griño, Jose M.
    International European Conference on Parallel and Distributed Computing
    p. 149-161
    DOI: 10.1007/978-3-642-03869-3_17
    Presentation's date: 2009-09
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Understanding and optimizing the synchronization operations of parallel programs in distributed shared memory multiprocessors (dsm), is one of the most important factors leading to significant reductions in execution time. This paper introduces a new methodology for tuning performance of parallel programs. We focus on the critical sections used to assure exclusive access to critical resources and data structures, proposing a specific dynamic characterization of every critical section in order to a) measure the lock contention, b) measure the degree of data sharing in consecutive executions, and c) break down the execution time, reflecting the different overheads that can appear. All the required measurements are taken using a multiprocessor simulator with a detailed timing model of the processor and memory system. We propose also a static classification of critical sections that takes into account how locks are associated with their protected data. The dynamic characterization and the static classification are correlated to identify key critical sections and infer code optimization opportunities (e.g. data layout), which when applied can lead to significant reductions in execution time (up to 33 % in the SPLASH-2 scientific benchmark suite). By using the simulator we can also evaluate whether the performance of the applied code optimizations is sensitive to common hardware optimizations or not.

  • Characterization of Apache web server with Specweb2005

     Bosque, Ana; Ibañez, Pablo; Viñals, Victor; Stenstrom, Per; Llaberia Griño, Jose M.
    MEDEA Workshop MEmory performance: DEaling with Applications, systems and architecture in conjunction with PACT 2007 Conference.
    p. 73-80
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Aceleración del cambio de propietario de un cerrojo en el multiprocesadores DSM

     Rodriguez, Esther; Sahelices, Benjamin; Llanos, Diego R; Ibañez, Pablo; Viñals, Victor; Llaberia Griño, Jose M.
    XVIII Jornadas de Paralelismo. CEDI 2007 II Congreso Español de Informática.
    p. 139-146
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Memory Characterization of Apache using Specweb2005

     Bosque, Ana; Ibañez, Pablo; Viñals, Victor; Stenström, Per; Llaberia Griño, Jose M.
    XVIII Jornadas de Paralelismo. CEDI 2007 II Congreso Español de Informática.
    p. 165-172
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • On Reducing Energy-Consumption by Late-Inserting Instructions into the Issue Queue

     Morancho Llena, Enrique; Llaberia Griño, Jose M.; Olive Duran, Angel
    International Symposium on Low Power Electronics and Design
    p. 371-374
    Presentation's date: 2007-08-29
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • On Improving a Pipelined Scheduling Logic

     Gran Tejero, Ruben; Morancho Llena, Enrique; Olive Duran, Angel; Llaberia Griño, Jose M.
    XVIII Jornadas de Paralelismo. CEDI 2007 II Congreso Español de Informática.
    p. 75-82
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • On improving a pipelined scheduling logic

     Ruben, Gran; Morancho Llena, Enrique; Llaberia Griño, Jose M.; Olive Duran, Angel
    XVIII Jornadas de Paralelismo. CEDI 2007 II Congreso Español de Informática.
    p. 1
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • On Tolerating the Scheduling-Loop Latency

     Gran Tejero, Ruben; Morancho Llena, Enrique; Olive Duran, Angel; Llaberia Griño, Jose M.
    Date: 2007-10
    Report

     Share Reference managers Reference managers Open in new window

  • A comparison of two policies for issuing instructions speculatively

     Morancho Llena, Enrique; Llaberia Griño, Jose M.; Olive Duran, Angel
    Journal of systems architecture
    Vol. 53, num. 4, p. 170-183
    Date of publication: 2007-04
    Journal article

     Share Reference managers Reference managers Open in new window

  • Predicting L2 Misses to Increase Issue-Queue Efficacy

     Morancho Llena, Enrique; Llaberia Griño, Jose M.; Olive Duran, Angel
    4th Workshop on Memory Performance Issues (WMPI-2006) in conjunction with the 12th International Symposium on High-Performance Computer Architecture (HPCA-12)
    p. 29-35
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • An Enhancement for a Scheduling Logic Pipelined over two Cycles

     Gran Tejero, Ruben; Morancho Llena, Enrique; Olive Duran, Angel; Llaberia Griño, Jose M.
    Jornadas de Paralelismo
    p. 1-6
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Non-Speculative Enhancements for a Pipelined Scheduling Logic

     Llaberia Griño, Jose M.
    Second International Summer School on Advanced Computer Architecture and Compilation for Embedded Systems (ACACES 2006)
    Presentation's date: 2006-07-26
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Spedding-Up Synchronizations in DSM Multiprocessors

     Dios, A De; Shelices, B; Ibañez, P; Viñals, V; Llaberia Griño, Jose M.
    Euro-Par
    p. 473-484
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • An Enchancement for a Scheduling Logic Pipelined over two Cycles

     Llaberia Griño, Jose M.
    Jornadas de Paralelismo
    Presentation's date: 2006-09-18
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • An Enhancement for a Scheduling Logic Pipelined over two Cycles

     Gran Tejero, Ruben; Morancho Llena, Enrique; Olive Duran, Angel; Llaberia Griño, Jose M.
    ICCD 2006 XXIV IEEE International Conference on Computer Design
    p. 203-209
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Non-Speculative Enhancements for a Pipelined Scheduling Logic

     Gran Tejero, Ruben; Morancho Llena, Enrique; Olive Duran, Angel; Llaberia Griño, Jose M.
    Second International Summer School on Advanced Computer Architecture and Compilation for Embedded Systems (ACACES 2006)
    p. 1-4
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Predicting L2 Misses to Increase Issue-Queue Eficacy

     Llaberia Griño, Jose M.
    4th Workshop on Memory Performance Issues (WMPI-2006) in conjunction with the 12th International Symposium on High-Performance Computer Architecture (HPCA-12)
    Presentation's date: 2006-02-11
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Speeding-Up Synchronizations in DSM Multiprocessors

     Dios, A De; Sahelice, B; Ibañez, P; Viñals, V; Llaberia Griño, Jose M.
    Lecture notes in computer science
    Vol. 1, num. 4128, p. 473-484
    Date of publication: 2006-09
    Journal article

     Share Reference managers Reference managers Open in new window

  • An Enhancement for a Sceduling Logic Pipelined over two Cycles

     Gran Tejero, Ruben; Morancho Llena, Enrique; Olive Duran, Angel; Llaberia Griño, Jose M.
    Date: 2006-07
    Report

     Share Reference managers Reference managers Open in new window

  • On Tolerating the Scheduling-Loop Latency Non-Speculatively

     Gran Tejero, Ruben; Morancho Llena, Enrique; Olive Duran, Angel; Llaberia Griño, Jose M.
    Date: 2006-07
    Report

     Share Reference managers Reference managers Open in new window

  • Planificador por Niveles de Dependencia

     Gran Tejero, Ruben; Morancho Llena, Enrique; Olive Duran, Angel; Llaberia Griño, Jose M.
    Date: 2006-10
    Report

     Share Reference managers Reference managers Open in new window

  • La Lógica de Lanzamiento a Ejecución de Instrucciones

     Gran Tejero, Ruben; Morancho Llena, Enrique; Olive Duran, Angel; Llaberia Griño, Jose M.
    Date: 2006-10
    Report

     Share Reference managers Reference managers Open in new window

  • Cache Miss Characterization of Commercial Workloads

     Bosque, Ana; Viñals, Victor; Ibañez, Pablo; Stenström, Per; Llaberia Griño, Jose M.
    Second International Summer School on Advanced Computer Architecture and Compilation for Embedded Systems (ACACES 2006)
    p. 201-204
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Speeding - Up Synchronizations in DSM Multiprocessors

     Llaberia Griño, Jose M.
    Euro-Par
    Presentation's date: 2006-08-28
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Access to the full text
    El optimizador de bucles del compilador Open64/ORC (parte 2)  Open access

     Santamaria Barnadas, Eduard; Jimenez Castells, Marta; Fernandez Jimenez, Agustin; Llaberia Griño, Jose M.
    Date: 2005-09-05
    Report

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Open64 y ORC (Open Research Compiler) son dos iniciativas de código abierto basadas en el compilador SGI Pro64. Open64 está gestionada por miembros de la Universidad de Delaware, y ORC es una extensión del compilador desarrollada por Intel y la Chinese Academy of Science. Para más información consultar las respectivas páginas web [2] y [1]. SGI Pro64 es un conjunto de compiladores optimizadores desarrollados por SGI. Incluye compiladores de C, C++ y Fortran90/95 que siguen los estándares ABI y API de Linux IA-64. Los archivos fuente son de dominio público y se distribuyen bajo los términos de la GNU General Public License. El conjunto de compiladores está disponible para correr sobre plataformas Linux IA-32 e IA-64. Este documento continúa el trabajo iniciado en los technical reports “Introducción al compilador Open64/ORC” [10] y “El optimizador de bucles del compilador Open64/ORC (parte 1)” [11]. El primero describe los componentes del compilador y la representación intermedia que se utiliza como interficie común entre ellos. El segundo documento se centra específicamente en uno de los componentes del compilador: el optimizador de bucles.

  • Tradeoffs in buffering speculative memory state for thread-level speculation in multiprocessors

     Garzaran, Maria Jesus; Prvulovic, Milos; Llaberia Griño, Jose M.; Viñals, Victor; Rauchweger, Lawrence; Torrellas, Josep
    ACM transactions on architecture and code optimization
    Vol. 2, num. 3, p. 247-279
    Date of publication: 2005-09
    Journal article

     Share Reference managers Reference managers Open in new window

  • Accurate and complexity-effective coherence predictors

     Bosque, Ana; Viñals, Victor; Llaberia Griño, Jose M.; Stenström, Per
    International Summer School on Advanced Computer Architecture and Compilation for Embedded Systems
    p. 91-94
    Presentation's date: 2005
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • El optimizador de bucles del compilador Open64/ORC

     Jimenez Castells, Marta; Fernandez Jimenez, Agustin; Llaberia Griño, Jose M.; Santamaria Barnadas, Eduard
    Date: 2004-12-14
    Report

     Share Reference managers Reference managers Open in new window

  • A Mechanism for Verifying Data Speculation

     Morancho Llena, Enrique; Llaberia Griño, Jose M.; Olive Duran, Angel
    Euro-Par
    p. 525-534
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Contents Management in First-Level Multibanked Data Caches

     Llaberia Griño, Jose M.
    Euro-Par
    Presentation's date: 2004-08-31
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • A Mechanism for Verifying Data Speculation

     Llaberia Griño, Jose M.
    Euro-Par
    Presentation's date: 2004-08-31
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Contents Management in First-Level Multibanked Data Caches

     Torres, E F; Ibañez, P; Viñals, V; Llaberia Griño, Jose M.
    Euro-Par
    p. 516-524
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Contents Management in First-Level Multibanked Data Caches

     Torres, E F; Ibañez, P; Viñals, V; Llaberia Griño, Jose M.
    Lecture notes in computer science
    Vol. 1, num. 3149, p. 516-524
    Date of publication: 2004-08
    Journal article

     Share Reference managers Reference managers Open in new window

  • A Mechanism for Verifying Data Speculation

     Morancho Llena, Enrique; Llaberia Griño, Jose M.; Olive Duran, Angel
    Lecture notes in computer science
    Vol. 1, num. 3149, p. 525-534
    Date of publication: 2004-08
    Journal article

     Share Reference managers Reference managers Open in new window

  • Introducción al Compilador Open64/ORC

     Santamaria, Eduard; Jimenez Castells, Marta; Fernandez Jimenez, Agustin; Llaberia Griño, Jose M.
    Date: 2003-05
    Report

     Share Reference managers Reference managers Open in new window