Graphic summary
  • Show / hide key
  • Information


Scientific and technological production
  •  

1 to 50 of 206 results
  • HIPEAC 3 - European Network of Excellence on HighPerformance Embedded Architecture and Compilers

     Gil Gómez, Maria Luisa; Navarro Mas, Nacho; Martorell Bofill, Xavier; Valero Cortes, Mateo; Ayguade Parra, Eduard; Ramirez Bellido, Alejandro; Badia Sala, Rosa Maria; Labarta Mancho, Jesus Jose; Llaberia Griño, Jose M.
    Competitive project

     Share

  • Source code transformations for efficient SIMD code generation

     Berna Juan, Alejandro; Jimenez Castells, Marta; Llaberia Griño, Jose M.
    Date: 2012-01
    Report

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Vectorized register tiling

     Berna Juan, Alejandro; Jimenez Castells, Marta; Llaberia Griño, Jose M.
    Date: 2012-01
    Report

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Filtering directory lookups in CMPs

     Bosque Arbiol, Ana
    Universidad de Zaragoza
    Theses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Filtering directory lookups in CMPs with write-through caches

     Bosque Arbiol, Ana; Viñals, Victor; Ibañez, Pablo; Llaberia Griño, Jose M.
    International European Conference on Parallel and Distributed Computing
    p. 269-281
    DOI: 10.1007/978-3-642-23400-2_26
    Presentation's date: 2011-09-02
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Access to the full text
    Source-to-Source transformations for efficient SIMD code generation  Open access

     Berna Juan, Alejandro; Jimenez Castells, Marta; Llaberia Griño, Jose M.
    Jornadas de Paralelismo
    p. 719-726
    Presentation's date: 2011-09
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    In the last years, there has been much effort in commercial compilers to generate efficient SIMD instructions-based code sequences from conventional sequential programs. However, the small numbers of compilers that can automatically use these instructions achieve in most cases unsatisfactory results. Therefore, the code often has to be written manually in assembly language or using compiler built-in functions to achieve high performance. In this work, we present source-to-source transformations that help commercial vectorizing compilers to generate efficient SIMD code. Experimental results show that excellent performance can be achieved. In particular, for the problem of matrix product (SGEMM) we almost achieve as high performance as hand-optimized numerical libraries. Our source-tosource transformations are based on the scalar replacement and unroll and jam transformations presented by Callahan et all. In particular, we extend the use of scalar replacement to vectorial replacement and combine this transformation with unroll and jam and outer loop vectorization to fully exploit the vector register level and thus to help the compiler to generate efficient SIMD code. We will show experimentally the effectiveness of our proposal.

  • Non-Speculative Enhancements for the Scheduling Logic

     Gran Tejero, Ruben
    Department of Computer Architecture, Universitat Politècnica de Catalunya
    Theses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Access to the full text
    Filtering directory lookups in CMPs  Open access

     Bosque Arbiol, Ana; Viñals Yúfera, Víctor; Ibáñez Marín, Pablo Enrique; Llaberia Griño, Jose M.
    Euromicro Conference on Digital System Design: Architectures, Methods and Tools
    p. 207-216
    DOI: doi.ieeecomputersociety.org/10.1109/DSD.2010.85
    Presentation's date: 2010
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Coherence protocols consume an important fraction of power to determine which coherence action should take place. In this paper we focus on CMPs with a shared cache and a directory-based coherence protocol implemented as a duplicate of local caches tags. We observe that a big fraction of directory lookups produce a miss since the block looked up is not cached in any local cache. We propose to add a filter before the directory lookup in order to reduce the number of lookups to this structure. The filter identifies whether the current block was last accessed as a data or as an instruction. With this information, looking up the whole directory can be avoided for most accesses. We evaluate the filter in a CMP with 8 in-order processors with 4 threads each and a memory hierarchy with a shared L2 cache.We show that a filter with a size of 3% of the tag array of the shared cache can avoid more than 70% of all comparisons performed by directory lookups with a performance loss of just 0.2% for SPLASH2 and 1.5% for Specweb2005. On average, the number of 15-bit comparisons avoided per cycle is 54 out of 77 for SPLASH2 and 29 out of 41 for Specweb2005. In both cases, the filter requires less than one read of 1 bit per cycle.

  • ARQUITECTURA DE COMPUTADORS D'ALTRES PRESTACIONS (CAP)

     Jimenez Castells, Marta; Pericas Gleim, Miquel; Navarro Guerrero, Juan Jose; Llaberia Griño, Jose M.; Llosa Espuny, Jose Francisco; Villavieja Prados, Carlos; Alvarez Martinez, Carlos; Jimenez Gonzalez, Daniel; Ramirez Bellido, Alejandro; Morancho Llena, Enrique; Fernandez Jimenez, Agustin; Pajuelo González, Manuel Alejandro; Olive Duran, Angel; Sanchez Carracedo, Fermin; Moreto Planas, Miquel; Verdu Mula, Javier; Abella Ferrer, Jaume; Valero Cortes, Mateo
    Competitive project

     Share

  • A methodology to characterize critical section bottlenecks in DSM multiprocessors

     Sahelices, Benjamin; Ibañez, Pablo; Viñals, Victor; Llaberia Griño, Jose M.
    International European Conference on Parallel and Distributed Computing
    p. 149-161
    DOI: 10.1007/978-3-642-03869-3_17
    Presentation's date: 2009-09
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Understanding and optimizing the synchronization operations of parallel programs in distributed shared memory multiprocessors (dsm), is one of the most important factors leading to significant reductions in execution time. This paper introduces a new methodology for tuning performance of parallel programs. We focus on the critical sections used to assure exclusive access to critical resources and data structures, proposing a specific dynamic characterization of every critical section in order to a) measure the lock contention, b) measure the degree of data sharing in consecutive executions, and c) break down the execution time, reflecting the different overheads that can appear. All the required measurements are taken using a multiprocessor simulator with a detailed timing model of the processor and memory system. We propose also a static classification of critical sections that takes into account how locks are associated with their protected data. The dynamic characterization and the static classification are correlated to identify key critical sections and infer code optimization opportunities (e.g. data layout), which when applied can lead to significant reductions in execution time (up to 33 % in the SPLASH-2 scientific benchmark suite). By using the simulator we can also evaluate whether the performance of the applied code optimizations is sensitive to common hardware optimizations or not.

  • Access to the full text
    On reducing misspeculations on a pipelined scheduler  Open access

     Gran Tejero, Ruben; Morancho Llena, Enrique; Olive Duran, Angel; Llaberia Griño, Jose M.
    IEEE International Parallel and Distributed Processing Symposium
    p. 1-12
    DOI: 10.1109/IPDPS.2009.5160990
    Presentation's date: 2009-05
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Pipelining the scheduling logic, which exposes and exploits the instruction level parallelism, degrades processor performance. In a 4-issue processor, our evaluations show that pipelining the scheduling logic over two cycles degrades performance by 10% in SPEC-2000 integer benchmarks. Such a performance degradation is due to sacrificing the ability to execute dependent instructions in consecutive cycles. Speculative selection is a previously proposed technique that boosts the performance of a processor with a pipelined scheduling logic. However, this new speculation source increases the overall number of misspeculated instructions, and this unuseful work wastes energy. In this work we introduce a non-speculative mechanism named Dependence Level Scheduler (DLS)which not only tolerates the scheduling-logic latency but also reduces the number of misspeculated instructions with respect to a scheduler with speculative selection. In DLS, the selection of a group of one-cycle instructions (producer-level) is overlapped with the wake up in advance of its group of dependent instructions. DLS is not speculative because the group of woken in advance instructions will compete for selection only after issuing all producer-level instructions. On average, DLS reduces the number of misspeculated instructions with respect to a speculative scheduler by 17.9%. From the IPC point of view, the speculative scheduler outperforms DLS by 0.3%. Moreover, we propose two non-speculative improvements to DLS.

  • On Tolerating the Scheduling-Loop Latency

     Gran Tejero, Ruben; Morancho Llena, Enrique; Olive Duran, Angel; Llaberia Griño, Jose M.
    Date: 2007-10
    Report

     Share Reference managers Reference managers Open in new window

  • On Reducing Energy-Consumption by Late-Inserting Instructions into the Issue Queue

     Morancho Llena, Enrique; Llaberia Griño, Jose M.; Olive Duran, Angel
    International Symposium on Low Power Electronics and Design
    p. 371-374
    Presentation's date: 2007-08-29
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Characterization of Apache web server with Specweb2005

     Bosque Arbiol, Ana; Ibañez, Pablo; Viñals, Victor; Stenstrom, Per; Llaberia Griño, Jose M.
    MEDEA Workshop MEmory performance: DEaling with Applications, systems and architecture in conjunction with PACT 2007 Conference.
    p. 73-80
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Aceleración del cambio de propietario de un cerrojo en el multiprocesadores DSM

     Rodriguez, Esther; Sahelices, Benjamin; Llanos, Diego R; Ibañez, Pablo; Viñals, Victor; Llaberia Griño, Jose M.
    XVIII Jornadas de Paralelismo. CEDI 2007 II Congreso Español de Informática.
    p. 139-146
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Memory Characterization of Apache using Specweb2005

     Bosque Arbiol, Ana; Ibañez, Pablo; Viñals, Victor; Stenström, Per; Llaberia Griño, Jose M.
    XVIII Jornadas de Paralelismo. CEDI 2007 II Congreso Español de Informática.
    p. 165-172
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • On Improving a Pipelined Scheduling Logic

     Gran Tejero, Ruben; Morancho Llena, Enrique; Olive Duran, Angel; Llaberia Griño, Jose M.
    XVIII Jornadas de Paralelismo. CEDI 2007 II Congreso Español de Informática.
    p. 75-82
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • On improving a pipelined scheduling logic

     Ruben, Gran; Morancho Llena, Enrique; Llaberia Griño, Jose M.; Olive Duran, Angel
    XVIII Jornadas de Paralelismo. CEDI 2007 II Congreso Español de Informática.
    p. 1
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • La Lógica de Lanzamiento a Ejecución de Instrucciones

     Gran Tejero, Ruben; Morancho Llena, Enrique; Olive Duran, Angel; Llaberia Griño, Jose M.
    Date: 2006-10
    Report

     Share Reference managers Reference managers Open in new window

  • Planificador por Niveles de Dependencia

     Gran Tejero, Ruben; Morancho Llena, Enrique; Olive Duran, Angel; Llaberia Griño, Jose M.
    Date: 2006-10
    Report

     Share Reference managers Reference managers Open in new window

  • An Enchancement for a Scheduling Logic Pipelined over two Cycles

     Llaberia Griño, Jose M.
    Jornadas de Paralelismo
    Presentation's date: 2006-09-18
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Speeding - Up Synchronizations in DSM Multiprocessors

     Llaberia Griño, Jose M.
    Euro-Par
    Presentation's date: 2006-08-28
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • An Enhancement for a Sceduling Logic Pipelined over two Cycles

     Gran Tejero, Ruben; Morancho Llena, Enrique; Olive Duran, Angel; Llaberia Griño, Jose M.
    Date: 2006-07
    Report

     Share Reference managers Reference managers Open in new window

  • On Tolerating the Scheduling-Loop Latency Non-Speculatively

     Gran Tejero, Ruben; Morancho Llena, Enrique; Olive Duran, Angel; Llaberia Griño, Jose M.
    Date: 2006-07
    Report

     Share Reference managers Reference managers Open in new window

  • Non-Speculative Enhancements for a Pipelined Scheduling Logic

     Llaberia Griño, Jose M.
    Second International Summer School on Advanced Computer Architecture and Compilation for Embedded Systems (ACACES 2006)
    Presentation's date: 2006-07-26
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Predicting L2 Misses to Increase Issue-Queue Eficacy

     Llaberia Griño, Jose M.
    4th Workshop on Memory Performance Issues (WMPI-2006) in conjunction with the 12th International Symposium on High-Performance Computer Architecture (HPCA-12)
    Presentation's date: 2006-02-11
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • An Enhancement for a Scheduling Logic Pipelined over two Cycles

     Gran Tejero, Ruben; Morancho Llena, Enrique; Olive Duran, Angel; Llaberia Griño, Jose M.
    ICCD 2006 XXIV IEEE International Conference on Computer Design
    p. 203-209
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Non-Speculative Enhancements for a Pipelined Scheduling Logic

     Gran Tejero, Ruben; Morancho Llena, Enrique; Olive Duran, Angel; Llaberia Griño, Jose M.
    Second International Summer School on Advanced Computer Architecture and Compilation for Embedded Systems (ACACES 2006)
    p. 1-4
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • An Enhancement for a Scheduling Logic Pipelined over two Cycles

     Gran Tejero, Ruben; Morancho Llena, Enrique; Olive Duran, Angel; Llaberia Griño, Jose M.
    Jornadas de Paralelismo
    p. 1-6
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Predicting L2 Misses to Increase Issue-Queue Efficacy

     Morancho Llena, Enrique; Llaberia Griño, Jose M.; Olive Duran, Angel
    4th Workshop on Memory Performance Issues (WMPI-2006) in conjunction with the 12th International Symposium on High-Performance Computer Architecture (HPCA-12)
    p. 29-35
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Cache Miss Characterization of Commercial Workloads

     Bosque Arbiol, Ana; Viñals, Victor; Ibañez, Pablo; Stenström, Per; Llaberia Griño, Jose M.
    Second International Summer School on Advanced Computer Architecture and Compilation for Embedded Systems (ACACES 2006)
    p. 201-204
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Spedding-Up Synchronizations in DSM Multiprocessors

     Dios, A De; Shelices, B; Ibañez, P; Viñals, V; Llaberia Griño, Jose M.
    Euro-Par
    p. 473-484
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Access to the full text
    El optimizador de bucles del compilador Open64/ORC (parte 2)  Open access

     Santamaria Barnadas, Eduard; Jimenez Castells, Marta; Fernandez Jimenez, Agustin; Llaberia Griño, Jose M.
    Date: 2005-09-05
    Report

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Open64 y ORC (Open Research Compiler) son dos iniciativas de código abierto basadas en el compilador SGI Pro64. Open64 está gestionada por miembros de la Universidad de Delaware, y ORC es una extensión del compilador desarrollada por Intel y la Chinese Academy of Science. Para más información consultar las respectivas páginas web [2] y [1]. SGI Pro64 es un conjunto de compiladores optimizadores desarrollados por SGI. Incluye compiladores de C, C++ y Fortran90/95 que siguen los estándares ABI y API de Linux IA-64. Los archivos fuente son de dominio público y se distribuyen bajo los términos de la GNU General Public License. El conjunto de compiladores está disponible para correr sobre plataformas Linux IA-32 e IA-64. Este documento continúa el trabajo iniciado en los technical reports “Introducción al compilador Open64/ORC” [10] y “El optimizador de bucles del compilador Open64/ORC (parte 1)” [11]. El primero describe los componentes del compilador y la representación intermedia que se utiliza como interficie común entre ellos. El segundo documento se centra específicamente en uno de los componentes del compilador: el optimizador de bucles.

  • Accurate and complexity-effective coherence predictors

     Bosque Arbiol, Ana; Viñals, Victor; Llaberia Griño, Jose M.; Stenström, Per
    International Summer School on Advanced Computer Architecture and Compilation for Embedded Systems
    p. 91-94
    Presentation's date: 2005
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • El optimizador de bucles del compilador Open64/ORC

     Jimenez Castells, Marta; Fernandez Jimenez, Agustin; Llaberia Griño, Jose M.; Santamaria Barnadas, Eduard
    Date: 2004-12-14
    Report

     Share Reference managers Reference managers Open in new window

  • A Mechanism for Verifying Data Speculation

     Llaberia Griño, Jose M.
    Euro-Par
    Presentation's date: 2004-08-31
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Contents Management in First-Level Multibanked Data Caches

     Llaberia Griño, Jose M.
    Euro-Par
    Presentation's date: 2004-08-31
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Contents Management in First-Level Multibanked Data Caches

     Torres, E F; Ibañez, P; Viñals, V; Llaberia Griño, Jose M.
    Euro-Par
    p. 516-524
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • A Mechanism for Verifying Data Speculation

     Morancho Llena, Enrique; Llaberia Griño, Jose M.; Olive Duran, Angel
    Euro-Par
    p. 525-534
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Software Logging under Speculative Parallelization

     Llaberia Griño, Jose M.; Garzazan, Mª Jesus; Prvulovic, Milos; Viñals, Victor; Rauchwerger, Lawrence; Torrellas, Josep
    Date of publication: 2003-12-31
    Book chapter

     Share Reference managers Reference managers Open in new window

  • Using Software Logging to Support Multi-Version Buffering in Thread-Level Speculation

     Llaberia Griño, Jose M.
    12th International Conference on Parallel Architectures and Compilation Techniques (PACT'03)
    Presentation's date: 2003-09-27
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Counteracting Bank Misprediction in Sliced First-Level Caches

     Llaberia Griño, Jose M.
    Euro-Par
    Presentation's date: 2003-08-26
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Introducción al compilador Open64/ORC  Open access

     Santamaria Barnadas, Eduard; Jimenez Castells, Marta; Fernandez Jimenez, Agustin; Llaberia Griño, Jose M.
    Date: 2003-05-13
    Report

    Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

  • Introducción al Compilador Open64/ORC

     Santamaria, Eduard; Jimenez Castells, Marta; Fernandez Jimenez, Agustin; Llaberia Griño, Jose M.
    Date: 2003-05
    Report

     Share Reference managers Reference managers Open in new window

  • Counteracting Bank Misprediction in Sliced First-Level Caches

     Enrique, F Torres; Ibañez, Pablo; Viñals, Victor; Llaberia Griño, Jose M.
    Euro-Par
    p. 586-596
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Using Software Logging to Support Multi-Version Buffering in Thread-Level Speculation

     Garzarán, María Jesús; Prvulovic, Milos; Viñals, Victor; Llaberia Griño, Jose M.; Rauchwerger, Lawrence; Torrellas, Josep
    12th International Conference on Parallel Architectures and Compilation Techniques (PACT'03)
    p. 170-181
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Address Prediction and Recovery Mechanisms  Open access

     Morancho Llena, Enrique
    Department of Computer Architecture, Universitat Politècnica de Catalunya
    Theses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Uno de los mayores retos que debe ser afrontado por los diseñadores de micro-procesadores es el de mitigar la gran latencia de las instrucciones de carga de datos en registros. Esta tesis analiza una de las posibles alternativas para atacar dicho problema: predicción de direcciones y ejecución especulativa.Varios autores han comprobado que las direcciones efectivas calculadas por las instrucciones de carga son bastante predecibles. En primer lugar, hemos analizado a qué es debida dicha predictabilidad. Este estudio intenta establecer las estructuras típicas presentes en lenguajes de alto nivel que, al ser compiladas, generas instruciones de carga predecibles. También se analizan los predictores convencionales con el objetivo de determinar qué predictores son más adecuados para las típicas aplicaciones.El estudio continúa con la propuesta de nuevos predictores de direcciones que utilizan sus recursos de almacenamiento de forma más eficiente que los previos predictores. Los predictores alamacenan información respecto al comportamiento de las instrucciones de carga; sin embargo, los requisitos de las instrucciones predecibles son diferentes de los de las instrucciones no predecibles. Consecuentemente, se propone una organización de las tablas de predicción que considere la existencia de ambos tipos de instruciones. También se muestra que existe un cierto grado de redundnacia en las tablas de predicción de los predictores. Este estudio propoen organizar las tablas de predicción de forma que se reduzca dicha redundancia. Todas estas propuestas permiten reducir los requisitos de los predictores referentes a espacio de alamacenamiento, sin causar menoscabo en el rendimiento de los predictores.Posteriormente, se evalúa el impacto de la predicción de direcciones en el rendimiento de los processadores. Las evaluaciones asumen que las predicciones se utilizan para iniciar de forma especulativa accessos a memoria y para ejecutar de forma especulativa sus instrucciones dependientes. En caso de una predicción correcta, todo el trabajo realizado de forma especulativa puede considerarse como correcto; en caso de error de predicción, el tranajo realizado especulativamente debe ser descartado. El estudio se centra en diversos aspectos como la interacción entre predicción de direcciones y predicción de saltos, la implementación de mecanismods de verification, los mecanismos re recuperación en casos de errores de predicción y la influencia de varios parámetreos del procesador (el tamaño de la ventana de emisión de instrucciones, la latencia de la memora cache, y la anchura de emisión de instrucciones) en le impacto de la predicción de direcciones en el rendimiento de los procesadores.Finalmente, se han evaluado mechanismos de recuperación para el caso de errores de predicción de latencia. La predicción de latencia es una técnica de ejecución especulativa utilizada por los planificadores de alguncos procesadores superescalares para tratar las instrucciones de latencia variable (por ejemplo, las instrucciones de carga). Nuestras evaluaciones se centran en un mecanismo convencional de recuperación para errores de predicción de latencia y en una nueva propuesta. También se evalúan los mecanismos propuestos en el ámbito de predicción de direcciones. Se concluye con que éstos mecanismos representan una alternativa rentable a los mecanismos de recuperación convencionales utilizados para tratar los errores de predicción de direcciones.

    Mitigating the effect of the large latency of load instructions is one of challenges of micro-processor designers. This thesis analyses one of the alternatives for tackling this problem: address prediction and speculative execution.Several authors have noticed that the effective addresses computed by the load instructions are quite predictable. First of all, we study why this predictability appears; our study tries to detect the high-level language structures that are compiled into predictable load instructions. We also analyse the conventional address predictors in order to determine which address predictors are most appropriate for the typical applications.Our study continues by proposing address predictors that use their storage structures more efficiently. Address predictors track history information of the load instructions; however, the requirements of the predictable instructions are different from the requirements of the unpredictable instructions. We then propose an organization of the prediction tables considering the existence of both kinds of instructions. We also show that there is a certain degree of redundancy in the prediction tables of the address predictors. We propose organizing the prediction tables in order to reduce this redundancy. These proposals allow us to reduce the area cost of the address predictors without impacting their performance.After that, we evaluate the impact of address prediction on processor performance. Our evaluations assume that address prediction is used to start speculatively some memory accesses and to execute speculatively their dependent instructions. On a correct prediction, all the speculative work is considered as correct; on a misprediction, the speculative work must be discarded. Our study is focused on several aspects such as the interaction of address prediction and branch prediction, the implementation of verification mechanisms, the recovery mechanism on address mispredictions, and the influence of several processor parameters (the issue-queue size, the cache latency and the issue width) on the performance impact of address prediction. Finally, we evaluate several recovery mechanisms for latency mispredictions. Latency prediction is a speculative technique used by the schedulers of some superscalar processors to deal with variable-latency instructions (for instance, load instructions). Our evaluations are focused on a conventional recovery mechanism for latency mispredictions and a new proposal. We also evaluate the proposed recovery mechanism in the scope of address prediction; we conclude that it represents a cost-effective alternative to the conventional recovery mechanisms used for address mispredictions.

  • Recovery mechanism for latency misprediction

     Morancho Llena, Enrique; Olive Duran, Angel; Llaberia Griño, Jose M.
    Date: 2001-11
    Report

     Share Reference managers Reference managers Open in new window

  • A Cost-Effective Implementation of Multilevel Tiling

     Jimenez Castells, Marta; Llaberia Griño, Jose M.; Fernandez Jimenez, Agustin
    Date: 2001-02
    Report

     Share Reference managers Reference managers Open in new window