Graphic summary
  • Show / hide key
  • Information


Scientific and technological production
  •  

1 to 50 of 103 results
  • A SIMD-efficient 14 instruction shader program for high-throughput microtriangle rasterization

     Roca Monfort, Jordi; Moya Del Barrio, Victor; González Rodríguez, Carlos; Escandell, Vicente; Murciego, Albert; Fernandez Jimenez, Agustin; Espasa Sans, Roger
    The visual computer
    Vol. 26, num. 6-8, p. 707-719
    DOI: 10.1007/s00371-010-0492-4
    Date of publication: 2010-06
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • A SIMD-efficient 14 instruction shader program for high-throughput microtriangle rasterization

     Roca Monfort, Jordi; Moya Del Barrio, Victor; Gonzalez, Carlos; Escandell, Vicente; Murciego, Albert; Fernandez Jimenez, Agustin; Espasa Sans, Roger
    Computers Graphic International Conference
    p. 707-719
    Presentation's date: 2010-06
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    This paper shows that breaking the barrier of 1 triangle/clock rasterization rate for microtriangles in modern GPU architectures in an efficient way is possible. The fixed throughput of the special purpose culling and triangle setup stages of the classic pipeline limits the GPU scalability to rasterize many triangles in parallel when these cover very few pixels. In contrast, the shader core counts and increasing GFLOPs in modern GPUs clearly suggests parallelizing this computation entirely across multiple shader threads, making use of the powerful wide-ALU instructions. In this paper, we present a very efficient SIMD-like rasterization code targeted at very small triangles that scales very well with the number of shader cores and has higher performance than traditional edge equation based algorithms. We have extended the ATTILA GPU shader ISA (del Barrioet al. in IEEE International Symposium on Performance Analysis of Systems and Software, pp. 231–241, 2006) with two fixed point instructions to meet the rasterization precision requirement. This paper also introduces a novel subpixel Bounding Box size optimization that adjusts the bounds much more finely, which is critical for small triangles, and doubles the 2x2- pixel stamp test efficiency. The proposed shader rasterization program can run on top of the original pixel shader program in such a way that selected fragments are rasterized, attribute interpolated and pixel shaded in the same pass. Our results show that our technique yields better performance than a classic rasterizer at 8 or more shader cores, with speedups as high as 4x for 16 shader cores.

  • L'assignatura Arquitectures de Computadors Actuals

     Espasa Sans, Roger
    Jornades de Docència del Departament d'Arquitectura de Computadors. 10 Anys de Jornades
    p. 1-10
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • El doctorat del DAC

     Gonzalez Colas, Antonio Maria; Ayguade Parra, Eduard; Espasa Sans, Roger; Garcia Vidal, Jorge; Navarro Guerrero, Juan Jose
    Jornades de Docència del Departament d'Arquitectura de Computadors. 10 Anys de Jornades
    p. 1-10
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • ATTILA: A Cycle-Level Execution-Driven Simulator For Modern GPU Architectures

     Moya Del Barrio, Victor; Gonzalez Rodriguez, Carlos; Jordi, Roca; Fernandez Jimenez, Agustin; Espasa Sans, Roger
    2006 IEEE International Symposium on Performance Analysis of Systems And Software (ISPASS'06)
    p. 231-241
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Workload Characterization of 3D Games

     Jordi, Roca; Victor, Moya; Gonzalez Rodriguez, Carlos; Chema, Solis; Fernandez Jimenez, Agustin; Espasa Sans, Roger
    2006 IEEE International Symposium on Workload Characterization, IISWC-2006
    p. 1
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Workload Characterization of 3D Games

     Espasa Sans, Roger
    2006 IEEE International Symposium on Workload Characterization, IISWC-2006
    Presentation's date: 2006-10-25
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • A Single (Unified) Shader GPU Microarchitecture for Embedded Systems

     Victor, Moya; Gonzalez, Carlos; Jordi, Roca; Fernandez Jimenez, Agustin; Espasa Sans, Roger
    Lecture notes in computer science
    Vol. 1, num. 1, p. 286-301
    Date of publication: 2005-11
    Journal article

     Share Reference managers Reference managers Open in new window

  • A Single (Unified) Shader GPU Microarchitecture for Embedded Systems

     Victor, Moya; Gonzalez, Carlos; Jordi, Roca; Fernandez Jimenez, Agustin; Espasa Sans, Roger
    2005 International Conference on High Performance Embedded Architectures & Compilers (HiPEAC'2005)
    p. 286-301
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Shader Performance Analysis on a Modern GPU Architecture

     Victor, Moya; Gonzalez, Carlos; Jordi, Roca; Fernandez Jimenez, Agustin; Espasa Sans, Roger
    IEEE/ACM International Symposium on Microarchitecture
    p. 355-364
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Binary Redundancy Elimination  Open access

     Fernandez Gomez, Manel
    Department of Computer Architecture, Universitat Politècnica de Catalunya
    Theses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Dos de las limitaciones de rendimiento más importantes en los procesadores de hoy en día provienen de las operaciones de memoria y de las dependencias de control. Para resolver estos problemas, las memorias cache y los predictores de salto son dos alternativas hardware bien conocidas que explotan, entre otros factores, el reuso temporal de memoria y la correlación de saltos. En otras palabras, estas estructuras tratan de explotar la redundancia dinámica existente en los programas. Esta redundancia proviene parcialmente de la forma en que los programadores escriben código, pero también de limitaciones existentes en el modelo de compilación tradicional, lo cual introduce instrucciones de memoria y de salto innecesarias. Pensamos que los compiladores deberían ser muy agresivos optimizando programas, y por tanto ser capaces de eliminar una parte importante de esta redundancia. Por otro lado, las optimizaciones aplicadas en tiempo de enlace o directamente al programa ejecutable final han recibido una atención creciente en los últimos años, debido a limitaciones existentes en el modelo de compilación tradicional. Incluso aplicando sofisticados análisis y transformaciones interprocedurales, un compilador tradicional no es capaz de optimizar un programa como una entidad completa. Un problema similar aparece aplicando técnicas de compilación dirigidas por profiling: grandes proyectos se ven forzados a recompilar todos y cada uno de sus módulos para aprovechar dicha información. Por el contrario, seria más conveniente construir la aplicación completa, instrumentarla para obtener información de profiling y optimizar entonces el binario final sin recompilar ni un solo fichero fuente.En esta tesis presentamos nuevas técnicas de compilación dirigidas por profiling para eliminar la redundancia encontrada en programas ejecutables a nivel binario (esto es, redundancia binaria), incluso aunque estos programas hayan sido compilados agresivamente con un novísimo compilador comercial. Nuestras técnicas de eliminación de redundancia están diseñadas para eliminar operaciones de memoria y de salto redundantes, que son las más importantes para mitigar los problemas de rendimiento que hemos mencionado. Estas propuestas están basadas en técnicas de eliminación de redundancia parcial sensibles al camino de ejecución. Los resultados muestran que aplicando nuestras optimizaciones, somos capaces de alcanzar una reducción del 14% en el tiempo de ejecución de nuestro conjunto de programas.En este trabajo también revisamos el problemas del análisis de alias en programas ejecutables, identificando el por qué la desambiguación de memoria es uno de los puntos débiles en la modificación de código objeto. Proponemos varios análisis para ser aplicados en el contexto de optimizadores binarios. Primero un análisis de alias estricto para descubrir dependencias de memoria sensibles al camino de ejecución, el cual es usado en nuestras optimizaciones para la eliminación de redundancias de memoria. Seguidamente, dos análisis especulativos de posibles alias para detección de independencias de memoria. Estos análisis están basados en introducir información especulativa en tiempo de análisis, lo que incrementa la precisión en partes importantes de código manteniendo el análisis eficiente. Los resultados muestran que nuestras propuestas son altamente útiles para incrementar la desambiguación de memoria de código binario, lo que se traduce en oportunidades para aplicar optimizaciones. Todos nuestros algoritmos, tanto de análisis como de optimización, han sido implementados en un optimizador binario, enfatizando los problemas más relevantes en la aplicaciones de nuestros algoritmos en código ejecutable, sin la ayuda de gran parte de la información de alto nivel presente en compiladores tradicionales.

    Two of the most important performance limiters in today's processor families comes from solving the memory wall and handling control dependencies. In order to address these issues, cache memories and branch predictors are well-known hardware proposals that take advantage of, among other things, exploiting both temporal memory reuse and branch correlation. In other words, they try to exploit the dynamic redundancy existing in programs. This redundancy comes partly from the way that programmers write source code, but also from limitations in the compilation model of traditional compilers, which introduces unnecessary memory and conditional branch instructions. We believe that today's optimizing compilers should be very aggressive in optimizing programs, and then they should be expected to optimize a significant part of this redundancy away.On the other hand, optimizations performed at link-time or directly applied to final program executables have received increased attention in recent years, due to limitations in the traditional compilation model. First, even though performing sophisticated interprocedural analyses and transformations, traditional compilers do not have the opportunity to optimize the program as a whole. A similar problem arises when applying profile-directe compilation techniques: large projects will be forced to re-build every source file to take advantage of profile information. By contrast, it would be more convenient to build the full application, instrument it to obtain profile data and then re-optimize the final binary without recompiling a single source file.In this thesis we present new profile-guided compiler optimizations for eliminating the redundancy encountered on executable programs at binary level (i.e.: binary redundancy), even though these programs have been compiled with full optimizations using a state-ofthe- art commercial compiler. In particular, our Binary Redundancy Elimination (BRE) techniques are targeted at eliminating both redundant memory operations and redundant conditional branches, which are the most important ones for addressing the performance issues that we mentioned above in today's microprocessors. These new proposals are mainly based on Partial Redundancy Elimination (PRE) techniques for eliminating partial redundancies in a path-sensitive fashion. Our results show that, by applying our optimizations, we are able to achieve a 14% execution time reduction in our benchmark suite.In this work we also review the problem of alias analysis at the executable program level, identifying why memory disambiguation is one of the weak points of object code modification. We then propose several alias analyses to be applied in the context of linktime or executable code optimizers. First, we present a must-alias analysis to recognize memory dependencies in a path- sensitive fashion, which is used in our optimization for eliminating redundant memory operations. Next, we propose two speculative may-alias data-flow algorithms to recognize memory independencies. These may-alias analyses are based on introducing unsafe speculation at analysis time, which increases alias precision on important portions of code while keeping the analysis reasonably cost-efficient. Our results show that our analyses prove to be very useful for increasing memory disambiguation accuracy of binary code, which turns out into opportunities for applying optimizations.All our algorithms, both for the analyses and the optimizations, have been implemented within a binary optimizer, which overcomes most of the existing limitations of traditional source-code compilers. Therefore, our work also points out the most relevant issues of applying our algorithms at the executable code level, since most of the high-level information available in traditional compilers is lost.

  • Link-Time Path-Senitive Memory Redundancy Elimination

     Fernandez, Manel; Espasa Sans, Roger
    10 th International Symposium on Highn Performance Computer Architecture HPCA-10
    p. 300-309
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Link-Time Path-sensitive Memory Redundancy Elimination

     Fernandez Gomez, Manel; Espasa Sans, Roger
    10 th International Symposium on Highn Performance Computer Architecture HPCA-10
    p. 300-309
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Link-Time Optimization Techniques for Eliminating Conditinal Branch Redundancies

     Fernandez Gomez, Manel; Espasa Sans, Roger
    8th Annual Workshop on Interaction between Compilers and Computer Architecture (INTERACT-8) in conjunction with the IEEE 10th International Symposium on High-Performance Computer Architecture (HPCA-10)
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Link-Time Path-Senitive Memory Redundancy Elimination

     Espasa Sans, Roger
    10 th International Symposium on Highn Performance Computer Architecture HPCA-10
    Presentation's date: 2004-02-15
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • A Cost-Effective Architecture for Vectorizable Numerical and Multimedia Applications

     Francisca, Quintana; Corbal San Adrian, Jesus; Espasa Sans, Roger; Valero Cortes, Mateo
    Theory of computing systems
    Vol. 36, num. 5, p. 575-593
    Date of publication: 2003-09
    Journal article

     Share Reference managers Reference managers Open in new window

  • A Combined Algorithm for Memory Redundancy Elimination on Executable Code

     Fernandez Gomez, Manel; Espasa Sans, Roger
    Date: 2003-06
    Report

     Share Reference managers Reference managers Open in new window

  • Asim: A Performance Model Framework

     Joel, Emer; Pritpal, Ahuja; Borch, Eric; Chi-Keung, Luk; Srilatha, Manne; Shubhendu, S Mukherjee; Harish, Patil; Wallace, Steven; Binkert, Nathan; Espasa Sans, Roger; Juan Hormigo, Antonio
    Computer
    Vol. 35, num. 2, p. 68-76
    Date of publication: 2002-02
    Journal article

     Share Reference managers Reference managers Open in new window

  • Speculative Alias Analysis for Executable Code

     Fernandez, Manel; Espasa Sans, Roger
    Date: 2002-07
    Report

     Share Reference managers Reference managers Open in new window

  • Tarantula: A Vector Extension to the Alpha Architecture

     Espasa Sans, Roger; Federico, Ardanaz; Joel, Emer; Stephen, Felix; Julio, Gago; Gramunt, Roger; Hernández, Isaac; Juan Hormigo, Antonio; Geoffrey, Lowney; Matthew, Mattina; Seznec, André
    The 29th Annual International Symposium on Computer Architecture (ISCA-2002)
    p. 281-292
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Speculative Alias Analysis for Executable Code

     Fernandez, Manuel; Espasa Sans, Roger
    11th International Conference on Parallel Architectues and Compilation Techniques (PACT'02)
    p. 222-231
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Three Dimensional Memory Vectorization for High Bandwidth Media Memory Systems

     Corbal San Adrian, Jesus; Espasa Sans, Roger; Valero Cortes, Mateo
    IEEE/ACM International Symposium on Microarchitecture
    p. 149-160
    Presentation's date: 2002-11-18
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Tarantula: Next-generation Alpha with Vectors

     Espasa Sans, Roger
    XIII Jornadas de Paralelismo
    Presentation's date: 2002-09-09
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • N-dimensional Vector Architectures for Multimedia Applications

     Corbal San Adrian, Jesus
    Department of Computer Architecture, Universitat Politècnica de Catalunya
    Theses

     Share Reference managers Reference managers Open in new window

  • Three-Dimensional Vector Prefetches for Media Applications

     Corbal San Adrian, Jesus; Espasa Sans, Roger; Valero Cortes, Mateo
    Date: 2001-11
    Report

     Share Reference managers Reference managers Open in new window

  • Load Redundancy Elimination on Executable Code

     Fernandez Gomez, Manel; Espasa Sans, Roger; Saumya, Debray
    Date: 2001-02
    Report

     Share Reference managers Reference managers Open in new window

  • On the Efficiency of Reductions in u-SIMD Media Extensions

     Corbal San Adrian, Jesus; Espasa Sans, Roger; Valero Cortes, Mateo
    10th International Conference on Parallel Architectures and Compilation Techniques (PACT'01)
    p. 83-94
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • A Cost Effective Architecture for Vectorizable Numerical and Multimedia Applications

     Francisca, Quintana; Corbal San Adrian, Jesus; Espasa Sans, Roger; Valero Cortes, Mateo
    Thirteenth ACM Symposium on Parallel Algorithms and Architectures (SPAA 2001)
    p. 1
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • DLP + TLP Processors for the Next Generation of Media Workloads

     Corbal San Adrian, Jesus; Espasa Sans, Roger; Valero Cortes, Mateo
    Seventh International Symposium on High Performance Computer Architecture (HPCA-7)
    p. 219-228
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Instruction-Level Parallelism and Computer Architecture

     Ayguade Parra, Eduard; Dahlgren, Fredrik; Christine, Eisenbeis; Espasa Sans, Roger; Guang, R Gao; Muller, Henk; Sakellariou, Rizos; Seznec, André
    Euro-Par
    p. 385
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Load Redundancy Elimination on Executable Code

     Fernández, Manuel; Espasa Sans, Roger; Saumya, Debray
    Euro-Par
    p. 221-229
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Performance Analysis of a Feasible Superscalar+ Vector Architecture

     Quintana, F.; Espasa Sans, Roger; Valero Cortes, Mateo
    Jornadas de Paralelismo
    p. 6-10
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • A Comparison between Superescalar and Vector Processors

     Francisca, Quintana; Espasa Sans, Roger; Valero Cortes, Mateo
    Lecture notes in computer science
    Vol. 1573, p. 548-560
    Date of publication: 1999-01
    Journal article

     Share Reference managers Reference managers Open in new window

  • A Simulation Study of Decoupled Vector Architectures

     Espasa Sans, Roger; Valero Cortes, Mateo
    Journal of supercomputing
    Vol. 14, num. 2, p. 129-152
    Date of publication: 1999-10
    Journal article

     Share Reference managers Reference managers Open in new window

  • Registers Size Influence on Vector Architectures

     Villa, L; Espasa Sans, Roger; Valero Cortes, Mateo
    Lecture notes in computer science
    Vol. 1573, p. 439-451
    Date of publication: 1999-10
    Journal article

     Share Reference managers Reference managers Open in new window

  • Adding a vector unit to a superscalar processor

     Francisca, Quintana; Corbal San Adrian, Jesus; Espasa Sans, Roger; Valero Cortes, Mateo
    Date: 1999-06
    Report

     Share Reference managers Reference managers Open in new window

  • MOM Instruction Set Architecture: Reference Manual

     Corbal San Adrian, Jesus; Espasa Sans, Roger; Valero Cortes, Mateo
    Date: 1999-10
    Report

     Share Reference managers Reference managers Open in new window

  • Exploiting a new level of DLP with Matrix multimedia extensions

     Corbal San Adrian, Jesus; Espasa Sans, Roger; Valero Cortes, Mateo
    Date: 1999-10
    Report

     Share Reference managers Reference managers Open in new window

  • Command Vector Memory Systems: High Performance at Low Cost

     Corbal San Adrian, Jesus; Espasa Sans, Roger; Valero Cortes, Mateo
    Date: 1999-01
    Report

     Share Reference managers Reference managers Open in new window

  • MOM: a Matrix SIMD Instruction Set Architecture for Multimedia Applications

     Corbal San Adrian, Jesus; Espasa Sans, Roger; Valero Cortes, Mateo
    1999 ACM/IEEE Conference on Supercomputing (SC'99)
    p. 1-5
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Dixie: A Retargetable Binary Instrumentation Tool

     Fernandez Gomez, Manel; Espasa Sans, Roger
    1st Workshop on Binary Translation
    p. 1-9
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Adding a Vector Unit to a Superscalar Processor

     Francisca, Quintana; Corbal San Adrian, Jesus; Espasa Sans, Roger; Valero Cortes, Mateo
    ACM International Conference on Supercomputing (ISC'99)
    p. 1-10
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Exploiting a New Level of DLP in Multimedia Applications

     Corbal San Adrian, Jesus; Valero Cortes, Mateo; Espasa Sans, Roger
    IEEE/ACM International Symposium on Microarchitecture
    p. 72-79
    Presentation's date: 1999-11-16
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • An Evaluation of Different DLP Alternatives for the Embedded Media Domain

     Salamí San Juan, Esther; Corbal San Adrian, Jesus; Valero Cortes, Mateo; Espasa Sans, Roger
    1st Workshop on Media Processors and DSPs (MP-DSP-1) in conjunction with the 32nd Annual ACM/IEEE International Symposium on Microarchitecture (MICRO-32)
    p. 100-109
    Presentation's date: 1999-11-15
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Dixie: A Retargetable Binary Instrumentation Tool

     Fernandez Gomez, Manel; Espasa Sans, Roger
    X Jornadas de Paralelismo
    p. 143-148
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Evaluación de Arquitecturas Vectoriales Avanzadas con Registros Cortos

     Villa Vargas, Luis Alfonso
    Department of Computer Architecture, Universitat Politècnica de Catalunya
    Theses

     Share Reference managers Reference managers Open in new window

  • Dixie: A Retargetable Binary Instrumentation Tool

     Fernandez Gomez, Manel; Ramirez Bellido, Alejandro; Cernuda, S; Espasa Sans, Roger
    Date: 1998-12
    Report

     Share Reference managers Reference managers Open in new window