Graphic summary
  • Show / hide key
  • Information


Scientific and technological production
  •  

1 to 50 of 153 results
  • Aprendizaje activo basado en problemas

     Alvarez Martinez, Carlos; Fernandez Jimenez, Agustin; Llosa Espuny, Jose Francisco; Sanchez Carracedo, Fermin
    Jornadas de Enseñanza Universitaria de la Informática
    Presentation's date: 2013-07
    Presentation of work at congresses

    Read the abstract Read the abstract  Share Reference managers Reference managers Open in new window

    Durante años, los autores del presente trabajo hemos practicado diversos métodos para fomentar el aprendizaje activo de los estudiantes a partir de la resolución de problemas, tanto en clase como fuera de ella. Los últimos cuatro cursos hemos utilizado en clase de problemas de la asignatura una metodología que consiste en encargar a los estudiantes cada semana que resuelvan un pequeño conjunto de problemas que trabajarán en clase la semana siguiente. En clase, los juntamos en equipos de tres o cuatro personas, que discuten sus respectivas soluciones y entregan una solución de consenso al final de la clase. Esta solución se les devuelve corregida en la siguiente clase. Los resultados recopilados durante estos cuatro cursos prueban que asistir y participar activamente en clase ayuda mucho en el aprendizaje, y que trabajar y pensar los problemas antes de ir, ayuda aún más, ya que permite aprovechar mejor las clases. En estos cuatro años, el 78% de los estudiantes que realizaron al menos el 90% de los problemas aprobaron la asignatura por controles, sin necesidad de realizar el examen final, mientras que el 64% de los estudiantes que realizaron menos del 50% de los problemas no consiguieron superar la asignatura.

  • Aprendizaje activo basado en problemas

     Alvarez Martinez, Carlos; Fernandez Jimenez, Agustin; Llosa Espuny, Jose Francisco; Sanchez Carracedo, Fermin
    ReVisión
    Date of publication: 2013-09
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Durante años, los autores del presente trabajo hemos practicado diversos métodos para fomentar el aprendizaje activo de los estudiantes a partir de la resolución de problemas, tanto en clase como fuera de ella. Los últimos cuatro cursos hemos utilizado en la clase de problemas de la asignatura una metodología que consiste en encargar a los estudiantes cada semana que resuelvan un pequeño conjunto de problemas que trabajarán en clase la semana siguiente. En clase, los juntamos en equipos de tres o cuatro personas, que discuten sus respectivas soluciones y entregan una solución de consenso al final de la clase. Esta solución se les devuelve corregida en la siguiente clase. Los resultados recopilados durante estos cuatro cursos prueban que asistir y participar activamente en clase ayuda mucho en el aprendizaje, y que trabajar y pensar los problemas antes de ir ayuda aún más, ya que permite aprovechar mejor las clases. En estos cuatro años, el 78 % de los estudiantes que realizaron al menos el 90 % de los problemas aprobaron la asignatura por controles, sin necesidad de realizar el examen final, mientras que el 64 % de los estudiantes que realizaron menos del 50 % de los problemas no consiguieron superar la asignatura.

  • Improving multithreading performance for clustered VLIW architectures.  Open access

     Gupta, Manoj
    Defense's date: 2013-06-14
    Department of Computer Architecture, Universitat Politècnica de Catalunya
    Theses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Very Long Instruction Word (VLIW) processors are very popular in embedded and mobile computing domain. Use of VLIW processors range from Digital Signal Processors (DSPs) found in a plethora of communication and multimedia devices to Graphics Processing Units (GPUs) used in gaming and high performance computing devices. The advantage of VLIWs is their low complexity and low power design which enable high performance at a low cost. Scalability of VLIWs is limited by the scalability of register file ports. It is not viable to have a VLIW processor with a single large register file because of area and power consumption implications of the register file. Clustered VLIW solve the register file scalability issue by partitioning the register file into multiple clusters and a set of functional units that are attached to register file of that cluster. Using a clustered approach, higher issue width can be achieved while keeping the cost of register file within reasonable limits. Several commercial VLIW processors have been designed using the clustered VLIW model. VLIW processors can be used to run a larger set of applications. Many of these applications have a good Lnstruction Level Parallelism (ILP) which can be efficiently utilized. However, several applications, specially the ones that are control code dominated do not exibit good ILP and the processor is underutilized. Cache misses is another major source of resource underutiliztion. Multithreading is a popular technique to improve processor utilization. Interleaved MultiThreading (IMT) hides cache miss latencies by scheduling a different thread each cycle but cannot hide unused instructions slots. Simultaneous MultiThread (SMT) can also remove ILP under-utilization by issuing multiple threads to fill the empty instruction slots. However, SMT has a higher implementation cost than IMT. The thesis presents Cluster-level Simultaneous MultiThreading (CSMT) that supports a limited form of SMT where VLIW instructions from different threads are merged at a cluster-level granularity. This lowers the hardware implementation cost to a level comparable to the cheap IMT technique. The more complex SMT combines VLIW instructions at the individual operation-level granularity which is quite expensive especially in for a mobile solution. We refer to SMT at operation-level as OpSMT to reduce ambiguity. While previous studies restricted OpSMT on a VLIW to 2 threads, CSMT has a better scalability and upto 8 threads can be supported at a reasonable cost. The thesis proposes several other techniques to further improve CSMT performance. In particular, Cluster renaming remaps the clusters used by instructions of different threads to reduce resource conflicts. Cluster renaming is quite effective in reducing the issue-slots under-utilization and significantly improves CSMT performance.The thesis also proposes: a hybrid between IMT and CSMT which increases the number of supported threads, heterogeneous instruction merging where some instructions are combined using SMT and CSMT rest, and finally, split-issue, a technique that allows to launch partially an instruction making it easier to be combined with others.

  • Code Optimizations for Narrow Bitwidth Architectures  Open access

     Bhagat, Indu
    Defense's date: 2012-02-23
    Department of Computer Architecture, Universitat Politècnica de Catalunya
    Theses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This thesis takes a HW/SW collaborative approach to tackle the problem of computational inefficiency in a holistic manner. The hardware is redesigned by restraining the datapath to merely 16-bit datawidth (integer datapath only) to provide an extremely simple, low-cost, low-complexity execution core which is best at executing the most common case efficiently. This redesign, referred to as the Narrow Bitwidth Architecture, is unique in that although the datapath is squeezed to 16-bits, it continues to offer the advantage of higher memory addressability like the contemporary wider datapath architectures. Its interface to the outside (software) world is termed as the Narrow ISA. The software is responsible for efficiently mapping the current stack of 64-bit applications onto the 16-bit hardware. However, this HW/SW approach introduces a non-negligible penalty both in dynamic code-size and performance-impact even with a reasonably smart code-translator that maps the 64- bit applications on to the 16-bit processor. The goal of this thesis is to design a software layer that harnesses the power of compiler optimizations to assuage this negative performance penalty of the Narrow ISA. More specifically, this thesis focuses on compiler optimizations targeting the problem of how to compile a 64-bit program to a 16-bit datapath machine from the perspective of Minimum Required Computations (MRC). Given a program, the notion of MRC aims to infer how much computation is really required to generate the same (correct) output as the original program. Approaching perfect MRC is an intrinsically ambitious goal and it requires oracle predictions of program behavior. Towards this end, the thesis proposes three heuristic-based optimizations to closely infer the MRC. The perspective of MRC unfolds into a definition of productiveness - if a computation does not alter the storage location, it is non-productive and hence, not necessary to be performed. In this research, the definition of productiveness has been applied to different granularities of the data-flow as well as control-flow of the programs. Three profile-based, code optimization techniques have been proposed : 1. Global Productiveness Propagation (GPP) which applies the concept of productiveness at the granularity of a function. 2. Local Productiveness Pruning (LPP) applies the same concept but at a much finer granularity of a single instruction. 3. Minimal Branch Computation (MBC) is an profile-based, code-reordering optimization technique which applies the principles of MRC for conditional branches. The primary aim of all these techniques is to reduce the dynamic code footprint of the Narrow ISA. The first two optimizations (GPP and LPP) perform the task of speculatively pruning the non-productive (useless) computations using profiles. Further, these two optimization techniques perform backward traversal of the optimization regions to embed checks into the nonspeculative slices, hence, making them self-sufficient to detect mis-speculation dynamically. The MBC optimization is a use case of a broader concept of a lazy computation model. The idea behind MBC is to reorder the backslices containing narrow computations such that the minimal necessary computations to generate the same (correct) output are performed in the most-frequent case; the rest of the computations are performed only when necessary. With the proposed optimizations, it can be concluded that there do exist ways to smartly compile a 64-bit application to a 16- bit ISA such that the overheads are considerably reduced.

    Esta tesis deriva su motivación en la inherente ineficiencia computacional de los procesadores actuales: a pesar de que muchas aplicaciones contemporáneas tienen unos requisitos de ancho de bits estrechos (aplicaciones de enteros, de red y multimedia), el hardware acaba utilizando el camino de datos completo, utilizando más recursos de los necesarios y consumiendo más energía. Esta tesis utiliza una aproximación HW/SW para atacar, de forma íntegra, el problema de la ineficiencia computacional. El hardware se ha rediseñado para restringir el ancho de bits del camino de datos a sólo 16 bits (únicamente el de enteros) y ofrecer así un núcleo de ejecución simple, de bajo consumo y baja complejidad, el cual está diseñado para ejecutar de forma eficiente el caso común. El rediseño, llamado en esta tesis Arquitectura de Ancho de Bits Estrecho (narrow bitwidth en inglés), es único en el sentido que aunque el camino de datos se ha estrechado a 16 bits, el sistema continúa ofreciendo las ventajas de direccionar grandes cantidades de memoria tal como procesadores con caminos de datos más anchos (64 bits actualmente). Su interface con el mundo exterior se denomina ISA estrecho. En nuestra propuesta el software es responsable de mapear eficientemente la actual pila software de las aplicaciones de 64 bits en el hardware de 16 bits. Sin embargo, esta aproximación HW/SW introduce penalizaciones no despreciables tanto en el tamaño del código dinámico como en el rendimiento, incluso con un traductor de código inteligente que mapea las aplicaciones de 64 bits en el procesador de 16 bits. El objetivo de esta tesis es el de diseñar una capa software que aproveche la capacidad de las optimizaciones para reducir el efecto negativo en el rendimiento del ISA estrecho. Concretamente, esta tesis se centra en optimizaciones que tratan el problema de como compilar programas de 64 bits para una máquina de 16 bits desde la perspectiva de las Mínimas Computaciones Requeridas (MRC en inglés). Dado un programa, la noción de MRC intenta deducir la cantidad de cómputo que realmente se necesita para generar la misma (correcta) salida que el programa original. Aproximarse al MRC perfecto es una meta intrínsecamente ambiciosa y que requiere predicciones perfectas de comportamiento del programa. Con este fin, la tesis propone tres heurísticas basadas en optimizaciones que tratan de inferir el MRC. La utilización de MRC se desarrolla en la definición de productividad: si un cálculo no altera el dato que ya había almacenado, entonces no es productivo y por lo tanto, no es necesario llevarlo a cabo. Se han propuesto tres optimizaciones del código basadas en profile: 1. Propagación Global de la Productividad (GPP en inglés) aplica el concepto de productividad a la granularidad de función. 2. Poda Local de Productividad (LPP en inglés) aplica el mismo concepto pero a una granularidad mucho más fina, la de una única instrucción. 3. Computación Mínima del Salto (MBC en inglés) es una técnica de reordenación de código que aplica los principios de MRC a los saltos condicionales. El objetivo principal de todas esta técnicas es el de reducir el tamaño dinámico del código estrecho. Las primeras dos optimizaciones (GPP y LPP) realizan la tarea de podar especulativamente las computaciones no productivas (innecesarias) utilizando profiles. Además, estas dos optimizaciones realizan un recorrido hacia atrás de las regiones a optimizar para añadir chequeos en el código no especulativo, haciendo de esta forma la técnica autosuficiente para detectar, dinámicamente, los casos de fallo en la especulación. La idea de la optimización MBC es reordenar las instrucciones que generan el salto condicional tal que las mínimas computaciones que general la misma (correcta) salida se ejecuten en la mayoría de los casos; el resto de las computaciones se ejecutarán sólo cuando sea necesario.

  • RUNNING STREAM-LIKE PROGRAMS ON HETEROGENEOUS MULTI-CORE SYSTEMS  Open access

     Carpenter, Paul
    Defense's date: 2011-10-24
    Department of Computer Architecture, Universitat Politècnica de Catalunya
    Theses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    All major semiconductor companies are now shipping multi-cores. Phones, PCs, laptops, and mobile internet devices will all require software that can make effective use of these cores. Writing high-performance parallel software is difficult, time-consuming and error prone, increasing both time-to-market and cost. Software outlives hardware; it typically takes longer to develop new software than hardware, and legacy software tends to survive for a long time, during which the number of cores per system will increase. Development and maintenance productivity will be improved if parallelism and technical details are managed by the machine, while the programmer reasons about the application as a whole. Parallel software should be written using domain-specific high-level languages or extensions. These languages reveal implicit parallelism, which would be obscured by a sequential language such as C. When memory allocation and program control are managed by the compiler, the program's structure and data layout can be safely and reliably modified by high-level compiler transformations. One important application domain contains so-called stream programs, which are structured as independent kernels interacting only through one-way channels, called streams. Stream programming is not applicable to all programs, but it arises naturally in audio and video encode and decode, 3D graphics, and digital signal processing. This representation enables high-level transformations, including kernel unrolling and kernel fusion. This thesis develops new compiler and run-time techniques for stream programming. The first part of the thesis is concerned with a statically scheduled stream compiler. It introduces a new static partitioning algorithm, which determines which kernels should be fused, in order to balance the loads on the processors and interconnects. A good partitioning algorithm is crucial if the compiler is to produce efficient code. The algorithm also takes account of downstream compiler passes---specifically software pipelining and buffer allocation---and it models the compiler's ability to fuse kernels. The latter is important because the compiler may not be able to fuse arbitrary collections of kernels. This thesis also introduces a static queue sizing algorithm. This algorithm is important when memory is distributed, especially when local stores are small. The algorithm takes account of latencies and variations in computation time, and is constrained by the sizes of the local memories. The second part of this thesis is concerned with dynamic scheduling of stream programs. First, it investigates the performance of known online, non-preemptive, non-clairvoyant dynamic schedulers. Second, it proposes two dynamic schedulers for stream programs. The first is specifically for one-dimensional stream programs. The second is more general: it does not need to be told the stream graph, but it has slightly larger overhead. This thesis also introduces some support tools related to stream programming. StarssCheck is a debugging tool, based on Valgrind, for the StarSs task-parallel programming language. It generates a warning whenever the program's behaviour contradicts a pragma annotation. Such behaviour could otherwise lead to exceptions or race conditions. StreamIt to OmpSs is a tool to convert a streaming program in the StreamIt language into a dynamically scheduled task based program using StarSs.

    Totes les empreses de semiconductors produeixen actualment multi-cores. Mòbils,PCs, portàtils, i dispositius mòbils d’Internet necessitaran programari quefaci servir eficientment aquests cores. Escriure programari paral·lel d’altrendiment és difícil, laboriós i propens a errors, incrementant tant el tempsde llançament al mercat com el cost. El programari té una vida més llarga queel maquinari; típicament pren més temps desenvolupar nou programi que noumaquinari, i el programari ja existent pot perdurar molt temps, durant el qualel nombre de cores dels sistemes incrementarà. La productivitat dedesenvolupament i manteniment millorarà si el paral·lelisme i els detallstècnics són gestionats per la màquina, mentre el programador raona sobre elconjunt de l’aplicació.El programari paral·lel hauria de ser escrit en llenguatges específics deldomini. Aquests llenguatges extrauen paral·lelisme implícit, el qual és ocultatper un llenguatge seqüencial com C. Quan l’assignació de memòria i lesestructures de control són gestionades pel compilador, l’estructura iorganització de dades del programi poden ser modificades de manera segura ifiable per les transformacions d’alt nivell del compilador.Un dels dominis de l’aplicació importants és el que consta dels programes destream; aquest programes són estructurats com a nuclis independents queinteractuen només a través de canals d’un sol sentit, anomenats streams. Laprogramació de streams no és aplicable a tots els programes, però sorgeix deforma natural en la codificació i descodificació d’àudio i vídeo, gràfics 3D, iprocessament de senyals digitals. Aquesta representació permet transformacionsd’alt nivell, fins i tot descomposició i fusió de nucli.Aquesta tesi desenvolupa noves tècniques de compilació i sistemes en tempsd’execució per a programació de streams. La primera part d’aquesta tesi esfocalitza amb un compilador de streams de planificació estàtica. Presenta unnou algorisme de partició estàtica, que determina quins nuclis han de serfusionats, per tal d’equilibrar la càrrega en els processadors i en lesinterconnexions. Un bon algorisme de particionat és fonamental per tal de queel compilador produeixi codi eficient. L’algorisme també té en compte elspassos de compilació subseqüents---específicament software pipelining il’arranjament de buffers---i modela la capacitat del compilador per fusionarnuclis. Aquesta tesi també presenta un algorisme estàtic de redimensionament de cues.Aquest algorisme és important quan la memòria és distribuïda, especialment quanles memòries locals són petites. L’algorisme té en compte latències ivariacions en els temps de càlcul, i considera el límit imposat per la mida deles memòries locals.La segona part d’aquesta tesi es centralitza en la planificació dinàmica deprogrames de streams. En primer lloc, investiga el rendiment dels planificadorsdinàmics online, non-preemptive i non-clairvoyant. En segon lloc, proposa dosplanificadors dinàmics per programes de stream. El primer és específicament pera programes de streams unidimensionals. El segon és més general: no necessitael graf de streams, però els overheads són una mica més grans.Aquesta tesi també presenta un conjunt d’eines de suport relacionades amb laprogramació de streams. StarssCheck és una eina de depuració, que és basa enValgrind, per StarSs, un llenguatge de programació paral·lela basat en tasques.Aquesta eina genera un avís cada vegada que el comportament del programa estàen contradicció amb una anotació pragma. Aquest comportament d’una altra manerapodria causar excepcions o situacions de competició. StreamIt to OmpSs és unaeina per convertir un programa de streams codificat en el llenguatge StreamIt aun programa de tasques en StarSs planificat de forma dinàmica.

  • Evaluación formativa con feedback rápido usando mandos interactivos

     Alvarez Martinez, Carlos; Llosa Espuny, Jose Francisco
    Jornadas de Enseñanza Universitaria de la Informática
    Presentation's date: 2010-07
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Access to the full text
    A low cost split-issue technique to improve performance of SMT clustered VLIW processors  Open access

     Gupta, Manoj; Sanchez Carracedo, Fermin; Llosa Espuny, Jose Francisco
    IEEE International Parallel and Distributed Processing Symposium
    Presentation's date: 2010-04
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Abstract—Very Long Instruction Word (VLIW) processors are a popular choice in embedded domain due to their hardware simplicity, low cost and low power consumption. Simultaneous MultiThreading (SMT) is a popular technique for improving processor performance. To maintain execution semantics, a VLIW instruction needs to be issued in entirety, which restricts the opportunities in SMT. Split-issue at operation-level is a technique that allows issuing a VLIW instruction in parts without breaking execution semantics. Issuing an instruction in parts allows non-conflicting part of an instruction to be issued along with other instructions and improves SMT performance. However, implementing splitissue at operation-level requires complex structures and is not practical for an embedded VLIW processor. This paper proposes cluster-level split-issue, which implements split-issue at a cluster-level boundary for clustered VLIW processors. Cluster-level split-issue has a very low hardware overhead in contrast to split-issue at operation-level. Experimental results show that cluster-level split-issue, despite being more restrictive than split-issue at operation-level, achieves similar performance and improves SMT performance significantly.

  • Mandos interactivos en EC2

     Alvarez Martinez, Carlos; Llosa Espuny, Jose Francisco
    Jornades de Docència del Departament d'Arquitectura de Computadors
    Presentation's date: 2010-02
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Uso de mandos interactivos para la evaluación formativa con realimentación rápida

     Alvarez Martinez, Carlos; Llosa Espuny, Jose Francisco
    ReVisión
    Date of publication: 2010-10
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • CSMT: Simultaneous Multithreading for Clustered VLIW Processors

     Gupta, Manoj; Sanchez Carracedo, Fermin; Llosa Espuny, Jose Francisco
    IEEE transactions on computers
    Date of publication: 2010-03
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Instruction scheduling for clustered processors based on graph techniques

     Aleta Ortega, Alexandre
    Defense's date: 2009-10-15
    Department of Computer Architecture, Universitat Politècnica de Catalunya
    Theses

     Share Reference managers Reference managers Open in new window

  • Access to the full text
    Hybrid multithreading for VLIW processors  Open access

     Gupta, Manoj; Sanchez Carracedo, Fermin; Llosa Espuny, Jose Francisco
    International Conference on Compilers, Architecture, and Synthesis for Embedded Systems
    Presentation's date: 2009-10
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Several multithreading techniques have been proposed to reduce resource underutilization in Very Long Instruction Word (VLIW) processors. Simultaneous MultiThreading (SMT) is a popular technique that improves processor performance by issuing multiple instructions from di erent threads. In VLIW processors, SMT requires extra hardware to merge instructions from di erent threads. The complexity of this hardware increases substantially with the number of threads. On the other hand, techniques like Interleaved MultiThreading (IMT) do not need any merging hardware, and support a larger number of threads at reasonable cost. In this paper, we propose Hybrid MultiThreading (HMT), a technique that at each cycle merges instructions from only a subset of threads. HMT supports a reasonable number of threads with a low merging hardware cost. For instance, it is possible to support 8 hardware threads with a merging hardware for only 2 threads. The experimental results show that using HMT improves the multithreading performance significantly. Further, supporting 8 hardware threads with HMT but using a 4-thread merging hardware achieves a performance similar to merging 8 threads simultaneously with a significantly lower merging hardware cost.

    © ACM, 2009. This is the author's version of the work: http://doi.acm.org/10.1145/1629395.1629403

    Postprint (author’s final draft)

  • ARQUITECTURA DE COMPUTADORS D'ALTRES PRESTACIONS (CAP)

     Jimenez Castells, Marta; Pericas Gleim, Miquel; Navarro Guerrero, Juan Jose; Llaberia Griño, Jose M.; Llosa Espuny, Jose Francisco; Villavieja Prados, Carlos; Alvarez Martinez, Carlos; Jimenez Gonzalez, Daniel; Ramirez Bellido, Alejandro; Morancho Llena, Enrique; Fernandez Jimenez, Agustin; Pajuelo González, Manuel Alejandro; Olive Duran, Angel; Sanchez Carracedo, Fermin; Moreto Planas, Miquel; Verdu Mula, Javier; Abella Ferrer, Jaume; Valero Cortes, Mateo
    Participation in a competitive project

     Share

  • Thread merging schemes for multithreaded clustered VLIW processors

     Gupta, Manoj; Sanchez Carracedo, Fermin; Llosa Espuny, Jose Francisco
    International Conference on Parallel Processing
    Presentation's date: 2009-09
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Estrategias para el diseño de laboratorios orientados al aprendizaje continuo

     Llosa Espuny, Jose Francisco
    Jornadas de Enseñanza Universitaria de la Informática
    Presentation's date: 2008-07-09
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Estrategias para el diseño de laboratorios orientados al aprendizaje continuo

     Fernandez Jimenez, Agustin; Llosa Espuny, Jose Francisco; Sanchez Carracedo, Fermin
    Jornadas de Enseñanza Universitaria de la Informática
    Presentation's date: 2008-07
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Power-efficient VLIW design using clustering and widening

     Pericas Gleim, Miquel; Ayguade Parra, Eduard; Zalamea, Javier; Valero Cortes, Mateo; Llosa Espuny, Jose Francisco
    International Journal of Embedded Systems
    Date of publication: 2008-10
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • La enseñanza de Estructura de Computadores en el EEES

     Sanchez Carracedo, Fermin; Fernandez Jimenez, Agustin; Llosa Espuny, Jose Francisco
    Jornadas de Enseñanza Universitaria de la Informática
    Presentation's date: 2007-07
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Merge Logic for Clustered Multithreaded VLIW Processors

     Manoj, Gupta; Sanchez Carracedo, Fermin; Llosa Espuny, Jose Francisco
    10th Euromicro Conference on Digital Systems Design: Architectures, Methods and Tools (DSD 2007)
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Performance evaluation of CSMT for VLIW processors

     Gupta, Manoj; Sanchez Carracedo, Fermin; Llosa Espuny, Jose Francisco
    International Summer School on Advanced Computer Architecture and Compilation for Embedded Systems
    Presentation's date: 2007-07
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Instruction Merge Logic for Clustered Multithreaded VLIW Processors

     Manoj, Gupta; Llosa Espuny, Jose Francisco; Sanchez Carracedo, Fermin
    Date: 2007-06
    Report

     Share Reference managers Reference managers Open in new window

  • Performance Evaluation of Cluster-Level Simultaneous Multithreading for VLIW Processors

     Manoj, Gupta; Llosa Espuny, Jose Francisco; Sanchez Carracedo, Fermin
    Date: 2007-06
    Report

     Share Reference managers Reference managers Open in new window

  • Merge Logic for Clustered Multithreaded VLIW Processors

     Manoj, Gupta; Llosa Espuny, Jose Francisco; Sanchez Carracedo, Fermin
    Date: 2007-06
    Report

     Share Reference managers Reference managers Open in new window

  • Cluster-Level Simultaneous Multithreading for VLIW Processors

     Manoj, Gupta; Sanchez Carracedo, Fermin; Llosa Espuny, Jose Francisco
    25th IEEE International Conference on Computer Design, ICCD 2007
    Presentation's date: 2007-10
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Kilo Instruction Processors

     Cristal Kestelman, Adrian
    Defense's date: 2006-04-18
    Department of Computer Architecture, Universitat Politècnica de Catalunya
    Theses

     Share Reference managers Reference managers Open in new window

  • Cluster Level Multithreading for VLIW Processors

     Llosa Espuny, Jose Francisco
    Second International Summer School on Advanced Computer Architecture and Compilation for Embedded Systems (ACACES 2006)
    Presentation's date: 2006-07-26
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Cluster-Level Simultaneous Multithreading for VLIW Processors

     Manoj, Gupta; Llosa Espuny, Jose Francisco; Sanchez Carracedo, Fermin
    Date: 2006-07
    Report

     Share Reference managers Reference managers Open in new window

  • Cluster Level Multithreading for VLIW Processors

     Manoj, Gupta; Llosa Espuny, Jose Francisco; Sanchez Carracedo, Fermin
    Second International Summer School on Advanced Computer Architecture and Compilation for Embedded Systems (ACACES 2006)
    Presentation's date: 2006-07
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Near-Optimal Padding for Removing Conflict Misses

     Vera Rivera, Francisco Javier; Llosa Espuny, Jose Francisco; Gonzalez Colas, Antonio Maria
    Lecture notes in computer science
    Date of publication: 2006-06
    Journal article

     Share Reference managers Reference managers Open in new window

  • An Accurate Cost Model for Guiding Data Locality Transformations

     Xavier, Vera; Abella Ferrer, Jaume; Llosa Espuny, Jose Francisco; Gonzalez Colas, Antonio Maria
    ACM transactions on programming languages and systems
    Date of publication: 2005-09
    Journal article

     Share Reference managers Reference managers Open in new window

  • Software and Hardware Techniques to Optimize Register File Utilization in VLIW

     Zalamea Leon, Francisco Javier; Llosa Espuny, Jose Francisco; Ayguade Parra, Eduard; Valero Cortes, Mateo
    International journal of parallel programming
    Date of publication: 2004-12
    Journal article

     Share Reference managers Reference managers Open in new window

  • Register constrained Modulo Scheduling

     Zalamea Leon, Francisco Javier; Llosa Espuny, Jose Francisco; Ayguade Parra, Eduard; Valero Cortes, Mateo
    IEEE transactions on parallel and distributed systems
    Date of publication: 2004-05
    Journal article

     Share Reference managers Reference managers Open in new window

  • Performance and Power Evaulation of Clustered VLIW Processors with Wide Functional Units

     Pericas Gleim, Miquel; Ayguade Parra, Eduard; Zalamea Leon, Francisco Javier; Llosa Espuny, Jose Francisco; Valero Cortes, Mateo
    Lecture notes in computer science
    Date of publication: 2004-11
    Journal article

     Share Reference managers Reference managers Open in new window

  • Out-of Order Commint Processors

     Llosa Espuny, Jose Francisco
    10 th International Symposium on Highn Performance Computer Architecture HPCA-10
    Presentation's date: 2004-02-15
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Out-of Order Commint Processors

     Cristal Kestelman, Adrian; Daniel, Ortega; JOSEP, LLOSA; Llosa Espuny, Jose Francisco; Valero Cortes, Mateo
    10 th International Symposium on Highn Performance Computer Architecture HPCA-10
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Hipeac - European Network of Excellence on High-Performance Embedded Architecture and Compilation

     Valero Cortes, Mateo; Navarro Guerrero, Juan Jose; Gil Gómez, Maria Luisa; Ramirez Bellido, Alejandro; Llosa Espuny, Jose Francisco; Morancho Llena, Enrique; Canal Corretger, Ramon; Moreto Planas, Miquel
    Participation in a competitive project

     Share

  • High Performance Computing

     Llosa Espuny, Jose Francisco
    11 th International Conference
    Presentation's date: 2004-12-01
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • A Fast and Accurate Framework to Analyze and Optimize Cache Memory Behavior

     Xavier, Vera; Bermudo, Nerina; Llosa Espuny, Jose Francisco; Gonzalez Colas, Antonio Maria
    ACM transactions on programming languages and systems
    Date of publication: 2004-03
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Mirs: Modulo Scheduling with Integrated Register Spilling

     Zalamea Leon, Francisco Javier; Llosa Espuny, Jose Francisco; Ayguade Parra, Eduard; Valero Cortes, Mateo
    Lecture notes in computer science
    Date of publication: 2003-01
    Journal article

     Share Reference managers Reference managers Open in new window

  • A case for Resource-conscious Out-of-order Processors

     Cristal Kestelman, Adrian; Martínez, José F; JOSEP, LLOSA; Llosa Espuny, Jose Francisco; Valero Cortes, Mateo
    Computer architecture letters
    Date of publication: 2003-10
    Journal article

     Share Reference managers Reference managers Open in new window

  • Kilo-instruction Processors

     Cristal Kestelman, Adrian; Ortega Fernandez, Daniel; JOSEP, LLOSA; Llosa Espuny, Jose Francisco; Valero Cortes, Mateo
    Lecture notes in computer science
    Date of publication: 2003-10
    Journal article

     Share Reference managers Reference managers Open in new window

  • Power -Performance Trade-Offs in Wide and Clustered VLIW Cores for Numerical Codes

     Pericas Gleim, Miquel; Ayguade Parra, Eduard; Zalamea, Javier; Llosa Espuny, Jose Francisco; Valero Cortes, Mateo
    Lecture notes in computer science
    Date of publication: 2003-10
    Journal article

     Share Reference managers Reference managers Open in new window

  • Perfomance and Power Evaluation of Clustered VLIW Processors with Wide Functional Units

     Pericas Gleim, Miquel; Ayguade Parra, Eduard; Zalamea, Javier; JOSEP, LLOSA; Llosa Espuny, Jose Francisco; Valero Cortes, Mateo
    Third International Workshop on Systems, Architectures, MOdeling, and Simulation (SAMOS III)
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Kilo-instruction Processors

     Cristal Kestelman, Adrian; Ortega Fernandez, Daniel; Llosa Espuny, Jose Francisco; Valero Cortes, Mateo
    5th International Symposium on High Performance Computing (ISHPC 2003)
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Performance and Power Evaluation of Clustered WLIW Processors with Wide Functional Units

     Pericas Gleim, Miquel; Ayguade Parra, Eduard; Zalamea Leon, Francisco Javier; JOSEP, LLOSA; Llosa Espuny, Jose Francisco; Valero Cortes, Mateo
    Third International Workshop on Systems, Architectures, MOdeling, and Simulation (SAMOS III)
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Optimizing Program Locality Through CMEs and GAs

     Xavier, Vera; Abella Ferrer, Jaume; JOSEP, LLOSA; Llosa Espuny, Jose Francisco; Gonzalez Colas, Antonio Maria
    12th International Conference on Parallel Architectures and Compilation Techniques (PACT'03)
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Power-performance trade-offs in wide clustered VLIW cores for numerical codes

     Pericas Gleim, Miquel; Ayguade Parra, Eduard; Zalamea, J; Llosa Espuny, Jose Francisco; Valero Cortes, Mateo
    5th International Symposium on High Performance Computing (ISHPC 2003)
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Power-performance trade-offs in wide and clustered VLIW Cores for numerical codes

     Llosa Espuny, Jose Francisco
    5th International Symposium on High Performance Computing (ISHPC 2003)
    Presentation's date: 2003-10-20
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Perfomance and Power Evaluation of Clustered VLIW Processors with Wide Functional Units

     Llosa Espuny, Jose Francisco
    Third International Workshop on Systems, Architectures, MOdeling, and Simulation (SAMOS III)
    Presentation's date: 2003-07-21
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Out-of-Order Commit Processors

     Cristal Kestelman, Adrián; Martínez, José F; Ortega Fernandez, Daniel; Llosa Espuny, Jose Francisco; Valero Cortes, Mateo
    Date: 2003-07
    Report

     Share Reference managers Reference managers Open in new window