Acosta Ojeda, Carmelo Alexis
Total activity: 18
Department
Department of Computer Architecture
E-mail
carmelo.alexis.acostaestudiant.upc.edu
Contact details
UPC directory Open in new window

Graphic summary
  • Show / hide key
  • Information


Scientific and technological production
  •  

1 to 18 of 18 results
  • Heterogeneity-awareness in multithreaded multicore processors  Open access

     Acosta Ojeda, Carmelo Alexis
    Defense's date: 2009-07-07
    Department of Computer Architecture, Universitat Politècnica de Catalunya
    Theses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    During the last decades, Computer Architecture has experienced a great series of revolutionary changes. The increasing transistor count on a single chip has led to some of the main milestones in the field, from the release of the first Superscalar (1965) to the state-of-the-art Multithreaded Multicore Architectures, like the Intel Core i7 (2009).Moore's Law has continued for almost half of a century and is not expected to stop for at least another decade, and perhaps much longer. Moore observed a trend in the process technology advances. So, the number of transistors that can be placed inexpensively on an integrated circuit has increased exponentially, doubling approximately every two years. Nevertheless, having more available transistors can not be always directly translated into having more performance.The complexity of state-of-the-art software has reached heights unthinkable in prior ages, both in terms of the amount of computation and the complexity involved. If we deeply analyze this complexity in software we would realize that software is comprised of smaller execution processes that, although maintaining certain spatial/temporal locality, imply an inherently heterogeneous behavior. That is, during execution time the hardware executes very different portions of software, with huge differences in terms of behavior and hardware requirements. This heterogeneity in the behaviour of the software is not specific of the latest videogame, but it is inherent to software programming itself, since the very beginning of Algorithmics.In this PhD dissertation we deeply analyze the inherent heterogeneity present in software behavior. We identify the main issues and sources of this heterogeneity, that hamper most of the state-of-the-art processor designs from obtaining their maximum potential. Hence, the heterogeneity in software turns most of the current processors, commonly called general-purpose processors, into overdesigned. That is, they have much more hardware resources than really needed to execute the software running on them. This fact would not represent a main problem if we were not concerned on the additional power consumption involved in software computation.The final goal of this PhD dissertation consists in assigning each portion of software exactly the amount of hardware resources really needed to fully exploit its maximal potential; without consuming more energy than the strictly needed. That is, obtaining complexity-effective executions using the inherent heterogeneity in software behavior as steering indicator. Thus, we start deeply analyzing the heterogenous behaviour of the software run on top of general-purpose processors and then matching it on top of a heterogeneously distributed hardware, which explicitly exploit heterogeneous hardware requirements. Only by being heterogeneity-aware in software, and appropriately matching this software heterogeneity on top of hardware heterogeneity, may we effectively obtain better processor designs.The PhD dissertation is comprised of four main contributions that cover both multithreaded single-core (hdSMT) and multicore (TCA Algorithm, hTCA Framework and MFLUSH) scenarios, deeply explained in their corresponding chapters in the PhD dissertation memory. Overall, these contributions cover a significant range of the Heterogeneity-Aware Processors' design space. Within this design space, we have focused on the state-of-the-art trend in processor design: Multithreaded Multicore (CMP+SMT) Processors.We make special emphasis on the MPsim simulation tool, specifically designed and developed for this PhD dissertation. This tool has already gone beyond this PhD dissertation, becoming a reference tool by an important group of researchers spread over the Computer Architecture Department (DAC) at the Polytechnic University of Catalonia (UPC), the Barcelona Supercomputing Center (BSC) and the University of Las Palmas de Gran Canaria (ULPGC).

  • Access to the full text
    Thread to core assignment in SMT on-chip multiprocessors  Open access

     Acosta Ojeda, Carmelo Alexis; Cazorla Almeida, Francisco Javier; Ramirez Bellido, Alejandro; Valero Cortes, Mateo
    Computer architecture news
    Date of publication: 2009
    Journal article

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    State-of-the-art high-performance processors like the IBM POWER5 and Intel i7 show a trend in industry towards on-chip Multiprocessors (CMP) involving Simultaneous Multithreading (SMT) in each core. In these processors, the way in which applications are assigned to cores plays a key role in the performance of each application and the overall system performance. In this paper we show that the system throughput highly depends on the Thread to Core Assignment (TCA), regardless the SMT Instruction Fetch (IFetch) Policy implemented in the cores. Our results indicate that a good TCA can improve the results of any underlying IFetch Policy, yielding speedups of up to 28%. Given the relevance of TCA, we propose an algorithm to manage it in CMP+SMT processors. The proposed throughput-oriented TCA Algorithm takes into account the workload characteristics and the underlying SMT IFetch Policy. Our results show that the TCA Algorithm obtains thread-to-core assignments 3% close to the optimal assignation for each case, yielding system throughput improvements up to 21%.

  • Access to the full text
    Thread to core assignment in SMT on-chip multiprocessors  Open access

     Acosta Ojeda, Carmelo Alexis; Cazorla, Francisco J.; Ramirez Bellido, Alejandro; Valero Cortes, Mateo
    International Symposium on Computer Architecture and High Performance Computing
    Presentation's date: 2009-10-30
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    State-of-the-art high-performance processors like the IBM POWER5 and Intel i7 show a trend in industry towards on-chip Multiprocessors (CMP) involving Simultaneous Multithreading (SMT) in each core. In these processors, the way in which applications are assigned to cores plays a key role in the performance of each application and the overall system performance. In this paper we show that the system throughput highly depends on the Thread to Core Assignment (TCA), regardless the SMT Instruction Fetch (IFetch) Policy implemented in the cores. Our results indicate that a good TCA can improve the results of any underlying IFetch Policy, yielding speedups of up to 28%. Given the relevance of TCA, we propose an algorithm to manage it in CMP+SMT processors. The proposed throughput-oriented TCA Algorithm takes into account the workload characteristics and the underlying SMT IFetch Policy. Our results show that the TCA Algorithm obtains thread-to-core assignments 3% close to the optimal assignation for each case, yielding system throughput improvements up to 21%.

  • Access to the full text
    MFLUSH: handling long-latency loads in SMT on-chip multiprocessors  Open access

     Acosta Ojeda, Carmelo Alexis; Cazorla, Francisco J.; Ramirez Bellido, Alejandro; Valero Cortes, Mateo
    International Conference on Parallel Processing
    Presentation's date: 2008-09-11
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Nowadays, there is a clear trend in industry towards employing the growing amount of transistors on chip in replicating execution cores (CMP), where each core is Simultaneous Multithreading (SMT). State-of-the-art high-performance processors like the IBM POWER5 and POWER6 corroborate this CMP+SMT trend. Within each SMT core any of the well-known SMT mechanisms may be applied to face SMT related challenges. Among them, probably the most important issue in an SMT execution pipeline concerns the In-struction Fetch (IFetch) Policy. The FLUSH IFetch Policy represents a choice for throughput-oriented scenarios. It handles L2 cache misses in order to avoid hardware resource monopolization by any given execution Thread; involving an additional energy cost via instruction refetching. However, the new constraints imposed by the CMP+SMT scenario may a ect wellknown SMT mechanisms, like the FLUSH mechanism.

  • Core to Memory Interconnection Implications for Forthcoming on -Chip Multiprocessors

     Acosta Ojeda, Carmelo Alexis; Cazorla Almeida, Francisco Javier; Ramirez Bellido, Alejandro; Valero Cortes, Mateo
    IEEE/ACM International Symposium on Microarchitecture
    Presentation's date: 2007-12-01
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • A Complexity-Effective Simultaneous Multithreading Architecture

     Acosta Ojeda, Carmelo Alexis
    International Conference on Parallel Processing
    Presentation's date: 2005-06-14
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • hdSMT: an Heterogeneity-Aware Simultaneous Multithreading Architecture

     Acosta Ojeda, Carmelo Alexis; Falcon Samper, Ayose Jesus; Ramirez Bellido, Alejandro; Valero Cortes, Mateo
    Jornadas de Paralelismo
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • A Complexity-Effective Simultaneous Multithreading Architecture

     Acosta Ojeda, Carmelo Alexis; Falcon Samper, Ayose Jesus; Ramirez Bellido, Alejandro; Valero Cortes, Mateo
    International Conference on Parallel Processing
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Complexity-effectiveness in multithreading architectures

     Acosta Ojeda, Carmelo Alexis; Falcon Samper, Ayose Jesus; Ramirez Bellido, Alejandro; Valero Cortes, Mateo
    International Summer School on Advanced Computer Architecture and Compilation for Embedded Systems
    Presentation's date: 2005
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Introducing Kilo-Instruction Multiprocessor

     Galluzzi, Marco; Puente, V; Santana Jaria, Oliverio J.; Acosta Ojeda, Carmelo Alexis; Cristal Kestelman, Adrian; Valero Cortes, Mateo
    Jornadas de Paralelismo
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Introducing kilo-instruction multiprocessors

     Acosta Ojeda, Carmelo Alexis
    Jornadas de Paralelismo
    Presentation's date: 2004-09-15
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Heterogeneity-Aware architectures

     Acosta Ojeda, Carmelo Alexis
    Jornadas de Paralelismo
    Presentation's date: 2004-09-15
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Heterogeneity-Aware Architectures

     Acosta Ojeda, Carmelo Alexis; Falcon Samper, Ayose Jesus; Ramirez Bellido, Alejandro; Valero Cortes, Mateo
    Jornadas de Paralelismo
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • A First Glance at a Heterogeneity-Aware Simultaneous Multithreading Architecture

     Acosta Ojeda, Carmelo Alexis; Falcon Samper, Ayose Jesus; Ramirez Bellido, Alejandro; Valero Cortes, Mateo
    Date: 2004-06
    Report

     Share Reference managers Reference managers Open in new window

  • Dealing with Billions of Transistors

     Acosta Ojeda, Carmelo Alexis; Galluzzi, Marco; Vajapeyam, Sriram; Ramirez Bellido, Alejandro; Valero Cortes, Mateo
    XIV Jornadas de Paralelismo
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Dealing with Billions of Transistors

     Acosta Ojeda, Carmelo Alexis
    XIV Jornadas de Paralelismo
    Presentation's date: 2003-09-15
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • CDE: A Compiler-driven, Despedence-Centric, Eager-executing Architecture for the Billion Transistors Era

     Acosta Ojeda, Carmelo Alexis
    4th Workshop on Complexity-Effective Design (WCED'03) in conjunction with the 30th Annual International Symposium on Computer Architecture (ISCA-2003)
    Presentation's date: 2003-06-07
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • CDE: A Compiler-driven, Despedence-Centric, Eager-executing Architecture for the Billion Transistors Era

     Acosta Ojeda, Carmelo Alexis; Vajapeyam, Sriram; Ramirez Bellido, Alejandro; Valero Cortes, Mateo
    4th Workshop on Complexity-Effective Design (WCED'03) in conjunction with the 30th Annual International Symposium on Computer Architecture (ISCA-2003)
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window