Pajuelo González, Manuel Alejandro
Total activity: 70

Graphic summary
  • Show / hide key
  • Information


Scientific and technological production
  •  

1 to 50 of 70 results
  • Thread assignment of multithreaded network applications in multicore/multithreaded processors

     Radojkovic, Petar; Cakarevic, Vladimir; Verdu Mula, Javier; Pajuelo González, Manuel Alejandro; Cazorla Almeida, Francisco Javier; Nemirovsky, Mario; Valero Cortes, Mateo
    IEEE transactions on parallel and distributed systems
    Vol. 24, num. 12, p. 2513-2525
    DOI: 10.1109/TPDS.2012.311
    Date of publication: 2013-12
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    The introduction of multithreaded processors comprised of a large number of cores with many shared resources makes thread scheduling, and in particular optimal assignment of running threads to processor hardware contexts to become one of the most promising ways to improve the system performance. However, finding optimal thread assignments for workloads running in state-of-the-art multicore/multithreaded processors is an NP-complete problem. In this paper, we propose BlackBox scheduler, a systematic method for thread assignment of multithreaded network applications running on multicore/multithreaded processors. The method requires minimum information about the target processor architecture and no data about the hardware requirements of the applications under study. The proposed method is evaluated with an industrial case study for a set of multithreaded network applications running on the UltraSPARC T2 processor. In most of the experiments, the proposed thread assignment method detected the best actual thread assignment in the evaluation sample. The method improved the system performance from 5 to 48 percent with respect to load balancing algorithms used in state-of-the-art OSs, and up to 60 percent with respect to a naive thread assignment.

  • The problem of evaluating CPU-GPU systems with 3d visualization applications

     Verdu Mula, Javier; Pajuelo González, Manuel Alejandro; Valero Cortes, Mateo
    IEEE micro
    Vol. 32, num. 6, p. 17-27
    DOI: 10.1109/MM.2012.13
    Date of publication: 2012-12
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Complex, computationally demanding 3D visualization applications can be used as benchmarks to evaluate CPU-GPU systems. However, because those applications are time dependent, their execution is not deterministic. Thus, measurements can vary from one execution to another. This article proposes a methodology that enforces the starting times of frames so that applications behave deterministically.

  • Procedimiento, sistema y pieza de código ejecutable para controlar el uso de recursos de hardware de un sistema informático

     Pajuelo González, Manuel Alejandro; Verdu Mula, Javier
    Date of request: 2012-04-16
    Invention patent

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Procedimiento, sistema y pieza de código ejecutable para controlar el uso de recursos de hardware de un sistema informático.

    La invención se refiere a un procedimiento para controlar el uso de recursos de hardware de un sistema informático por parte de una aplicación que se ejecuta sobre un sistema operativo que comprende al menos una interfaz de programación de aplicaciones (API) y que se ejecuta sobre este sistema informático, mediante una pieza de código ejecutable adaptada para ser inyectada en un proceso perteneciente a la aplicación, comprendiendo el procedimiento interceptar la llamada del proceso al servicio de la api; actuar sobre una entidad software perteneciente al proceso en ejecución, a partir de la interceptación de la llamada del proceso al servicio de la API.

  • Procedimiento, sistema y pieza de código ejecutable para virtualizar un recurso de hardware asociado a un sistema informático

     Pajuelo González, Manuel Alejandro; Verdu Mula, Javier
    Date of request: 2012-04-16
    Invention patent

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Procedimiento, sistema y pieza de código ejecutable para virtualizar un recurso de hardware asociado a un sistema informático.

    Procedimiento para virtualizar recursos de hardware asociado a un sistema informático por parte de una pieza de código ejecutable adaptada para ser inyectada en un proceso perteneciente a una aplicación que se ejecuta sobre un sistema operativo que comprende al menos una API que se ejecuta sobre el sistema informático, que comprende interceptar una llamada del proceso a un servicio de una API relacionado con la gestión del flujo de datos producido entre el proceso y el recurso de hardware; gestionar el flujo de datos producido entre el proceso y el recurso de hardware por parte de la pieza de código, a partir de la interceptación de la llamada del proceso al servicio de la API relacionado con la gestión del flujo de datos producido entre el proceso y el recurso de hardware.

  • Concurs Wayra Barcelona 2012

     Arroyo, Ignacio; Pajuelo González, Manuel Alejandro; Verdu Mula, Javier
    Award or recognition

    View View Open in new window  Share

  • Optimal task assignment in multithreaded processors: a statistical approach

     Cakarevic, Vladimir; Radojkovic, Petar; Moreto Planas, Miquel; Verdu Mula, Javier; Pajuelo González, Manuel Alejandro; Cazorla, Francisco J.; Nemirovsky, Mario; Valero Cortes, Mateo
    International Conference on Architectural Support for Programming Languages and Operating Systems
    p. 235-248
    DOI: 10.1145/2150976.2151002
    Presentation's date: 2012
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    The introduction of massively multithreaded (MMT) processors, comprised of a large number of cores with many shared resources, has made task scheduling, in particular task to hardware thread assignment, one of the most promising ways to improve system performance. However, finding an optimal task assignment for a workload running on MMT processors is an NP-complete problem. Due to the fact that the performance of the best possible task assignment is unknown, the room for improvement of current task-assignment algorithms cannot be determined. This is a major problem for the industry because it could lead to: (1)~A waste of resources if excessive effort is devoted to improving a task assignment algorithm that already provides a performance that is close to the optimal one, or (2)~significant performance loss if insufficient effort is devoted to improving poorly-performing task assignment algorithms. In this paper, we present a method based on Extreme Value Theory that allows the prediction of the performance of the optimal task assignment in MMT processors. We further show that executing a sample of several hundred or several thousand random task assignments is enough to obtain, with very high confidence, an assignment with a performance that is close to the optimal one. We validate our method with an industrial case study for a set of multithreaded network applications running on an UltraSPARC~T2 processor.

  • Thread to strand binding of parallel network applications in massive multi-threaded systems

     Radojkovic, Petar; Cakarevic, Vladimir; Verdu Mula, Javier; Pajuelo González, Manuel Alejandro; Cazorla, Francisco J.; Nemirovsky, Mario; Valero Cortes, Mateo
    ACM SIGPLAN notices
    Vol. 45, num. 5, p. 191-201
    Date of publication: 2010-05
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    In processors with several levels of hardware resource sharing, like CMPs in which each core is an SMT, the scheduling process becomes more complex than in processors with a single level of resource sharing, such as pure-SMT or pure-CMP processors. Once the operating system selects the set of applications to simultaneously schedule on the processor (workload), each application/ thread must be assigned to one of the hardware contexts (strands). We call this last scheduling step the Thread to Strand Binding or TSB. In this paper, we show that the TSB impact on the performance of processors with several levels of shared resources is high. We measure a variation of up to 59% between different TSBs of real multithreaded network applications running on the UltraSPARC T2 processor which has three levels of resource sharing. In our view, this problem is going to be more acute in future multithreaded architectures comprising more cores, more contexts per core, and more levels of resource sharing. We propose a resource-sharing aware TSB algorithm (TSBSched) that significantly facilitates the problem of thread to strand binding for software-pipelined applications, representative ofmultithreaded network applications. Our systematic approach encapsulates both, the characteristics of multithreaded processors under the study and the structure of the software pipelined applications. Once calibrated for a given processor architecture, our proposal does not require hardware knowledge on the side of the programmer, nor extensive profiling of the application. We validate our algorithm on the UltraSPARC T2 processor running a set of real multithreaded network applications on which we report improvements of up to 46% compared to the current state-of-the-art dynamic schedulers.

  • Runahead threads  Open access

     Ramirez Garcia, Tanausu
    Department of Computer Architecture, Universitat Politècnica de Catalunya
    Theses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Los temas de investigación sobre multithreading han ganado mucho interés en la arquitectura de computadores con la aparición de procesadores multihilo y multinucleo. Los procesadores SMT (Simultaneous Multithreading) son uno de estos nuevos paradigmas, combinando la capacidad de emisión de múltiples instrucciones de los procesadores superscalares con la habilidad de explotar el paralelismo a nivel de hilos (TLP). Así, la principal característica de los procesadores SMT es ejecutar varios hilos al mismo tiempo para incrementar la utilización de las etapas del procesador mediante la compartición de recursos.Los recursos compartidos son el factor clave de los procesadores SMT, ya que esta característica conlleva tratar con importantes cuestiones pues los hilos también compiten por estos recursos en el núcleo del procesador. Si bien distintos grupos de aplicaciones se benefician de disponer de SMT, las diferentes propiedades de los hilos ejecutados pueden desbalancear la asignación de recursos entre los mismos, disminuyendo los beneficios de la ejecución multihilo. Por otro lado, el problema con la memoria está aún presente en los procesadores SMT. Estos procesadores alivian algunos de los problemas de latencia provocados por la lentitud de la memoria con respecto a la CPU. Sin embargo, hilos con grandes cargas de trabajo y con altas tasas de fallos en las caches son unas de las mayores dificultades de los procesadores SMT. Estos hilos intensivos en memoria tienden a crear importantes problemas por la contención de recursos. Por ejemplo, pueden llegar a bloquear recursos críticos debido a operaciones de larga latencia impidiendo no solo su ejecución, sino el progreso de la ejecución de los otros hilos y, por tanto, degradando el rendimiento general del sistema.El principal objetivo de esta tesis es aportar soluciones novedosas a estos problemas y que mejoren el rendimiento de los procesadores SMT. Para conseguirlo, proponemos los Runahead Threads (RaT) aplicando una ejecución especulativa basada en runahead. RaT es un mecanismo alternativo a las políticas previas de gestión de recursos las cuales usualmente restringían a los hilos intensivos en memoria para conseguir más productividad.La idea clave de RaT es transformar un hilo intensivo en memoria en un hilo ligero en el uso de recursos que progrese especulativamente. Así, cuando un hilo sufre de un acceso de larga latencia, RaT transforma dicho hilo en un hilo de runahead mientras dicho fallo está pendiente. Los principales beneficios de esta simple acción son varios. Mientras un hilo está en runahead, éste usa los diferentes recursos compartidos sin monopolizarlos o limitarlos con respecto a los otros hilos. Al mismo tiempo, esta ejecución especulativa realiza prebúsquedas a memoria que se solapan con el fallo principal, por tanto explotando el paralelismo a nivel de memoria y mejorando el rendimiento.RaT añade muy poco hardware extra y complejidad en los procesadores SMT con respecto a su implementación. A través de un mecanismo de checkpoint y lógica de control adicional, podemos dotar a los contextos hardware con la capacidad de ejecución en runahead. Por medio de RaT, contribuímos a aliviar simultaneamente dos problemas en el contexto de los procesadores SMT. Primero, RaT reduce el problema de los accesos de larga latencia en los SMT mediante el paralelismo a nivel de memoria (MLP). Un hilo prebusca datos en paralelo en vez de estar parado debido a un fallo de L2 mejorando su rendimiento individual. Segundo, RaT evita que los hilos bloqueen recursos bajo fallos de larga latencia. RaT asegura que el hilo intensivo en memoria recicle más rápido los recursos compartidos que usa debido a la naturaleza de la ejecución especulativa.La principal limitación de RaT es que los hilos especulativos pueden ejecutar instrucciones extras cuando no realizan prebúsqueda e innecesariamente consumir recursos de ejecución en el procesador SMT. Este inconveniente resulta en hilos de runahead ineficientes pues no contribuyen a la ganancia de rendimiento e incrementan el consumo de energía debido al número extra de instrucciones especulativas. Por consiguiente, en esta tesis también estudiamos diferentes soluciones dirigidas a solventar esta desventaja del mecanismo RaT. El resultado es un conjunto de soluciones complementarias para mejorar la eficiencia de RaT en términos de consumo de potencia y gasto energético.Por un lado, mejoramos la eficiencia de RaT aplicando ciertas técnicas basadas en el análisis semántico del código ejecutado por los hilos en runahead. Proponemos diferentes técnicas que analizan y controlan la utilidad de ciertos patrones de código durante la ejecución en runahead. Por medio de un análisis dinámico, los hilos en runahead supervisan la utilidad de ejecutar los bucles y subrutinas dependiendo de las oportunidades de prebúsqueda. Así, RaT decide cual de estas estructuras de programa ejecutar dependiendo de la información de utilidad obtenida, decidiendo entre parar o saltar el bucle o la subrutina para reducir el número de las instrucciones no útiles. Entre las técnicas propuestas, conseguimos reducir las instrucciones especulativas y la energía gastada mientras obtenemos rendimientos similares a la técnica RaT original.Por otro lado, también proponemos lo que denominamos hilos de runahead eficientes. Esta propuesta se basa en una técnica más fina que cubre todo el rango de ejecución en runahead, independientemente de las características del programa ejecutado. La idea principal es averiguar "cuando" y "durante cuanto" un hilo en runahead debe ser ejecutado prediciendo lo que denominamos distancia útil de runahead. Los resultados muestran que la mejor de estas propuestas basadas en la predicción de la distancia de runahead reducen significativamente el número de instrucciones extras así como también el consumo de potencia. Asimismo, conseguimos mantener los beneficios de rendimiento de los hilos en runahead, mejorando de esta forma la eficiencia energética de los procesadores SMT usando el mecanismo RaT.La evolución de RaT desarrollada durante toda esta investigación nos proporciona no sólo una propuesta orientada a un mayor rendimiento sino también una forma eficiente de usar los recursos compartidos en los procesadores SMT en presencia de operaciones de memoria de larga latencia.Dado que los diseños SMT en el futuro estarán orientados a optimizar una combinación de rendimiento individual en las aplicaciones, la productividad y el consumo de energía, los mecanismos basados en RaT aquí propuestos son interesantes opciones que proporcionan un mejor balance de rendimiento y energía que las propuestas previas en esta área.

    Research on multithreading topics has gained a lot of interest in the computer architecture community due to new commercial multithreaded and multicore processors. Simultaneous Multithreading (SMT) is one of these relatively new paradigms, which combines the multiple instruction issue features of superscalar processors with the ability of multithreaded architectures to exploit thread level parallelism (TLP). The main feature of SMT processors is to execute multiple threads that increase the utilization of the pipeline by sharing many more resources than in other types of processors.Shared resources are the key of simultaneous multithreading, what makes the technique worthwhile.This feature also entails important challenges to deal with because threads also compete for resources in the processor core. On the one hand, although certain types and mixes of applications truly benefit from SMT, the different features of threads can unbalance the resource allocation among threads, diminishing the benefit of multithreaded execution. On the other hand, the memory wall problem is still present in these processors. SMT processors alleviate some of the latency problems arisen by main memory's slowness relative to the CPUs. Nevertheless, threads with high cache miss rates that use large working sets are one of the major pitfalls of SMT processors. These memory intensive threads tend to use processor and memory resources poorly creating the highest resource contention problems. Memory intensive threads can clog up shared resources due to long latency memory operations without making progress on a SMT processor, thereby hindering overall system performance.The main goal of this thesis is to alleviate these shortcomings on SMT scenarios. To accomplish this, the key contribution of this thesis is the application of the paradigm of Runahead execution in the design of multithreaded processors by Runahead Threads (RaT). RaT shows to be a promising alternative to prior SMT resource management mechanisms which usually restrict memory bound threads in order to get higher throughputs.The idea of RaT is to transform a memory intensive thread into a light-consumer resource thread by allowing that thread to progress speculatively. Therefore, as soon as a thread undergoes a long latency load, RaT transforms the thread to a runahead thread while it has that long latency miss outstanding. The main benefits of this simple action performed by RaT are twofold. While being a runahead thread, this thread uses the different shared resources without monopolizing or limiting the available resources for other threads. At the same time, this fast speculative thread issues prefetches that overlap other memory accesses with the main miss, thereby exploiting the memory level parallelism.Regarding implementation issues, RaT adds very little extra hardware cost and complexity to an existing SMT processor. Through a simple checkpoint mechanism and little additional control logic, we can equip the hardware contexts with the runahead thread capability. Therefore, by means of runahead threads, we contribute to alleviate simultaneously the two shortcomings in the context of SMT processor improving the performance. First, RaT alleviates the long latency load problem on SMT processors by exposing memory level parallelism (MLP). A thread prefetches data in parallel (if MLP is available) improving its individual performance rather than be stalled on an L2 miss. Second, RaT prevents threads from clogging resources on long latency loads. RaT ensures that the L2-missing thread recycles faster the shared resources it uses by the nature of runahead speculative execution. This avoids memory intensive threads clogging the important processor resources up.The main limitation of RaT though is that runahead threads can execute useless instructions and unnecessarily consume execution resources on the SMT processor when there is no prefetching to be exploited. This drawback results in inefficient runahead threads which do not contribute to the performance gain and increase dynamic energy consumption due to the number of extra speculatively executed instructions. Therefore, we also propose different solutions aimed at this major disadvantage of the Runahead Threads mechanism. The result of the research on this line is a set of complementary solutions to enhance RaT in terms of power consumption and energy efficiency.On the one hand, code semantic-aware Runahead threads improve the efficiency of RaT using coarse-grain code semantic analysis at runtime. We provide different techniques that analyze the usefulness of certain code patterns during runahead thread execution. The code patterns selected to perform that analysis are loops and subroutines. By means of the proposed coarse grain analysis, runahead threads oversee the usefulness of loops or subroutines depending on the prefetches opportunities during their executions. Thus, runahead threads decide which of these particular program structures execute depending on the obtained usefulness information, deciding either stall or skip the loop or subroutine executions to reduce the number of useless runahead instructions. Some of the proposed techniques reduce the speculative instruction and wasted energy while achieving similar performance to RaT.On the other hand, the efficient Runahead thread proposal is another contribution focused on improving RaT efficiency. This approach is based on a generic technique which covers all runahead thread executions, independently of the executed program characteristics as code semantic-aware runahead threads are. The key idea behind this new scheme is to find out --when' and --how long' a thread should be executed in runahead mode by predicting the useful runahead distance. The results show that the best of these approaches based on the runahead distance prediction significantly reduces the number of extra speculative instructions executed in runahead threads, as well as the power consumption. Likewise, it maintains the performance benefits of the runahead threads, thereby improving the energy-efficiency of SMT processors using the RaT mechanism.The evolution of Runahead Threads developed in this research provides not only a high performance but also an efficient way of using shared resources in SMT processors in the presence of long latency memory operations. As designers of future SMT systems will be increasingly required to optimize for a combination of single thread performance, total throughput, and energy consumption, RaT-based mechanisms are promising options that provide better performance and energy balance than previous proposals in the field.

  • On the Problem of Evaluating the Performance of Multiprogrammed Workloads

     Cazorla, Francisco J.; Pajuelo González, Manuel Alejandro; Santana Jaria, Oliverio J.; Fernandez, Enrique; Valero Cortes, Mateo
    IEEE transactions on computers
    Vol. 59, num. 12, p. 1722-1728
    DOI: 10.1109/TC.2010.62
    Date of publication: 2010-03-18
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Thread to strand binding of parallel network applications in massive multi-threaded systems

     Radojkovic, Petar; Cakarevic, Vladimir; Verdu Mula, Javier; Pajuelo González, Manuel Alejandro; Cazorla Almeida, Francisco Javier; Nemirovsky, Mario; Valero Cortes, Mateo
    ACM International Conference on Supercomputing
    p. 191-201
    Presentation's date: 2010-01
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    In processors with several levels of hardware resource sharing, like CMPs in which each core is an SMT, the scheduling process becomes more complex than in processors with a single level of resource sharing, such as pure-SMT or pure-CMP processors. Once the operating system selects the set of applications to simultaneously schedule on the processor (workload), each application/thread must be assigned to one of the hardware contexts (strands). We call this last scheduling step the Thread to Strand Binding or TSB. In this paper, we show that the TSB impact on the performance of processors with several levels of shared resources is high. We measure a variation of up to 59% between different TSBs of real multithreaded network applications running on the UltraSPARC T2 processor which has three levels of resource sharing. In our view, this problem is going to be more acute in future multithreaded architectures comprising more cores, more contexts per core, and more levels of resource sharing. We propose a resource-sharing aware TSB algorithm (TSBSched) that significantly facilitates the problem of thread to strand binding for software-pipelined applications, representative of multithreaded network applications. Our systematic approach encapsulates both, the characteristics of multithreaded processors under the study and the structure of the software pipelined applications. Once calibrated for a given processor architecture, our proposal does not require hardware knowledge on the side of the programmer, nor extensive profiling of the application. We validate our algorithm on the UltraSPARC T2 processor running a set of real multithreaded network applications on which we report improvements of up to 46% compared to the current state-of-the-art dynamic schedulers.

  • Characterizing the resource-sharing levels of the UltraSparc T2 processor

     Cakarevic, Vladimir; Radojkovic, Petar; Verdu Mula, Javier; Pajuelo González, Manuel Alejandro; Cazorla, Francisco J.; Nemirovsky, Mario; Valero Cortes, Mateo
    IEEE/ACM International Symposium on Microarchitecture
    p. 1-12
    DOI: /doi.acm.org/10.1145/1669112.1669173
    Presentation's date: 2009-12
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Thread level parallelism (TLP) has become a popular trend to improve processor performance, overcoming the limitations of extracting instruction level parallelism. Each TLP paradigm, such as Simultaneous Multithreading or Chip-Multiprocessors, provides di erent bene ts, which has motivated processor vendors to combine several TLP paradigms in each chip design. Even if most of these combined-TLP designs are homogeneous, they present di erent levels of hardware resource sharing, which introduces complexities on the operating system scheduling and load balancing. Commonly, processor designs provide two levels of resource sharing: Inter-core in which only the highest levels of the cache hierarchy are shared, and Intracore in which most of the hardware resources of the core are shared . Recently, Sun Microsystems has released the UltraSPARC T2, a processor with three levels of hardware resource sharing: InterCore, IntraCore, and IntraPipe. In this work, we provide the rst characterization of a three-level resource sharing processor, the UltraSPARC T2, and we show how multi-level resource sharing a ects the operating system design. We further identify the most critical hardware resources in the T2 and the characteristics of applications that are not sensitive to resource sharing. Finally, we present a case study in which we run a real multithreaded network application, showing that a resource sharing aware scheduler can improve the system throughput up to 55%.

  • ARQUITECTURA DE COMPUTADORS D'ALTRES PRESTACIONS (CAP)

     Jimenez Castells, Marta; Pericas Gleim, Miquel; Navarro Guerrero, Juan Jose; Llaberia Griño, Jose M.; Llosa Espuny, Jose Francisco; Villavieja Prados, Carlos; Alvarez Martinez, Carlos; Jimenez Gonzalez, Daniel; Ramirez Bellido, Alejandro; Morancho Llena, Enrique; Fernandez Jimenez, Agustin; Pajuelo González, Manuel Alejandro; Olive Duran, Angel; Sanchez Carracedo, Fermin; Moreto Planas, Miquel; Verdu Mula, Javier; Abella Ferrer, Jaume; Valero Cortes, Mateo
    Competitive project

     Share

  • Code semantic-aware runahead threads

     Ramirez Garcia, Tanausu; Pajuelo González, Manuel Alejandro; Santana Jaria, Oliverio J.; Valero Cortes, Mateo
    International Conference on Parallel Processing
    p. 1-8
    Presentation's date: 2009-09
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Memory-intensive threads can hoard shared re- sources without making progress on a multithreading processor (SMT), thereby hindering the overall system performance. A recent promising solution to overcome this important problem in SMT processors is Runa-head Threads (RaT). RaT employs runahead execution to allow a thread to speculatively execute instructions and prefetch data instead of stalling for a long-latency load. The main advantage of this mechanism is that it exploits memory-level parallelism under long latency loads without clogging up shared resources. As a result, RaT improves the overall processor performance reducing the resource contention among threads. In this paper, we propose simple code semantic based techniques to increase RaT efficiency. Our proposals are based on analyzing the prefetch opportunities (usefulness) of loops and subroutines during runahead thread executions. We dynamically analyze these particular program structures to detect when it is useful or not to control the runahead thread execution. By means of this dynamic information, the proposed techniques make a control decision either to avoid or to stall the loop or subroutine execution in runahead threads. Our experimental results show that our best proposal signifi cantly reduces the speculative instruction execution (33% on average) while maintaining and, even improving the performance of RaT (up to 3%) in some cases.

  • Mapa Conceptual Global como herramienta para la vision global de un sistema operativo  Open access

     Verdu Mula, Javier; Lopez Alvarez, David; Pajuelo González, Manuel Alejandro
    Jornadas de Enseñanza Universitaria de la Informática
    p. 1-8
    Presentation's date: 2009-07
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Numerosas asignaturas están formadas por un temario que está totalmente interrelacionado. Al final del curso los estudiantes deberían haber adquirido los conocimientos de cada tema pero, más importante aún, deberían saber cómo interactúan los diferentes temas entre ellos para obtener una visión global de la asignatura. Sin embargo, a menudo los estudiantes se centran en los temas por separado, en parte porque no les ofrecemos herramientas que les ayuden a relacionar las distintas partes del curso. En este trabajo presentamos el uso de un Mapa Conceptual Global (MCG) de una asignatura como recurso docente que ayuda al estudiante a obtener una visión de conjunto de todo el temario. La experiencia ha sido realizada como complemento de una clase de aprendizaje activo en una asignatura de Sistemas Operativos, pero pensamos que puede ser fácilmente aplicable a otros cursos.

  • Measuring operating system overhead on Sun UltraSparc T1 processor

     Radojkovic, Petar; Cakarevic, Vladimir; Verdu Mula, Javier; Pajuelo González, Manuel Alejandro; Gioiosa, Roberto; Cazorla Almeida, Francisco Javier; Nemirovsky, Mario; Valero Cortes, Mateo
    International Summer School on Advanced Computer Architecture and Compilation for High-Performance and Embedded Systems
    Presentation's date: 2009-07
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Experiencias en el uso de un mapa conceptual global en SO

     Verdu Mula, Javier; Pajuelo González, Manuel Alejandro; Lopez Alvarez, David
    Jornades de Docència del Departament d'Arquitectura de Computadors
    p. 1-22
    Presentation's date: 2009-02-13
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Access to the full text
    Measuring operating system overhead on CMT processors  Open access

     Radojkovic, Petar; Cakarevic, Vladimir; Verdu Mula, Javier; Pajuelo González, Manuel Alejandro; Gioiosa, Roberto; Cazorla Almeida, Francisco Javier; Nemirovsky, Mario; Valero Cortes, Mateo
    Symposium on Computer Architecture and High Performance Computing
    p. 133-140
    Presentation's date: 2008-10-29
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Numerous studies have shown that Operating System (OS) noise is one of the reasons for significant performance degradation in clustered architectures. Although many studies examine the OS noise for High Performance Computing (HPC), especially in multi-processor/core systems, most of them focus on 2- or 4-core systems. In this paper, we analyze the major sources of OS noise on a massive multithreading processor, the Sun UltraSPARC T1, running Linux and Solaris. Since a real system is too complex to analyze, we compare those results with a low-overhead runtime environment: the Netra Data Plane Software Suite (Netra DPS). Our results show that the overhead introduced by the OS timer interrupt in Linux and Solaris depends on the particular core and hardware context in which the application is running. This overhead is up to 30% when the application is executed on the same hardware context of the timer interrupt handler and up to 10% when the application and the timer interrupt handler run on different contexts but on the same core. We detect no overhead when the benchmark and the timer interrupt handler run on different cores of the processor.

  • Montando el puzzle: visión global de un sistema operativo

     Pajuelo González, Manuel Alejandro
    Jornadas de Enseñanza Universitaria de la Informática
    Presentation's date: 2008-07-09
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Sistemes Operatius. Quadern de Laboratori

     Pajuelo González, Manuel Alejandro; Lopez Alvarez, David; Millan Vizuete, Amador; Heredero Lazaro, Ana M.; Durán, Alex; Herrero Zaragoza, José Ramón; Verdu Mula, Javier; Becerra Fontal, Yolanda; Morancho Llena, Enrique
    Date of publication: 2008-07
    Book

     Share Reference managers Reference managers Open in new window

  • Montando el puzzle: visión global de un sistema operativo

     Verdu Mula, Javier; Lopez Alvarez, David; Pajuelo González, Manuel Alejandro
    Jornadas de Enseñanza Universitaria de la Informática
    p. 495-502
    Presentation's date: 2008-07
    Presentation of work at congresses

    Read the abstract Read the abstract  Share Reference managers Reference managers Open in new window

    A menudo, algunos estudiantes que han demostrado ser perfectamente capaces de resolver problemas de cualquier tema de un curso, fracasan ante problemas que combinan conocimientos de diversos temas. Esto se debe a que existe una tendencia a dividir el conocimiento en paquetes pequeños y no relacionados entre sí. Los alumnos estudian estos paquetes y los aprenden, pero no obtienen una visión de conjunto: conociendo cada árbol, roca y animal del bosque, no entienden las interacciones que lo convierten en un ecosistema. En este trabajo presentamos una metodología que permite obtener una visión de conjunto de una asignatura usando técnicas de aprendizaje activo. Aunque la experiencia ha sido realizada en una asignatura de Sistemas Operativos, pensamos que la metodología es adaptable a otros cursos.

  • Access to the full text
    Undertanding the overhead of the spin-lock loop in CMT architectures  Open access

     Cakarevic, Vladimir; Radojkovic, Petar; Verdu Mula, Javier; Gioiosa, Roberto; Pajuelo González, Manuel Alejandro; Cazorla Almeida, Francisco Javier; Nemirovsky, Mario; Valero Cortes, Mateo
    Workshop on the Interaction between Operating Systems and Computer Architecture
    p. 1-10
    Presentation's date: 2008-06-18
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Spin locks are a synchronization mechanisms used to provide mutual exclusion to shared software resources. Spin locks are used over other synchronization mechanisms in several situations, like when the average waiting time to obtain the lock is short, in which case the probability of getting the lock is high, or when it is no possible to use other synchronization mechanisms. In this paper, we study the effect that the execution of the Linux spin-lock loop in the Sun UltraSPARC T1 and T2 processors introduces on other running tasks, especially in the worst case scenario where the workload shows high contention on a lock. For this purpose, we create a task that continuously executes the spin-lock loop and execute several instances of this task together with another active tasks. Our results show that, when the spin-lock tasks run with other applications in the same core of a T1 or a T2 processor, they introduce a significant overhead on other applications: 31% in T1 and 42% in T2, on average, respectively. For the T1 and T2 processors, we identify the fetch bandwidth as the main source of interaction between active threads and the spin-lock threads. We, propose 4 different variants of the Linux spin-lock loop that require less fetch bandwidth. Our proposal reduces the overhead of the spin-lock tasks over the other applications down to 3.5% and 1.5% on average, in T1 and T2 respectively. This is a reduction of 28 percentage points with respect to the Linux spin-lock loop for T1. For T2 the reduction is about 40 percentage points.

  • Access to the full text
    Overhead of the spin-lock loop in UltraSPARC T2  Open access

     Cakarevic, Vladimir; Radojkovic, Petar; Cazorla Almeida, Francisco Javier; Gioiosa, Roberto; Nemirovsky, Mario; Valero Cortes, Mateo; Pajuelo González, Manuel Alejandro; Verdu Mula, Javier
    HiPEAC Industrial Workshop
    p. 1-2
    Presentation's date: 2008-06-04
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Spin locks are task synchronization mechanism used to provide mutual exclusion to shared software resources. Spin locks have a good performance in several situations over other synchronization mechanisms, i.e., when on average tasks wait short time to obtain the lock, the probability of getting the lock is high, or when there is no other synchronization mechanism. In this paper we study the effect that the execution of spinlocks create in multithreaded processors. Besides going to multicore architectures, recent industry trends show a big move toward hardware multithreaded processors. Intel P4, IBM POWER5 and POWER6, Sun's UltraSPARC T1 and T2 all this processors implement multithreading in various degrees. By sharing more processor resources we can increase system's performance, but at the same time, it increases the impact that processes executing simultaneously introduce to each other.

  • An Efficiency-aware Technique to guide Runahead Threads

     Ramirez Garcia, Tanausu; Pajuelo González, Manuel Alejandro; Oliverio, J Santana; Mutlu, Onur; Valero Cortes, Mateo
    Date: 2008-05
    Report

     Share Reference managers Reference managers Open in new window

  • Una metodología para obtener la visión global de un SO

     Verdu Mula, Javier; Pajuelo González, Manuel Alejandro; Lopez Alvarez, David
    Jornades de Docència del Departament d'Arquitectura de Computadors
    p. 1-19
    Presentation's date: 2008-02-14
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Computación de Altas Prestaciones V: Arquitecturas, Compiladores, Sistemas Operativos, Herramientas y Aplicaciones

     Ramirez Bellido, Alejandro; Valero Cortes, Mateo; Moreto Planas, Miquel; Cazorla Almeida, Francisco Javier; Abella Ferrer, Jaume; Figueiredo Boneti, Carlos Santieri; Gioiosa, Roberto; Pajuelo González, Manuel Alejandro; Quiñones Moreno, Eduardo; Verdu Mula, Javier; Guitart Fernández, Jordi; Fernandez Jimenez, Agustin; Garcia Almiñana, Jordi; Utrera Iglesias, Gladys Miriam
    Competitive project

     Share

  • Sistemes Operatius. Conceptes Bàsics

     Duran González, Alejandro; Herrero Zaragoza, José Ramón; Pajuelo González, Manuel Alejandro; Lopez Alvarez, David
    Date of publication: 2007-09
    Book

     Share Reference managers Reference managers Open in new window

  • The Multi-State Processor: ROB-free architecture with precise recovery

     Pajuelo González, Manuel Alejandro; Galluzzi, Marco; Cristal Kestelman, Adrián; Oliverio, J Santana; Valero Cortes, Mateo
    Date: 2007-09
    Report

     Share Reference managers Reference managers Open in new window

  • Introducing Runahead Threads

     Ramirez Garcia, Tanausu; Pajuelo González, Manuel Alejandro; Oliverio, J Santana; Valero Cortes, Mateo
    Date: 2007-07
    Report

     Share Reference managers Reference managers Open in new window

  • Evaluating Multithreaded Architectures on Simulation Environments

     Vera Gomez, Javier; Cazorla Almeida, Francisco Javier; Pajuelo González, Manuel Alejandro; Oliverio, J Santana; Fernández, Enrique; Valero Cortes, Mateo
    Date: 2007-04
    Report

     Share Reference managers Reference managers Open in new window

  • Implantación de la Evaluación Continuada en SO

     Lopez Alvarez, David; Pajuelo González, Manuel Alejandro; Herrero Zaragoza, José Ramón; Duran González, Alejandro
    Date: 2007-04
    Report

     Share Reference managers Reference managers Open in new window

  • A New Proposal to Evaluate Multithreaded Processors

     Cazorla Almeida, Francisco Javier; Pajuelo González, Manuel Alejandro; Oliverio, J Santana; FERNANDEZ GARCIA, ENRIQUE; Valero Cortes, Mateo
    XVIII Jornadas de Paralelismo. CEDI 2007 II Congreso Español de Informática.
    p. 1
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Measuring the Performance of Multithreaded Architectures

     Cazorla Almeida, Francisco Javier; Pajuelo González, Manuel Alejandro; Oliverio, J Santana; FERNANDEZ GARCIA, ENRIQUE; Valero Cortes, Mateo
    2007 SPEC Benchmark Workshop
    p. 1
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • FAME: Evaluating multithreaded architectures

     Vera Rivera, Francisco Javier; Cazorla Almeida, Francisco Javier; Pajuelo González, Manuel Alejandro; Oliverio, J Santana; FERNANDEZ GARCIA, ENRIQUE; Valero Cortes, Mateo
    Third International Summer School on Advanced Computer Architecture and Compilation for Embedded Systems (ACACES 2007)
    p. 123-126
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Introducing Runahead Threads for SMT Processors

     Ramirez Garcia, Tanausu; Pajuelo González, Manuel Alejandro; Oliverio, J Santana; Valero Cortes, Mateo
    XVIII Jornadas de Paralelismo. CEDI 2007 II Congreso Español de Informática.
    p. 35-42
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • A new proposal to evaluate multithreaded processors

     Javier, Vera; Cazorla, Francisco; Pajuelo González, Manuel Alejandro; Oliverio, J Santana; FERNANDEZ GARCIA, ENRIQUE; Valero Cortes, Mateo
    XVIII Jornadas de Paralelismo. CEDI 2007 II Congreso Español de Informática.
    p. 27-34
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Evaluacion continuada sin morir en el intento

     Lopez Alvarez, David; Pajuelo González, Manuel Alejandro; Herrero Zaragoza, José Ramón; Duran González, Alejandro
    Jornadas de Enseñanza Universitaria de la Informática
    p. 171-178
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Resultats de la implantació de l'avaluació continuada a l'assignatura Sistemes Operatius

     Lopez Alvarez, David; Pajuelo González, Manuel Alejandro; Herrero Zaragoza, José Ramón; Duran González, Alejandro
    Jornades de Docència del Departament d'Arquitectura de Computadors. 10 Anys de Jornades
    p. 1-10
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • FAME: Fairly Measuring Multithreaded Architectures

     Cazorla Almeida, Francisco Javier; Pajuelo González, Manuel Alejandro; Oliverio, J Santana; FERNANDEZ GARCIA, ENRIQUE; Valero Cortes, Mateo
    16th International Conference on Parallel Architectures and Compilation Techniques (PACT'07)
    p. 305-316
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • A Proposal for Continuous Assessment at Low Cost

     Lopez Alvarez, David; Herrero Zaragoza, José Ramón; Pajuelo González, Manuel Alejandro; Duran González, Alejandro
    37th Annual Frontiers in Education Conference Program
    p. 103
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Runahead Threads: Reducing Resource Contention in SMT Processors

     Ramirez Garcia, Tanausu; Pajuelo González, Manuel Alejandro; Oliverio, J Santana; Valero Cortes, Mateo
    16th International Conference on Parallel Architectures and Compilation Techniques (PACT'07)
    p. 423
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • A First Glance at Runahead Threads

     Ramirez Garcia, Tanausu; Pajuelo González, Manuel Alejandro; Oliverio, J Santana; Valero Cortes, Mateo
    Third International Summer School on Advanced Computer Architecture and Compilation for Embedded Systems (ACACES 2007)
    p. 107-110
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Implementando recuperaciones precisas en procesadores con consolidación fuera de orden

     Pajuelo González, Manuel Alejandro
    Jornadas de Paralelismo
    Presentation's date: 2006-09-18
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • A Simple Speculative Load Control Mechanism for Energy Saving

     Pajuelo González, Manuel Alejandro
    MEDEA Workshop MEmory performance: DEaling with Applications, systems and architecture in conjunction with PACT 2006 Conference.
    Presentation's date: 2006-09-16
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • A Novel Evaluation Methodology to Obtain Fair Measurements in Multithreaded Architectures

     Vera Rivera, Francisco Javier; Cazorla Almeida, Francisco Javier; Pajuelo González, Manuel Alejandro; Oliverio, J Santana; FERNANDEZ GARCIA, ENRIQUE; Valero Cortes, Mateo
    Workshop on Modeling, Benchmarking and Simulation (MoBS 2006) Held in conjunction with the 33rd Annual International Symposium on Computer Architecture (ISCA-2006)
    p. 78-87
    Presentation's date: 2006-06-18
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Kilo-Instruction Processors RunAhead and Prefetch

     Pajuelo González, Manuel Alejandro
    Third ACM International Conference on Computing Frontiers (CF'06)
    Presentation's date: 2006-05-03
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • A Novel Evaluation Methodology to Obtain Fair Measurements in Multithreaded Architectures

     Cazorla Almeida, Francisco Javier; Pajuelo González, Manuel Alejandro; Oliverio, J Santana; Fernández, Enrique; Valero Cortes, Mateo
    Date: 2006-05
    Report

     Share Reference managers Reference managers Open in new window

  • A Novel Methodology to Obtaing Fair Measurments in Multithreaded Architectures

     Cazorla Almeida, Francisco Javier; Pajuelo González, Manuel Alejandro; Oliverio, J Santana; FERNANDEZ GARCIA, ENRIQUE; Valero Cortes, Mateo
    Date: 2006-05
    Report

     Share Reference managers Reference managers Open in new window

  • A Simple Speculative Load Control Mechanism for Energy Saving

     Ramirez Garcia, Tanausu; Pajuelo González, Manuel Alejandro; Oliverio, J Santana; Valero Cortes, Mateo
    MEDEA Workshop MEmory performance: DEaling with Applications, systems and architecture in conjunction with PACT 2006 Conference.
    p. 1-2
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Implementando recuperaciones precisas en procesadores con consolidación fuera de orden

     Pajuelo González, Manuel Alejandro; Oliverio, J Sanatana; Valero Cortes, Mateo
    Jornadas de Paralelismo
    p. 13-18
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Looking for Novel Ways to Obtain Fair Measurements in Multithreaded Architectures

     Cazorla Almeida, Francisco Javier; Pajuelo González, Manuel Alejandro; Oliverio, J Santana; FERNANDEZ GARCIA, ENRIQUE; Valero Cortes, Mateo
    Jornadas de Paralelismo
    p. 37-42
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window