Gimenez Lucas, Judit
Total activity: 29
Department
Department of Computer Architecture
E-mail
judit.gimenezupc.edu
Contact details
UPC directory Open in new window

Graphic summary
  • Show / hide key
  • Information


Scientific and technological production
  •  

1 to 29 of 29 results
  • Scalability analysis of Dalton, a molecular structure program

     Aguilar, Xavier; Schliephake, Michael; Vahtras, Olav; Gimenez Lucas, Judit; Laure, Erwin
    Future generation computer systems
    Date of publication: 2013-10
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Dalton is a molecular electronic structure program featuring common methods of computational chemistry that are based on pure quantum mechanics (QM) as well as hybrid quantum mechanics/molecular mechanics (QM/MM). It is specialized and has a leading position in calculation of molecular properties with a large world-wide user community (over 2000 licenses issued). In this paper, we present a performance characterization and optimization of Dalton. We also propose a solution to avoid the master/worker design of Dalton to become a performance bottleneck for larger process numbers. With these improvements we obtain speedups of 4x, increasing the parallel efficiency of the code and being able to run in it in a much bigger number of cores. © 2013 Elsevier B.V. All rights reserved.

    Dalton is a molecular electronic structure program featuring common methods of computational chemistry that are based on pure quantum mechanics (QM) as well as hybrid quantum mechanics/molecular mechanics (QM/MM). It is specialized and has a leading position in calculation of molecular properties with a large world-wide user community (over 2000 licenses issued). In this paper, we present a performance characterization and optimization of Dalton. We also propose a solution to avoid the master/worker design of Dalton to become a performance bottleneck for larger process numbers. With these improvements we obtain speedups of 4x, increasing the parallel efficiency of the code and being able to run in it in a much bigger number of cores.

  • Detailed and simultaneous power and performance analysis

     Servat Gelabert, Harald; Llort Sanchez, German Matías; Gimenez Lucas, Judit; Labarta Mancho, Jesus Jose
    Concurrency and computation. Practice and experience
    Date of publication: 2013-12
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    On the road to Exascale computing, both performance and power areas are meant to be tackled at different levels, from system to processor level. The processor itself is the main responsible for the serial node performance and also for the most of the energy consumed by the system. Thus, it is important to have tools to simultaneously analyze both performance and energy efficiency at processor level. Performance tools have allowed analysts to understand, and even improve, the performance of an application that runs in a system. With the advent of recent processor capabilities to measure its own power consumption, performance tools can increase their collection of metrics by adding those related to energy consumption and provide a correlation between the source code, its performance and its energy efficiency. In this paper, we present a performance tool that has been extended to gather such energy metrics. The results of this tool are passed to a mechanism called folding that produces detailed metrics and source code references by using coarse grain sampling. We have used the tool with multiple serial benchmarks as well as parallel applications to demonstrate its usefulness by locating hot spots in terms of performance and power drained.

    On the road to Exascale computing, both performance and power areas are meant to be tackled at different levels, from system to processor level. The processor itself is the main responsible for the serial node performance and also for the most of the energy consumed by the system. Thus, it is important to have tools to simultaneously analyze both performance and energy efficiency at processor level. Performance tools have allowed analysts to understand, and even improve, the performance of an application that runs in a system. With the advent of recent processor capabilities to measure its own power consumption, performance tools can increase their collection of metrics by adding those related to energy consumption and provide a correlation between the source code, its performance and its energy efficiency. In this paper, we present a performance tool that has been extended to gather such energy metrics. The results of this tool are passed to a mechanism called folding that produces detailed metrics and source code references by using coarse grain sampling. We have used the tool with multiple serial benchmarks as well as parallel applications to demonstrate its usefulness by locating hot spots in terms of performance and power drained.

  • Framework for a productive performance optimization

     Servat Gelabert, Harald; Llort Sanchez, German Matías; Huck, Kevin A.; Gimenez Lucas, Judit; Labarta Mancho, Jesus Jose
    Parallel computing
    Date of publication: 2013-08
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Modern supercomputers deliver large computational power, but it is difficult for an application to exploit such power. One factor that limits the application performance is the single node performance. While many performance tools use the microprocessor performance counters to provide insights on serial node performance issues, the complex semantics of these counters pose an obstacle to an inexperienced developer. We present a framework that allows easy identification and qualification of serial node performance bottlenecks in parallel applications. The output of the framework is precise and it is capable of correlating performance inefficiencies with small regions of code within the application. The framework not only points to regions of code but also simplifies the semantics of the performance counters into metrics that refer to processor functional units. With such information the developer can focus on the identified code and improve it by knowing which processor execution unit is degrading the performance. To demonstrate the usefulness of the framework we apply it to three already optimized applications using realistic inputs and, according to the results, modify their source code. By doing modifications that require little effort, we successfully increase the applications' performance from 10% to 30% and thus shorten the time required to reach the solution and/or allow facing increased problem sizes.

    Modern supercomputers deliver large computational power, but it is difficult for an application to exploit such power. One factor that limits the application performance is the single node performance. While many performance tools use the microprocessor performance counters to provide insights on serial node performance issues, the complex semantics of these counters pose an obstacle to an inexperienced developer. We present a framework that allows easy identification and qualification of serial node performance bottlenecks in parallel applications. The output of the framework is precise and it is capable of correlating performance inefficiencies with small regions of code within the application. The framework not only points to regions of code but also simplifies the semantics of the performance counters into metrics that refer to processor functional units. With such information the developer can focus on the identified code and improve it by knowing which processor execution unit is degrading the performance. To demonstrate the usefulness of the framework we apply it to three already optimized applications using realistic inputs and, according to the results, modify their source code. By doing modifications that require little effort, we successfully increase the applications’ performance from 10% to 30% and thus shorten the time required to reach the solution and/or allow facing increased problem sizes.

  • On the usefulness of object tracking techniques in performance analysis

     Llort Sanchez, German Matías; Servat Gelabert, Harald; González Garcia, Juan; Gimenez Lucas, Judit; Labarta Mancho, Jesus Jose
    International Conference for High Performance Computing, Networking, Storage and Analysis
    Presentation's date: 2013-11-17
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Understanding the behavior of a parallel application is crucial if we are to tune it to achieve its maximum performance. Yet the behavior the application exhibits may change over time and depend on the actual execution scenario: particular inputs and program settings, the number of processes used, or hardware-specific problems. So beyond the details of a single experiment a far more interesting question arises: how does the application behavior respond to changes in the execution conditions? In this paper, we demonstrate that object tracking concepts from computer vision have huge potential to be applied in the context of performance analysis. We leverage tracking techniques to analyze how the behavior of a parallel application evolves through multiple scenarios where the execution conditions change. This method provides comprehensible insights on the influence of different parameters on the application behavior, enabling us to identify the most relevant code regions and their performance trends. Copyright 2013 ACM.

    Understanding the behavior of a parallel application is crucial if we are to tune it to achieve its maximum performance. Yet the behavior the application exhibits may change over time and depend on the actual execution scenario: particular inputs and program settings, the number of processes used, or hardware-specific problems. So beyond the details of a single experiment a far more interesting question arises: how does the application behavior respond to changes in the execution conditions? In this paper, we demonstrate that object tracking concepts from computer vision have huge potential to be applied in the context of performance analysis. We leverage tracking techniques to analyze how the behavior of a parallel application evolves through multiple scenarios where the execution conditions change. This method provides comprehensible insights on the influence of different parameters on the application behavior, enabling us to identify the most relevant code regions and their performance trends. Copyright 2013 ACM.

  • Automatic refinement of parallel applications structure detection

     González Garcia, Juan; Huck, Kevin; Gimenez Lucas, Judit; Labarta Mancho, Jesus Jose
    IEEE International Parallel and Distributed Processing Symposium
    Presentation's date: 2012-05-21
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Analyzing parallel programs has become increasingly difficult due to the immense amount of information collected on large systems. In this scenario, cluster analysis has been proved to be a useful technique to reduce the amount of data to analyze. A good example is the use of the density-based cluster algorithm DBSCAN to identify similar single program multiple data (SPMD) computing phases in message-passing applications. This structure detection simplifies the analyst work as the whole information available is reduced to a small set of clusters. However, DBSCAN presents two major problems: it is very sensitive to its parametrization and is not capable of correctly detect clusters when the data set has different densities across the data space. In this paper, we introduce the Aggregative Cluster Refinement, an iterative algorithm that produces more accurate structure detections of SPMD phases than DBSCAN. In addition, it is able to detect clusters with different densities

  • Simulating whole supercomputer applications

     González Garcia, Juan; Casas, Marc; Gimenez Lucas, Judit; Moreto Planas, Miquel; Ramirez Bellido, Alejandro; Labarta Mancho, Jesus Jose; Valero Cortes, Mateo
    IEEE micro
    Date of publication: 2011-06
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Detailed simulations of large scale message-passing interface parallel applications are extremely time consuming and resource intensive. A new methodology that combines signal processing and data mining techniques plus a multilevel simulation reduces the simulated data by various orders of magnitude. This reduction makes possible detailed software performance analysis and accurate performance predictions in a reasonable time.

  • Comparing Different Clustering Algorithms on Performance Analysis of Parallel Applications

     González Garcia, Juan; Gimenez Lucas, Judit; Labarta Mancho, Jesus Jose
    Date: 2008-10
    Report

     Share Reference managers Reference managers Open in new window

  • Automatic detection of fine grain application structure (3rd Version)

     González, Juan; Gimenez Lucas, Judit; Labarta Mancho, Jesus Jose
    Date: 2008-02
    Report

     Share Reference managers Reference managers Open in new window

  • Automatic detection of parallel applications structure

     González, Juan; Gimenez Lucas, Judit; Labarta Mancho, Jesus Jose
    Date: 2008-04
    Report

     Share Reference managers Reference managers Open in new window

  • Detailed performance analysis using coarse grain sampling

     Servat Gelabert, Harald; Llort Sanchez, German Matías; Gimenez Lucas, Judit; Labarta Mancho, Jesus Jose
    Date: 2008-07
    Report

     Share Reference managers Reference managers Open in new window

  • Automatic detection of parallel applications computation phases

     González, Juan; Gimenez Lucas, Judit; Labarta Mancho, Jesus Jose
    Date: 2008-07
    Report

     Share Reference managers Reference managers Open in new window

  • Automatic detection of fine grain application structure (2nd version

     González Garcia, Juan; Gimenez Lucas, Judit; Labarta Mancho, Jesus Jose
    Date: 2007-10
    Report

     Share Reference managers Reference managers Open in new window

  • Automatic detection of fine grain application structure

     González Garcia, Juan; Gimenez Lucas, Judit; Labarta Mancho, Jesus Jose
    Date: 2007-06
    Report

     Share Reference managers Reference managers Open in new window

  • Scalability of Visualization and Tracing Tools

     Labarta Mancho, Jesus Jose; Gimenez Lucas, Judit; Gonzalez, E Martine P; Servat Gelabert, Harald; Llort Sanchez, German Matías; Aguilar, X
    Parallel Computing Current & Future Issuses of High-End Computing
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Blue Gene/L Performance Tools

     Martorell Bofill, Xavier; Smeds, N; Walkup, R; Almasi, G; Labarta Mancho, Jesus Jose; Escale Claveras, Francesc; Gimenez Lucas, Judit
    IBM journal of research and development
    Date of publication: 2005-03
    Journal article

     Share Reference managers Reference managers Open in new window

  • What Multilevel Parallel programs do when you are not watching: A performance analysis case study comparing MPI/OpenMP, MLP, and nested OpenMP

     Jost, G; Labarta Mancho, Jesus Jose; Gimenez Lucas, Judit
    Workshop on OpenMP Applications and Tools
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Paramedir: A Tool for Programmable Performance Analysis

     Gimenez Lucas, Judit
    Computational Science - ICCS 2004
    Presentation's date: 2004-06-06
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Paramedir: A Tool for Programmable Performance Analysis

     Gabriel, Jost; Labarta Mancho, Jesus Jose; Gimenez Lucas, Judit
    Computational Science - ICCS 2004
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Paramedir: A Tool for Programmable Performance Analysis

     Gabriel, Jost; Labarta Mancho, Jesus Jose; Gimenez Lucas, Judit
    Lecture notes in computer science
    Date of publication: 2004-06
    Journal article

     Share Reference managers Reference managers Open in new window

  • What Multilevel Parallel Programs Do When You Are Not Watching: A Performance Analysis Case Study Comparing MPI/Open MP, MLP and Nested OpenMP

     Gabriele, Jost; Labarta Mancho, Jesus Jose; Gimenez Lucas, Judit
    Lecture notes in computer science
    Date of publication: 2004-05
    Journal article

     Share Reference managers Reference managers Open in new window

  • Interfacing Computer Aided Parallelization and Performance Analysis

     Gabriele, Jost; Haoqiang, Jin; Labarta Mancho, Jesus Jose; Gimenez Lucas, Judit
    Computational Science- ICCS 2003
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Performance Prediction in a Grid environment

     Badia Sala, Rosa Maria; Escalé, Francesc; Gabriel, Edgar; Gimenez Lucas, Judit; Keller, Rainer; Labarta Mancho, Jesus Jose; Matthias, S Müller
    First European Across Grids Conference
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Performance Prediction in a Grid Environment

     Badia Sala, Rosa Maria; Escalé, Francesc; Gabriel, Edgar; Gimenez Lucas, Judit; Rainer, Keller; Labarta Mancho, Jesus Jose; Matthias, S Müller
    First European Across Grids Conference
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Performance Analysis of Multilevel Parallel Applications on Shared Memory Architectures

     Gabriele, Jost; Haoqiang, Jin; Labarta Mancho, Jesus Jose; Gimenez Lucas, Judit; Caubet Serrabou, Jordi
    IEEE International Parallel and Distributed Processing Symposium
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Interfacing Computer Aided Parallelization and Performance Analysis

     Gabriele, Jost; Haoqiang, Jin; Labarta Mancho, Jesus Jose; Gimenez Lucas, Judit
    Lecture notes in computer science
    Date of publication: 2003-06
    Journal article

     Share Reference managers Reference managers Open in new window

  • Performance Prediction in a Grid Environment

     Badia Sala, Rosa Maria; Escale Claveras, Francesc; Gabriel, Edgar; Gimenez Lucas, Judit; Rainer, Keller; Labarta Mancho, Jesus Jose; Matthias, S Müller
    Lecture notes in computer science
    Date of publication: 2003-12
    Journal article

     Share Reference managers Reference managers Open in new window

  • A Dynamic Tracing Mechanism for Performance Analysis of OpenMP Applications

     Caubet Serrabou, Jordi; Gimenez Lucas, Judit; Labarta Mancho, Jesus Jose; Derose, Luiz; Jeffrey, Vetter
    2nd International Workshop on OpenMP Applications and Tools
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • ESPRIT No.24757 CEPBA-TTN - Mechanism for enabling HPCN technology transfer in Europe (METIER)

     Valero Cortes, Mateo; Labarta Mancho, Jesus Jose; Gimenez Lucas, Judit; Torres Viñals, Jordi
    Participation in a competitive project

     Share

  • The Paros Operating System Microkernel (UPC-CEPBA-94-05)

     Labarta Mancho, Jesus Jose; Girona Turell, Sergio; Cortes Rossello, Antonio; Gimenez Lucas, Judit; Pujol Jensen, Cristina; Gregoris de la Fuente, Luis
    Date: 1994-06
    Report

     Share Reference managers Reference managers Open in new window