Gimenez Lucas, Judit
Total activity: 42
Department
Department of Computer Architecture
E-mail
judit.gimenezupc.edu
Contact details
UPC directory Open in new window

Graphic summary
  • Show / hide key
  • Information


Scientific and technological production
  •  

1 to 42 of 42 results
  • Identifying code phases using piece-wise linear regressions

     Servat Gelabert, Harald; Llort Sanchez, German Matías; González Garcia, Juan; Gimenez Lucas, Judit; Labarta Mancho, Jesus Jose
    IEEE International Parallel and Distributed Processing Symposium
    p. 941-951
    DOI: 10.1109/IPDPS.2014.100
    Presentation's date: 2014-05-19
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Node-level performance is one of the factors that may limit applications from reaching the supercomputers' peak performance. Studying node-level performance and attributing it to the source code results into valuable insight that can be used to improve the application efficiency, albeit performing such a study may be an intimidating task due to the complexity and size of the applications. We present in this paper a mechanism that takes advantage of combining piece-wise linear regressions, coarse-grain sampling, and minimal instrumentation to detect performance phases in the computation regions even if their granularity is very fine. This mechanism then maps the performance of each phase into the application syntactical structure displaying a correlation between performance and source code. We introduce a methodology on top of this mechanism to describe the node-level performance of parallel applications, even for first-time seen applications. Finally, we demonstrate the methodology describing optimized in-production applications and further improving their performance applying small transformations to the code based on the hints discovered. © 2014 IEEE.

  • Framework for a productive performance optimization

     Servat Gelabert, Harald; Llort Sanchez, German Matías; Huck, Kevin A.; Gimenez Lucas, Judit; Labarta Mancho, Jesus Jose
    Parallel computing
    Vol. 39, num. 8, p. 336-353
    DOI: 10.1016/j.parco.2013.05.004
    Date of publication: 2013-08
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Modern supercomputers deliver large computational power, but it is difficult for an application to exploit such power. One factor that limits the application performance is the single node performance. While many performance tools use the microprocessor performance counters to provide insights on serial node performance issues, the complex semantics of these counters pose an obstacle to an inexperienced developer. We present a framework that allows easy identification and qualification of serial node performance bottlenecks in parallel applications. The output of the framework is precise and it is capable of correlating performance inefficiencies with small regions of code within the application. The framework not only points to regions of code but also simplifies the semantics of the performance counters into metrics that refer to processor functional units. With such information the developer can focus on the identified code and improve it by knowing which processor execution unit is degrading the performance. To demonstrate the usefulness of the framework we apply it to three already optimized applications using realistic inputs and, according to the results, modify their source code. By doing modifications that require little effort, we successfully increase the applications' performance from 10% to 30% and thus shorten the time required to reach the solution and/or allow facing increased problem sizes.

    Modern supercomputers deliver large computational power, but it is difficult for an application to exploit such power. One factor that limits the application performance is the single node performance. While many performance tools use the microprocessor performance counters to provide insights on serial node performance issues, the complex semantics of these counters pose an obstacle to an inexperienced developer. We present a framework that allows easy identification and qualification of serial node performance bottlenecks in parallel applications. The output of the framework is precise and it is capable of correlating performance inefficiencies with small regions of code within the application. The framework not only points to regions of code but also simplifies the semantics of the performance counters into metrics that refer to processor functional units. With such information the developer can focus on the identified code and improve it by knowing which processor execution unit is degrading the performance. To demonstrate the usefulness of the framework we apply it to three already optimized applications using realistic inputs and, according to the results, modify their source code. By doing modifications that require little effort, we successfully increase the applications’ performance from 10% to 30% and thus shorten the time required to reach the solution and/or allow facing increased problem sizes.

  • Detailed and simultaneous power and performance analysis

     Servat Gelabert, Harald; Llort Sanchez, German Matías; Gimenez Lucas, Judit; Labarta Mancho, Jesus Jose
    Concurrency and computation. Practice and experience
    DOI: 10.1002/cpe.3188
    Date of publication: 2013-12
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    On the road to Exascale computing, both performance and power areas are meant to be tackled at different levels, from system to processor level. The processor itself is the main responsible for the serial node performance and also for the most of the energy consumed by the system. Thus, it is important to have tools to simultaneously analyze both performance and energy efficiency at processor level. Performance tools have allowed analysts to understand, and even improve, the performance of an application that runs in a system. With the advent of recent processor capabilities to measure its own power consumption, performance tools can increase their collection of metrics by adding those related to energy consumption and provide a correlation between the source code, its performance and its energy efficiency. In this paper, we present a performance tool that has been extended to gather such energy metrics. The results of this tool are passed to a mechanism called folding that produces detailed metrics and source code references by using coarse grain sampling. We have used the tool with multiple serial benchmarks as well as parallel applications to demonstrate its usefulness by locating hot spots in terms of performance and power drained.

  • Scalability analysis of Dalton, a molecular structure program

     Aguilar, Xavier; Schliephake, Michael; Vahtras, Olav; Gimenez Lucas, Judit; Laure, Erwin
    Future generation computer systems
    Vol. 29, num. 8, p. 2197-2204
    DOI: 10.1016/j.future.2013.04.013
    Date of publication: 2013-10
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Dalton is a molecular electronic structure program featuring common methods of computational chemistry that are based on pure quantum mechanics (QM) as well as hybrid quantum mechanics/molecular mechanics (QM/MM). It is specialized and has a leading position in calculation of molecular properties with a large world-wide user community (over 2000 licenses issued). In this paper, we present a performance characterization and optimization of Dalton. We also propose a solution to avoid the master/worker design of Dalton to become a performance bottleneck for larger process numbers. With these improvements we obtain speedups of 4x, increasing the parallel efficiency of the code and being able to run in it in a much bigger number of cores. © 2013 Elsevier B.V. All rights reserved.

    Dalton is a molecular electronic structure program featuring common methods of computational chemistry that are based on pure quantum mechanics (QM) as well as hybrid quantum mechanics/molecular mechanics (QM/MM). It is specialized and has a leading position in calculation of molecular properties with a large world-wide user community (over 2000 licenses issued). In this paper, we present a performance characterization and optimization of Dalton. We also propose a solution to avoid the master/worker design of Dalton to become a performance bottleneck for larger process numbers. With these improvements we obtain speedups of 4x, increasing the parallel efficiency of the code and being able to run in it in a much bigger number of cores.

  • On the usefulness of object tracking techniques in performance analysis

     Llort Sanchez, German Matías; Servat Gelabert, Harald; González Garcia, Juan; Gimenez Lucas, Judit; Labarta Mancho, Jesus Jose
    International Conference for High Performance Computing, Networking, Storage and Analysis
    DOI: 10.1145/2503210.2503267
    Presentation's date: 2013-11-17
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Understanding the behavior of a parallel application is crucial if we are to tune it to achieve its maximum performance. Yet the behavior the application exhibits may change over time and depend on the actual execution scenario: particular inputs and program settings, the number of processes used, or hardware-specific problems. So beyond the details of a single experiment a far more interesting question arises: how does the application behavior respond to changes in the execution conditions? In this paper, we demonstrate that object tracking concepts from computer vision have huge potential to be applied in the context of performance analysis. We leverage tracking techniques to analyze how the behavior of a parallel application evolves through multiple scenarios where the execution conditions change. This method provides comprehensible insights on the influence of different parameters on the application behavior, enabling us to identify the most relevant code regions and their performance trends. Copyright 2013 ACM.

  • Automatic refinement of parallel applications structure detection

     González Garcia, Juan; Huck, Kevin; Gimenez Lucas, Judit; Labarta Mancho, Jesus Jose
    IEEE International Parallel and Distributed Processing Symposium
    p. 1680-1687
    DOI: 10.1109/IPDPSW.2012.209
    Presentation's date: 2012-05-21
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Analyzing parallel programs has become increasingly difficult due to the immense amount of information collected on large systems. In this scenario, cluster analysis has been proved to be a useful technique to reduce the amount of data to analyze. A good example is the use of the density-based cluster algorithm DBSCAN to identify similar single program multiple data (SPMD) computing phases in message-passing applications. This structure detection simplifies the analyst work as the whole information available is reduced to a small set of clusters. However, DBSCAN presents two major problems: it is very sensitive to its parametrization and is not capable of correctly detect clusters when the data set has different densities across the data space. In this paper, we introduce the Aggregative Cluster Refinement, an iterative algorithm that produces more accurate structure detections of SPMD phases than DBSCAN. In addition, it is able to detect clusters with different densities

  • The HOPSA workflow and tools

     Mohr, Bernd; Voevedin, Vladimir; Gimenez Lucas, Judit; Hagersten, Erik; Knüpfer, Andreas; Nikitenko, Dmitry A.; Nilsson, Mats; Servat Gelabert, Harald; Shah, Aamer; Winkler, Frank; Wolf, Felix; Zhukov, Ilya
    International Parallel Tools Workshop
    p. 127-146
    DOI: 10.1007/978-3-642-37349-7_9
    Presentation's date: 2012-09
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    To maximise the scientific output of a high-performance computing system, different stakeholders pursue different strategies. While individual application developers are trying to shorten the time to solution by optimising their codes, system administrators are tuning the configuration of the overall system to increase its throughput. Yet, the complexity of today¿s machines with their strong interrelationship between application and system performance presents serious challenges to achieving these goals. The HOPSA project (HOlistic Performance System Analysis) therefore sets out to create an integrated diagnostic infrastructure for combined application and system-level tuning ¿ with the former provided by the EU and the latter by the Russian project partners. Starting from system-wide basic performance screening of individual jobs, an automated workflow routes findings on potential bottlenecks either to application developers or system administrators with recommendations on how to identify their root cause using more powerful diagnostic tools. Developers can choose from a variety of mature performance-analysis tools developed by our consortium. Within this project, the tools will be further integrated and enhanced with respect to scalability, depth of analysis, and support for asynchronous tasking, a node-level paradigm playing an increasingly important role in hybrid programs on emerging hierarchical and heterogeneous systems.

    To maximise the scientific output of a high-performance computing system, different stakeholders pursue different strategies. While individual application developers are trying to shorten the time to solution by optimising their codes, system administrators are tuning the configuration of the overall system to increase its throughput. Yet, the complexity of today’s machines with their strong interrelationship between application and system performance presents serious challenges to achieving these goals. The HOPSA project (HOlistic Performance System Analysis) therefore sets out to create an integrated diagnostic infrastructure for combined application and system-level tuning – with the former provided by the EU and the latter by the Russian project partners. Starting from system-wide basic performance screening of individual jobs, an automated workflow routes findings on potential bottlenecks either to application developers or system administrators with recommendations on how to identify their root cause using more powerful diagnostic tools. Developers can choose from a variety of mature performance-analysis tools developed by our consortium. Within this project, the tools will be further integrated and enhanced with respect to scalability, depth of analysis, and support for asynchronous tasking, a node-level paradigm playing an increasingly important role in hybrid programs on emerging hierarchical and heterogeneous systems.

  • Simulating whole supercomputer applications

     González Garcia, Juan; Casas, Marc; Gimenez Lucas, Judit; Moreto Planas, Miquel; Ramirez Bellido, Alejandro; Labarta Mancho, Jesus Jose; Valero Cortes, Mateo
    IEEE micro
    Vol. 31, num. 3, p. 32-45
    DOI: 10.1109/MM.2011.58
    Date of publication: 2011-06
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Detailed simulations of large scale message-passing interface parallel applications are extremely time consuming and resource intensive. A new methodology that combines signal processing and data mining techniques plus a multilevel simulation reduces the simulated data by various orders of magnitude. This reduction makes possible detailed software performance analysis and accurate performance predictions in a reasonable time.

  • Folding: detailed analysis with coarse sampling

     Servat Gelabert, Harald; Llort Sanchez, German Matías; Gimenez Lucas, Judit; Huck, Kevin A.; Labarta Mancho, Jesus Jose
    International Parallel Tools Workshop
    p. 105-118
    DOI: 10.1007/978-3-642-31476-6_9
    Presentation's date: 2011-09
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Performance analysis tools help the application users to find bottlenecks that prevent the application to run at full speed in current supercomputers. The level of detail and the accuracy of the performance tools are crucial to completely depict the nature of the bottlenecks. The details exposed do not only depend on the nature of the tools (profile-based or trace-based) but also on the mechanism on which they rely (instrumentation or sampling) to gather information.In this paper we present a mechanism called folding that combines both instrumentation and sampling for trace-based performance analysis tools. The folding mechanism takes advantage of long execution runs and low frequency sampling to finely detail the evolution of the user code with minimal overhead on the application. The reports provided by the folding mechanism are extremely useful to understand the behavior of a region of code at a very low level. We also present a practical study we have done in a in-production scenario with the folding mechanism and show that the results of the folding resembles to high frequency sampling.

  • Estimation of MPI application performance on volunteer environments

     Nandagudi, Girish; Subhlok, Jaspal; Gabriel, Edgar; Gimenez Lucas, Judit
    International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms
    p. 511-520
    DOI: 10.1007/978-3-642-29737-3_56
    Presentation's date: 2011-09
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Emerging MPI libraries, such as VolpexMPI and P2P MPI, allow message passing parallel programs to execute effectively in heterogeneous volunteer environments despite frequent failures. However, the performance of message passing codes varies widely in a volunteer environment, depending on the application characteristics and the computation and communication characteristics of the nodes and the interconnection network. This paper has the dual goal of developing and validating a tool chain to estimate performance of MPI codes in a volunteer environment and analyzing the suitability of the class of computations represented by NAS benchmarks for volunteer computing. The framework is deployed to estimate performance in a variety of possible volunteer configurations, including some based on the measured parameters of a campus volunteer pool. The results show slowdowns by factors between 2 and 10 for different NAS benchmark codes for execution on a realistic volunteer campus pool as compared to dedicated clusters.

  • Trace spectral analysis toward dynamic levels of detail

     Llort Sanchez, German Matías; Casas Guix, Marc; Servat Gelabert, Harald; Huck, Kevin A.; Gimenez Lucas, Judit; Labarta Mancho, Jesus Jose
    IEEE International Conference on Parallel and Distributed Systems
    p. 332-339
    DOI: 10.1109/ICPADS.2011.142
    Presentation's date: 2011-12
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    The emergence of Petascale systems has raised new challenges to performance analysis tools. Understanding every single detail of an execution is important to bridge the gap between the theoretical peak and the actual performance achieved. Tracing tools are the best option when it comes to providing detailed information about the application behavior, but not without liabilities. The amount of information that a single execution can generate grows so fast that it easily becomes unmanageable. An effective analysis in such scenarios necessitates the intelligent selection of information. In this paper we present an on-line performance tool based on spectral analysis of signals that automatically identifies the different computing phases of the application as it runs, selects a few representative periods and decides the granularity of the information gathered for these regions. As a result, the execution is completely characterized at different levels of detail, reducing the amount of data collected while maximizing the amount of useful information presented for the analysis.

  • Unveiling internal evolution of parallel application computation phases

     Servat Gelabert, Harald; Llort Sanchez, German Matías; Gimenez Lucas, Judit; Huck, Kevin A.; Labarta Mancho, Jesus Jose
    International Conference on Parallel Processing
    p. 155-164
    DOI: 10.1109/ICPP.2011.35
    Presentation's date: 2011-09
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    As access to supercomputing resources is becoming more and more commonplace, performance analysis tools are gaining importance in order to decrease the gap between the application performance and the supercomputers' peak performance. Performance analysis tools allow the analyst to understand the idiosyncrasies of an application in order to improve it. However, these tools require monitoring regions of the application to provide information to the analysts, leaving non-monitored regions of code unknown, which may result in lack of understanding of important regions of the application. In this paper we describe an automated methodology that reports very detailed application insights and improves the analysis experience of performance tools based on traces. We apply this methodology to three production applications and provide suggestions on how to improve their performance. Our methodology uses computation burst clustering and a mechanism called folding. While clustering automatically detects application structure, folding combines instrumentation and sampling to augment the performance analysis details. Folding provides fine grain performance information from coarse grain sampling on iterative applications. Folding results closely resemble the performance data gathered from fine grain sampling with an absolute mean difference less than 5% without overhead of fine grain.

  • Scaling Dalton: a molecular electronic structure program

     Aguilar, Xavier; Schliepake, Michael; Vahtras, Olav; Gimenez Lucas, Judit; Laure, Erwin
    IEEE International Conference on e-Science
    p. 256-262
    DOI: 10.1109/eScience.2011.43
    Presentation's date: 2011-12
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Dalton is a molecular electronic structure program featuring common methods of computational chemistry that are based on pure quantum mechanics (QM) as well as hybrid quantum mechanics/molecular mechanics (QM/MM). It is specialized and has a leading position in calculation of molecular properties with a large world-wide user community (over 2000 licenses issued). In this paper, we present a characterization and performance optimization of Dalton that increases the scalability and parallel efficiency of the application. We also propose a solution that helps to avoid the master/worker design of Dalton to become a performance bottleneck for larger process numbers and increase the parallel efficiency.

  • Guided performance analysis combining profile and trace tools

     Gimenez Lucas, Judit; Labarta Mancho, Jesus Jose; Pegenaute Bresme, Xavier; Wen, Hui-Fang; Klepacki, David; Chung, I-Hsin; Cong, Guojing; Voigtländer, Felix; Mohr, Bernd
    Workshop on Productivity and Performance
    p. 513-521
    DOI: 10.1007/978-3-642-21878-1_63
    Presentation's date: 2010-09
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Performance analysis is very important to understand the applications¿ behavior and to identify bottlenecks. Performance analysis tools should facilitate the exploration of the data collected and help to identify where the analyst has to look. While this functionality can promote the tools usage on small and medium size environments, it becomes mandatory for large-scale and many-core systems where the amount of data is dramatically increased. This paper proposes a new methodology based on the integration of profilers and timeline tools to improve and facilitate the performance analysis process.

    Performance analysis is very important to understand the applications’ behavior and to identify bottlenecks. Performance analysis tools should facilitate the exploration of the data collected and help to identify where the analyst has to look. While this functionality can promote the tools usage on small and medium size environments, it becomes mandatory for large-scale and many-core systems where the amount of data is dramatically increased. This paper proposes a new methodology based on the integration of profilers and timeline tools to improve and facilitate the performance analysis process.

  • On-line detection of large-scale parallel application's structure

     Llort Sanchez, German Matías; González Garcia, Juan; Servat Gelabert, Harald; Gimenez Lucas, Judit; Labarta Mancho, Jesus Jose
    IEEE International Parallel and Distributed Processing Symposium
    p. 1-10
    DOI: 10.1109/IPDPS.2010.5470350
    Presentation's date: 2010-04
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    With larger and larger systems being constantly deployed, trace-based performance analysis of parallel applications has become a daunting task. Even if the amount of performance data gathered per single process is small, traces rapidly become unmanageable when merging together the information collected from all processes. In general, an ecient analysis of such a large volume of data is subject to a previous ltering step that directs the analyst's attention towards what is meaningful to understand the observed application behavior. Furthermore, the iterative nature of most scienti c applications usually ends up producing repetitive information. Discarding irrelevant data aims at reducing both the size of traces, and the time required to perform the analysis and deliver results. In this paper, we present an on-line analysis framework that relies on clustering techniques to intelligently select the most relevant information to understand how does the application behave, while keeping the trace volume at a reasonable size.

    With larger and larger systems being constantly deployed, trace-based performance analysis of parallel applications has become a daunting task. Even if the amount of performance data gathered per single process is small, traces rapidly become unmanageable when merging together the information collected from all processes. In general, an e cient analysis of such a large volume of data is subject to a previous ltering step that directs the analyst's attention towards what is meaningful to understand the observed application behavior. Furthermore, the iterative nature of most scienti c applications usually ends up producing repetitive information. Discarding irrelevant data aims at reducing both the size of traces, and the time required to perform the analysis and deliver results. In this paper, we present an on-line analysis framework that relies on clustering techniques to intelligently select the most relevant information to understand how does the application behave, while keeping the trace volume at a reasonable size.

  • Performance data extrapolation in parallel codes

     González Garcia, Juan; Gimenez Lucas, Judit; Labarta Mancho, Jesus Jose
    IEEE International Conference on Parallel and Distributed Systems
    p. 155-163
    DOI: 10.1109/ICPADS.2010.79
    Presentation's date: 2010-12
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Measuring the performance of parallel codes is a compromise between lots of factors. The most important one is which data has to be analyzed. Current supercomputers are able to run applications in large number of processors as well as the analysis data that can be extracted is also large and varied. That implies a hard compromise between the potential problems one want to analyze and the information one is able to capture during the application execution. In this paper we present an extrapolation methodology to maximize the information extracted in a single application execution. It is based on a structural characterization of the applications, performed using clustering techniques, the ability to multiplex the read of performance hardware counters, plus a projection process. As a result, we obtain the approximated values of a large set of metrics for each phase of the application, with minimum error.

  • Automatic evaluation of the computation structure of parallel applications

     González Garcia, Juan; Gimenez Lucas, Judit; Labarta Mancho, Jesus Jose
    International Conference on Parallel and Distributed Computing, Applications and Technologies
    p. 138-145
    DOI: 10.1109/PDCAT.2009.52
    Presentation's date: 2009-12
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Many data mining techniques have been proposed for parallel applications performance analysis, the most inter- esting being clustering analysis. Most cases have been used to detect processors with similar behavior. In previous work, we presented a different approach: clustering was used to detect the computation structure of the applications and how these different computation phases behave. In this paper, we present a method to evaluate the accuracy of this structure detection. This new method is based on the Single Program Multiple Data (SPMD) paradigm exhibited by real parallel programs. Assuming an SPMD structure, we expect that all tasks of a parallel application execute the same operation sequence. Using a Multiple Sequence Alignment (MSA) algorithm, we check the sequence ordering of the detected clusters to evaluate the quality of the clustering results.

  • Automatic detection of parallel applications computation phases

     González Garcia, Juan; Gimenez Lucas, Judit; Labarta Mancho, Jesus Jose
    IEEE International Parallel and Distributed Processing Symposium
    p. 1-11
    DOI: 10.1109/IPDPS.2009.5161027
    Presentation's date: 2009-05
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Analyzing parallel programs has become increasingly difficult due to the immense amount of information collected on large systems. The use of clustering techniques has been proposed to analyze applications. However, while the objective of previous works is focused on identifying groups of processes with similar characteristics, we target a much finer granularity in the application behavior. In this paper, we present a tool that automatically characterizes the different computation regions between communication primitives in message-passing applications. This study shows how some of the clustering algorithms which may be applicable at a coarse grain are no longer adequate at this level. Density-based clustering algorithms applied to the performance counters offered by modern processors are more appropriate in this context. This tool automatically generates accurate displays of the structure of the application as well as detailed reports on a broad range of metrics for each individual region detected.

  • Detailed performance analysis using coarse grain sampling

     Servat Gelabert, Harald; Llort Sanchez, German Matías; Gimenez Lucas, Judit; Labarta Mancho, Jesus Jose
    Workshop on Productivity and Performance
    p. 185-198
    DOI: 10.1007/978-3-642-14122-5_23
    Presentation's date: 2009-09
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Performance evaluation tools enable analysts to shed light on how applications behave both from a general point of view and at concrete execution points, but cannot provide detailed information beyond the monitored regions of code. Having the ability to determine when and which data has to be collected is crucial for a successful analysis. This is particularly true for trace-based tools, which can easily incur either unmanageable large traces or information shortage. In order to mitigate the well-known resolution vs. usability trade-off, we present a procedure that obtains fine grain performance information using coarse grain sampling, projecting performance metrics scattered all over the execution into thoroughly detailed representative areas. This mechanism has been incorporated into the MPItrace tracing suite, greatly extending the amount of performance information gathered from statically instrumented points with further periodic samples collected beyond them. We have applied this solution to the analysis of two applications to introduce a novel performance analysis methodology based on the combination of instrumentation and sampling techniques.

  • Detailed performance analysis using coarse grain sampling

     Servat Gelabert, Harald; Llort Sanchez, German Matías; Gimenez Lucas, Judit; Labarta Mancho, Jesus Jose
    Date: 2008-07
    Report

     Share Reference managers Reference managers Open in new window

  • Automatic detection of parallel applications computation phases

     González, Juan; Gimenez Lucas, Judit; Labarta Mancho, Jesus Jose
    Date: 2008-07
    Report

     Share Reference managers Reference managers Open in new window

  • Comparing Different Clustering Algorithms on Performance Analysis of Parallel Applications

     González Garcia, Juan; Gimenez Lucas, Judit; Labarta Mancho, Jesus Jose
    Date: 2008-10
    Report

     Share Reference managers Reference managers Open in new window

  • Automatic detection of fine grain application structure (3rd Version)

     González, Juan; Gimenez Lucas, Judit; Labarta Mancho, Jesus Jose
    Date: 2008-02
    Report

     Share Reference managers Reference managers Open in new window

  • Automatic detection of parallel applications structure

     González, Juan; Gimenez Lucas, Judit; Labarta Mancho, Jesus Jose
    Date: 2008-04
    Report

     Share Reference managers Reference managers Open in new window

  • Automatic detection of fine grain application structure

     González Garcia, Juan; Gimenez Lucas, Judit; Labarta Mancho, Jesus Jose
    Date: 2007-06
    Report

     Share Reference managers Reference managers Open in new window

  • Automatic detection of fine grain application structure (2nd version

     González Garcia, Juan; Gimenez Lucas, Judit; Labarta Mancho, Jesus Jose
    Date: 2007-10
    Report

     Share Reference managers Reference managers Open in new window

  • Blue Gene/L Performance Tools

     Martorell Bofill, Xavier; Smeds, N; Walkup, R; Almasi, G; Labarta Mancho, Jesus Jose; Escale Claveras, Francesc; Gimenez Lucas, Judit
    IBM journal of research and development
    Vol. 49, num. 2/3, p. 407-424
    Date of publication: 2005-03
    Journal article

     Share Reference managers Reference managers Open in new window

  • Scalability of Visualization and Tracing Tools

     Labarta Mancho, Jesus Jose; Gimenez Lucas, Judit; Gonzalez, E Martine P; Servat Gelabert, Harald; Llort Sanchez, German Matías; Aguilar, X
    Parallel Computing Current & Future Issuses of High-End Computing
    p. 869-876
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Paramedir: A Tool for Programmable Performance Analysis

     Gabriel, Jost; Labarta Mancho, Jesus Jose; Gimenez Lucas, Judit
    Lecture notes in computer science
    Vol. 3036, p. 466-469
    Date of publication: 2004-06
    Journal article

     Share Reference managers Reference managers Open in new window

  • What Multilevel Parallel Programs Do When You Are Not Watching: A Performance Analysis Case Study Comparing MPI/Open MP, MLP and Nested OpenMP

     Gabriele, Jost; Labarta Mancho, Jesus Jose; Gimenez Lucas, Judit
    Lecture notes in computer science
    num. 3349, p. 29-40
    Date of publication: 2004-05
    Journal article

     Share Reference managers Reference managers Open in new window

  • Paramedir: A Tool for Programmable Performance Analysis

     Gimenez Lucas, Judit
    Computational Science - ICCS 2004
    Presentation's date: 2004-06-06
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Paramedir: A Tool for Programmable Performance Analysis

     Gabriel, Jost; Labarta Mancho, Jesus Jose; Gimenez Lucas, Judit
    Computational Science - ICCS 2004
    p. 466-470
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • What Multilevel Parallel programs do when you are not watching: A performance analysis case study comparing MPI/OpenMP, MLP, and nested OpenMP

     Jost, G; Labarta Mancho, Jesus Jose; Gimenez Lucas, Judit
    Workshop on OpenMP Applications and Tools
    p. 29-40
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Interfacing Computer Aided Parallelization and Performance Analysis

     Gabriele, Jost; Haoqiang, Jin; Labarta Mancho, Jesus Jose; Gimenez Lucas, Judit
    Lecture notes in computer science
    Vol. 2660, num. 4, p. 181-190
    Date of publication: 2003-06
    Journal article

     Share Reference managers Reference managers Open in new window

  • Performance Prediction in a Grid Environment

     Badia Sala, Rosa Maria; Escale Claveras, Francesc; Gabriel, Edgar; Gimenez Lucas, Judit; Rainer, Keller; Labarta Mancho, Jesus Jose; Matthias, S Müller
    Lecture notes in computer science
    Vol. 2970, num. 1, p. 257-264
    Date of publication: 2003-12
    Journal article

     Share Reference managers Reference managers Open in new window

  • Performance Prediction in a Grid environment

     Badia Sala, Rosa Maria; Escalé, Francesc; Gabriel, Edgar; Gimenez Lucas, Judit; Keller, Rainer; Labarta Mancho, Jesus Jose; Matthias, S Müller
    First European Across Grids Conference
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Performance Prediction in a Grid Environment

     Badia Sala, Rosa Maria; Escalé, Francesc; Gabriel, Edgar; Gimenez Lucas, Judit; Rainer, Keller; Labarta Mancho, Jesus Jose; Matthias, S Müller
    First European Across Grids Conference
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Performance Analysis of Multilevel Parallel Applications on Shared Memory Architectures

     Gabriele, Jost; Haoqiang, Jin; Labarta Mancho, Jesus Jose; Gimenez Lucas, Judit; Caubet Serrabou, Jordi
    IEEE International Parallel and Distributed Processing Symposium
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Interfacing Computer Aided Parallelization and Performance Analysis

     Gabriele, Jost; Haoqiang, Jin; Labarta Mancho, Jesus Jose; Gimenez Lucas, Judit
    Computational Science- ICCS 2003
    p. 181-190
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • A Dynamic Tracing Mechanism for Performance Analysis of OpenMP Applications

     Caubet Serrabou, Jordi; Gimenez Lucas, Judit; Labarta Mancho, Jesus Jose; Derose, Luiz; Jeffrey, Vetter
    2nd International Workshop on OpenMP Applications and Tools
    p. 53-67
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • ESPRIT No.24757 CEPBA-TTN - Mechanism for enabling HPCN technology transfer in Europe (METIER)

     Labarta Mancho, Jesus Jose; Valero Cortes, Mateo; Gimenez Lucas, Judit; Torres Viñals, Jordi
    Competitive project

     Share

  • The Paros Operating System Microkernel (UPC-CEPBA-94-05)

     Labarta Mancho, Jesus Jose; Girona Turell, Sergio; Cortes Rossello, Antonio; Gimenez Lucas, Judit; Pujol Jensen, Cristina; Gregoris de la Fuente, Luis
    Date: 1994-06
    Report

     Share Reference managers Reference managers Open in new window