Vera, F.J.; Abella, J.; Llosa, J.; Gonzalez, A. ACM transactions on programming languages and systems Vol. 27, num. 5, p. 946-987 DOI: 10.1145/1086642.1086646 Data de publicació: 2005-09 Article en revista
Caches have become increasingly important with the widening gap between main memory and processor speeds. Small and fast cache memories are designed to bridge this discrepancy. However, they are only effective when programs exhibit sufficient data locality.The performance of the memory hierarchy can be improved by means of data and loop transformations. Tiling is a loop transformation that aims at reducing capacity misses by shortening the reuse distance. Padding is a data layout transformation targeted to reduce conflict misses.This article presents an accurate cost model that describes misses across different hierarchy levels and considers the effects of other hardware components such as branch predictors. The cost model drives the application of tiling and padding transformations. We combine the cost model with a genetic algorithm to compute the tile and pad factors that enhance the program performance.To validate our strategy, we ran experiments for a set of benchmarks on a large set of modern architectures. Our results show that this scheme is useful to optimize programs' performance. When compared to previous approaches, we observe that with a reasonable compile-time overhead, our approach gives significant performance improvements for all studied kernels on all architectures.
Vera, F.J.; Bermudo, N.; Llosa, J.; Gonzalez, A. ACM transactions on programming languages and systems Vol. 26, num. 2, p. 263-300 DOI: 10.1145/973097.973099 Data de publicació: 2004-03 Article en revista
The gap between processor and main memory performance increases every year. In order to overcome this problem, cache memories are widely used. However, they are only effective when programs exhibit sufficient data locality. Compile-time program transformations can significantly improve the performance of the cache. To apply most of these transformations, the compiler requires a precise knowledge of the locality of the different sections of the code, both before and after being transformed.Cache miss equations (CMEs) allow us to obtain an analytical and precise description of the cache memory behavior for loop-oriented codes. Unfortunately, a direct solution of the CMEs is computationally intractable due to its NP-complete nature.This article proposes a fast and accurate approach to estimate the solution of the CMEs. We use sampling techniques to approximate the absolute miss ratio of each reference by analyzing a small subset of the iteration space. The size of the subset, and therefore the analysis time, is determined by the accuracy selected by the user. In order to reduce the complexity of the algorithm to solve CMEs, effective mathematical techniques have been developed to analyze the subset of the iteration space that is being considered. These techniques exploit some properties of the particular polyhedra represented by CMEs.