Nowadays, most computer manufacturers offer chip multiprocessors (CMPs) due to the always increasing chip density. These CMPs have a broad range of characteristics, but all of them support the shared memory programming model. As a result, every CMP implements a coherence protocol to keep local caches coherent. Coherence protocols consume an important fraction of power to determine which coherence action to perform. Specifically, on CMPs with write-through local caches, a shared cache and a directory-based coherence protocol implemented as a duplicate of local caches tags, we have observed that energy is wasted in the directory due to two main reasons. Firstly, an important fraction of directory lookups are useless, because the target block is not located in any local cache. The power consumed by the directory could be reduce by filtering out useless directory lookups. Secondly, useful directory lookups (there are local copies of the target block) are performed over target blocks that are shared by a small number of processors. The directory power consumption could be reduced by limiting the directory lookups to only the directory entries that have a copy of the block. Along this thesis we propose two filtering mechanisms. Each of these mechanisms is focused on one of the problems described above: while our first proposal focuses on reducing number of directory lookups performed, our second proposal aims at reducing the associativity of directory lookups. Several implementations of both filtering approaches have been proposed and evaluated, having all of them a very limited hardware complexity. Our results show that the power consumed by the directory can be reduced as much as 30%
Bosque, A.; Viñals, V.; Ibáñez , P.; Llaberia, J. International European Conference on Parallel and Distributed Computing p. 269-281 DOI: 10.1007/978-3-642-23400-2_26 Data de presentació: 2011-09-02 Presentació treball a congrés
Bosque, A.; Viñals, V.; Ibáñez , P.; Llaberia, J. Euromicro Conference on Digital System Design p. 207-216 DOI: doi.ieeecomputersociety.org/10.1109/DSD.2010.85 Data de presentació: 2010 Presentació treball a congrés
Coherence protocols consume an important fraction of power to determine which coherence action should take place. In this paper we focus on CMPs with a shared cache
and a directory-based coherence protocol implemented as a duplicate of local caches tags. We observe that a big fraction of directory lookups produce a miss since the block looked up is not cached in any local cache. We propose to add a filter before the directory lookup in order to reduce the number of lookups to this structure. The filter identifies whether the current block was last accessed as a data or as an instruction. With this information, looking up the whole directory can be avoided for most accesses. We evaluate the filter in a CMP with 8 in-order processors with 4 threads each and a memory hierarchy with a shared L2 cache.We show that a filter with a size of 3% of the tag array of the shared cache can avoid more than 70% of all comparisons performed by directory lookups with a performance loss of just 0.2% for SPLASH2 and 1.5% for Specweb2005. On average,
the number of 15-bit comparisons avoided per cycle is 54 out of 77 for SPLASH2 and 29 out of 41 for Specweb2005. In both cases, the filter requires less than one read of 1 bit per cycle.
Bosque, A.; Viñals, V.; Ibáñez , P.; Stenström, P.; Llaberia, J. International Summer School on Advanced Computer Architecture and Compilation for Embedded Systems p. 201-204 Data de presentació: 2006 Presentació treball a congrés
Bosque, A.; Viñals, V.; Llaberia, J.; Stenström, P. International Summer School on Advanced Computer Architecture and Compilation for Embedded Systems p. 91-94 Data de presentació: 2005 Presentació treball a congrés