Canal Corretger, Ramon
Total activity: 120
Expertise
DRAM, Memory Design, Microarchitecture, Nanotechnology Circuit Design, Near and Subthreshold Architectures, Non-Volatile Memories, Processor Design, SRAM, Variability
h index
14
Professional category
University lecturer
Doctoral courses
Doctor per la Universitat Politècnica de Catalunya
University degree
Enginyer en Informàtica
Research group
ARCO - Microarchitecture and Compilers
Department
Department of Computer Architecture
School
Barcelona School of Informatics (FIB)
E-mail
rcanalac.upc.edu
Contact details
UPC directory Open in new window
Orcid
0000-0003-4542-204X Open in new window
ResearcherID
E-7775-2014 Open in new window
Scopus Author ID
7004495853 Open in new window
Links of interest
Personal webpage Open in new window

Graphic summary
  • Show / hide key
  • Information


Scientific and technological production
  •  

1 to 50 of 120 results
  • Reliability In The Face of Variability in Nanometer Embedded Memories  Open access

     Ganapathy, Shrikanth
    Defense's date: 2014-04-28
    Universitat Politècnica de Catalunya
    Theses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    En esta tesis, se ha investigado el impacto de las variaciones paramétricas en el comportamiento de una estructura de procesador rendimiento crítico - recuerdos incrustados. Como variaciones se manifiestan como una distribución del consumo y el rendimiento, como primer paso , se propone una nueva metodología de modelado que ayuda a evaluar el impacto de las optimizaciones a nivel de circuito en las opciones de diseño a nivel de arquitectura. Después complementamos estas optimizaciones en tiempo de diseño con un mecanismo en tiempo de ejecución basado en body-biasing. Nuestra propuesta utiliza una novedosa variante totalmente digital de seguimiento de hardware mediante incrustado DRAM células (EDRAM) para monitorear los cambios en tiempo de ejecución de la latencia de la memoria caché. Un generador de biasing utiliza estas mediciones para generar el voltaje de polarización óptimo para cumplir con los objetivos de rendimiento requeridos.Además de lo anterior, esta tesis propone una nueva celda de memoria eDRAM que tolera mejor las variaciones y los impactos de particulas. Esta celda es una alternativa a los diseños actuales basados ??en SRAM. En el dominio de ultra bajo consumo de energía cuando la operación segura está limitada por la tensión mínima de funcionamiento (Vddmin), se analiza el impacto de las fallas en los márgenes funcionales. Con este fin, hemos desarrollado una herramienta totalmente automatizada (INFORMER) capaz de calcular mediciones de toda la memoria, como la energía , el rendimiento y yield con precisión y rapidez. Usando la herramienta desarrollada, evaluamos la efectividad de una nueva clase de técnicas híbridas en la mejora de yield de la memoria caché mediante la prevención y corrección de fallas. Tener una perspectiva holística de las métricas de rendimiento de toda la memoria nos ayuda a llegar a diseños optimizados necesarios para el correcto funcionamiento durante toda la vida útil de la memoria.

    In this thesis, we have investigated the impact of parametric variations on the behaviour of one performance-critical processor structure - embedded memories. As variations manifest as a spread in power and performance, as a first step, we propose a novel modeling methodology that helps evaluate the impact of circuit-level optimizations on architecture-level design choices. Choices made at the design-stage ensure conflicting requirements from higher-levels are decoupled. We then complement such design-time optimizations with a runtime mechanism that takes advantage of adaptive body-biasing to lower power whilst improving performance in the presence of variability. Our proposal uses a novel fully-digital variation tracking hardware using embedded DRAM (eDRAM) cells to monitor run-time changes in cache latency and leakage. A special fine-grain body-bias generator uses the measurements to generate an optimal body-bias that is needed to meet the required yield targets. A novel variation-tolerant and soft-error hardened eDRAM cell is also proposed as an alternate candidate for replacing existing SRAM-based designs in latency critical memory structures. In the ultra low-power domain where reliable operation is limited by the minimum voltage of operation (Vddmin), we analyse the impact of failures on cache functional margin and functional yield. Towards this end, we have developed a fully automated tool (INFORMER) capable of estimating memory-wide metrics such as power, performance and yield accurately and rapidly. Using the developed tool, we then evaluate the #effectiveness of a new class of hybrid techniques in improving cache yield through failure prevention and correction. Having a holistic perspective of memory-wide metrics helps us arrive at design-choices optimized simultaneously for multiple metrics needed for maintaining lifetime requirements.

  • DRAM-based coherent caches and how to take advantage of the coherence protocol to reduce the refresh energy

     Jaksic, Zoran; Canal Corretger, Ramon
    Design, Automation and Test in Europe
    Presentation's date: 2014-03
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Recent technology trends has turned DRAMs into an interesting candidate to substitute traditional SRAM-based on-chip memory structures (i.e. register file, cache memories). Nevertheless, a major problem to introduce these cells is that they lose their state (i.e. value) over time, and they have to be refreshed. This paper proposes the implementation of coherent caches with DRAM cells. Furthermore, we propose to use the coherence state to tune the refresh overhead. According to our analysis, an average of up to 57% of refresh energy can be saved. Also, comparing to the caches implemented in SRAMs total energy savings are on average up to 39% depending of the refresh policy with a performance loss below 8%.

  • SSFB: a highly-efficient and scalable simulation reduction technique for SRAM yield analysis

     Rana, Manish; Canal Corretger, Ramon
    Design, Automation and Test in Europe
    Presentation's date: 2014-03-24
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Estimating extremely low SRAM failure-probabilities by conventional Monte Carlo (MC) approach requires hundreds-of-thousands simulations making it an impractical approach. To alleviate this problem, failure-probability estimation methods with a smaller number of simulations have recently been proposed, most notably variants of consecutive mean-shift based Importance Sampling (IS). In this method, a large amount of time is spent simulating data points that will eventually be discarded in favor of other data-points with minimum norm. This can potentially increase the simulation time by orders of magnitude. To solve this very important limitation, in this paper, we introduce SSFB: A novel SRAM failure-probability estimation method that has much better cognizance of the data points compared to conventional approaches. The proposed method starts with radial simulation of a single point and reduces discarded simulations by: a) random sampling-only-when it reaches a failure boundary and after that continues again with radial simulation of a chosen point, and b) random sampling is performed-only-within a specific failure-range which decreases in each iteration. The proposed method is also scalable to higher dimensions (more input variables) as sampling is done on the surface of the hyper-sphere, rather than within-the-hypersphere as other techniques do. Our results show that using our method we can achieve an overall 40x reduction in simulations compared to consecutive mean-shift IS methods while remaining within the 0.01-Sigma accuracy. © 2014 EDAA.

    Estimating extremely low SRAM failure-probabilities by conventional Monte Carlo (MC) approach requires hundreds-of-thousands simulations making it an impractical approach. To alleviate this problem, failure-probability estimation methods with a smaller number of simulations have recently been proposed, most notably variants of consecutive mean-shift based Importance Sampling (IS). In this method, a large amount of time is spent simulating data points that will eventually be discarded in favor of other data-points with minimum norm. This can potentially increase the simulation time by orders of magnitude. To solve this very important limitation, in this paper, we introduce SSFB: A novel SRAM failure-probability estimation method that has much better cognizance of the data points compared to conventional approaches. The proposed method starts with radial simulation of a single point and reduces discarded simulations by: a) random sampling-only-when it reaches a failure boundary and after that continues again with radial simulation of a chosen point, and b) random sampling is performed-only-within a specific failure-range which decreases in each iteration. The proposed method is also scalable to higher dimensions (more input variables) as sampling is done on the surface of the hyper-sphere, rather than within-the-hypersphere as other techniques do. Our results show that using our method we can achieve an overall 40x reduction in simulations compared to consecutive mean-shift IS methods while remaining within the 0.01-Sigma accuracy. © 2014 EDAA.

  • INFORMER: an integrated framework for early-stage memory robustness analysis

     Ganapathy, Shrikanth; Canal Corretger, Ramon; Alexandrescu, Dan; Costenaro, Eric; Gonzalez Colas, Antonio Maria; Rubio Sola, Jose Antonio
    Design, Automation and Test in Europe
    Presentation's date: 2014-03-24
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    With the growing importance of parametric (process and environmental) variations in advanced technologies, it has become a serious challenge to design reliable, fast and low-power embedded memories. Adopting a variation-aware design paradigm requires a holistic perspective of memory-wide metrics such as yield, power and performance. However, accurate estimation of such metrics is largely dependent on circuit implementation styles, technology parameters and architecture-level specifics. In this paper, we propose a fully automated tool - INFORMER - that helps high-level designers estimate memory reliability metrics rapidly and accurately. The tool relies on accurate circuit-level simulations of failure mechanisms such as soft-errors and parametric failures. The statistics obtained can then help couple low-level metrics with higher-level design choices. A new technique for rapid estimation of low-probability failure events is also proposed. We present three use-cases of our prototype tool to demonstrate its diverse capabilities in autonomously guiding large SRAM based robust memory designs. © 2014 EDAA.

    With the growing importance of parametric (process and environmental) variations in advanced technologies, it has become a serious challenge to design reliable, fast and low-power embedded memories. Adopting a variation-aware design paradigm requires a holistic perspective of memory-wide metrics such as yield, power and performance. However, accurate estimation of such metrics is largely dependent on circuit implementation styles, technology parameters and architecture-level specifics. In this paper, we propose a fully automated tool - INFORMER - that helps high-level designers estimate memory reliability metrics rapidly and accurately. The tool relies on accurate circuit-level simulations of failure mechanisms such as soft-errors and parametric failures. The statistics obtained can then help couple low-level metrics with higher-level design choices. A new technique for rapid estimation of low-probability failure events is also proposed. We present three use-cases of our prototype tool to demonstrate its diverse capabilities in autonomously guiding large SRAM based robust memory designs. © 2014 EDAA.

  • Thread row buffers: Improving memory performance isolation and throughput in multiprogrammed environments

     Herrero Abellanas, Enric; Gonzalez, Jose; Canal Corretger, Ramon; Tullsen, Dean
    IEEE transactions on computers
    Date of publication: 2013-09
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    The widespread adoption of chip multiprocessors in recent years has increased the number of applications simultaneously accessing DRAM memories. Therefore, memory access patterns have also changed and this has reduced row buffer locality significantly, degrading performance and energy efficiency. Furthermore, concurrent execution of applications also has shown the need of performance isolation among threads in the memory controller to enforce a quality of service in virtualized environments. Existing DRAM memories, however, enforce a tradeoff between throughput and isolation. To solve these problems, this paper proposes the addition of Thread Row Buffers (TRBs) to DRAM memories. TRBs keep an active row per thread, thereby increasing DRAM efficiency by avoiding alternate accesses to a limited number of rows and allowing the implementation of a memory scheduler not bound to the throughput-isolation tradeoff. Thread Row Buffers with Service Partitioning (TRB-SP) increase the row hit-rate by 38 percent with respect to FR-FCFS and by 11 percent with respect to Cache DRAM. This, in turn, increases overall performance by 17 and 7 percent, respectively. TRB-SP is also able to reduce the standard deviation of the memory access time of an application by 40 percent over FR-FCFS, 31 percent over PAR-BS, and 42 percent over Cache DRAM. 1968-2012 IEEE.

  • Impact of FinFET technology introduction in the 3T1D-DRAM memory cell

     Amat Bertran, Esteve; Garcia Almudever, Carmen; Aymerich Capdevila, Nivard; Canal Corretger, Ramon; Rubio Sola, Jose Antonio
    IEEE transactions on device and materials reliability
    Date of publication: 2013-01-09
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    n this paper, the 3T1D-DRAM cell based on FinFET devices is studied as an alternative to the bulk one. We observe an improvement in its behavior when IG and SG FinFETs are properly mixed, since together they provide a relevant increase in the memory circuit retention time. Moreover, our FinFET cell shows larger variability robustness, better performance at low supply voltage, and higher tolerance to elevated temperatures.

  • Impact of finfet and III-V/Ge technology on logic and memory cell behavior

     Amat, Esteve; Calomarde Palomino, Antonio; Garcia Almudever, Carmen; Aymerich Capdevila, Nivard; Canal Corretger, Ramon; Rubio Sola, Jose Antonio
    IEEE transactions on device and materials reliability
    Date of publication: 2013-11-20
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    In this work, we assess the performance of a ring oscillator and a DRAM cell when they are implemented with different technologies (planar CMOS, FinFET and III-V MOSFETs), and subjected to different reliability scenarios (variability and soft errors). FinFET-based circuits show the highest robustness against variability and soft error environments.

  • Variability robustness enhancement for 7nm FinFET 3T1D-DRAM cells

     Amat Bertran, Esteve; Garcia Almudever, Carmen; Aymerich Capdevila, Nivard; Rubio Sola, Jose Antonio; Canal Corretger, Ramon
    IEEE International Midwest Symposium on Circuits and Systems
    Presentation's date: 2013-08-05
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    3T1D-DRAM cells will still be operative with 7nm FinFETs but their performance is significantly degraded when factoring in variability. In order to improve the cell robustness against device process variation and high environment temperatures, we propose a Dual-VT strategy. Our results show a larger retention time, significant cell spread reduction and reliable behavior up to 100°C.

  • An energy-efficient and scalable eDRAM-based register file architecture for GPGPU

     Jing, Naifeng; Shen, Yao; Lu, Yao; Ganapathy, Shrikanth; Mao, Zhigang; Guo, Minyi; Canal Corretger, Ramon; Liang, Xiaoyao
    Annual International Symposium on Computer Architecture
    Presentation's date: 2013-06
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    The heavily-threaded data processing demands of streaming multiprocessors (SM) in a GPGPU require a large register file (RF). The fast increasing size of the RF makes the area cost and power consumption unaffordable for traditional SRAM designs in the future technologies. In this paper, we propose to use embedded-DRAM (eDRAM) as an alternative in future GPGPUs. Compared with SRAM, eDRAM provides higher density and lower leakage power. However, the limited data retention time in eDRAM poses new challenges. Periodic refresh operations are needed to maintain data integrity. This is exacerbated with the scaling of eDRAM density, process variations and temperature. Unlike conventional CPUs which make use of multi-ported RF, most of the RFs in modern GPGPU are heavily banked but not multi-ported to reduce the hardware cost. This provides a unique opportunity to hide the refresh overhead. We propose two different eDRAM implementations based on 3T1D and 1T1C memory cells. To mitigate the impact of periodic refresh, we propose two novel refresh solutions using bank bubble and bank walk-through. Plus, for the 1T1C RF, we design an interleaved bank organization together with an intelligent warp scheduling strategy to reduce the impact of the destructive reads. The analysis shows that our schemes present better energy efficiency, scalability and variation tolerance than traditional SRAM-based designs.

  • Combining RAM technologies for hard-error recovery in L1 data caches working at very-low power modes

     Lorente, Vicente; Valero, Alejandro; Sahuquillo, Julio; Petit, Salvador; Canal Corretger, Ramon; López, Pedro; Duato, José
    Design, Automation and Test in Europe
    Presentation's date: 2013-03-20
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Low-power modes in modern microprocessors rely on low frequencies and low voltages to reduce the energy budget. Nevertheless, manufacturing induced parameter variations can make SRAM cells unreliable producing hard errors at supply voltages below Vccmin.

    Low-power modes in modern microprocessors rely on low frequencies and low voltages to reduce the energy budget. Nevertheless, manufacturing induced parameter variations can make SRAM cells unreliable producing hard errors at supply voltages below Vccmin. Recent proposals provide a rather low fault-coverage due to the fault coverage/overhead trade-off. We propose a new fault- tolerant L1 cache, which combines SRAM and eDRAM cells in L1 data caches to provide 100% SRAM hard-error fault coverage. Results show that, compared to a conventional cache and assuming 50% failure probability at low-power mode, leakage and dynamic energy savings are by 85% and 62%, respectively, with a minimal impact on performance.

  • Effectiveness of hybrid recovery techniques on parametric failures

     Ganapathy, Shrikanth; Canal Corretger, Ramon; Gonzalez Colas, Antonio Maria; Rubio Sola, Jose Antonio
    International Symposium on Quality Electronic Design
    Presentation's date: 2013-03
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Modern day microprocessors effectively utilise supply voltage scaling for tremendous power reduction. The minimum voltage beyond which a processor cannot operate reliably is defined as V ddmin. On-chip memories like caches are the most susceptible to voltage-noise induced failures because of process variations and reduced noise-margins thereby arbitrating whole processor's V ddmin. In this paper, we evaluate the effectiveness of a new class of hybrid techniques in improving cache yield through failure prevention and correction. Proactive read/write assist techniques like body-biasing (BB) and wordline boosting (WLB) when combined with reactive techniques like ECC and redundancy are shown to offer better quality-energy-area trade offs when compared to their standalone configurations. Proactive techniques can help lower V ddmin (improving functional margin) for significant power savings and reactive techniques ensure that the resulting large number of failures are corrected (improving functional yield). Our results in 22nm technology indicate that at scaled supply voltages, hybrid techniques can improve parametric yield by atleast 28% when considering worst-case process variations

  • Impact of positive bias temperature instability (PBTI) on 3T1D-DRAM cells

     Aymerich Capdevila, Nivard; Ganapathy, Shrikanth; Rubio Sola, Jose Antonio; Canal Corretger, Ramon; Gonzalez Colas, Antonio Maria
    Integration. The VLSI journal
    Date of publication: 2012-06
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Distributed cooperative caching: an energy efficient memory scheme for chip multiprocessors

     Herrero Abellanas, Enric; González, José; Canal Corretger, Ramon
    IEEE transactions on parallel and distributed systems
    Date of publication: 2012-05
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Variability mitigation mechanisms in scaled 3T1D-DRAM memories to 22 nm and beyond

     Amat Bertran, Esteve; Garcia Almudever, Carmen; Aymerich Capdevila, Nivard; Canal Corretger, Ramon; Rubio Sola, Jose Antonio
    IEEE transactions on device and materials reliability
    Date of publication: 2012-09-06
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Comparison of SRAM cells for 10-nm SOI FinFETs under process and environmental variations

     Jaksic, Zoran; Canal Corretger, Ramon
    IEEE transactions on electron devices
    Date of publication: 2012-12
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    We explore the 6T and 8T SRAM design spaces through read static noise margin (RSNM), word-line write margin, and leakage for future 10-nm FinFETs. Process variations are based on the ITRS and modeled at device (TCAD) level. We propose a method to incorporate them into a BSIM-CMG model card for time-efficient simulation. We analyze cells with different fin numbers, supply voltages, and temperatures. Results show a 1.8× improvement of RSNM for 8T SRAM cells, the need for stronger pull-downs to secure read stability in 6Ts, and high leakage sensitivity to temperature (10× between 40°C and 100°C). As a specific example, we show how the RSNM of a 6T SRAM cell can be improved by using back-gate biasing techniques for independent-gate FinFETs. We show how WLMN is increased by reducing the strength of pull-up transistors when reverse back-gate biasing is applied on it and how the RSNM can be increased by reducing the strength of access transistor by reverse back-gate biasing of pass-gate transistors. When combining these two techniques, RSNM can be improved up to 25% without compromising cell write ability for any sample. In general, when compared to previous technologies, read stability is untouched, writeability is reduced, and leakage keeps stable.

    We explore the 6T and 8T SRAM design spaces through read static noise margin (RSNM), word-line write margin, and leakage for future 10-nm FinFETs. Process variations are based on the ITRS and modeled at device (TCAD) level. We propose a method to incorporate them into a BSIM-CMG model card for time-efficient simulation. We analyze cells with different fin numbers, supply voltages, and temperatures. Results show a 1.8× improvement of RSNM for 8T SRAM cells, the need for stronger pull-downs to secure read stability in 6Ts, and high leakage sensitivity to temperature (10× between 40°C and 100°C). As a specific example, we show how the RSNM of a 6T SRAM cell can be improved by using back-gate biasing techniques for independent-gate FinFETs. We show how WLMN is increased by reducing the strength of pull-up transistors when reverse back-gate biasing is applied on it and how the RSNM can be increased by reducing the strength of access transistor by reverse back-gate biasing of pass-gate transistors. When combining these two techniques, RSNM can be improved up to 25% without compromising cell write ability for any sample. In general, when compared to previous technologies, read stability is untouched, writeability is reduced, and leakage keeps stable.

  • Process variability in sub-16nm bulk CMOS technology

     Rubio Sola, Jose Antonio; Figueras Pamies, Juan; Vatajelu, Elena Ioana; Canal Corretger, Ramon
    Date: 2012-03-01
    Report

     Share Reference managers Reference managers Open in new window

  • IEEE On-Line Testing Symposium 2012

     Cruz Diaz, Josep-llorenç; Canal Corretger, Ramon
    Participation in a competitive project

     Share

  • Analysis of FinFET technology on memories

     Amat, E.; ASenov, Asen; Canal Corretger, Ramon; Cheng, B.; Cruz Diaz, Josep-llorenç; Jaksic, Zoran; Miranda, Miguel; Rubio Sola, Jose Antonio; Zuber, Paul
    IEEE International On-Line Testing Symposium
    Presentation's date: 2012-06-29
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • A Novel variation-tolerant 4T-DRAM with enhance soft-error tolerance

     Ganapathy, Shrikanth; Canal Corretger, Ramon; Alexandrescu, Dan; Costenaro, Enrico; Gonzalez Colas, Antonio Maria; Rubio Sola, Jose Antonio
    IEEE International Conference on Computer Design: VLSI in Computers and Processors
    Presentation's date: 2012-09-30
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    In view of device scaling issues, embedded DRAM (eDRAM) technology is being considered as a strong alternative to conventional SRAM for use in on-chip memories. Memory cells designed using eDRAM technology in addition to being logic-compatible, are variation tolerant and immune to noise present at low supply voltages. However, two major causes of concern are the data retention capability which is worsened by parameter variations leading to frequent data refreshes (resulting in large dynamic power overhead) and the transient reduction of stored charge increasing soft-error (SE) susceptibility. In this paper, we present a novel variation-tolerant 4T-DRAM cell whose power consumption is 20.4% lower when compared to a similar sized eDRAM cell. The retention time on-average is improved by 2.04X while incurring a delay overhead of 3% on the read-access time. Most importantly, using a soft-error (SE) rate analysis tool, we have confirmed that the cell sensitivity to SEs is reduced by 56% on-average in a natural working environment.

  • Enhancing 3T DRAMs for SRAM replacement under 10nm tri-gate SOI FinFETs

     Jaksic, Zoran; Canal Corretger, Ramon
    IEEE International Conference on Computer Design: VLSI in Computers and Processors
    Presentation's date: 2012-10-02
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    In this paper, we present the dynamic 3T memory cell for future 10 nm tri-gate FinFETs as a potential replacement for classical 6T SRAM cell for implementation in high speed cache memories. We investigate read access time, retention time, and static power consumption of the cell when it is exposed to the effects of process and environmental variations. Process variations are extracted from the ITRS predictions and they are modeled at device level. For simulation, we use 10 nm SOI tri-gate FinFET BSIM-CMG model card developed by the University of Glasgow, Device Modeling Group. When compared to the classical 6T SRAM, 3T cell has 40% smaller area, leakage is reduced up to 14 times while access time is approximately the same. In order to achieve higher retention times,we propose several cell extensions which, at the same time, enable post-fabrication/run-time adaptability.

    In this paper, we pr esent the dynamic 3T memory cell for future 10nm tri-gate FinFETs as a potential replacement for classical 6T SRAM cell for implementation in high speed cache memories. We investigate read access time, retention time, and static power consumption of the cell when it is exposed to the effects of process and environmental variations. Process variations are extracted from the ITRS predictions and they are modeled at device level. For simulation, we use 10nm SOI tri-gate FinFET BSIM-CMG model card developed by the University of Glasgow, Device Modeling Group. When compared to the classical 6T SRAM, 3T cell has 40% smaller area, leakage is reduced up to 14 times while access time is approximately the same. In order to achieve higher retention times, we propose several cell extensions which, at the same time, enable post- fabrication/run-time adaptability.

  • A novel variation-tolerant 4T-DRAM cell with enhanced soft-error tolerance

     Ganapathy, Shrikanth; Canal Corretger, Ramon; Alexandrescu, Dan; Costenaro, Eric; Gonzalez Colas, Antonio Maria; Rubio Sola, Jose Antonio
    IEEE International Conference on Computer Design: VLSI in Computers and Processors
    Presentation's date: 2012-10-02
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    In view of device scaling issues, embedded DRAM (eDRAM) technology is being considered as a strong alternative to conventional SRAM for use in on-chip memories. Memory cells designed using eDRAM technology in addition to being logic-compatible, are variation tolerant and immune to noise present at low supply voltages. However, two major causes of concern are the data retention capability which is worsened by parameter variations leading to frequent data refreshes (resulting in large dynamic power overhead) and the transient reduction of stored charge increasing soft-error (SE) susceptibility. In this paper, we present a novel variation-tolerant 4T-DRAM cell whose power consumption is 20.4% lower when compared to a similar sized eDRAM cell. The retention time on-average is improved by 2.04X while incurring a delay overhead of 3% on the read-access time. Most importantly, using a soft-error (SE) rate analysis tool, we have confirmed that the cell sensitivity to SEs is reduced by 56% on-average in a natural working environment.

    In view of device scaling issues, embedded DRAM (eDRAM) technology is being considered as a strong alternative to conventional SRAM for use in on-chip memories. Memory cells designed using eDRAM technology in addition to being logic-compatible, are variation tolerant and immune to noise present at low supply voltages. However, two major causes of concern are the data retention capability which is worsened by parameter variations leading to frequent data refreshes (resulting in large dynamic power overhead) and the transient reduction of stored charge increasing soft-error (SE) susceptibility. In this paper, we present a novel variation-tolerant 4T-DRAM cell whose power consumption is 20.4% lower when compared to a similar sized eDRAM cell. The retention time on-average is improved by 2.04X while incurring a delay overhead of 3% on the read-access time. Most importantly, using a soft-error (SE) rate analysis tool, we have confirmed that the cell sensitivity to SEs is reduced by 56% on-average in a natural working environment

  • Enhancing 6T SRAM cell stabilitty by back gate biasing techniques for 10nm SOI FinFETs under process and environmental variations

     Jaksic, Zoran; Canal Corretger, Ramon
    International Conference Mixed Design of Integrated Circuits and Systems
    Presentation's date: 2012-05-26
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Strain relevance on the improvement of the 3T1D cell performance

     Amat Bertran, Esteve; Garcia Almudever, Carmen; Aymerich Capdevila, Nivard; Canal Corretger, Ramon; Rubio Sola, Jose Antonio
    International Conference Mixed Design of Integrated Circuits and Systems
    Presentation's date: 2012-05-26
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Impact of bulk/SOI 10nm FinFETs on 3T1D-DRAM cell performance

     Amat Bertran, Esteve; Garcia Almudever, Carmen; Aymerich Capdevila, Nivard; Canal Corretger, Ramon; Rubio Sola, Jose Antonio
    International Conference on Solid-State and Integrated Circuit Technology
    Presentation's date: 2012-10
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    While the feasibility of SOI or bulk substrates for 10nm FinFETs has been shown, their impact on 3T1D memory performance has not been studied yet. In our study, bulk-based FinFETs show a better behavior for golden devices. Nevertheless, when variation is factored in, SOI-based FinFETs present better tolerance and, consequently, lower performance spread than bulk-based devices. When considering environment temperature it is always a detrimental factor for both multi-gate devices, but the impact is lower for the bulk ones.

  • Access to the full text
    Mitigation strategies of the variability in 3T1D cell memories scaled beyond 22nm  Open access  awarded activity

     Amat Bertran, Esteve; Garcia Almudever, Carmen; Aymerich Capdevila, Nivard; Canal Corretger, Ramon; Rubio Sola, Jose Antonio
    Conference on Design of Circuits and Integrated Systems
    Presentation's date: 2012
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    3T1D cell has been stated as a valid alternative to be implemented on L1 memory cache to substitute 6T, highly affected by device variability. In this contribution, we have shown that 22nm 3T1D memory cells present significant tolerance to high levels of device parameter fluctuation. Moreover, we have observed that the variability of the write access transistor has turn into the more detrimental device for the 3T1D cell performance. Furthermore, resizing and temperature control have been presented as some strategies to mitigate the cell variability.

    3T1D cell has been stated as a valid alternative to be implemented on L1 memory cache to substitute 6T, highly affected by device variability. In this contribution, we have shown that 22nm 3T1D memory cells present significant tolerance to high levels of device parameter fluctuation. Moreover, we have observed that the variability of the write access transistor has turn into the more detrimental device for the 3T1D cell performance. Furthermore, resizing and temperature control have been presented as some strategies to mitigate the cell variability.

  • TRAMS Project: variability and reliability of SRAM memories in sub-22nm bulk-CMOS technologies

     Canal Corretger, Ramon; Rubio Sola, Jose Antonio; ASenov, Asen; Brown, Andrew; Miranda, Miguel; Zuber, Paul; Gonzalez Colas, Antonio Maria; Vera, Xavier
    Procedia Computer Science
    Date of publication: 2011-12-22
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    The TRAMS (Terascale Reliable Adaptive MEMORY Systems) project addresses in an evolutionary way the ultimate CMOS scaling technologies and paves the way for revolutionary, most promising beyond-CMOS technologies. In this abstract we show the significant variability levels of future 18 and 13 nm device bulk-CMOS technologies as well as its dramatic effect on the yield of memory cells and circuits.

  • Cooperative caching for clip multiprocessors

     Chang, J. Chang; Herrero Abellanas, Enric; Canal Corretger, Ramon; Sohi, G.
    Date of publication: 2011
    Book chapter

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Adaptive Memory Hierarchies For Next Generation Tiled Microarchitectures  Open access  awarded activity

     Herrero Abellanas, Enric
    Defense's date: 2011-07-05
    Department of Computer Architecture, Universitat Politècnica de Catalunya
    Theses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Les últimes dècades el rendiment dels processadors i de les memòries ha millorat a diferent ritme, limitant el rendiment dels processadors i creant el conegut memory gap. Sol·lucionar aquesta diferència de rendiment és un camp d'investigació d'actualitat i que requereix de noves sol·lucions. Una sol·lució a aquest problema són les memòries “cache”, que permeten reduïr l'impacte d'unes latències de memòria creixents i que conformen la jerarquia de memòria. La majoria de d'organitzacions de les “caches” estan dissenyades per a uniprocessadors o multiprcessadors tradicionals. Avui en dia, però, el creixent nombre de transistors disponible per xip ha permès l'aparició de xips multiprocessador (CMPs). Aquests xips tenen diferents propietats i limitacions i per tant requereixen de jerarquies de memòria específiques per tal de gestionar eficientment els recursos disponibles. En aquesta tesi ens hem centrat en millorar el rendiment i la eficiència energètica de la jerarquia de memòria per CMPs, des de les “caches” fins als controladors de memòria. A la primera part d'aquesta tesi, s'han estudiat organitzacions tradicionals per les “caches” com les privades o compartides i s'ha pogut constatar que, tot i que funcionen bé per a algunes aplicacions, un sistema que s'ajustés dinàmicament seria més eficient. Tècniques com el Cooperative Caching (CC) combinen els avantatges de les dues tècniques però requereixen un mecanisme centralitzat de coherència que té un consum energètic molt elevat. És per això que en aquesta tesi es proposa el Distributed Cooperative Caching (DCC), un mecanisme que proporciona coherència en CMPs i aplica el concepte del cooperative caching de forma distribuïda. Mitjançant l'ús de directoris distribuïts s'obté una sol·lució més escalable i que, a més, disposa d'un mecanisme de marcatge més flexible i eficient energèticament. A la segona part, es demostra que les aplicacions fan diferents usos de la “cache” i que si es realitza una distribució de recursos eficient es poden aprofitar els que estan infrautilitzats. Es proposa l'Elastic Cooperative Caching (ElasticCC), una organització capaç de redistribuïr la memòria “cache” dinàmicament segons els requeriments de cada aplicació. Una de les contribucions més importants d'aquesta tècnica és que la reconfiguració es decideix completament a través del maquinari i que tots els mecanismes utilitzats es basen en estructures distribuïdes, permetent una millor escalabilitat. ElasticCC no només és capaç de reparticionar les “caches” segons els requeriments de cada aplicació, sinó que, a més a més, és capaç d'adaptar-se a les diferents fases d'execució de cada una d'elles. La nostra avaluació també demostra que la reconfiguració dinàmica de l'ElasticCC és tant eficient que gairebé proporciona la mateixa taxa de fallades que una configuració amb el doble de memòria.Finalment, la tesi es centra en l'estudi del comportament de les memòries DRAM i els seus controladors en els CMPs. Es demostra que, tot i que els controladors tradicionals funcionen eficientment per uniprocessadors, en CMPs els diferents patrons d'accés obliguen a repensar com estan dissenyats aquests sistemes. S'han presentat múltiples sol·lucions per CMPs però totes elles es veuen limitades per un compromís entre el rendiment global i l'equitat en l'assignació de recursos. En aquesta tesi es proposen els Thread Row Buffers (TRBs), una zona d'emmagatenament extra a les memòries DRAM que permetria guardar files de dades específiques per a cada aplicació. Aquest mecanisme permet proporcionar un accés equitatiu a la memòria sense perjudicar el seu rendiment global. En resum, en aquesta tesi es presenten noves organitzacions per la jerarquia de memòria dels CMPs centrades en la escalabilitat i adaptativitat als requeriments de les aplicacions. Els resultats presentats demostren que les tècniques proposades proporcionen un millor rendiment i eficiència energètica que les millors tècniques existents fins a l'actualitat.

    Processor performance and memory performance have improved at different rates during the last decades, limiting processor performance and creating the well known "memory gap". Solving this performance difference is an important research field and new solutions must be proposed in order to have better processors in the future. Several solutions exist, such as caches, that reduce the impact of longer memory accesses and conform the system memory hierarchy. However, most of the existing memory hierarchy organizations were designed for single processors or traditional multiprocessors. Nowadays, the increasing number of available transistors has allowed the apparition of chip multiprocessors, which have different constraints and require new ad-hoc memory systems able to efficiently manage memory resources. Therefore, in this thesis we have focused on improving the performance and energy efficiency of the memory hierarchy of chip multiprocessors, ranging from caches to DRAM memories. In the first part of this thesis we have studied traditional cache organizations such as shared or private caches and we have seen that they behave well only for some applications and that an adaptive system would be desirable. State-of-the-art techniques such as Cooperative Caching (CC) take advantage of the benefits of both worlds. This technique, however, requires the usage of a centralized coherence structure and has a high energy consumption. Therefore we propose the Distributed Cooperative Caching (DCC), a mechanism to provide coherence to chip multiprocessors and apply the concept of cooperative caching in a distributed way. Through the usage of distributed directories we obtain a more scalable solution and, in addition, has a more flexible and energy-efficient tag allocation method. We also show that applications make different uses of cache and that an efficient allocation can take advantage of unused resources. We propose Elastic Cooperative Caching (ElasticCC), an adaptive cache organization able to redistribute cache resources dynamically depending on application requirements. One of the most important contributions of this technique is that adaptivity is fully managed by hardware and that all repartitioning mechanisms are based on distributed structures, allowing a better scalability. ElasticCC not only is able to repartition cache sizes to application requirements, but also is able to dynamically adapt to the different execution phases of each thread. Our experimental evaluation also has shown that the cache partitioning provided by ElasticCC is efficient and is almost able to match the off-chip miss rate of a configuration that doubles the cache space. Finally, we focus in the behavior of DRAM memories and memory controllers in chip multiprocessors. Although traditional memory schedulers work well for uniprocessors, we show that new access patterns advocate for a redesign of some parts of DRAM memories. Several organizations exist for multiprocessor DRAM schedulers, however, all of them must trade-off between memory throughput and fairness. We propose Thread Row Buffers, an extended storage area in DRAM memories able to store a data row for each thread. This mechanism enables a fair memory access scheduling without hurting memory throughput. Overall, in this thesis we present new organizations for the memory hierarchy of chip multiprocessors which focus on the scalability and of the proposed structures and adaptivity to application behavior. Results show that the presented techniques provide a better performance and energy-efficiency than existing state-of-the-art solutions.

  • On the effectiveness of hybrid mechanisms on reduction of parametric failures in caches

     Ganapathy, Shrikanth; Canal Corretger, Ramon; Gonzalez Colas, Antonio Maria; Rubio Sola, Jose Antonio
    Date: 2011-12-05
    Report

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    In this paper, we provide an insight on the different proactive read/write assist methods (wordline boosting & adaptive body biasing) that help in preventing (and reducing) parametric failures when coupled with reactive techniques like ECC and redundancy which cope with already existent failures. While proactive and reactive have been previously viewed as complementary techniques, we show that it is not necessarily the case when considering the benefits of such hybrid schemes.

    Postprint (author’s final draft)

  • Access to the full text
    Dynamic fine-grain body biasing of caches with latency and leakage 3T1D-based monitors  Open access

     Ganapathy, Shrikanth; Canal Corretger, Ramon; Gonzalez Colas, Antonio Maria; Rubio Sola, Jose Antonio
    Date: 2011-04-15
    Report

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    In this paper, we propose a dynamically tunable fine-grain body biasing mechanism to reduce active & standby leakage power in caches under process variations.

  • MICROARQUITECTURA Y COMPILADORES PARA FUTUROS PROCESADORES II

     Parcerisa Bundó, Joan Manuel; Canal Corretger, Ramon; Tubella Murgadas, Jordi; Cruz Diaz, Josep-llorenç; Gonzalez Colas, Antonio Maria
    Participation in a competitive project

     Share

  • New reliability mechanisms in memory design for sub-22nm technologies

     Aymerich Capdevila, Nivard; Brown, A.; Canal Corretger, Ramon; Cheng, B.; Figueras Pamies, Juan; Gonzalez Colas, Antonio Maria; Herrero Abellanas, Enric; Markov, S.; Miranda, Miguel; Pouyan, Peyman; Ramirez Garcia, Tanausu; Rubio Sola, Jose Antonio; Vatajelu, I.; Vera, Xavier; Wang, W.; Zuber, Paul; ASenov, Asen
    IEEE International On-Line Testing Symposium
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • TRAMS Project: variability and reliability of SRAM memories in sub-22 nm Bulk-CMOS technologies

     Canal Corretger, Ramon; Rubio Sola, Jose Antonio; ASenov, Asen; Brown, A.; Miranda, Miguel; Zuber, Paul; Gonzalez Colas, Antonio Maria; Vera, Xavier
    European Future Technologies Conference and Exhibition
    Presentation's date: 2011
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Impact of positive bias temperature instability (PBTI) on 3T1D-DRAM cells

     Aymerich Capdevila, Nivard; Ganapathy, Shrikanth; Rubio Sola, Jose Antonio; Canal Corretger, Ramon; Gonzalez Colas, Antonio Maria
    ACM Great Lakes Symposium on VLSI
    Presentation's date: 2011-05-18
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Memory circuits are playing a key role in complex multicore systems with both data and instructions storage and mailbox communication functions. There is a general concern that conventional SRAM cell based on the 6T structure could exhibit serious limitations in future CMOS technologies due to the instability caused by transistor mismatching as well as for leakage consumption reasons. For L1 data caches the new cell 3T1D DRAM is considered a potential candidate to substitute 6T SRAMs. We first evaluate the impact of the positive bias temperature instability, PBTI, on the access and retention time of the 3T1D memory cell implemented with 45 nm technology. Then, we consider all sources of variations and the effect of the degradation caused by the aging of the device on the yield at system level.

  • Dynamic fine-grain body biasing of caches with latency and leakage 3T1D-based monitors

     Ganapathy, Shrikanth; Canal Corretger, Ramon; Gonzalez Colas, Antonio Maria; Rubio Sola, Jose Antonio
    IEEE International Conference on Computer Design: VLSI in Computers and Processors
    Presentation's date: 2011
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Elastic Cooperative Caching: An Autonomous Dynamically Adaptive Memory Hierarchy for Chip Multiprocessors

     Herrero Abellanas, Enric; González, Jose; Canal Corretger, Ramon
    Computer architecture news
    Date of publication: 2010
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Mèrits docents d'especial qualitat

     Canal Corretger, Ramon
    Award or recognition

     Share

  • Access to the full text
    vPROBE: Variation aware post-silicon power/performance binning using embedded 3T1D cells  Open access

     Ganapathy, Shrikanth; Canal Corretger, Ramon; Gonzalez Colas, Antonio Maria; Rubio Sola, Jose Antonio
    Date: 2010-09-05
    Report

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    In this paper, we present an on-die post-silicon binning methodology that takes into account the effect of static and dynamic variations and categorizes every processor based on power/performance.The proposed scheme is composed of a discretization hardware that exploits the delay/leakage dependence on variability sources characteristic for categorization

  • TERASCALE RELIABLE ADAPTIVE MEMORY SYSTEMS

     Figueras Pamies, Juan; Vatajelu, Elena Ioana; Aymerich Capdevila, Nivard; Calomarde Palomino, Antonio; Moll Echeto, Francesc de Borja; Garcia Almudever, Carmen; Canal Corretger, Ramon; Pouyan, Peyman; Rubio Sola, Jose Antonio
    Participation in a competitive project

     Share

  • TERASCALE RELIABLE ADAPTIVE MEMORY SYSTEMS

     Canal Corretger, Ramon; Cruz Diaz, Josep-llorenç; Tubella Murgadas, Jordi; Gonzalez Colas, Antonio Maria
    Participation in a competitive project

     Share

  • Elastic cooperative caching: an autonomous dynamically adaptive memory hierarchy for chip multiprocessors

     Herrero Abellanas, Enric; González, José; Canal Corretger, Ramon
    International Symposium on Computer Architecture
    Presentation's date: 2010-06-19
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Power-efficient spilling techniques for chip multiprocessors

     Herrero Abellanas, Enric; González, José; Canal Corretger, Ramon
    International European Conference on Parallel and Distributed Computing
    Presentation's date: 2010-09-02
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Current trends in CMPs indicate that the core count will increase in the near future. One of the main performance limiters of these forthcoming microarchitectures is the latency and high-demand of the on-chip network and the off-chip memory communication. To optimize the usage of on-chip memory space and reduce off-chip traffic several techniques have proposed to use the N-chance forwarding mechanism, a solution for distributing unused cache space in chip multiprocessors. This technique, however, can lead in some cases to extra unnecessary network traffic or inefficient cache allocation. This paper presents two alternative power-efficient spilling methods to improve the efficiency of the N-chance forwarding mechanism. Compared to traditional Spilling, our Distance-Aware Spilling technique provides an energy efficiency improvement (MIPS3/W) of 16% on average, and a reduction of the network usage of 14% in a ring configuration while increasing performance 6%. Our Selective Spilling technique is able to avoid most of the unnecessary reallocations and it doubles the reuse of spilled blocks, reducing network traffic by an average of 22%. A combination of both techniques allows to reduce the network usage by 30% on average without degrading performance, allowing a 9% increase of the energy efficiency.

  • MODEST: a model for energy estimation under spatio-temporal variability

     Ganapathy, Shrikanth; Canal Corretger, Ramon; Gonzalez Colas, Antonio Maria; Rubio Sola, Jose Antonio
    International Symposium on Low Power Electronics and Design
    Presentation's date: 2010-08-20
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Circuit propagation delay estimation through multivariate regression-based modeling under spatio-temporal variability

     Ganapathy, Shrikanth; Canal Corretger, Ramon; Gonzalez Colas, Antonio Maria; Rubio Sola, Jose Antonio
    Design, Automation and Test in Europe
    Presentation's date: 2010-03-08
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Circuit propagation delay estimation through multivariate regression-based modeling under spatio-temporal variability

     Ganapathy, Shrikanth; Canal Corretger, Ramon; Gonzalez Colas, Antonio Maria; Rubio, Antonio
    Date: 2009-09-07
    Report

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • 2009-SGR-1250 Arquitectura i Compiladors (ARCO)

     Tubella Murgadas, Jordi; Parcerisa Bundó, Joan Manuel; Gonzalez Colas, Antonio Maria; Canal Corretger, Ramon; Cruz Diaz, Josep-llorenç; Molina Clemente, Carlos; Aliagas Castell, Carles; Aleta Ortega, Alexandre
    Participation in a competitive project

     Share

  • MICROARQUITECTURA I COMPILADORS (ARCO)

     Gibert Codina, Enric; Canal Corretger, Ramon; Cruz Diaz, Josep-llorenç; Parcerisa Bundó, Joan Manuel; Pons Solé, Marc; Aliagas Castell, Carles; Aleta Ortega, Alexandre; Molina Clemente, Carlos; Magklis, Grigorios; Unsal, Osman Sabri; Piñeiro Riobo, Jose Alejandro; Vera Rivera, Francisco Javier; Gonzalez Colas, Antonio Maria; Codina Viñas, Josep M; Tubella Murgadas, Jordi
    Participation in a competitive project

     Share

  • Access to the full text
    Using coherence information and decay techniques to optimize L2 cache leakage in CMPs  Open access

     Monchiero, Matteo; Canal Corretger, Ramon; Gonzalez Colas, Antonio Maria
    International Conference on Parallel Processing
    Presentation's date: 2009
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    This paper evaluates several techniques to save leakage in CMP L2 caches by selectively switching off the less used lines. We primarily focus on private snoopy L2 caches. In this case, coherence must be enforced in all situations and specially when a line is turned off to save power. In particular, we introduce three techniques: the first one turns off the cache lines by using the coherence protocol invalidations, the second one is an implementation of a cache decay technique specific for coherent caches, the third one is a performance-optimized decay-based technique for coherent caches. Experimental results, carried out by using accurate performance/thermal/energy models, show that appreciable power savings can be achieved by properly designing a leakage optimization technique. We target a CMP composed of 4 cores and 1 to 8 MB of total cache. For 4MB, the proposed techniques show a 13%, 30%, and 21% energy reduction, respectively, at the cost of 0%, 8%, and 2% performance loss. For other cache sizes the behavior is qualitatively similar.

  • Microarquitectura i compiladors (ARCO)

     Tubella Murgadas, Jordi; Gonzalez Colas, Antonio Maria; Parcerisa Bundó, Joan Manuel; Canal Corretger, Ramon; Cruz Diaz, Josep-llorenç; Molina Clemente, Carlos Maria; Aliagas Castell, Carles; Aleta Ortega, Alexandre; Deb, Abhishek; Sreekar Shenoy, Govind; Pavlou, Demos; Herrero Abellanas, Enric; Yazdanpanah Ahmadabadi, Fahimeh; Bhagat, Indu; Lira Rueda, Javier; Lupon Navazo, Marc; Pons Sole, Marc; Ranjan, Rakesh; Ganapathy, Shrikanth; Jaksic, Zoran
    Participation in a competitive project

     Share

  • An hybrid eDRAM/SRAM macrocell to implement first-level data caches

     Valero, Alejandro; Sahuquillo, Julio; Petit, Salvador; Lorente, Vicente; Canal Corretger, Ramon; López, Pedro; Duato, José
    IEEE/ACM International Symposium on Microarchitecture
    Presentation's date: 2009-12-14
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    SRAM and DRAM cells have been the predominant technologies used to implement memory cells in computer systems, each one having its advantages and shortcomings. SRAM cells are faster and require no refresh since reads are not destructive. In contrast, DRAM cells provide higher density and minimal leakage energy since there are no paths within the cell from Vdd to ground. Recently, DRAM cells have been embedded in logic-based technology, thus overcoming the speed limit of typical DRAM cells. In this paper we propose an n-bit macrocell that implements one static cell, and n-1 dynamic cells. This cell is aimed at being used in an n-way set-associative first-level data cache. Our study shows that in a four-way set-associative cache with this macrocell compared to an SRAM based with the same capacity, leakage is reduced by about 75% and area more than half with a minimal impact on performance. Architectural mechanisms have also been devised to avoid refresh logic. Experimental results show that no performance is lost when the retention time is larger than 50K processor cycles. In addition, the proposed delayed writeback policy that avoids refreshing performs a similar amount of writebacks than a conventional cache with the same organization, so no power wasting is incurred.