Various constraints of Static Random Access Memory (SRAM) are leading to consider new memory technologies as candidates for building on-chip shared last-level caches (SLLCs). Spin-Transfer Torque RAM (STT-RAM) is currently postulated as the prime contender due to its better energy efficiency, smaller die footprint and higher scalability. However, STT-RAM also exhibits some drawbacks, like slow and energy-hungry write operations that need to be mitigated before it can be used in SLLCs for the next generation of computers. In this work, we address these shortcomings by leveraging a new management mechanism for STT-RAM SLLCs. This approach is based on the previous observation that although the stream of references arriving at the SLLC of a Chip MultiProcessor (CMP) exhibits limited temporal locality, it does exhibit reuse locality, i.e. those blocks referenced several times manifest high probability of forthcoming reuse. As such, conventional STT-RAM SLLC management mechanisms, mainly focused on exploiting temporal locality, result in low efficient behavior. In this paper, we employ a cache management mechanism that selects the contents of the SLLC aimed to exploit reuse locality instead of temporal locality. Specifically, our proposal consists in the inclusion of a Reuse Detector (RD) between private cache levels and the STT-RAM SLLC. Its mission is to detect blocks that do not exhibit reuse, in order to avoid their insertion in the SLLC, hence reducing the number of write operations and the energy consumption in the STT-RAM. Our evaluation, using multiprogrammed workloads in quad-core, eight-core and 16-core systems, reveals that our scheme reports on average, energy reductions in the SLLC in the range of 37–30%, additional energy savings in the main memory in the range of 6–8% and performance improvements of 3% (quad-core), 7% (eight-core) and 14% (16-core) compared with an STT-RAM SLLC baseline where no RD is employed. More importantly, our approach outperforms DASCA, the state-of-the-art STT-RAM SLLC management, reporting—depending on the specific scenario and the kind of applications used—SLLC energy savings in the range of 4–11% higher than those of DASCA, delivering higher performance in the range of 1.5–14% and additional improvements in DRAM energy consumption in the range of 2–9% higher than DASCA.
This paper presents two mechanisms that can significantly improve the I/O performance of both hard and solid-state drives for read operations: KDSim and REDCAP. KDSim is an in-kernel disk simulator that provides a framework for simultaneously simulating the performance obtained by different I/O system mechanisms and algorithms, and for dynamically turning them on and off, or selecting between different options or policies, to improve the overall system performance. REDCAP is a RAM-based disk cache that effectively enlarges the built-in cache present in disk drives. By using KDSim, this cache is dynamically activated/deactivated according to the throughput achieved. Results show that, by using KDSim and REDCAP together, a system can improve its I/O performance up to 88% for workloads with some spatial locality on both hard and solid-state drives, while it achieves the same performance as a ‘regular system’ for workloads with random or sequential access patterns.
We have created and evaluated an algorithm capable of deduplicating and clustering exact- and near-duplicate media items of type photo and video that get shared on multiple social networks in the context of events. This algorithm works in an entirely ad hoc manner without requiring any pre-calculation. When people attend events, they more and more share event-related media items publicly on social networks to let their social network contacts relive and witness the attended events. In the past, we have worked on methods to accumulate such public user-generated multimedia content in order to summarize events visually, for example, in the form of media galleries or slideshows. In this paper, first, we introduce social-network-specific reasons and challenges that cause near-duplicate media items. Second, we detail an algorithm for the task of deduplicating and clustering exact- and near-duplicate media items stemming from multiple social networks. Finally, we evaluate the algorithm's strengths and weaknesses and thoroughly compare its performance with the state-of-the-art feature detection algorithms SIFT, ASIFT and SURF and show that for the given use case it performs almost equally well accuracy-wise, but strongly outperforms speed-wise.
The popular Nelder and Mead (NM) algorithm has four parameters associated to the operations known as reflection, expansion, contraction and shrinkage. The authors set their values to 1, 2, 0.5 and 0.5, respectively, which have been universally used. Here, we propose to use NM to calibrate itself. A computational experiment is carried out and results show that the parameter values originally proposed by NM are better than those obtained with more sophisticated ways.
An orchestration is a multi-threaded computation that invokes a number of remote services. In practice, the responsiveness of a web-service fluctuates with demand; during surges in activity service responsiveness may be degraded, perhaps even to the point of failure. An uncertainty profile formalizes a user's perception of the effects of stress on an orchestration of web-services; it describes a strategic situation, modelled by a zero-sum angel-daemon game. Stressed web-service scenarios are analysed, using game theory, in a realistic way, lying between over-optimism (services are entirely reliable) and over-pessimism (all services are broken). The 'resilience' of an uncertainty profile can be assessed using the valuation of its associated zero-sum game. In order to demonstrate the validity of the approach, we consider two measures of resilience and a number of different stress models. It is shown how (i) uncertainty profiles can be ordered by risk (as measured by game valuations) and (ii) the structural properties of risk partial orders can be analysed.
Many peer-to-peer (p2p) overlays require certain security services which could be provided through a Public Key Infrastructure. However, these infrastructures are bound up with a revocation system, such as Certificate Revocation Lists (CRLs). A system with a client/server structure, where a Certificate Authority plays a role of a central server, is prone to suffer from common problems of a single point of failure. If only one Authority has to distribute the whole CRL to all users, perhaps several millions in a structured p2p overlay, a bottleneck problem appears. Moreover, in these networks, users often have a set of pseudonyms that are bound to a certificate, which gives rise to two additional issues: issuing the CRL and assuring its freshness. On the one hand, the list size grows exponentially with the number of network users. On the other hand, these lists must be updated more frequently; otherwise the revocation data will not be fresh enough. To solve these problems, we propose a new distributed revocation system for the Kademlia network. Our system distributes CRLs using the overlay itself and, to not compromise the storage of nodes, lists are divided into segments. This mechanism improves the accessibility, increases the availability and guarantees the freshness of the revocation data.
Rincon, D.; Agusti, A.; Botero, J.; Raspall, F.; Remondo, D.; Hesselbach, X.; Beck, M.; de Meer, H.; Niedermeier, F.; Giuliani, G. Computer journal Vol. 56, num. 12, p. 1518-1536 DOI: 10.1093/comjnl/bxt053 Data de publicació: 2013-12-01 Article en revista
This work describes a novel approach for the reduction of energy consumption in data centres (DCs) that will yield benefits both in terms of running costs and its environmental impact. The method is based on the introduction of collaborative interactions and flexibility clauses in contracts between all the DC ecosystem entities. The included entities are all the actors along the energy
production–consumption chain, from the energy provider to the Information Technology customer. The collaborative approach also integrates the interaction between federated DCs. In this paper, we find a detailed description of the architecture that enables interaction between the DC ecosystem parties, which is designed to be progressively deployed, allowing traditional and ‘greened’ services to coexist, and without modification of the existing DC automation and framework systems.
Ants are generally believed to follow an intensive work routine. Numerous tales and fables refer to ants as conscientious workers. Nevertheless, biologists have discovered that ants also rest for extended periods of time. This does not only hold for individual ants. Interestingly, ant colonies exhibit synchronized activity phases that result from self-organization. In this work, self-synchronization in ant colonies is taken as the inspiring source for a new mechanism of self-synchronized duty-cycling in mobile sensor networks. Hereby, we assume that sensor nodes are equipped with energy harvesting capabilities such as, for example, solar cells. We show that the proposed self-synchronization mechanism can be made adaptive depending on variable energy resources. The main objective of
this paper is to study and explore the swarm intelligence foundations of self-synchronized dutycycling. With this purpose in mind, physical constraints such as packet collisions and packet loss are
generally not considered.
Coverage problems are a flourishing topic in optimization, thanks to the recent advances in the field
of wireless sensor networks. The main coverage issue centres around critical conditions that require
reliable monitoring and prohibit failures. This issue can be addressed by maximal-exposure paths,
regarding which this article presents new results. Namely, it shows how to minimize the sensing range
of a set of sensors in order to ensure the existence of a k-covered path between two points on a given
region. Such a path’s coverage depends on k ≥ 2, which is fixed. The three types of regions studied
are: a planar graph, the whole plane and a polygonal region.
The performance of high-performance computing (HPC) applications highly depends on the memory subsystem due to the huge data sets used that do not fit into the cache hierarchy. Besides, energy efficiency has become a main design factor and, consequently, both performance and energy efficiency are primary goals in HPC designs. As a result, energy-efficient high-performance memory subsystem designs should be explored. In this paper, we extend the architecture of general-purpose processors by adding a software-managed local memory (LM) and a very simple programmable DMA controller. We demonstrate that with these extensions—together with efficient run-time management—we improve performance and energy consumption factors. We perform an LM design space exploration study for an Intel® Pentium® 4 platform: we analyze the performance, energy and energy-delay product for a total of 27 computational loops of the NAS benchmarks. We show a 1.2x performance speedup factor and an energy reduction of 6.21% on average when using a constrained 32 KB LM with commodity memory bandwidths (6.4 GB/s). More aggressive configurations (i.e. 256 KB LM + 12.8 GB/s) show at least 2.14x performance speedup factors and energy savings of 42.07% on average.
In this paper we develop a variant, regenerative randomization with Laplace transform inversion, of a previously proposed method (the regenerative randomization method) for the transient analysis of rewarded continuous time Markov models. Those models find applications in dependability and performability analysis of computer and telecommunication systems. The variant differs from regenerative randomization in that the truncated transformed model obtained in that method is solved using a Laplace transform inversion algorithm instead of standard randomization. As regenerative randomization, the variant requires the selection of a regenerative state on which the performance of the method depends. For a class of models, class C’, including typical failure/repair models, a natural selection for the regenerative state exists and, with that selection, theoretical results are available assessing the performance of the method in terms of “visible” characteristics. Using dependability class C’ models of moderate size of a RAID 5 architecture we compare the performance of the variant with those of regenerative randomization and randomization with steady-state detection for irreducible models, and with those of regenerative randomization and standard randomization for models with absorbing states. For irreducible models, the new variant seems to be about as fast as randomization with steady-state detection for models which are not too small when the initial probability distribution is concentrated in the regenerative state, and significantly faster than regenerative randomization when the model is stiff and not very large. For stiff models with absorbing states, the new variant is much faster than standard randomization and significantly faster than regenerative randomization when the model is not very large. In addition, the variant seems to be able to achieve stringent accuracy levels safely.
This paper presents a novel architectural model that implements the Multipath execution model of Prolog programs. Multipath performs a partial breadth-first traversal of SLD trees, which is shown to be more efficient than the standard depth-first traversal for most of the benchmarks. Its advantages can be exploited in either a sequential or a parallel implementation. In a sequential execution, Multipath reduces the number of operations by traversing more than one search path in a single control flow. Moreover, in the context of a parallel environment, Multipath exploits path parallelism, a particular case of data parallelism when exploring search trees. We present performance figures of a sequential implementation and we measure the amount of parallelism exhibited by the execution model that could be exploited by a parallel implementation.