Graphic summary
  • Show / hide key
  • Information


Scientific and technological production
  •  

1 to 50 of 230 results
  • Query Optimization Engine for Graph Databases

     Larriba Pey, Josep
    Participation in a competitive project

     Share

  • Research endogamy as an indicator of conference quality

     Lopez Montolio, Sergio; Dominguez Sala, David; Larriba Pey, Josep
    SIGMOD record
    Date of publication: 2013-06
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Endogamy in scientific publications is a measure of the degree of collaboration between researchers. In this paper, we analyze the endogamy of a large set of computer science conferences and journals. We observe a strong correlation between the quality of those conferences and the endogamy of their authors: conferences where researchers collaborate with new peers have significantly more quality than conferences where researchers work in groups that are stable along time.

  • Producer-consumer: the programming model for future many-core processors

     Prat Perez, Arnau; Dominguez Sala, David; Larriba Pey, Josep; Troncoso, Pedro
    International Conference on Architecture of Computing Systems
    Presentation's date: 2013
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    The massive addition of cores on a chip is adding more pressure to the accesses to main memory. In order to avoid this bottleneck, we propose the use of a simple producer-consumer model, which allows for the temporary results to be transferred directly from one task to another. These data transfer operations are performed within the chip, using on-chip memory, thus avoiding costly off-chip memory accesses. We implement this model on a real many-core processor, the 48-core Intel Single-chip Cloud Computer processor using its on-chip memory facilities. We find that the Producer-Consumer model adapts to such architectures and allow to achieve good task and data parallelism. For the evaluation of the proposed platform we implement a graph-based application using the Producer- Consumer model. Our tests show that the model scales very well as it takes advantage of the on-chip memory. The execution times of our implementation are up to 9 times faster than the baseline implementation, which relies on storing the temporary results to main memory

    The massive addition of cores on a chip is adding more pressure to the accesses to main memory. In order to avoid this bottleneck, we propose the use of a simple producer-consumer model, which allows for the temporary results to be transferred directly from one task to another. These data transfer operations are performed within the chip, using on-chip memory, thus avoiding costly off-chip memory accesses. We implement this model on a real many-core processor, the 48-core Intel Single-chip Cloud Computer processor using its on-chip memory facilities. We find that the Producer-Consumer model adapts to such architectures and allow to achieve good task and data parallelism. For the evaluation of the proposed platform we implement a graph-based application using the Producer- Consumer model. Our tests show that the model scales very well as it takes advantage of the on-chip memory. The execution times of our implementation are up to 9 times faster than the baseline implementation, which relies on storing the temporary results to main memory.

  • Visible, near infrared and thermal hand-based image biometric recognition.  Open access

     Font Aragones, Xavier
    Defense's date: 2013-05-30
    Universitat Politècnica de Catalunya
    Theses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Biometric Recognition refers to the automatic identification of a person based on his or her anatomical characteristic or modality (i.e., fingerprint, palmprint, face) or behavioural (i.e., signature) characteristic. It is a fundamental key issue in any process concerned with security, shared resources, network transactions among many others. Arises as a fundamental problem widely known as recognition, and becomes a must step before permission is granted. It is supposed that protects key resources by only allowing those resources to be used by users that have been granted authority to use or to have access to them. Biometric systems can operate in verification mode, where the question to be solved is Am I who I claim I am? or in identification mode where the question is Who am I? Scientific community has increased its efforts in order to improve performance of biometric systems. Depending on the application many solutions go in the way of working with several modalities or combining different classification methods. Since increasing modalities require some user inconvenience many of these approaches will never reach the market. For example working with iris, face and fingerprints requires some user effort in order to help acquisition. This thesis addresses hand-based biometric system in a thorough way. The main contributions are in the direction of a new multi-spectral hand-based image database and methods for performance improvement. The main contributions are: A) The first multi-spectral hand-based image database from both hand faces: palmar and dorsal. Biometric database are a precious commodity for research, mainly when it offers something new like visual (VIS), near infrared (NIR) and thermography (TIR) images at a time. This database with a length of 100 users and 10 samples per user constitute a good starting point to check algorithms and hand suitability for recognition. B) In order to correctly deal with raw hand data, some image preprocessing steps are necessary. Three different segmentation phases are deployed to deal with VIS, NIR and TIR images specifically. Some of the tough questions to address: overexposed images, ring fingers and the cuffs, cold finger and noise image. Once image segmented, two different approaches are prepared to deal with the segmented data. These two approaches called: Holistic and Geometric define the main focus to extract the feature vector. These feature vectors can be used alone or can be combined in some way. Many questions can be stated: e.g. which approach is better for recognition?, Can fingers alone obtain better performance than the whole hand? and Is thermography hand information suitable for recognition due to its thermoregulation properties? A complete set of data ready to analyse, coming from the holistic and geometric approach have been designed and saved to test. Some innovative geometric approach related to curvature will be demonstrated. C) Finally the Biometric Dispersion Matcher (BDM) is used in order to explore how it works under different fusion schemes, as well as with different classification methods. It is the intention of this research to contrast what happen when using other methods close to BDM like Linear Discriminant Analysis (LDA). At this point, some interesting questions will be solved, e.g. by taking advantage of the finger segmentation (as five different modalities) to figure out if they can outperform what the whole hand data can teach us.

    El Reconeixement Biomètric fa referència a la identi cació automàtica de persones fent us d'alguna característica o modalitat anatòmica (empremta digital) o d'alguna característica de comportament (signatura). És un aspecte fonamental en qualsevol procés relacionat amb la seguretat, la compartició de recursos o les transaccions electròniques entre d'altres. És converteix en un pas imprescindible abans de concedir l'autorització. Aquesta autorització, s'entén que protegeix recursos clau, permeten així, que aquests siguin utilitzats pels usuaris que han estat autoritzats a utilitzar-los o a tenir-hi accés. Els sistemes biomètrics poden funcionar en veri cació, on es resol la pregunta: Soc jo qui dic que soc? O en identi cació on es resol la qüestió: Qui soc jo? La comunitat cientí ca ha incrementat els seus esforços per millorar el rendiment dels sistemes biomètrics. En funció de l'aplicació, diverses solucions s'adrecen a treballar amb múltiples modalitats o combinant diferents mètodes de classi cació. Donat que incrementar el número de modalitats, representa a la vegada problemes pels usuaris, moltes d'aquestes aproximacions no arriben mai al mercat. La tesis contribueix principalment en tres grans àrees, totes elles amb el denominador comú següent: Reconeixement biometric a través de les mans. i) La primera d'elles constitueix la base de qualsevol estudi, les dades. Per poder interpretar, i establir un sistema de reconeixement biomètric prou robust amb un clar enfocament a múltiples fonts d'informació, però amb el mínim esforç per part de l'usuari es construeix aquesta Base de Dades de mans multi espectral. Les bases de dades biomètriques constitueixen un recurs molt preuat per a la recerca; sobretot si ofereixen algun element nou com es el cas. Imatges de mans en diferents espectres electromagnètics: en visible (VIS), en infraroig (NIR) i en tèrmic (TIR). Amb un total de 100 usuaris, i 10 mostres per usuari, constitueix un bon punt de partida per estudiar i posar a prova sistemes multi biomètrics enfocats a les mans. ii) El segon bloc s'adreça a les dues aproximacions existents en la literatura per a tractar les dades en brut. Aquestes dues aproximacions, anomenades Holística (tracta la imatge com un tot) i Geomètrica (utilitza càlculs geomètrics) de neixen el focus alhora d'extreure el vector de característiques. Abans de tractar alguna d'aquestes dues aproximacions, però, és necessària l'aplicació de diferents tècniques de preprocessat digital de la imatge per obtenir les regions d'interès desitjades. Diferents problemes presents a les imatges s'han hagut de solucionar de forma original per a cadascuna de les tipologies de les imatges presents: VIS, NIR i TIR. VIS: imatges sobre exposades, anells, mànigues, braçalets. NIR: Ungles pintades, distorsió en forma de soroll en les imatges TIR: Dits freds La segona àrea presenta aspectes innovadors, ja que a part de segmentar la imatge de la ma, es segmenten tots i cadascun dels dits (feature-based approach). Així aconseguim contrastar la seva capacitat de reconeixement envers la ma de forma completa. Addicionalment es presenta un conjunt de procediments geomètrics amb la idea de comparar-los amb els provinents de l'extracció holística. La tercera i última àrea contrasta el procediment de classi cació anomenat Biometric Dispersion Matcher (BDM) amb diferents situacions. La primera relacionada amb l'efectivitat respecte d'altres mètode de reconeixement, com ara l'Anàlisi Lineal Discriminant (LDA) o bé mètodes com KNN o la regressió logística. Les altres situacions que s'analitzen tenen a veure amb múltiples fonts d'informació, quan s'apliquen tècniques de normalització i/o estratègies de combinació (fusió) per millorar els resultats. Els resultats obtinguts no deixen lloc per a la confusió, i són certament prometedors en el sentit que posen a la llum la importància de combinar informació complementària per obtenir rendiments superiors.

  • A Coherent and Rich PaaS with a Common Programming Model

     Larriba Pey, Josep
    Participation in a competitive project

     Share

  • Shaping communities out of triangles

     Prat Perez, Arnau; Dominguez Sal, David; Brunat Blay, Josep Maria; Larriba Pey, Josep
    ACM International Conference on Information and Knowledge Management
    Presentation's date: 2012
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Community detection has arisen as one of the most relevant topics in the field of graph data mining due to its importance in many fields such as biology, social networks or network traffic analysis. The metrics proposed to shape communi- ties are generic and follow two approaches: maximizing the internal density of such communities or reducing the connec - tivity of the internal vertices with those outside the commu - nity. However, these metrics take the edges as a set and do not consider the internal layout of the edges in the commu- nity. We define a set of properties oriented to social network s that ensure that communities are cohesive, structured and well defined. Then, we propose the Weighted Community Clustering ( W CC ), which is a community metric based on triangles. We proof that analyzing communities by trian- gles gives communities that fulfill the listed set of propert ies, in contrast to previous metrics. Finally, we experimentall y show that WCC correctly captures the concept of commu- nity in social networks using real and syntethic datasets, a nd compare statistically some of the most relevant community detection algorithms in the state of the art.

  • GraphGen: a tool for automatic generation of multipartite graphs from arbitrary data

     Álvarez Garcia, Sandra; Baeza Yates, Ricardo; Brisaboa, Nieves R.; Larriba Pey, Josep; Pedreira, Oscar
    Latin American Web Congress
    Presentation's date: 2012-10
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Efficient graph management based on bitmap indices

     Martinez Bazan, Norbert; Muntés Mulero, Víctor; Gómez Villamor, S.; Dominguez Sala, David; Aguila Lorente, Miguel Angel; Larriba Pey, Josep
    International Database Engineering and Applications Symposium
    Presentation's date: 2012
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    The increasing amount of graph like data from social networks, science and the web has grown an interest in analyzing the relationships between different entities. New specialized solutions in the form of graph databases, which are generic and able to adapt to any schema as an alternative to RDBMS, have appeared to manage attributed multigraphs efficiently. In this paper, we describe the internals of DEX graph database, which is based on a representation of the graph and its attributes as maps and bitmap structures that can be loaded and unloaded efficiently from memory. We also present the internal operations used in DEX to manipulate these structures. We show that by using these structures, DEX scales to graphs with billions of vertices and edges with very limited memory requirements. Finally, we compare our graph-oriented approach to other approaches showing that our system is better suited for out-of-core typical graph-like operations.

    The increasing amount of graph like data from social networks, science and the web has grown an interest in analyzing the relationships between different entities. New specialized solutions in the form of graph databases, which are generic and able to adapt to any schema as an alternative to RDBMS, have appeared to manage attributed multigraphs efficiently. In this paper, we describe the internals of DEX graph database, which is based on a representation of the graph and its attributes as maps and bitmap structures that can be loaded and unloaded efficiently from memory. We also present the internal operations used in DEX to manipulate these structures. We show that by using these structures, DEX scales to graphs with billions of vertices and edges with very limited memory requirements. Finally, we compare our graph-oriented approach to other approaches showing that our system is better suited for out-of-core typical graph-like operations.

  • Access to the full text
    Using Evolutive Summary Counters for Efficient Cooperative Caching in Search Engines  Open access

     Dominguez Sala, David; Aguilar Saborit, Josep; Surdeanu, Mihai; Larriba Pey, Josep
    IEEE transactions on parallel and distributed systems
    Date of publication: 2012-04
    Journal article

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    We propose and analyze a distributed cooperative caching strategy based on the Evolutive Summary Counters (ESC), a new data structure that stores an approximated record of the data accesses in each computing node of a search engine. The ESC capture the frequency of accesses to the elements of a data collection, and the evolution of the access patterns for each node in a network of computers. The ESC can be efficiently summarized into what we call ESC-summaries to obtain approximate statistics of the document entries accessed by each computing node. We use the ESC-summaries to introduce two algorithms that manage our distributed caching strategy, one for the distribution of the cache contents, ESC-placement, and another one for the search of documents in the distributed cache, ESC-search. While the former improves the hit rate of the system and keeps a large ratio of data accesses local, the latter reduces the network traffic by restricting the number of nodes queried to find a document. We show that our cooperative caching approach outperforms state of the art models in both hit rate, throughput, and location recall for multiple scenarios, i.e., different query distributions and systems with varying degrees of complexity.

  • An Online Writer Recognition System Based On In-Air And On-Surface Trajectories  Open access

     Sesa Nogueras, Enric
    Defense's date: 2012-09-20
    Department of Computer Architecture, Universitat Politècnica de Catalunya
    Theses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    The main motivation of this dissertation is the exploration of the field of online text-dependent writer recognition, in order to provide evidence of the usefulness of short sequences of text to perform identification and verification, which are the two tasks involved in recognition. From this motivation stem its main goals and contributions: an exploration performed from a practical perspective, thus requiring the development of a recognition system, and the gathering of evidence concerning the discriminative power of in-air trajectories (the trajectories described while not exerting any pressure on the writing surface, when the hand moves in the air while transitioning from one stroke to the next), i.e. their ability to discriminate among writers. In-air and on-surface trajectories have been analyzed from the perspective of information theory and the results yielded by this analysis show that, except for pressure, they contain virtually equal amounts of information and are notably non-redundant. This suggests that in-air trajectories may have a considerable discriminative power and that they may help improve the overall recognition performance when combined with on-surface trajectories. An innovative writer recognition system that fulfils the abovementioned practical goal has been devised. It follows an allographic approach, that is, it does not take into account the global characteristics of the text but focuses on character and character-fragment shapes. Strokes are considered the structural units of handwriting and any piece of text is regarded as two separate sequences, one of pen-up and one of pen-down strokes. The system relies on a pair of catalogues of strokes, built in an unsupervised manner by means of self-organizing maps, which allow mapping sequences of strokes into sequences of integers. The latter sequences, much simpler than the original ones, can be effectively compared by means of dynamic time warping, which takes advantage of the neighbouring properties exhibited by self-organizing maps. Measures obtained from each sequence can be combined in a later step. The recognition system has been experimentally tested using 16 uppercase words from the BiosecurID database, which contains 4 executions of each word donated by 400 writers. The experimental results obtained clearly sustain the claim that online words have a notable recognition potential and show the suitability of the allographic approach to perform writer recognition in the online text-dependent context. Regarding identification, the system compares positively to other word-based identification schemas. As for verification, the accuracy levels attained do not lie much below the accuracies reported for today¿s state-of-the-art signature verification methods. Furthermore, the results obtained from in-air trajectories have substantiated what the information analysis had already suggested: their considerable recognition power and their notable non-redundancy with respect to on-surface trajectories. Finally, a new method to generate synthetic samples of online words from real ones has been proposed. This method is based on the recognition system previously described, takes advantage of its main characteristics and can be seamlessly integrated into it. Synthetic samples are used to enlarge the enrolment sets, which has the effect of substantially improving the recognition accuracy of the system.

    La principal motivació d’aquesta dissertació és la investigació en el camp del reconeixement d’escriptors en la modalitat online depenent del text, amb intenció de proporcionar evidències que avalin la utilitat de les seqüències curtes per a la identificació i la verificació, que són les dues tasques compreses en el reconeixement. D’aquesta motivació se’n deriven els seus objectius més rellevants: una exploració feta des d’una perspectiva pràctica que requereix, doncs, el desenvolupament d’un sistema de reconeixement; i la recerca d’evidència relacionada amb la potència discriminant de les trajectòries en l’aire (aquelles que són executades sense que l’estri d’escriptura exerceixi pressió sobre la superfície, en les transicions entre traços), això és, la seva capacitat per a reconèixer escriptors. Les trajectòries en l’aire i sobre la superfície han estat analitzades des de la perspectiva de la teoria de la informació. Els resultats obtinguts d’aquesta anàlisi mostren que, llevat de la pressió, ambdós tipus de trajectòries contenen quantitats d’informació pràcticament idèntiques, amb un nivell notable de no redundància. Això suggereix que les trajectòries en l’aire potser posseeixen una potència discriminant considerable i que la capacitat global de reconeixement pot millorar si es combinen amb les trajectòries sobre la superfície. S’ha desenvolupat un sistema de reconeixement innovador que representa l’assoliment de l’objectiu pràctic. Aquest sistema està basat en una aproximació al•logràfica, això és, no té en compte les característiques globals del text sinó que està focalitzat en les formes dels caràcters i dels seus fragments. Els traços són considerats la unitat estructural bàsica de l’escriptura i qualsevol fragment de text és entès com un parell de seqüències separades, una de traços en superfície i una de traços elevats. El sistema treballa en base a un parell de catàlegs de traços, construïts de manera no supervisada amb l’ajut de mapes autoorganitzats, que li permeten transformar les seqüències de traços en seqüències de números enters. Aquestes darreres seqüències, molt més simples que no pas les originals, poden ser comparades, de manera efectiva, mitjançant el dynamic time warping (alineament temporal dinàmic) el qual treu profit de les propietats de veïnatge característiques dels mapes autoorganitzats. Les mesures que s’obtenen de cada seqüència poden ser combinades en un pas posterior. El sistema de reconeixement ha estat provat experimentalment fent ús de les 16 paraules en majúscules de la base de dades BiosecurID, la qual en conté 4 realitzacions de cadascuna donades per 400 persones. Els resultats experimentals que s’han obtingut recolzen clarament l’afirmació que les paraules online presenten una potència discriminant notable i avalen l’adequació de l’aproximació al•logràfica per a dur a terme reconeixement d’escriptors en el context online depenent del text. Quant a la identificació, el sistema es compara favorablement amb altres mètodes basats en paraules. I, pel que fa a la verificació, els nivells de precisió obtinguts no es troben gaire lluny dels nivells assolits pels mètodes de verificació de signatura representatius de l’estat de l’art actual. És més, els resultats que s’obtenen de les trajectòries en l’aire han corroborat allò que havia estat suggerit per l’anàlisi de la informació: la seva considerable potència discriminant i la seva substancial manca de redundància respecte de les trajectòries sobre la superfície. Finalment, s’ha proposat un nou sistema de generació de mostres sintètiques de paraules online. Aquest mètode està basat en el sistema de reconeixement abans descrit, n’aprofita les característiques principals i s’hi pot integrar amb facilitat. Les mostres sintètiques s’utilitzen per engrandir els conjunts d’inscripció (enrolment sets), la qual cosa té com a efecte una millora substancial de la precisió del sistema.

  • Linked Data Benchmark Council

     Perez Casany, Marta; Martinez Bazan, Norbert; Escale Claveras, Francesc; Ferrer Sumsi, Miquel; Prat Perez, Arnau; Dominguez Sal, David; Larriba Pey, Josep
    Participation in a competitive project

     Share

  • Pla de transferencia tecnologica pttu de la Univeristat Politècnica de Catalunya

     Larriba Pey, Josep
    Participation in a competitive project

     Share

  • Access to the full text
    Hybrid tables for speeding-up data accesses in hybrid database management systems  Open access

     Guisado Gamez, Joan; Wolski, Antoni; Zuzarte, Calisto; Larriba Pey, Josep; Muntés Mulero, Víctor
    Jornadas de Ingeniería del Software y Bases de Datos
    Presentation's date: 2011-09-06
    Presentation of work at congresses

    Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

  • Memory-, bandwidth-, and power-aware multi-core for a graph database workload

     Trancoso, Pedro; Martinez Bazan, Norbert; Larriba Pey, Josep
    International Conference on Architecture of Computing Systems
    Presentation's date: 2011-02-24
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Processors have evolved to the now de-facto standard multicore architecture. The continuous advances in technology allow for increased component density, thus resulting in a larger number of cores on the chip. This, in turn, places pressure on the off-chip and pin bandwidth. Large Last-Level Caches (LLC), which are shared among all cores, have been used as a way to control the out-of-chip requests. In this work we focus on analyzing the memory behavior of a modern demanding application, a graph-based database workload, which is representative of future workloads. We analyze the performance of this application for different cache configurations in terms of: memory access time, bandwidth requirements, and power consumption. The experimental results show that the bandwidth requirements reduce as the number of clusters reduces and the LLC per cluster increases. This configuration is also the most power efficient. If on the other hand, memory latency is the dominant factor, assuming bandwidth is not a limitation, then the best configuration is the one with more clusters and smaller LLCs.

  • Gestió i projecció de projectes de recerca i innovació amb empreses internacionals

     Larriba Pey, Josep
    Participation in a competitive project

     Share

  • Recomanació i Exploració de Continguts Audiovisuals Orientats a l'Aprenentatge

     Larriba Pey, Josep
    Participation in a competitive project

     Share

  • EVALUACION DE LA CIENCIA EN ESPAÑA, CONTINUACION DEL PROYECTO TIN2008-01202-E/TIN

     Larriba Pey, Josep
    Participation in a competitive project

     Share

  • Social based layouts for the increase of locality in graph operations

     Prat Perez, Arnau; Dominguez Sala, David; Larriba Pey, Josep
    Lecture notes in computer science
    Date of publication: 2011
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Graphs provide a natural data representation for analyzing the relationships among entities in many application areas. Since the analysis algorithms perform memory intensive operations, it is important that the graph layout is adapted to take advantage of the memory hierarchy. Here, we propose layout strategies based on community detection to improve the in-memory data locality of generic graph algorithms. We conclude that the detection of communities in a graph provides a layout strategy that improves the performance of graph algorithms consistently over other state of the art strategies.

  • A discussion on the design of graph database benchmarks

     Dominguez Sala, David; Martinez Bazan, Norbert; Muntés Mulero, Víctor; Baleta Ferrer, Pedro; Larriba Pey, Josep
    Lecture notes in computer science
    Date of publication: 2011
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Access to the full text
    Hybrid in-memory and on-disk tables for speeding-up table accesses  Open access

     Guisado Gamez, Joan; Wolski, Antoni; Zuzarte, Calisto; Larriba Pey, Josep; Muntés Mulero, Víctor
    International Conference on Database and Expert Systems Applications
    Presentation's date: 2010-08-30
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Main memory database management systems have become essential for response-time-bounded applications, such as those in telecommunications systems or Internet, where users frequently access a table in order to get information or check whether an element exists, and require the response to be as fast as possible. Continuous data growth is making it unafordable to keep entire relations in memory and some commercial applications provide two different engines to handle data in-memory and on-disk separately. However, these systems assign each table to one of these engines, forcing large relations to be kept on secondary storage. In this paper we present TwinS|a hybrid database management system that allows managing hybrid tables, i.e. tables partially managed by both engines. Our objective is twofold: first, to allow large tables that do not fit in the memory to partially benefit from in-memory management techniques and, second, to provide a way to discard unnecessary accesses to both memory and disk. Overall, we show that we can reduce response time when accessing a large table in the database. All our experiments have been run on a dual-engine DBMS: IBM-SolidDB .

  • Cooperative cache analysis for distributed search engines

     Dominguez Sala, David; Perez Casany, Marta; Larriba Pey, Josep
    International Journal of Information Technology, Communications and Convergence
    Date of publication: 2010
    Journal article

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    In this paper, we study the performance of a distributed search engine from a data caching point of view using statistical tools on a varied set of configurations. We study two strategies to achieve better performance: cacheaware load balancing that issues the queries to nodes that store the computation in cache; and cooperative caching (CC) that stores and transfers the available computed contents from one node in the network to others. Since cache-aware decisions depend on information about the recent history, we also analyse how the ageing of this information impacts the system performance. Our results show that the combination of both strategies yield better throughput than individually implementing cooperative cache or cache-aware load balancing strategies because of a synergic improvement of the hit rate. Furthermore, the analysis concludes that the data structures to monitor the system need only moderate precision to achieve optimal throughput.

  • Graph partitioning strategies for efficient BFS in shared-nothing parallel systems

     Muntés Mulero, Víctor; Martinez Bazan, Norbert; Larriba Pey, Josep; Pacitti, Esther; Valduriez, Patrick
    Lecture notes in computer science
    Date of publication: 2010
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Desemantization for numerical microdata anonymization

     Pont Tuset, Jordi; Nin Guerrero, Jordi; Medrano Gracia, Pau; Larriba Pey, Josep; Muntés Mulero, Víctor
    Date of publication: 2010
    Book chapter

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    The design of cryptographic and security protocols for new scenarios and applications can be computationally expensive. Examples of these can be sensor or mobile ad-hoc networks where thousands of nodes can be involved. In such cases, the aid of an automated tool generating protocols for a predefined problem can be of great utility. This work uses the genetic algorithms (GA) techniques for the automatic design of security networked protocols. When using GA for optimizing protocols two aspects are critical: the genome definition and the evaluation function. We discuss how security protocols can be represented as binary strings and can be interpreted as security protocols; moreover we define several basic criteria for evaluating security protocols. Finally, we present the software we developed for generating secure communications protocols and show some examples and obtained results.

  • Analysis and Optimization of Question Answering Systems  Open access

     Dominguez Sala, David
    Defense's date: 2010-04-23
    Department of Computer Architecture, Universitat Politècnica de Catalunya
    Theses

    Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

  • Creixement centre DAMA-UPC

     Larriba Pey, Josep
    Participation in a competitive project

     Share

  • PROCESADO DE ALTO RENDIMIENTO DE GRANDES CONJUNTOS DE DAOTS REPRESENTADOS COMO GRAFOS

     Martinez Bazan, Norbert; Dominguez Sala, David; Gomez Villamor, Sergio; Larriba Pey, Josep
    Participation in a competitive project

     Share

  • Survey of Graph Database Performance on the HPC Scalable Graph Analysis Benchmark

     Dominguez Sala, David; Urbon Bayes, P.; Gimenez Vaño, A.; Gomez Villamor, Sergio; Martinez Bazan, Norbert; Larriba Pey, Josep
    Lecture notes in computer science
    Date of publication: 2010
    Journal article

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • DEX: ANÁLISIS DE DATOS

     Baleta Ferrer, Pedro; Coll Jimenez, Damaris; Trench Ribes, Nuria; Pau Fernandez, Raquel; Ventura Simon, Robert; Tomas Ozalla, Miguel; Guisado Gamez, Joan; Prat Perez, Arnau; Martinez Palau, Xavier; Larriba Pey, Josep
    Participation in a competitive project

     Share

  • Exploración Intuitiva de Recursos Bibliograficos a Gran Escala

     Larriba Pey, Josep
    Participation in a competitive project

     Share

  • Proposta XViB Call 5 ICT

     Larriba Pey, Josep
    Participation in a competitive project

     Share

  • EMPOWER, Web Users for Self Services in Digital Multimedia Libraries

     Larriba Pey, Josep
    Participation in a competitive project

     Share

  • DETECT, Intercambio de datos para la detección de peligrosos criminales

     Larriba Pey, Josep
    Participation in a competitive project

     Share

  • Access to the full text
    Cache-aware load balancing vs. cooperative caching for distributed search engines  Open access

     Dominguez Sala, David; Perez Casany, Marta; Larriba Pey, Josep
    IEEE International Conference on High Performance Computing and Communications
    Presentation's date: 2009-06-26
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    In this paper we study the performance of a distributed search engine from a data caching point of view. We compare and combine two different approaches to achieve better hit rates: (a) send the queries to the node which currently has the related data in its local memory (cache-aware load balancing), and (b) send the cached contents to the node where a query is being currently processed (cooperative caching). Furthermore, we study the best scheduling points in the query computation in which they can be reassigned to another node, and how this reassignation should be performed. Our analysis is guided by statistical tools on a real question answering system for several query distributions, which are typically found in query logs.

  • CONTINUACIÓN DE LA CREACIÓN DE UN SISTEMA DE ANALISIS DE LA INVESTIGACIÓN EN ESPAÑA, TIN2007-30380

     Martinez Bazan, Norbert; Muntés Mulero, Víctor; Dominguez Sala, David; Pau Fernandez, Raquel; Gomez Villamor, Sergio; Larriba Pey, Josep
    Participation in a competitive project

     Share

  • Dynamic adaptive data structures for monitoring data streams

     Aguilar-Saborit, J; Trancoso, P; Muntés Mulero, Víctor; Larriba Pey, Josep
    Data and knowledge engineering
    Date of publication: 2008-07
    Journal article

     Share Reference managers Reference managers Open in new window

  • Parallelizing record linkage for disclosure risk assessment

     Guisado Gamez, Joan; Prat Perez, Arnau; Nin Guerrero, Jordi; Muntés Mulero, Víctor; Larriba Pey, Josep
    Privacy in Statistical Databases
    Presentation's date: 2008-09-25
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Improving microaggregation for complex record anonymization

     Pont Tuset, Jordi; Nin Guerrero, Jordi; Medrano Gracia, Pau; Larriba Pey, Josep; Muntés Mulero, Víctor
    International Conference on Modeling Decisions for Artificial Intelligence
    Presentation's date: 2008-10-31
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Microaggregation is one of the most commonly employed microdata protection methods. This method builds clusters of at least k original records and replaces the records in each cluster with the centroid of the cluster. Usually, when records are complex, i.e., the number of attributes of the data set is large, this data set is split into smaller blocks of attributes and microaggregation is applied to each block, successively and independently. In this way, the information loss when collapsing several values to the centroid of their group is reduced, at the cost of losing the k-anonymity property when at least two attributes of different blocks are known by the intruder. In this work, we present a new microaggregation method called One dimension microaggregation (Mic1D − κ). This method gathers all the values of the data set into a single sorted vector, independently of the attribute they belong to. Then, it microaggregates all the mixed values together. Our experiments show that, using real data, our proposal obtains lower disclosure risk than previous approaches whereas the information loss is preserved.

  • ONN the use of neural networks for data privacy

     Pont Tuset, Jordi; Medrano Gracia, Pau; Nin Guerrero, Jordi; Larriba Pey, Josep; Muntés Mulero, Víctor
    International Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM)
    Presentation of work at congresses

    View View Open in new window  Share Reference managers Reference managers Open in new window

  • Premi BDigital Global d'Innovació Digital

     Larriba Pey, Josep; Martinez Bazan, Norbert; Gomez Villamor, Sergio; Pons, M; Rodríguez, A; Erola, P
    Award or recognition

     Share

  • On the use of Evolutive Summary Counters in Distributed Retrieval Systems

     Dominguez Sala, David; Aguilar Saborit, Josep; Surdeanu, Mihai; Larriba Pey, Josep
    Date: 2008-03
    Report

     Share Reference managers Reference managers Open in new window

  • Load Balancing for Question Answering

     Dominguez Sala, David; Aguilar Saborit, Josep; Surdeanu, Mihai; Larriba Pey, Josep
    Date: 2008-06
    Report

     Share Reference managers Reference managers Open in new window

  • Advances in Databases: Concepts, Systems and Applications

     Muntés Mulero, Víctor; Lafón, N; Aguilar Saborit, Josep; Larriba Pey, Josep
    Lecture notes in computer science
    Date of publication: 2007-04
    Journal article

     Share Reference managers Reference managers Open in new window

  • Star join revisited: Performance internals for cluster architectures

     Aguilart, J; Muntés Mulero, Víctor; Zuzarte, C; Larriba Pey, Josep
    Data and knowledge engineering
    Date of publication: 2007-12
    Journal article

     Share Reference managers Reference managers Open in new window

  • Genetic optimization for large join queries

     Muntés Mulero, Víctor
    Defense's date: 2007-05-28
    Department of Computer Architecture, Universitat Politècnica de Catalunya
    Theses

     Share Reference managers Reference managers Open in new window

  • DEX: High-performance exploration on large graphs for information retrieval

     Martinez Bazan, Norbert; Muntés Mulero, Víctor; Gomez Villamor, Sergio; Nin Guerrero, Jordi; Sánchez Martínez, Mario; Larriba Pey, Josep
    ACM International Conference on Information and Knowledge Management
    Presentation of work at congresses

    Read the abstract Read the abstract View View Open in new window  Share Reference managers Reference managers Open in new window

    Link and graph analysis tools are important devices to boost the richness of information retrieval systems. Internet and the existing social networking portals are just a couple of situations where the use of these tools would be beneficial and enriching for the users and the analysts. However, the need for integrating different data sources and, even more important, the need for high performance generic tools, is at odds with the continuously growing size and number of data repositories. In this paper we propose and evaluate DEX, a high performance graph database querying system that allows for the integration of multiple data sources. DEX makes graph querying possible in different flavors, including link analysis, social network analysis, pattern recognition and keyword search. The richness of DEX shows up in the experiments that we carried out on the Internet Movie Database(IMDb). Through a variety of these complex analytical queries, DEX shows to be a generic and efficient tool on large graph databases.

  • Anonymizing data via polynomial regression

     Nin Guerrero, Jordi; Pont Tuset, Jordi; Medrano Gracia, Pau; Larriba Pey, Josep; Muntés Mulero, Víctor
    Simposio en Ingeniería de Sistemas y Automática en Bioingeniería
    Presentation's date: 2007
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window

  • Access to the full text
    On the use of semantic blocking techniques for data cleansing and integration  Open access

     Nin Guerrero, Jordi; Muntés Mulero, Víctor; Martinez Bazan, Norbert; Larriba Pey, Josep
    International Database Engineering and Applications Symposium
    Presentation's date: 2007
    Presentation of work at congresses

    Read the abstract Read the abstract Access to the full text Access to the full text Open in new window  Share Reference managers Reference managers Open in new window

    Record Linkage (RL) is an important component of data cleansing and integration. For years, many efforts have focused on improving the performance of the RL process, either by reducing the number of record comparisons or by reducing the number of attribute comparisons, which reduces the computational time, but very often decreases the quality of the results. However, the real bottleneck of RL is the post-process, where the results have to be reviewed by experts that decide which pairs or groups of records are real links and which are false hits. In this paper, we show that exploiting the relationships (e.g. foreign key) established between one or more data sources, makes it possible to find a new sort of semantic blocking method that improves the number of hits and reduces the amount of review effort.

  • L'assignatura Programació Conscient de l'Arquitectura

     Fernandez Jimenez, Agustin; Jimenez Gonzalez, Daniel; Larriba Pey, Josep; Morancho Llena, Enrique; Ramirez Bellido, Alejandro
    Jornades de Docència del Departament d'Arquitectura de Computadors. 10 Anys de Jornades
    Presentation of work at congresses

     Share Reference managers Reference managers Open in new window