Loading...
Loading...

Go to the content (press return)

A cost-based storage format selector for materialized results in big data frameworks

Author
Munir, R.; Abello, A.; Romero, O.; Thiele, M.; Lehner, W.
Type of activity
Journal article
Journal
Distributed and parallel databases
Date of publication
2019-05-08
First page
1
Last page
30
DOI
10.1007/s10619-019-07271-0
Repository
http://hdl.handle.net/2117/134838 Open in new window
URL
https://link.springer.com/article/10.1007/s10619-019-07271-0 Open in new window
Abstract
Modern big data frameworks (such as Hadoop and Spark) allow multiple users to do large-scale analysis simultaneously, by deploying data-intensive workflows (DIWs). These DIWs of different users share many common tasks (i.e, 50–80%), which can be materialized and reused in future executions. Materializing the output of such common tasks improves the overall processing time of DIWs and also saves computational resources. Current solutions for materialization store data on Distributed File System...
Citation
Munir, R. [et al.]. A cost-based storage format selector for materialized results in big data frameworks. "Distributed and parallel databases", 8 Maig 2019, p. 1-30.
Keywords
Big data, Cost model, Data-intensive workflows, HDFS, Materialized results, Storage format
Group of research
DTIM - Database Technologies and lnformation Management Group
IMP - Information Modelling and Processing
inLab FIB
inSSIDE - integrated Software, Service, Information and Data Engineering

Participants