s'authentifier
version française rss feed
HAL : hal-00684866, version 1

Voir la fiche détaillée  BibTeX,EndNote,...
1st International IBM Cloud Academy Conference - ICA CON 2012, Research Triangle Park, North Carolina : États-Unis (2012)
Towards Scalable Data Management for Map-Reduce-based Data-Intensive Applications on Cloud and Hybrid Infrastructures
Gabriel Antoniu1, Julien Bigot2, Christophe Blanchet3, Luc Bougé1, François Briant4, Franck Cappello5, 6, 7, Alexandru Costan1, Frédéric Desprez2, Gilles Fedak2, Sylvain Gault2, Kate Keahey8, Bogdan Nicolae5, 6, Christian Pérez2, Anthony Simonet2, Frédéric Suter9, Bing Tang2, Raphael Terreux3
(2012)

As Map-Reduce emerges as a leading programming paradigm for data-intensive computing, today's frameworks which support it still have substantial shortcomings that limit its potential scalability. In this paper we discuss several directions where there is room for such progress: they concern storage efficiency under massive data access concurrency, scheduling, volatility and fault-tolerance. We place our discussion in the perspective of the current evolution towards an increasing integration of large-scale distributed platforms (clouds, cloud federations, enterprise desktop grids, etc.). We propose an approach which aims to overcome the current limitations of existing Map-Reduce frameworks, in order to achieve scalable, concurrency-optimized, fault-tolerant Map-Reduce data processing on hybrid infrastructures. This approach will be evaluated with real-life bio-informatics applications on existing Nimbus-powered cloud testbeds interconnected with desktop grids.
1 :  INRIA - IRISA - KerData
2 :  LIP Lyon / Inria Grenoble Rhône-Alpes - AVALON
3 :  IBCP - Institut de biologie et chimie des protéines [Lyon]
4 :  IBM PSSC Montpellier - Innovation Lab.
5 :  INRIA Saclay - Ile de France - GRAND-LARGE
6 :  JLPC - Joint Laboratory for Petascale Computing [Illinois]
7 :  LRI - Laboratoire de Recherche en Informatique
8 :  ANL - Argonne National Laboratory
9 :  CC IN2P3 - Centre de Calcul de l'inst. national de phy. nucléaire et de phy. des particules
Informatique/Calcul parallèle, distribué et partagé
MapReduce – cloud computing – data-intensive computing – hybrid infrastructures – BlobSeer – BitDew – Nimbus – HLCM – Grid'5000
Liste des fichiers attachés à ce document :
PDF
ICACON2012-MapReduce.pdf(977.8 KB)