| Domaine : |
 |
Informatique/Calcul parallèle, distribué et partagé
|
 |
| Titre : |
 |
Towards Scalable Data Management for Map-Reduce-based Data-Intensive Applications on Cloud and Hybrid Infrastructures |
 |
| Auteur(s) : |
 |
Gabriel Antoniu ( , )1, Julien Bigot ( )2, Christophe Blanchet3, Luc Bougé ( )1, François Briant4, Franck Cappello5, 6, 7, Alexandru Costan ( )1, Frédéric Desprez ( )2, Gilles Fedak ( )2, Sylvain Gault2, Kate Keahey ( )8, Bogdan Nicolae ( )5, 6, Christian Pérez ( )2, Anthony Simonet2, Frédéric Suter ( )9, Bing Tang2, Raphael Terreux3 |
 |
| Projet(s) / laboratoire(s) : |
 |
|
 |
| Résumé : |
 |
As Map-Reduce emerges as a leading programming paradigm for data-intensive computing, today's frameworks which support it still have substantial shortcomings that limit its potential scalability. In this paper we discuss several directions where there is room for such progress: they concern storage efficiency under massive data access concurrency, scheduling, volatility and fault-tolerance. We place our discussion in the perspective of the current evolution towards an increasing integration of large-scale distributed platforms (clouds, cloud federations, enterprise desktop grids, etc.). We propose an approach which aims to overcome the current limitations of existing Map-Reduce frameworks, in order to achieve scalable, concurrency-optimized, fault-tolerant Map-Reduce data processing on hybrid infrastructures. This approach will be evaluated with real-life bio-informatics applications on existing Nimbus-powered cloud testbeds interconnected with desktop grids. |
 |
| Langue du document : |
 |
Anglais |
 |
|
| Type de publication : |
 |
Communications avec actes |
 |
| Date de publication : |
 |
2012 |
 |
| Audience : |
 |
internationale |
 |
| Titre conférence : |
 |
1st International IBM Cloud Academy Conference - ICA CON 2012 |
 |
| Ville : |
 |
Research Triangle Park, North Carolina |
 |
| Pays : |
 |
États-Unis |
 |
| Date conférence : |
 |
04/2012 |
 |
|
| Mots-clés : |
 |
MapReduce – cloud computing – data-intensive computing – hybrid infrastructures – BlobSeer – BitDew – Nimbus – HLCM – Grid'5000 |
 |
| Projet ANR : |
 |
| Référence du projet |
ANR-10-SEGI-001 |
| Année |
2010 |
| Acronyme du projet |
MapReduce |
| Titre du projet |
Traitement intensif de données à très grande échelle à l'aide du paradigme MapReduce sur des infrastructures de type cloud et hybrides |
|
 |