A Javaspace-Based Framework for Efficient Fault-Tolerant Master-Worker Distributed Applications

Virginie Galtier 1, 2 Constantinos Makassikis 1, 3 Stéphane Vialle 1, 2, 3
3 ALGORILLE - Algorithms for the Grid
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : We propose a framework built around a Java Space to ease the development of bag-of-tasks applications. The framework may optionally and automatically tolerate transient crash failures occurring on any of the distributed elements. It relies on check pointing and underlying middleware mechanisms to do so. To further improve check pointing efficiency, both in size and frequency, the programmer can introduce intermediate user-defined checkpoint data and code within the task processing program. The framework used without fault tolerance accelerates application development, does not introduce runtime overhead and yields to expected speedup. When enabling fault tolerance, our framework allows, despite failures, correct completion of applications with limited runtime and data storage overheads. Experiments run with up to 128 workers study the impact of some user-related and implementation-related on overall performance, and reveal good performances for classical Java Space-based master-worker application profiles.
Document type :
Conference papers
Complete list of metadatas

https://hal-supelec.archives-ouvertes.fr/hal-00618249
Contributor : Sébastien van Luchene <>
Submitted on : Thursday, September 1, 2011 - 11:20:19 AM
Last modification on : Wednesday, July 31, 2019 - 4:18:02 PM

Links full text

Identifiers

Citation

Virginie Galtier, Constantinos Makassikis, Stéphane Vialle. A Javaspace-Based Framework for Efficient Fault-Tolerant Master-Worker Distributed Applications. 19th Euromicro International Conference on Parallel, Distributed and Network-Based Processing -PDP 2011, Feb 2011, Ayia Napa, Cyprus. pp.272-276, ⟨10.1109/PDP.2011.82⟩. ⟨hal-00618249⟩

Share

Metrics

Record views

334