Distributed Convergence Detection Based on Global Residual Error Under Asynchronous Iterations - IRT SystemX Accéder directement au contenu
Article Dans Une Revue IEEE Transactions on Parallel and Distributed Systems Année : 2018

Distributed Convergence Detection Based on Global Residual Error Under Asynchronous Iterations

Résumé

Convergence of classical parallel iterations is detected by performing a reduction operation at each iteration in order to compute a residual error relative to a potential solution vector. To efficiently run asynchronous iterations, blocking communication requests are avoided, which makes it hard to isolate and handle any global vector. While some termination protocols were proposed for asynchronous iterations, only very few of them are based on global residual computation and guarantee effective convergence. But the most effective and efficient existing solutions feature two reduction operations, which constitutes an important factor of termination delay. In this paper, we present new, non-intrusive, protocols to compute a residual error under asynchronous iterations, requiring only one reduction operation. Various communication models show that some heuristics can even be introduced and formally evaluated. Extensive experiments with up to 5,600 processor cores confirm the practical effectiveness and efficiency of our approach.
Fichier non déposé

Dates et versions

hal-01737234 , version 1 (19-03-2018)

Identifiants

Citer

Frédéric Magoulès, Guillaume Gbikpi-Benissan. Distributed Convergence Detection Based on Global Residual Error Under Asynchronous Iterations. IEEE Transactions on Parallel and Distributed Systems, 2018, 29 (4), pp.819 - 829. ⟨10.1109/TPDS.2017.2780856⟩. ⟨hal-01737234⟩
83 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More