Design of a Multi-Strategy Parallelization for an Entire Application of Document Categorization on Low-Cost Multiprocessor PCs - UMI 2958 - Axe de recherche : Computer Science Accéder directement au contenu
Article Dans Une Revue Studia Informatica Universalis Année : 2004

Design of a Multi-Strategy Parallelization for an Entire Application of Document Categorization on Low-Cost Multiprocessor PCs

Michel Ianotto

Résumé

This paper introduces a research about parallelization of an entire application of Document- Categorization. The objective of this parallel computing research is to obtain a parallelization that can be successfully used on low cost and largely diffused shared memory multiprocessor PCs (not only on powerful and expensive supercomputers), and without any change in the input, output and user interface of the application (under Windows OS). This is a first step toward a parallelization on a cluster of multiprocessor PC, a more generic and still low cost parallel architecture. In this article, we describe parallel algorithms and programming technics we have designed to reach good performances on low cost but limited PC architecture. This leads us to introduce different parallelization strategies, for the different parts of the application, dealing with numerous disk accesses and the variety of configurations chosen by the users. Each parallelization is described and evaluated, and global performances of the final mix are introduced on 4-processor PC with SCSI disk technology and on a more recent 2-processor PC with IDE disk technology, leading to different but significant decreases of execution time. Then we can upgrade regularly our parallel machines to remain competitive compared to new sequential machines, because their low cost allows frequent upgrade and we always reach interesting speed up. The chosen application has been first designed to easily evaluate some classification algorithms (useful to Text-Mining researchers), and second to detect errors in previous manually categorizations and to advise some changes (useful to end-users).
Fichier non déposé

Dates et versions

hal-01301161 , version 1 (11-04-2016)

Identifiants

  • HAL Id : hal-01301161 , version 1

Citer

Stéphane Vialle, Guillaume Schaeffer, Michel Ianotto. Design of a Multi-Strategy Parallelization for an Entire Application of Document Categorization on Low-Cost Multiprocessor PCs. Studia Informatica Universalis, 2004, 3 (1), pp.61-84. ⟨hal-01301161⟩
100 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More