Skip to Main content Skip to Navigation
Conference papers

Efficient Batch-Incremental Classification Using UMAP for Evolving Data Streams

Maroua Bahri 1, 2 Bernhard Pfahringer 3 Albert Bifet 3, 1, 2 Silviu Maniu 4, 5
1 DIG - Data, Intelligence and Graphs
LTCI - Laboratoire Traitement et Communication de l'Information
5 VALDA - Value from Data
DI-ENS - Département d'informatique de l'École normale supérieure, Inria de Paris
Abstract : Learning from potentially infinite and high-dimensional data streams poses significant challenges in the classification task. For instance, k-Nearest Neighbors (kNN) is one of the most often used algorithms in the data stream mining area that proved to be very resource-intensive when dealing with high-dimensional spaces. Uniform Manifold Approximation and Projection (UMAP) is a novel manifold technique and one of the most promising dimension reduction and visualization techniques in the non-streaming setting because of its high performance in comparison with competitors. However, there is no version of UMAP that copes with the challenging context of streams. To overcome these restrictions, we propose a batch-incremental approach that pre-processes data streams using UMAP, by producing successive embeddings on a stream of disjoint batches in order to support an incremental kNN classification. Experiments conducted on publicly available synthetic and real-world datasets demonstrate the substantial gains that can be achieved with our proposal compared to state-of-the-art techniques.
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-03190032
Contributor : Silviu Maniu Connect in order to contact the contributor
Submitted on : Monday, April 5, 2021 - 8:04:25 PM
Last modification on : Wednesday, November 17, 2021 - 12:33:30 PM
Long-term archiving on: : Tuesday, July 6, 2021 - 6:09:52 PM

File

bahri2020efficient.pdf
Files produced by the author(s)

Identifiers

Citation

Maroua Bahri, Bernhard Pfahringer, Albert Bifet, Silviu Maniu. Efficient Batch-Incremental Classification Using UMAP for Evolving Data Streams. IDA 2020 - 18th International Symposium on Intelligent Data Analysis, Apr 2020, Konstanz / Virtual, Germany. pp.40-53, ⟨10.1007/978-3-030-44584-3_4⟩. ⟨hal-03190032⟩

Share

Metrics

Record views

125

Files downloads

320