HAMEX - a Handwritten and Audio Dataset of Mathematical Expressions - LINA - Equipe Traitement Automatique du Langage Naturel Access content directly
Conference Papers Year : 2011

HAMEX - a Handwritten and Audio Dataset of Mathematical Expressions

Abstract

In this paper, we present HAMEX, a new public dataset that contains mathematical expressions available in their on-line handwritten form and in their audio spoken form. We have designed this dataset so that, given a mathematical expression, its handwritten signal and its audio signal can be used jointly to design multimodal recognition systems. Here, we describe the different steps that allowed us to acquire this dataset, from the creation of the mathematical expression corpora (including expressions from Wikipedia pages) to the segmentation and the transcription of the collected data, via the data collection process itself. Currently, the dataset contains 4 350 on-line handwritten mathematical expressions written by 58 writers, and the corresponding audio expressions (in French) spoken by 58 speakers. The ground truth is also provided both for the handwritten expressions (as INKML files with the digital ink, the symbol segmentation, and the MATHML structure) and for the audio expressions (as XML files with the transcriptions of the spoken expressions).
Fichier principal
Vignette du fichier
Icdar_CameraReady_PID1943623.pdf (515.4 Ko) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

hal-00615210 , version 1 (07-12-2019)

Identifiers

  • HAL Id : hal-00615210 , version 1

Cite

Solen Quiniou, Harold Mouchère, Sebastian Peña Saldarriaga, Christian Viard-Gaudin, Emmanuel Morin, et al.. HAMEX - a Handwritten and Audio Dataset of Mathematical Expressions. 11th International Conference on Document Analysis and Recognition, ICDAR 2011, Sep 2011, Beijing, China. ⟨hal-00615210⟩
489 View
288 Download

Share

Gmail Facebook X LinkedIn More