Incremental Ontology-Based Extraction and Alignment in Semi-Structured Documents

Abstract : SHIRI 1 is an ontology-based system for integration of semi- structured documents related to a specic domain. The system's purpose is to allow users to access to relevant parts of documents as answers to their queries. SHIRI uses RDF/OWL for representation of resources and SPARQL for their querying. It relies on an automatic, unsupervised and ontology-driven approach for extraction, alignment and semantic anno- tation of tagged elements of documents. In this paper, we focus on the Extract-Align algorithm which exploits a set of named entity and term patterns to extract term candidates to be aligned with the ontology. It proceeds in an incremental manner in order to populate the ontology with terms describing instances of the domain and to reduce the access to extern resources such as Web. We experiment it on a HTML corpus related to call for papers in computer science and the results that we obtain are very promising. These results show how the incremental be- haviour of Extract-Align algorithm enriches the ontology and the number of terms (or named entities) aligned directly with the ontology increases.
Document type :
Conference papers
Complete list of metadatas

Cited literature [12 references]  Display  Hide  Download

https://hal-supelec.archives-ouvertes.fr/hal-00423575
Contributor : Evelyne Faivre <>
Submitted on : Tuesday, December 15, 2009 - 4:20:48 PM
Last modification on : Thursday, August 30, 2018 - 2:24:02 PM
Long-term archiving on : Tuesday, October 16, 2012 - 12:06:43 PM

File

DexaPublished.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00423575, version 1

Collections

Citation

Mouhamadou Thiam, Nacéra Bennacer Seghouani, Nathalie Pernelle, Moussa Lô. Incremental Ontology-Based Extraction and Alignment in Semi-Structured Documents. 20th International Conference DEXA (Database and Expert Systems Applications) 2009, Aug 2009, Linz, Austria. pp. 211-218. ⟨hal-00423575⟩

Share

Metrics

Record views

677

Files downloads

303