Skip to Main content Skip to Navigation
Conference papers

Contextual and Metadata-based Approach for the Semantic Annotation of Heterogeneous Documents

Abstract : In this paper, we present SHIRI-Annot, an automatic ontology- driven and unsupervised approach for the semantic annotation of doc- uments which contain more or less structured parts. The aim of this approach is to build an integration system called SHIRI 3 which allows the user access to documents related to a specific domain. In this sys- tem, the querying process is guided by an ontology of the domain and the answers are only made of the pertinent parts of the documents unlike keywords-based search engines. The ontology is described using RDFS (Resource Description Framework Schema) language. The SHIRI-Annot approach consists of locating and then annotating concept instances and their semantic relations. The locating step combines existing annotation approaches in order to locate instances in the text. The annotation step exploits a set of metadata and a set of logical rule patterns which are automatically instanciated from the domain description. These metadata are provided from the ontology or are defined specifically for the annota- tion task. The resulting annotations are represented in RDF (Resource Description Framework) language. We show through a preliminary study made on a corpus of HTML documents the usefulness of these specific metadata to represent the heterogeneity of documents. We also illus- trate through examples how the SHIRI system exploits the metadata to approximate the user queries in order to provide more pertinent answers.
Document type :
Conference papers
Complete list of metadata

Cited literature [14 references]  Display  Hide  Download
Contributor : Evelyne Faivre Connect in order to contact the contributor
Submitted on : Friday, December 11, 2009 - 4:18:27 PM
Last modification on : Thursday, July 8, 2021 - 3:47:57 AM
Long-term archiving on: : Friday, May 28, 2010 - 11:09:35 PM


Files produced by the author(s)


  • HAL Id : hal-00293255, version 1



Mouhamadou Thiam, Nathalie Pernelle, Nacéra Bennacer Seghouani. Contextual and Metadata-based Approach for the Semantic Annotation of Heterogeneous Documents. 1st Workshop on Semantic Metadata Management and Applications (SeMMA 2008) at the 5 th European Semantic Web Conference (ESWC 2008), Jun 2008, Tenerife, Spain. pp.16-28. ⟨hal-00293255⟩



Les métriques sont temporairement indisponibles