Contextual and Metadata-based Approach for the Semantic Annotation of Heterogeneous Documents

Abstract : In this paper, we present SHIRI-Annot, an automatic ontology- driven and unsupervised approach for the semantic annotation of doc- uments which contain more or less structured parts. The aim of this approach is to build an integration system called SHIRI 3 which allows the user access to documents related to a specific domain. In this sys- tem, the querying process is guided by an ontology of the domain and the answers are only made of the pertinent parts of the documents unlike keywords-based search engines. The ontology is described using RDFS (Resource Description Framework Schema) language. The SHIRI-Annot approach consists of locating and then annotating concept instances and their semantic relations. The locating step combines existing annotation approaches in order to locate instances in the text. The annotation step exploits a set of metadata and a set of logical rule patterns which are automatically instanciated from the domain description. These metadata are provided from the ontology or are defined specifically for the annota- tion task. The resulting annotations are represented in RDF (Resource Description Framework) language. We show through a preliminary study made on a corpus of HTML documents the usefulness of these specific metadata to represent the heterogeneity of documents. We also illus- trate through examples how the SHIRI system exploits the metadata to approximate the user queries in order to provide more pertinent answers.
Document type :
Conference papers
Complete list of metadatas

Cited literature [14 references]  Display  Hide  Download

https://hal-supelec.archives-ouvertes.fr/hal-00293255
Contributor : Evelyne Faivre <>
Submitted on : Friday, December 11, 2009 - 4:18:27 PM
Last modification on : Wednesday, June 20, 2018 - 2:32:02 PM
Long-term archiving on : Friday, May 28, 2010 - 11:09:35 PM

File

PapierSemma.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00293255, version 1

Collections

Citation

Mouhamadou Thiam, Nathalie Pernelle, Nacéra Bennacer Seghouani. Contextual and Metadata-based Approach for the Semantic Annotation of Heterogeneous Documents. 1st Workshop on Semantic Metadata Management and Applications (SeMMA 2008) at the 5 th European Semantic Web Conference (ESWC 2008), Jun 2008, Tenerife, Spain. pp.16-28. ⟨hal-00293255⟩

Share

Metrics

Record views

340

Files downloads

337