Controlled Knowledge Base Enrichment from Web Documents - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2012

Controlled Knowledge Base Enrichment from Web Documents

Résumé

The Linked Open Data initiative brought more and more RDF data sources to be published on the Web. However, these data sources contain relatively little information compared to the documents available on the surface Web. Many annotation tools have been proposed in the last decade for the automatic construction and enrichment of knowledge bases. But, while noticeable advances are achieved for the extraction of concept instances, the extraction of semantic relations remains a challenging task when the structures and the vocabularies of the target documents are heterogeneous. In this paper, we propose a novel approach, called REISA, which allows to enrich RDF/OWL knowledge bases with semantic relations using semistructured documents annotated with concept instances. REISA produces weighted relation instances without exploiting lexico-syntactic or structure regularities in the documents. Neighbor domain entities in the annotated documents are used to generate the rst sets of candidate relations according to the domain and range axioms de ned in a domain ontology. The construction of these candidate sets relies on automated semantic controls performed with (i) the existing knowledge bases and (ii) the (inverse) functionality of the target relations. The weighting of the selected relation candidates is performed according to the neighborhood distance between the annotated domain entities in the document. The proposed approach is complementary to classic pattern matching and machine learning approaches and achieves interesting results without exploiting document-level regularities. Experiments on two real web datasets show that (i) REISA allows to extract semantic relationships with interesting precision values reaching 76,5% and that (ii) the weighting method is e ective for ranking the relation candidates according to their precision.
Fichier non déposé

Dates et versions

hal-00762334 , version 1 (07-12-2012)

Identifiants

  • HAL Id : hal-00762334 , version 1

Citer

Yassine Mrabet, Nacéra Bennacer Seghouani, Nathalie Pernelle. Controlled Knowledge Base Enrichment from Web Documents. WISE 2012, Nov 2012, Paphos, Greece. pp.312-325. ⟨hal-00762334⟩
92 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More