Context-based Hierarchical Clustering for the Ontology Learning

Abstract : Ontologies provide a common layer which plays a major role in supporting information exchange and sharing. In this paper, we focus on the ontological concept extraction process from HTML documents. In order to improve this process, we propose an unsupervised hierarchical clustering algorithm namely “Contextual Ontological Concept Extraction” (COCE) which is an incremental use of the partitioning algorithm Kmeans and is guided by a structural context. Our context exploits the html structure and the location of words to select the semantically closer cooccurrents for each word and to improve the words weighting. Guided by this context definition, we perform an incremental clustering that refines the context of each word clusters to obtain semantically extracted concepts. The COCE algorithm offers the choice between either an automatic execution or a user's interaction. We experiment our algorithm on HTML documents related to the tourism domain. Our results show how the execution of our context-based algorithm which implements an incremental process and a successive refinement of clusters improves their conceptual quality and the relevance of the extracted ontological concepts.
Complete list of metadatas

https://hal-supelec.archives-ouvertes.fr/hal-00259889
Contributor : Evelyne Faivre <>
Submitted on : Friday, February 29, 2008 - 4:13:40 PM
Last modification on : Wednesday, June 20, 2018 - 2:32:02 PM

Identifiers

  • HAL Id : hal-00259889, version 1

Collections

Citation

Lobna Karoui, Marie-Aude Aufaure, Nacéra Bennacer Seghouani. Context-based Hierarchical Clustering for the Ontology Learning. IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006) jointly with the IEEE/WIC/ACM International Conference on Intelligent Agent Technology (IAT 2006) and the IEEE International Conference on Data Mining (ICDM 2006), Dec 2006, Hong-Kong, China. pp.420-427. ⟨hal-00259889⟩

Share

Metrics

Record views

190