Please wait a minute...
Front. Inform. Technol. Electron. Eng.  2012, Vol. 13 Issue (4): 257-267    DOI: 10.1631/jzus.C1101007
    
VDoc+: a virtual document based approach for matching large ontologies using MapReduce
Hang Zhang, Wei Hu, Yu-zhong Qu
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China; Department of Computer Science and Technology, Nanjing University, Nanjing 210093, China
VDoc+: a virtual document based approach for matching large ontologies using MapReduce
Hang Zhang, Wei Hu, Yu-zhong Qu
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China; Department of Computer Science and Technology, Nanjing University, Nanjing 210093, China
 全文: PDF 
摘要: Many ontologies have been published on the Semantic Web, to be shared to describe resources. Among them, large ontologies of real-world areas have the scalability problem in presenting semantic technologies such as ontology matching (OM). This either suffers from too long run time or has strong hypotheses on the running environment. To deal with this issue, we propose a three-stage MapReduce-based approach V-Doc+ for matching large ontologies, based on the MapReduce framework and virtual document technique. Specifically, two MapReduce processes are performed in the first stage to extract the textual descriptions of named entities (classes, properties, and instances) and blank nodes, respectively. In the second stage, the extracted descriptions are exchanged with neighbors in Resource Description Framework (RDF) graphs to construct virtual documents. This extraction process also benefits from the MapReduce-based implementation. A word-weight-based partitioning method is proposed in the third stage to conduct parallel similarity calculation using the term frequency–inverse document frequency (TF-IDF) model. Experimental results on two large-scale real datasets and the benchmark testbed from Ontology Alignment Evaluation Initiative (OAEI) are reported, showing that the proposed approach significantly reduces the run time with minor loss in precision and recall.
关键词: Ontology matchingVirtual documentMapReduceTF-IDFSemantic Web    
Abstract: Many ontologies have been published on the Semantic Web, to be shared to describe resources. Among them, large ontologies of real-world areas have the scalability problem in presenting semantic technologies such as ontology matching (OM). This either suffers from too long run time or has strong hypotheses on the running environment. To deal with this issue, we propose a three-stage MapReduce-based approach V-Doc+ for matching large ontologies, based on the MapReduce framework and virtual document technique. Specifically, two MapReduce processes are performed in the first stage to extract the textual descriptions of named entities (classes, properties, and instances) and blank nodes, respectively. In the second stage, the extracted descriptions are exchanged with neighbors in Resource Description Framework (RDF) graphs to construct virtual documents. This extraction process also benefits from the MapReduce-based implementation. A word-weight-based partitioning method is proposed in the third stage to conduct parallel similarity calculation using the term frequency–inverse document frequency (TF-IDF) model. Experimental results on two large-scale real datasets and the benchmark testbed from Ontology Alignment Evaluation Initiative (OAEI) are reported, showing that the proposed approach significantly reduces the run time with minor loss in precision and recall.
Key words: Ontology matching    Virtual document    MapReduce    TF-IDF    Semantic Web
收稿日期: 2011-08-05 出版日期: 2012-04-07
CLC:  TP311  
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  
Hang Zhang
Wei Hu
Yu-zhong Qu

引用本文:

Hang Zhang, Wei Hu, Yu-zhong Qu. VDoc+: a virtual document based approach for matching large ontologies using MapReduce. Front. Inform. Technol. Electron. Eng., 2012, 13(4): 257-267.

链接本文:

http://www.zjujournals.com/xueshu/fitee/CN/10.1631/jzus.C1101007        http://www.zjujournals.com/xueshu/fitee/CN/Y2012/V13/I4/257

[1] Hua-jun Chen, Tong Yu, Qing-zhao Zheng, Pei-qin Gu, Yu Zhang. A multi-agent framework for mining semantic relations from Linked Data[J]. Front. Inform. Technol. Electron. Eng., 2012, 13(4): 295-307.
[2] Zhi-chun Wang, Zhi-gang Wang, Juan-zi Li, Jeff Z. Pan. Knowledge extraction from Chinese wiki encyclopedias[J]. Front. Inform. Technol. Electron. Eng., 2012, 13(4): 268-280.
[3] Imran Ghani, Choon Yeul Lee, Sung Hyun Juhn, Seung Ryul Jeong. Semantics-oriented approach for information interoperability and governance: towards user-centric enterprise architecture management[J]. Front. Inform. Technol. Electron. Eng., 2010, 11(4): 227-240.