Please wait a minute...
Front. Inform. Technol. Electron. Eng.  2012, Vol. 13 Issue (4): 268-280    DOI: 10.1631/jzus.C1101008
    
Knowledge extraction from Chinese wiki encyclopedias
Zhi-chun Wang, Zhi-gang Wang, Juan-zi Li, Jeff Z. Pan
Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China; Department of Computer Science, University of Aberdeen, Aberdeen AB24 3UE, UK
Download:   PDF(0KB)
Export: BibTeX | EndNote (RIS)      

Abstract  The vision of the Semantic Web is to build a ‘Web of data’ that enables machines to understand the semantics of information on the Web. The Linked Open Data (LOD) project encourages people and organizations to publish various open data sets as Resource Description Framework (RDF) on the Web, which promotes the development of the Semantic Web. Among various LOD datasets, DBpedia has proved a successful structured knowledge base, and has become the central interlinking-hub of the Web of data in English. However, in the Chinese language, there is little linked data published and linked to DBpedia. This hinders the structured knowledge sharing of both Chinese and cross-lingual resources. This paper deals with an approach for building a large-scale Chinese structured knowledge base from Chinese wiki resources, including Hudong and Baidu Baike. The proposed approach first builds an ontology based on the wiki category system and infoboxes, and then extracts instances from wiki articles. Using Hudong as our source, our approach builds an ontology containing 19 542 concepts and 2381 properties. 802 593 instances are extracted and described using the concepts and properties in the extracted ontology and 62 679 of them are linked to equivalent instances in DBpedia. As from Baidu Baike, our approach builds an ontology containing 299 concepts, 37 object properties, and 5590 data type properties. 1 319 703 instances are extracted from Baidu Baike, and 84 343 of them are linked to instances in DBpedia. We provide RDF dumps and SPARQL endpoint to access the established Chinese knowledge bases. The knowledge bases built using our approach can be used not only in Chinese linked data building, but also in many useful applications of large-scale knowledge bases, such as question-answering and semantic search.

Key wordsSemantic Web      Linked Data      Ontology      Knowledge base     
Received: 10 August 2011      Published: 07 April 2012
CLC:  TP311  
Cite this article:

Zhi-chun Wang, Zhi-gang Wang, Juan-zi Li, Jeff Z. Pan. Knowledge extraction from Chinese wiki encyclopedias. Front. Inform. Technol. Electron. Eng., 2012, 13(4): 268-280.

URL:

http://www.zjujournals.com/xueshu/fitee/10.1631/jzus.C1101008     OR     http://www.zjujournals.com/xueshu/fitee/Y2012/V13/I4/268


Knowledge extraction from Chinese wiki encyclopedias

The vision of the Semantic Web is to build a ‘Web of data’ that enables machines to understand the semantics of information on the Web. The Linked Open Data (LOD) project encourages people and organizations to publish various open data sets as Resource Description Framework (RDF) on the Web, which promotes the development of the Semantic Web. Among various LOD datasets, DBpedia has proved a successful structured knowledge base, and has become the central interlinking-hub of the Web of data in English. However, in the Chinese language, there is little linked data published and linked to DBpedia. This hinders the structured knowledge sharing of both Chinese and cross-lingual resources. This paper deals with an approach for building a large-scale Chinese structured knowledge base from Chinese wiki resources, including Hudong and Baidu Baike. The proposed approach first builds an ontology based on the wiki category system and infoboxes, and then extracts instances from wiki articles. Using Hudong as our source, our approach builds an ontology containing 19 542 concepts and 2381 properties. 802 593 instances are extracted and described using the concepts and properties in the extracted ontology and 62 679 of them are linked to equivalent instances in DBpedia. As from Baidu Baike, our approach builds an ontology containing 299 concepts, 37 object properties, and 5590 data type properties. 1 319 703 instances are extracted from Baidu Baike, and 84 343 of them are linked to instances in DBpedia. We provide RDF dumps and SPARQL endpoint to access the established Chinese knowledge bases. The knowledge bases built using our approach can be used not only in Chinese linked data building, but also in many useful applications of large-scale knowledge bases, such as question-answering and semantic search.

关键词: Semantic Web,  Linked Data,  Ontology,  Knowledge base 
[1] Yue-ting Zhuang, Fei Wu, Chun Chen, Yun-he Pan. Challenges and opportunities: from big data to knowledge in AI 2.0[J]. Front. Inform. Technol. Electron. Eng., 2017, 18(1): 3-14.
[2] Hua-jun Chen, Tong Yu, Qing-zhao Zheng, Pei-qin Gu, Yu Zhang. A multi-agent framework for mining semantic relations from Linked Data[J]. Front. Inform. Technol. Electron. Eng., 2012, 13(4): 295-307.
[3] Hang Zhang, Wei Hu, Yu-zhong Qu. VDoc+: a virtual document based approach for matching large ontologies using MapReduce[J]. Front. Inform. Technol. Electron. Eng., 2012, 13(4): 257-267.
[4] Xiao-hong Tan, Rui-min Shen, Yan Wang. Personalized course generation and evolution based on genetic algorithms[J]. Front. Inform. Technol. Electron. Eng., 2012, 13(12): 909-917.
[5] Bahareh Zibanezhad, Kamran Zamanifar, Razieh Sadat Sadjady, Yousef Rastegari. Applying gravitational search algorithm in the QoS-based Web service selection problem[J]. Front. Inform. Technol. Electron. Eng., 2011, 12(9): 730-742.
[6] Imran Ghani, Choon Yeul Lee, Sung Hyun Juhn, Seung Ryul Jeong. Semantics-oriented approach for information interoperability and governance: towards user-centric enterprise architecture management[J]. Front. Inform. Technol. Electron. Eng., 2010, 11(4): 227-240.