Please wait a minute...
J4  2013, Vol. 47 Issue (3): 385-399    DOI: 10.3785/j.issn.1008-973X.2013.03.001
计算机技术     
一种基于Subject-Action-Object三元组的知识基因提取方法
许琦1,2, 顾新建1
1.浙江大学 机械工程学系 工业工程中心,浙江 杭州 310027; 2.台州职业技术学院
机电一体化技术实验室,浙江 台州 318000
Subject-action-object-triples-based method  for extraction of knowledge gene
XU Qi1, 2, GU Xin-jian1
1. Industrial Engineering Center, Department of Mechanical Engineering, Zhejiang University, Hangzhou 310027, China;
2. Mechatronics Technology Laboratory, Taizhou Vocational and Technical College, Taizhou 318000, China
 全文: PDF  HTML
摘要:

以专利引证网络为载体,从知识基因稳定性、遗传性以及变异性等基本特征出发,提出一种基于subject-action-object三元组的知识基因提取方法.应用连接度算法分析专利引证关系,挖掘引证专利和被引专利之间继承和发展的知识流,建立知识进化轨迹|利用文本语法分析技术,从专利权利要求书中提取subject-action-object三元组|基于语义词库WordNet进行语义加工,计算语义相似度,合并同义的subject-action-object三元组,绘制知识基因图谱.从美国专利数据库中采集了5 073项1975—1999年授权的数据挖掘领域的相关专利,分析了专利的地区分布情况和年度分布情况.从NBER (National Bureau of Economic Research)的专利数据集中查询得到专利引证关系,利用网络分析软件Pajek构建专利引证网络,作为实验数据样本,对所提出的知识基因提取方法进行验证.实验结果表明:所提取的subject-action-object三元组具备了知识基因稳定性、遗传性和变异性等特征,可以作为知识基因的一种表现形式.

Abstract:

Taking the patent citation network as carrier and the basic characteristics of knowledge gene as extraction principle, such as stability, hereditary and variability, this work proposed a subject-action-object-triples-based method for extraction of knowledge gene. First, the connectivity algorithm is applied to analyze the patent citation relationship, mine the knowledge flow of inheritance and development between citing patents, and cited patents and establish the knowledge evolutionary trajectory. Then, the text parsing technology was used to extract the subject-action-object triples from patent claims. And last, semantic processing was carried out based on semantic repository WordNet to compute semantic similarity, combine synonymous subject-action-object triples, and draw knowledge genetic map. This work collected 5 073 patents related to data mining which was granted between 1975 to 1999 from database of United States Patent and Trademark Office. The geographical distribution and annual distribution of the patents were analyzed. Query from the patent data set National Bureau of Economic Research(NBER) to get patent citation relations and use the network analysis software Pajek to build patent citation network. Taking it the patent citation metwork as experimental data, the proposed knowledge gene extraction method was validated. The experimental results show that the extracted subject-action-object triples possess the basic characteristics of knowledge gene, so they can be used as a kind of form of knowledge gene.

出版日期: 2013-03-01
:  TP 391.1  
基金资助:

国家自然科学基金资助项目 (51175463);国家自然科学基金重点资助项目 (71132007);国家“985工程”三期资助项目.

通讯作者: 顾新建,男,教授,博导.     E-mail: xjgu@zju.edu.cn
作者简介: 许琦(1983-),男,博士生,主要从事数据挖掘、知识网络等方面的研究.E-mail: xq_zju1983@sina.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  

引用本文:

许琦, 顾新建. 一种基于Subject-Action-Object三元组的知识基因提取方法[J]. J4, 2013, 47(3): 385-399.

XU Qi, GU Xin-jian. Subject-action-object-triples-based method  for extraction of knowledge gene. J4, 2013, 47(3): 385-399.

链接本文:

http://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2013.03.001        http://www.zjujournals.com/eng/CN/Y2013/V47/I3/385

[1] 刘植惠. 知识基因探索(一)——知识基因的涵义及其特征[J]. 情报理论与实践, 1998, 21(1): 62-64.
LIU Zhi-hui. Knowledge gene exploration(1)—the meaning and characteristics of knowledge gene [J]. Information Studies: Theory & Application, 1998, 21(1): 62-64.
[2] 许志强. 试论知识的遗传与变异[J]. 知识工程, 1990(3): 23-27.
XU Zhi-qiang. Discussion on the heredity and variation of knowledge [J]. Knowledge Engineering, 1990(3): 23-27.
[3] HUANG Zan, CHEN Hsin-chun, YIP A, et al. Longitudinal patent analysis for nanoscale science and engineering: country, institution and technology field [J]. Journal of Nanoparticle Research, 2003, 5(3): 333-363.
[4] CHEN Chao-mei, HICKS D. Tracing knowledge diffusion [J]. Scientometrics, 2004, 59(2): 199-211.
[5] SINGH J. Social networks as drivers of knowledge diffusion [EB/OL]. (2003-10-22)[2011-12-07]. http:∥papers.ssrn.com/sol3/papers.cfm?abstract_id=431872.
[6] MORGAN T H. The theory of the gene [M].New Haven: Yale University Press, 19-26.
[7] DAWKINS R. The Selfish Gene [M] .London: Oxford University Press, 19-76.
[8] SEN S K. 刘植惠译. 关于思想基因及其与情报科学关系的评价[J]. 国外情报科学, 1988 (2):1-6.
SEN S K.LIU Zhi-hui tran. Evaluation on the idea gene and its relationship with the information science [J]. Foreign Intelligence Science,1988(2):1-6.
[9] 赵红州. 大科学观[M]. 北京: 人民出版社, 1988.
[10] 刘植惠. 知识基因探索(二)——知识基因的类型[J]. 情报理论与实践, 1998, 21(2): 126128.
LIU Zhi-hui. Knowledge gene exploration (2)—the type of knowledge gene [J]. Information Studies: Theory & Application, 1998, 21(2): 126-128.
[11] 刘植惠. 知识基因探索(三)——基于进化时序的知识分类[J]. 情报理论与实践, 1998, 21(3): 187-189.
LIU Zhi-hui. Knowledge gene exploration (3)—knowledge classification based on the timing of evolution [J]. Information Studies: Theory & Application, 1998, 21(3): 187-189.
[12] 刘植惠. 知识基因探索(四)——知识的繁衍表(上)[J]. 情报理论与实践, 1998, 21(4): 254-256.
LIU Zhi-hui. Knowledge gene exploration (4)—proliferation table of knowledge(Part A) [J]. Information Studies: Theory & Application, 1998, 21(4): 254-256.
[13] 刘植惠. 知识基因探索(五)——知识的繁衍表(下)[J]. 情报理论与实践, 1998, 21(5): 317-319.
LIU Zhi-hui. Knowledge gene exploration(5)—proliferation table of knowledge(Part B) [J]. Information Studies: Theory & Application, 1998, 21(5): 317-319.
[14] 刘植惠. 知识基因探索(六)——知识基因原理在情报分析研究中的应用[J]. 情报理论与实践, 1998, 21(6): 380-382.
LIU Zhi-hui. Knowledge gene exploration(6)—knowledge genetic principle in the study of intelligence analysis [J]. Information Studies: Theory & Application, 1998, 21(6): 380-382.
[15] 刘植惠. 知识基因探索(七)——知识DNA[J]. 情报理论与实践, 1999, 22(1): 61-64.
LIU Zhi-hui. Knowledge gene exploration (7)—knowledge DNA [J]. Information Studies: Theory & Application, 1999, 22(1): 61-64.
[16] 刘植惠. 知识基因探索(八)——知识细胞[J]. 情报理论与实践, 1999, 22(2): 141-144.
LIU Zhi-hui. Knowledge gene exploration(8)—knowledge cell [J]. Information Studies: Theory & Application, 1999, 22(2): 141-144.
[17] 刘植惠. 知识基因探索(九)——科学学科[J]. 情报理论与实践, 1999, 22(3): 220-224.
LIU Zhi-hui. Knowledge gene exploration (9)—scientific disciplines [J]. Information Studies: Theory & Application, 1999, 22(3): 220-224.
[18] 刘植惠. 知识基因探索(十)——知识的遗传运动[J]. 情报理论与实践, 1999, 22(4): 302-304.
LIU Zhi-hui. Knowledge gene exploration (10)—genetic movement of knowledge [J]. Information Studies: Theory & Application, 1999, 22(4): 302-304.
[19] 刘植惠. 知识基因探索(十一)——知识的变异运动[J]. 情报理论与实践, 1999, 22(5): 380-383.
LIU Zhi-hui. Knowledge gene exploration (11)—variation movement of knowledge [J]. Information Studies: Theory & Application, 1999, 22(5): 380-383.
[20] 刘植惠. 知识基因探索(十二)——知识进化机制及其规律[J]. 情报理论与实践, 1999, 22(6): 459-462.
LIU Zhi-hui. Knowledge gene exploration (12)—knowledge evolutionary mechanisms and their law [J]. Information Studies: Theory & Application, 1999, 22(6): 459-462.
[21] Linguistic typology [EB/OL]. (2011-11-23)[2011-12-07]. http:∥en.wikipedia.org/wiki/Linguistic_typology.
[22] SLEATOR D, TEMPERLEY D. Parsing english with a link grammar [R]. Carnegie Mellon: Department of Computer Science, Carnegie Mellon University, 1991.
[23] CASCINI G, FANTECHI A, SPINOCCI E Natural language processing of patents and technical documentation [C] ∥Proceedings of the Sixth International Workshop on Document Analysis Systems. Berlin: Springer-Verlag, 2004: 508-520.
[24] ALTSHULLER G. Creativity as an exact science: the theory of the solution of inventive problems [M]. New York: Gordon and Breach Science Publishers, 1984.
[25] SAVRANSKY S D. Engineering of creativity: introduction to TRIZ methodology of inventive problem solving [M]. Boca Raton: CRC Press, 2000.
[26] CHOI S, LIM J, YOON J, et al. Patent function network analysis: a function based approach for analyzing patent information [C] ∥ Proceedings of the 19th International Conference on Management of Technology. Cairo: [s. n.], 2010.
[27] RADAUER A, WALTER L. Elements of good practice for providers of publicly funded patent information services for SMEs-selected and amended results of a benchmarking exercise [J]. World Patent Information, 2010, 32(3):237-245.
[28] MANN D. Hands-on systematic innovation [M]. Leper: CREAX Press, 2002.
[29] CASCINI G, RUSSO D, ZINI M. Computer-aided patent analysis: finding invention peculiarities [C] ∥ Trends in Computer aided Innovation. Boston: Springer, 2007: 167-178.
[30] MOEHRLE M, GERITZ A. Developing acquisition strategies based on patent maps [C] ∥ Proceedings of the 13th International Conference on Management of Technology. Washington, DC: [s. n.], 2004: 19.
[31] MOEHRLE M, WALTER L, GERITZ A, et al. Patent-based inventor profiles as a basis for human resource decisions in research and development [J]. R&D Management, 2005, 35(5):513-524.
[32] BERGMANN I, BUTZKE D, WALTER L, et al. Evaluating the risk of patent infringement by means of semantic patent analysis: the case of DNA chips [J]. R&D Management, 2008, 38(5), 550-562.
[33] GERKEN J, MOEHRLE M, WALTER L. Patents as an information source for product forecasting: insights from a longitudinal study in the automotive industry [C] ∥ Proceedings of the R&D Management Conference 2010. Manchester: Managing science and technology, 2010.
[34] KUHN T S. 科学革命的结构 [M]. 金吾伦, 胡新和译. 北京: 北京大学出版社, 2003.
KUHN T S. The Structure of Scientific Revolution [M]. JIN Wu-lun, HU Xin-he tran. Beijing: Peking University Press, 2003.
[35] HUMMON N P, DOREAIN P. Connectivity in a citation network: the development of DNA theory [J]. Social Networks, 1989, 11(1): 39-63.
[36] MINA A, RAMLOGAN R, TAMPUBOLON G, et al. Mapping evolutionary trajectories: applications to the growth and transformation of medical knowledge [J]. Research Policy, 2007, 36(5): 789-806.
[37] FONTANA R, NUVOLARI A, VERSPAGEN B. Mapping technological trajectories as patent citation networks: an application to data communication standards [J]. Economics of Innovation and New Technologies, 2009, 18(4): 311-336.
[38] VERSPAGEN B. Mapping technological trajectories as patent citation networks: a study on the history of fuel cell research [J]. Advances in Complex Systems, 2007, 10(1): 93-115.
[39] TEMPERLEY D, SLEATOR D, LAFFERTY J. Link Grammar [EB/OL]. (2009-08-20)[2011-11-22]. http:∥www.link.cs.cmu.edu/link/.
[40] FELLBAUM C. WordNet: an electronic lexical database [M]. Cambridge, MA: MIT Press, 1998.
[41] WU Zhi-biao, PALMER M. Verb semantics and lexical selection [C] ∥Proceedings of the 32nd annual meeting on Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 1994: 133-138.
[42] SIMPSON T, DAO T. WordNet-based semantic similarity measurement [EB/OL]. (2005-10-01)[2011-12-08]. http:∥www.codeproject.com/KB/string/semanticsimilaritywordnet.aspx.
[43] Matching (graph theory) [EB/OL]. [2011-10-25].http:∥en.wikipedia.org/wiki/Perfect_matching.
[44] United States Patent and Trademark Office [EB/OL]. [2011-10-25]. http:∥patft.uspto.gov/.
[45] HALL B H, JAFFE A B, TRAJTENBERG M. The NBER patent citation data file: lessons, insights and methodological tools [R].Cambidge,USA:National Bureau of Economic Research,2001.
[46] NOOY W D, MRVAR A, BATAGELJ V. Exploratory network analysis with pajek [M]. Cambridge, MA: Cambridge University Press, 2005.
[47] Java WordNet Interface [EB/OL]. [2011-12-08]. http:∥projects.csail.mit.edu/jwi/.
[48] FUJII A, IWAYAMA M, KANDO N. Introduction to the special issue on patent processing [J]. Information Processing & Management, 2007, 43(5): 1149-1153.

[1] 姚原岗, 林兰芬, 董金祥. 异质工程文档多维关联的语义检索方法[J]. J4, 2011, 45(2): 267-272.
[2] 仇光, 郑淼, 张晖, 朱建科, 卜佳俊, 陈纯, 杭航. 基于正则化主题建模的隐式产品属性抽取[J]. J4, 2011, 45(2): 288-294.
[3] 仇光,郑淼,卜佳俊,史源,陈纯. 基于传播的产品属性抽取[J]. J4, 2010, 44(11): 2188-2193.