(1. Institute of Manufacturing Engineering, Zhejiang Province Key Laboratory of Advanced Manufacturing Technology,
Zhejiang University, Hangzhou 310027,China; 2. School of Management, Zhejiang University, Hangzhou 310058, China)
A method to discover the similarity of patent documents was proposed in order to help enterprises in patent application, protection and utilization. A patent model tree was built based on the characteristics of patent documents. The patent model tree and its nodes were defined. Through analyzing the nodes’ attribute values, patent documents were categorized by using the vector space model(VSM) based text categorization technology and the weighted similarities of patent name and patent abstract. According to the categorization, similar patents were discovered by the weighted similarities of patent characteristics in the same category. Several ways to identify the weight of patent characteristics were discussed according to the actual needs in enterprise application. A case study showed that the method can be used in patent categorization and similar patent search.
CHEN Ji-Xi, GU Xin-Jian, CHEN Guo-Hai, et al. Method of discovering similar patents based on vector space model and characteristics of patent documents. J4, 2009, 43(10): 1848-1852.
[1] CONNELLY M C, SEKHAR J A. Invention and innovation: A case study in metals[J]. Key Engineering Materials, 2008, 380: 1539.
[2] SAIKI T, AKANO Y, WATANABE C, et al. A new dimension of potential resources in innovation: A wider scope of patent claims can lead to new functionality development[J].Technovation, 2006, 26(7): 796806.
[3] SOO Von-wun, LIN Szu-yin, YANG Shih-yao, et al. A cooperative multi-agent platform for invention based on patent document analysis and ontology[J]. Expert Systems with Applications, 2006, 31(4): 766775.
[4] 庞剑锋,卜东波,白硕. 基于向量空间模型的文本自动分类系统的研究与实现[J]. 计算机应用研究,2001,18(9): 2326.
PANG Jian-feng, BU Dong-bo, BAI Shuo. Research and implementation of text categorization system based on VSM[J]. Application Research of Computers, 2001, 18(9): 2326.
[5] 陈治纲,何丕廉,孙越恒,等. 基于向量空间模型的文本分类系统的研究与实现[J]. 中文信息学报,2005, 19(1): 3641.
CHEN Zhi-gang, HE Pi-lian, SUN Yue-heng, et al.Research and implementation of text classification system based on VSP[J]. Journal of Chinese Information Processing, 2005, 19(1): 3641.
[6] 李雪蕾,张冬茉. 一种基于向量空间模型的文本分类方法[J]. 计算机工程,2003, 29(17): 9092.
LI Xue-lei, ZHANG Dong-mo. A text categorization method based on VSM[J]. Computer Engineering, 2003, 29(17): 9092.
[7] 马辉民,李卫华,吴良元. VSM 在中文文本聚类中的应用及实证分析[J]. 武汉理工大学学报:信息与管理工程版,2006, 28(4): 5659,81.
MA Hui-min, LI Wei-hua, WU Liang-yuan. Application and empirical research of VSM in chinese text clustering[J]. Journal of WUT : Information and Management Engineering, 2006, 28(4): 5659,81.
[8] LI Bao-li, LU Qin, YU Shi-wen. An adaptive k-nearest neighbor text categorization strategy[J]. ACM Transactions on Asian Language Information Processing, 2004, 3(4): 215226.
[9] BAI Jing, NIE Jian-yun, CAO Gui-hong. Integrating compound terms in Bayesian text classification[C]∥Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence.[S.l.]: IEEE/WIC/ACM ,2005: 598601.
[10] KANG In-su, NA Seung-hoon, KIM Jungi, et al. Cluster-based patent retrieval[J]. Information Processing and Management, 2007, 43(5): 1173118.
[11] KIM Jae-ho, CHOI Key-sun. Patent document categorization based on semantic structural information[J]. Information Processing and Management, 2007, 43(5): 12001215.
[12] LI Yao-yong, SHAWE-TAYLOR J. Advanced learning algorithms for cross-language patent retrieval and classification[J]. Information Processing and Management, 2007, 43(5): 11831199.