计算机技术,无线电电子学 |
一种基于上下文索引的文本匹配框架 |
金苍宏1,吴明晖2,应晶1,2 |
1. 浙江大学 计算机学院, 浙江 杭州 310027;2. 浙江大学城市学院 计算机科学与工程学系, 浙江 杭州 310015 |
A context-aware index based text extraction framework |
JIN Cang-hong1, WU Ming-hui2, YING Jing1,2 |
1. College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China;
2. Department of Computer Science and Engineering, Zhejiang University City College, Hangzhou 310015, China |
[1] ALAN A S, LANG F M, An overview of MetaMap: historical perspective and recent advances [J]. Journal of American Medical Informatics Association, 2010,17:229-236.
[2] DEROSE P, SEHN W, CHEN F, et al. DBLife: a community information management platform for the database research community [C]∥ Conference on Innovative Data Systems Research 2007.Asilomar, USA: ACM,2007.
[3] GUERGANA K S, JAMES J M, PHILIP V O. Clinical text analysis and knowledge extraction system (cTAKES): architecture, component evaluation and applications [J]. Journal of American Medical Informatics Association, 2010,17: 507-513.
[4] COHEN W, MCCALLUM A, Information Extraction and Integration: an Overview [C]∥ Proceedings of Ninth ACM SIGKDD Internaltional Conference on Knowledge Discovery and Data Mining. Washington,DC, USA: ACM, 2003.
[5] KANDOGAN E, KRISHNAMURTHY R, RAGHAVAN S, et al. Avatar semantic search: a database approach to information retrieval[C]∥ Proceedings of Special Interest Group on Management of Data 2006. Chicago Illionis, USA: ACM,2006.
[6] 张博,耿志华,周傲英,一种支持高效XML路径查询的自适应结构索引[J].软件学报,2009,20(7):1812-1824.
ZHANG Bo, DI Zhi-hua, ZHOU Ao-ying. Adaptive structural index for efficient processing of XML path queries [J]. Journal of Software,2009,20(7):1812-1824.
[7] WANG R. Language-independent class instance extraction using the Web [D]. Pittsburgh:Electrical and Computer Engineering, Carnegie Mellon University,2009.
[8] MUSLEA I, MINTON S, KNOBLOCK C A. A hierarchical approach to wrapper induction [C]∥ Proceedings of the International Conference on Autonomous Agents(AGENTS’99). New York, USA: ACM, 1999:190-197.
[9] KUSHMERICK N, Wrapper induction for information extraction[D]. Washington: Department of Computer Science, University of Washington, 1997.
[10] COHEN W W, HURST M, JENSEN L S. A flexible learning system for wrapping tables and lists in html documents [C]∥ Proceedings of the 11th International World Wide Web Conf. (WWW’02).Hawaii, USA:ACM, 2002:232-241.
[11] EMBLEY D W, JIANG Y, NG Y K, Record-boundary discovery in Web documents [C]∥ Proceedings of ACM SIGMOD International Conference on Management of Data. Philadelphia, USA:ACM,1999:467-478.
[12] IAN H W, ALISTAIR M, TIMOTHY C B.Compressing and indexing documents and images [M].San Francisco:Morgan Kaufmann Publishing, 1999.
[13] 刘小珠,彭智勇.全文索引技术时空效率分析 [J]. 软件学报,2009,20(7): 1768-1784.
LIU Xiao-zhu, PENG Zhi-yong. Time and space efficiencies analysis of full-text index techniques [J]. Journal of Software, 2009,20(7):1768-1784.
[14] CHENG T, YAN X, CHANG K. EntityRank: searching entities directly and holistically [C]∥ International Conference on Very Large Data Bases. Vienna, Austria: ACM, 2007.
[15] ZHOU M, CHENG T, CHANG K, Data-oriented content query system: searching for data into text on the web [C]∥ International Conference on Web Search and Data Mining. Rome:\ [s.n.\], 2010.
[16] MCCANDLESS M, HATCHER E, GOSPODNETIC O. Lucene in action [M]. 2 eds.New York: Manning Publictaion Co., 2010.
[17] DOAN A, RAMARKRISHNAN R, VAITHYANATHAN S. Managing information extraction [C]∥ Proceedings of Special Interest Group on Management of Data 2006. Chicago Illionis, USA: ACM, 2006.
[18] LOWE H J, BARNETT G O.MicroMeSH: a microcomputer system for searching and exploring the national library medicine’s medical subject headings (mesh) vocabulary [J]. Proceedings of the Annual Symposium on Computer Application Medical Care, 1987, 11(4):717-720.
[19] MILLER R A, GIESZCZYKIEWICZ F M, VRIES J K. CHARTLINE: pviding bibliographic references relevant to patient charts using the UMLS Metathesaurus knowledge sources [J]. Proceedings of the Annual Symposium on Computer Application Medical Care, 1992(1): 86-90.
[20] HERSH W R, GREENES R A. SAPHIRE: an information retrieval system featuring concept matching, automatic indexing, probabilistic retrieval, and hierarchical relationships [J]. Comput BiomedRes,1990,23(5):410-425.
[21] Newsgroups [OL/EB].[2012-05-25]. http:∥people.csail.mit.edu/ jrennie/20Newsgroups.
[22] PATIL M, THANKACHAN S, SHAH R. Inverted indexes for phrases and strings [C]∥ Conference on Research and Development in Information Retrieval.Beijing:ACM, 2011.
[23] JUSTIN Z, ALISTAIR M. Inverted files for text search engines [J]. ACM Computing Surveys,2006,38(2):156.
[24] JUNGHOO C, SRIDHAR R. A Fast regular expression indexing engine [C]∥ Proceedings 18th International Conference on Issue Date. Los Angeles, USA:IEEE,2002.
[25] 刘小珠,彭智勇,陈旭.高效的随机访问分块倒排文件自索引技术 [J]. 计算机学报,2010,33(6): 977987.
LIU Xiao-zhu, PENG Zhi-yong, CHEN Xu. An efficient random access block inverted file self-index technology [J]. Chinese Journal of Computers, 2010,33(6): 977-987.
[26] GONZALO N, MATHIEU R.柔性字符串匹配 [M].北京:电子工业出版社, 2007.
[27] KNUTH D E, MORRIS J H, PRATT V R, Fast pattern matching in strings [J]. SIAM Journal on Computing, 1977,6(1):323-350.
[28] YAO A C. The complexity of pattern matching for a random string [J]. SIAM Journal on Computing, 1979,8(3):368-387.
[29] THOMPSON K. Regular expression search algorithm [J]. Communications of the ACM, 1968,11:419-422.
[30] GLUSKOV V M.The abstract theory of automata [J]. Russian Mathematical Surveys, 1961,16:153. |
Viewed |
Full text
Cited |
Shared |
Discussed |