Please wait a minute...
J4  2013, Vol. 47 Issue (6): 990-999    DOI: 10.3785/j.issn.1008-973X.2013.06.009
计算机技术     
面向大规模无结构数据的Web方面搜索方法
朱凡微1,2, 吴明晖2,1, 应晶1,2
1. 浙江大学 计算机科学与技术学院,浙江 杭州 310027; 2. 浙江大学城市学院 计算机科学与工程学系,浙江 杭州 310015
Faceted Web search approach for large scale unstructured data
ZHU Fan-wei1,2, WU Ming-hui2,1, YING Jing1,2
1. College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China| 2. Department of Computer Science and Engineering, Zhejiang University City College, Hangzhou 310015, China
 全文: PDF  HTML
摘要:

针对Web数据的异构性和非结构特征,提出一种最小开销的Web方面实体搜索方法:FacetedWeb.采用命名实体对Web进行结构化的标注,将无结构的Web数据建模为实体元组数据库以支持多类型的实体搜索和动态的方面选择.采用基于随机行走模型的概率排序算法,用结点的个性化PageRank值来衡量结果的相关性,以构造最小开销的方面接口.在真实Web数据集Clueweb上实现了FacetedWeb的原型系统,通过用户评测数据验证了FacetedWeb作为通用Web方面搜索引擎的有效性,并与传统实体搜索算法的进行对比,结果表明FacetedWeb在实体搜索的效率和精确度上具有明显的优势.

Abstract:

In order to investigate the cognitive processing of concrete icons and abstract icons under semantic match and mismatch conditions, 12 subjects were required to judge if the icon-word pairs presented on the screen have the same meaning and press the buttons quickly at the same time, and their behavior and event-related potential (ERP) data were collected. Experimental results showed that the reaction times were longer and the accuracy rates were lower for abstract icons than for concrete icons. In contrast with semantic mismatched conditions, the reaction times were shorter and the accuracy rates were higher when the semantic matched. The ERP results indicated that abstract icons elicited a small N400 under semantic match conditions, and both concrete icons and abstract icons elicited a lager N400 when the semantic mismatched. The N400 amplitudes distributed in different brain areas existed obvious difference. This study indicates that the icon concreteness significantly affects the icon comprehension.

出版日期: 2013-11-22
:  TP 311  
基金资助:

清华-腾讯互联网创新技术基金资助项目(2011-8).

通讯作者: 应晶,男,教授,博导.     E-mail: yingj@zucc.edu.cn
作者简介: 朱凡微(1984—),女,博士生,主要从事Web搜索、数据挖掘方面研究.E-mail: zhufanwei@zju.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  

引用本文:

朱凡微, 吴明晖, 应晶. 面向大规模无结构数据的Web方面搜索方法[J]. J4, 2013, 47(6): 990-999.

ZHU Fan-wei, WU Ming-hui, YING Jing. Faceted Web search approach for large scale unstructured data. J4, 2013, 47(6): 990-999.

链接本文:

http://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2013.06.009        http://www.zjujournals.com/eng/CN/Y2013/V47/I6/990

[1] Amazon.com: Online Shopping for Electronics, Apparel, Computers, Books, DVDs & more [EB/OL]. [2012-01-03]. http:∥www.amazon.com/.

[2] DBLP Computer Science Bibliography [EB/OL]. [2012-01-03]. http:∥dblp.uni-trier.de/.

[3] Electronics, Cars, Fashion, Collectibles, Coupons and More Online Shopping [EB/OL]. [2012-01-03]. http:∥www.ebay.com/.

[4] LEE C, HWANG Y G, JANG M G. Fine-grained named entity recognition and relation extraction for question answering [C]∥ Proceedings of the 30th annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2007: 799-800.

[5] NIE Z, ZHANG Y, WEN J R,et al. Object-level ranking: Bringing order to web objects [C]∥ Proceedings of the 14th International Conference on World Wide Web. New York: ACM, 2005: 567574.

[6] NIE Z, MA Y, SHI S, et al. Web object retrieval [C]∥ Proceedings of the 16th International Conference on World Wide Web. New York: ACM, 2007: 81-90.

[7] CHENG T, YAN X, CHANG K. EntityRank: Searching Entities Directly and Holistically [C]∥ Proceedings of the 33nd International Conference on Very Large Data Bases. Vienna: VLDB Endowment, 2007: 387-398.

[8] ZHOU M, CHENG T, CHANG K. Data-oriented content query system: searching for data into text on the web [C]∥ Proceedings of the third ACM International Conference on Web Search and Data Mining. New York: ACM, 2010: 121-130.

[9] PAGE L, BRIN S, MOTWANI R, et al. The PageRank citation ranking: Bringing order to the web [R]. USA: Stanford University, 1999.

[10] LOVSZ L. Random walks on graphs: a survey [J]. Combinatorics, 1993(2): 146.

[11] JEH G, WIDOM J. Scaling personalized web search [C]∥ Proceedings of the 12th International conference on World Wide Web. New York: ACM, 2003: 271-279.

[12] Apache Lucene [EB/OL]. [2012-01-03]. http:∥lucene.apache. org/.

[13] HASSAN A, JONES R, KLINKNER K L. Beyond DCG: User behavior as a predictor of a successful search [C]∥ Proceedings of the third ACM International Conference on Web Search and Data Mining. New York: ACM, 2010: 221-230.

[14] CHAKRABARTI S. Dynamic personalized pagerank in entity-relation graphs [C]∥ Proceedings of the 16th international Conference on World Wide Web. New York: ACM, 2007: 571-580.

[15] CAI D, HE X, WEN J R, et al. Block-level link analysis [C]∥ Proceedings of the 27th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2004: 440-447.


[16] TEJADA S, KNOBLOCK C A, AND MINTON S. Learning domain-independent string transformation weights for high accuracy object identification [C]∥ Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2002: 350-359.


[17] ELLIOTT A. Flamenco image browser: using metadata to improve image search during architectural design [C]∥ Proceedings of the ACM CHI 2001 Conference Companion. New York: ACM, 2001: 69-70.

[18] Relation browser [EBOL]. [2012-01-03]. http:∥moritz.stefaner.eu/projects/relation-browser/.

[19] ROY S B, WANG H, NAMBIAR U, et al. DynaCet: Building dynamic faceted search systems over databases [C]∥ Proceedings of IEEE 25th International Conference on Data Engineering. Shanghai: [s. n.], 2009: 1463-1466.

[20] LI C, YAN N, ROY S B, et al. Facetedpedia: dynamic generation of query-dependent faceted interfaces for wikipedia [C]∥ Proceedings of the 19th International conference on World Wide Web. New York: ACM, 2010: 651-660.


[21] TEEVAN J, DUMAIS S, GUTT Z. Challenges for supporting faceted search in large, heterogeneous corpora like the Web [C]∥ Proceedings of 2008 Workshop on Human-Computer Interaction and Information Retrieval. Redmond:[s. n.], 2008: 68.

[22] KOHLSCHTTER C, CHIRITA P A, NEJDL W. Using link analysis to identify aspects in faceted web search [C]∥ Proceedings of the 2006 ACM SIGIR Faceted Search Workshop. New York: ACM, 2006.

[23] POUND J, PAPARIZOS S, TSAPARAS P. Facet discovery for structured web search: a query-log mining approach [C]∥ Proceedings of the 2011 ACM SIGMOD Conference. New York: ACM, 2011: 169-180.

[1] 柯海丰,应晶. 基于R-ELM的实时车牌字符识别技术[J]. J4, 2014, 48(2): 0-0.
[2] 金苍宏,吴明晖,应晶. 一种基于上下文索引的文本匹配框架[J]. J4, 2013, 47(9): 1537-1546.
[3] 冯培恩, 刘屿, 邱清盈, 李立新. 提高Eclat算法效率的策略[J]. J4, 2013, 47(2): 223-230.
[4] 刘颖, 陈岭, 陈根才, 赵江奇, 王敬昌. 基于历史点击数据的集合选择方法[J]. J4, 2013, 47(1): 23-28.
[5] 殷婷,肖敏,陈岭,赵江奇,王敬昌. 基于CQPM的OLAP查询日志挖掘及推荐[J]. J4, 2012, 46(11): 2052-2060.
[6] 肖敏, 陈岭, 夏海元, 陈根才. 基于数据仓库内在特征的OLAP关键词查询[J]. J4, 2012, 46(6): 974-979.
[7] 张丽平,李松,郝晓红,郝忠孝. Jrv粗糙Vague区域关系[J]. J4, 2012, 46(1): 105-111.
[8] 陈岭,许晓龙,杨清,陈根才. 基于三次样条插值的无线信号强度衰减模型[J]. J4, 2011, 45(9): 1521-1527.
[9] 吴明晖, 应晶. 业务过程建模及其形式化验证[J]. J4, 2011, 45(2): 280-287.
[10] 傅朝阳, 高济, 周尤明. 词法多重散列与包容语义相结合的服务查找[J]. J4, 2010, 44(12): 2274-2283.
[11] 杨清, 陈岭, 陈根才. 基于单加速度传感器的行走距离估计[J]. J4, 2010, 44(9): 1681-1686.
[12] 熊伟, 王晓暾. 基于质量功能展开的可信软件需求映射方法[J]. J4, 2010, 44(5): 881-886.
[13] 张引, 何浩, 赵丽娜, 张三元. 网构软件模型中的抽象状态机设计[J]. J4, 2010, 44(5): 923-929.
[14] 沈斌, 姚敏. 关联且项项正相关频繁模式挖掘[J]. J4, 2009, 43(12): 2171-2177.
[15] 蒋涛, 应晶, 吴明晖, 等. 一种面向特征增量的软件产品线分析方法[J]. J4, 2009, 43(12): 2142-2148.