Please wait a minute...
J4  2013, Vol. 47 Issue (6): 990-999    DOI: 10.3785/j.issn.1008-973X.2013.06.009
    
Faceted Web search approach for large scale unstructured data
ZHU Fan-wei1,2, WU Ming-hui2,1, YING Jing1,2
1. College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China| 2. Department of Computer Science and Engineering, Zhejiang University City College, Hangzhou 310015, China
Download:   PDF(0KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

In order to investigate the cognitive processing of concrete icons and abstract icons under semantic match and mismatch conditions, 12 subjects were required to judge if the icon-word pairs presented on the screen have the same meaning and press the buttons quickly at the same time, and their behavior and event-related potential (ERP) data were collected. Experimental results showed that the reaction times were longer and the accuracy rates were lower for abstract icons than for concrete icons. In contrast with semantic mismatched conditions, the reaction times were shorter and the accuracy rates were higher when the semantic matched. The ERP results indicated that abstract icons elicited a small N400 under semantic match conditions, and both concrete icons and abstract icons elicited a lager N400 when the semantic mismatched. The N400 amplitudes distributed in different brain areas existed obvious difference. This study indicates that the icon concreteness significantly affects the icon comprehension.



Published: 22 November 2013
CLC:  TP 311  
Cite this article:

ZHU Fan-wei, WU Ming-hui, YING Jing. Faceted Web search approach for large scale unstructured data. J4, 2013, 47(6): 990-999.

URL:

http://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2013.06.009     OR     http://www.zjujournals.com/eng/Y2013/V47/I6/990


面向大规模无结构数据的Web方面搜索方法

针对Web数据的异构性和非结构特征,提出一种最小开销的Web方面实体搜索方法:FacetedWeb.采用命名实体对Web进行结构化的标注,将无结构的Web数据建模为实体元组数据库以支持多类型的实体搜索和动态的方面选择.采用基于随机行走模型的概率排序算法,用结点的个性化PageRank值来衡量结果的相关性,以构造最小开销的方面接口.在真实Web数据集Clueweb上实现了FacetedWeb的原型系统,通过用户评测数据验证了FacetedWeb作为通用Web方面搜索引擎的有效性,并与传统实体搜索算法的进行对比,结果表明FacetedWeb在实体搜索的效率和精确度上具有明显的优势.

[1] Amazon.com: Online Shopping for Electronics, Apparel, Computers, Books, DVDs & more [EB/OL]. [2012-01-03]. http:∥www.amazon.com/.

[2] DBLP Computer Science Bibliography [EB/OL]. [2012-01-03]. http:∥dblp.uni-trier.de/.

[3] Electronics, Cars, Fashion, Collectibles, Coupons and More Online Shopping [EB/OL]. [2012-01-03]. http:∥www.ebay.com/.

[4] LEE C, HWANG Y G, JANG M G. Fine-grained named entity recognition and relation extraction for question answering [C]∥ Proceedings of the 30th annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2007: 799-800.

[5] NIE Z, ZHANG Y, WEN J R,et al. Object-level ranking: Bringing order to web objects [C]∥ Proceedings of the 14th International Conference on World Wide Web. New York: ACM, 2005: 567574.

[6] NIE Z, MA Y, SHI S, et al. Web object retrieval [C]∥ Proceedings of the 16th International Conference on World Wide Web. New York: ACM, 2007: 81-90.

[7] CHENG T, YAN X, CHANG K. EntityRank: Searching Entities Directly and Holistically [C]∥ Proceedings of the 33nd International Conference on Very Large Data Bases. Vienna: VLDB Endowment, 2007: 387-398.

[8] ZHOU M, CHENG T, CHANG K. Data-oriented content query system: searching for data into text on the web [C]∥ Proceedings of the third ACM International Conference on Web Search and Data Mining. New York: ACM, 2010: 121-130.

[9] PAGE L, BRIN S, MOTWANI R, et al. The PageRank citation ranking: Bringing order to the web [R]. USA: Stanford University, 1999.

[10] LOVSZ L. Random walks on graphs: a survey [J]. Combinatorics, 1993(2): 146.

[11] JEH G, WIDOM J. Scaling personalized web search [C]∥ Proceedings of the 12th International conference on World Wide Web. New York: ACM, 2003: 271-279.

[12] Apache Lucene [EB/OL]. [2012-01-03]. http:∥lucene.apache. org/.

[13] HASSAN A, JONES R, KLINKNER K L. Beyond DCG: User behavior as a predictor of a successful search [C]∥ Proceedings of the third ACM International Conference on Web Search and Data Mining. New York: ACM, 2010: 221-230.

[14] CHAKRABARTI S. Dynamic personalized pagerank in entity-relation graphs [C]∥ Proceedings of the 16th international Conference on World Wide Web. New York: ACM, 2007: 571-580.

[15] CAI D, HE X, WEN J R, et al. Block-level link analysis [C]∥ Proceedings of the 27th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2004: 440-447.


[16] TEJADA S, KNOBLOCK C A, AND MINTON S. Learning domain-independent string transformation weights for high accuracy object identification [C]∥ Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2002: 350-359.


[17] ELLIOTT A. Flamenco image browser: using metadata to improve image search during architectural design [C]∥ Proceedings of the ACM CHI 2001 Conference Companion. New York: ACM, 2001: 69-70.

[18] Relation browser [EBOL]. [2012-01-03]. http:∥moritz.stefaner.eu/projects/relation-browser/.

[19] ROY S B, WANG H, NAMBIAR U, et al. DynaCet: Building dynamic faceted search systems over databases [C]∥ Proceedings of IEEE 25th International Conference on Data Engineering. Shanghai: [s. n.], 2009: 1463-1466.

[20] LI C, YAN N, ROY S B, et al. Facetedpedia: dynamic generation of query-dependent faceted interfaces for wikipedia [C]∥ Proceedings of the 19th International conference on World Wide Web. New York: ACM, 2010: 651-660.


[21] TEEVAN J, DUMAIS S, GUTT Z. Challenges for supporting faceted search in large, heterogeneous corpora like the Web [C]∥ Proceedings of 2008 Workshop on Human-Computer Interaction and Information Retrieval. Redmond:[s. n.], 2008: 68.

[22] KOHLSCHTTER C, CHIRITA P A, NEJDL W. Using link analysis to identify aspects in faceted web search [C]∥ Proceedings of the 2006 ACM SIGIR Faceted Search Workshop. New York: ACM, 2006.

[23] POUND J, PAPARIZOS S, TSAPARAS P. Facet discovery for structured web search: a query-log mining approach [C]∥ Proceedings of the 2011 ACM SIGMOD Conference. New York: ACM, 2011: 169-180.

[1] KE Hai-feng, YING Jing. Real-time license character recognition technology based on R-ELM[J]. J4, 2014, 48(2): 0-0.
[2] JIN Cang-hong, WU Ming-hui, YING Jing. A context-aware index based text extraction framework[J]. J4, 2013, 47(9): 1537-1546.
[3] FENG Pei-en, LIU Yu, QIU Qing-ying, LI Li-xin. Strategies of efficiency improvement for Eclat algorithm[J]. J4, 2013, 47(2): 223-230.
[4] LIU Ying, CHEN Ling, CHEN Gen-cai, ZHAO Jiang-qi, WANG Jing-chang. Approach for collection selection based on click-through data[J]. J4, 2013, 47(1): 23-28.
[5] YIN Ting, XIAO Min, CHEN Ling, ZHAO Jiang-qi, WANG Jing-chang. CQPM based OLAP query log mining and recommendation[J]. J4, 2012, 46(11): 2052-2060.
[6] XIAO Min, CHEN Iing, XIA Hai-yuan, CHEN Gen-cai. Data warehouse native feature based OLAP querying with keywords[J]. J4, 2012, 46(6): 974-979.
[7] ZHANG Li-ping, LI Song, HAO Xiao-hong, HAO Zhong-xiao. Jrv  rough Vague region relation[J]. J4, 2012, 46(1): 105-111.
[8] CHEN Ling, XU Xiao-long, YANG Qing, CHEN Gen-cai. Wireless signal strength propagation model
 base on cubic spline interpolation
[J]. J4, 2011, 45(9): 1521-1527.
[9] WU Ming-hui, YING Jing. Business process modeling and formal verification[J]. J4, 2011, 45(2): 280-287.
[10] FU Chao-yang, GAO Ji, ZHOU You-ming. Service discovery based on integrating lexical multi-level hashing
with subsumption semantics
[J]. J4, 2010, 44(12): 2274-2283.
[11] YANG Qing, CHEN Ling, CHEN Gen-Cai. Estimating walking distance based on single accelerometer[J]. J4, 2010, 44(9): 1681-1686.
[12] XIONG Wei, WANG Xiao-Tun. Method for mapping software dependability requirements
based on quality function deployment
[J]. J4, 2010, 44(5): 881-886.
[13] ZHANG Yin, HE Gao, DIAO Li-Na, ZHANG San-Yuan. Abstract state machine design of Internetware model[J]. J4, 2010, 44(5): 923-929.
[14] JIANG Chao, YING Jing, TUN Meng-Hui, et al. Feature increment oriented  approach for software product line analysis[J]. J4, 2009, 43(12): 2142-2148.
[15] CHEN Bin, TAO Min. Mining associated and item-item correlated frequent patterns[J]. J4, 2009, 43(12): 2171-2177.