The key of Web data fusion is to judge the credibility of conflicting Web data. The credibility of observing values was only rectified by their similarity relations, while the inclusion relations was ignored in existed algorithms. To solve this problem, the concept of inclusion degree was defined based on the characteristic analysis of the observing values; a modified model was proposed to rectify the credibility of observing values using inclusion degree was proposed. Taking the observing values as random variables, the reliability problem of the observation values could be attributed to a posterior probability distribution problem. Bayesian theory was adopted to define the conception of data source credibility, to derive the relationship model for data source credibility and observing value credibility, and to propose the Bayesian conflicting Web data credibility algorithm: DataCredibility. The experiment results show that the proposed DataCredibility algorithm achieves better accuracy, recall rate and F1 measure value compared with the baseline algorithms.
[1] 董永权.Deep Web数据集成关键问题研究 [D]. 济南:山东大学, 2010.
DONG Yongquan. Key problem of deep web data integration [D]. Jinan: Shandong University, 2010.
[2] FETTERLY D, MANASSE M, NAJORK M, et al. A largescale study of the evolution of web pages [C] ∥ In proceedings of the 12th international conference on World Wide Web. Budapest: ACM, 2003: 669678.
[3] CHANG K C C, HE B, LI C, et al. Structured databases on the web: observations and implications [J]. ACM Sigmod Record, 2004, 33(3): 6170.
[4] ENRIGHT A. Consumers trust information found online less than offline messages [J]. Internet Retailer, 2010, 25.
[5] DALVI N, MACHANAVAJJHALA A, PANG B. An analysis of structured data on the web [C] ∥ In proceedings of the 38th International Conference on Very Large DataBases. Istanbul: ACM, 2012: 680691.
[6] LI X, DONG X L, LYONS K, et al. Truth finding on the deep web: Is the problem solved? [C] ∥ In proceedings of the 38th International Conference on Very Large DataBases. Istanbul: ACM, 2012: 97108.
[7] YIN X X, HAN J W, YU P S. Truth discovery with multiple conflicting information providers on the Web [J]. IEEE Transactions on Knowledge and Data Engineering, 2008, 20(6): 796808.
[8] GALLAND A, ABITEBOUL S, MARIAN A, et al. Corroborating information from disagreeing views [C] ∥ In proceedings of the third ACM international conference on Web search and data mining. New York: ACM, 2010: 131140.
[9] PASTERNACK J, ROTH D. Knowing what to believe (when you already know something) [C] ∥ In proceedings of the International Conference on Computational Linguistics. Beijing: ACM, 2010: 877885.
[10] JSANG A, MARSH S, POPE S. Exploring different types of trust propagation [C] ∥ International Conference on Trust Management. Berlin Heidelberg: Springer, 2006: 179192.
[11] 张志强,刘丽霞,谢晓芹,等.基于数据源依赖关系的信息评价方法研究[J].计算机学报, 2012, 35(11):23922402.
ZHANG Zhiqiang, LIU Lixia, XIE Xiaoqin, et al. Information evaluation based on source dependence [J]. Chinese Journal of Computers, 2012, 35(11): 23922402.
[12] 张永新,李庆忠,彭朝晖.基于Markov逻辑网的两阶段数据冲突解决方法 [J]. 计算机学报, 2012, 35(1): 101111.
ZHANG Yongxin, LI Qingzhong, PENG Zhaohui. 2stage data conflict resolution based on Markov logic networks [J]. Chinese Journal of Computers, 2012, 35(1): 101111.
[13] ZHAO B, RUBINSTEIN B I, GEMMELL J, et al. A bayesian approach to discovering truth from conflicting sources for data integration [C] ∥ In proceedings of the 38th International Conference on Very Large DataBases. Istanbul: ACM, 2012, 5(6): 550561.
[14] RAVALI P, ANISH D S, DSONG X L, et al. Fusing data with correlations [C] ∥ In proceedings of the SIGMOD, Snowbird: ACM, 2014: 433444.
[15] LI Q, LI Y, GAO J, et al. A confidenceaware approach for truth discovery on longtail data [C] ∥ In proceedings of the 41th International Conference on Very LargeDataBases. Kohala Coast: ACM, 2015: 425436.
[16] 曲开社,翟岩慧.偏序集,包含度与形式概念分析[J].计算机学报,2006,29(2): 219226.
QU Kaishe, ZHAI Yanhui. Posets, inclusion degree theory and FCA [J]. Chinese Journal of Computers, 2006.29(2): 219226.
[17] YIN X, DONG L. Data sets for data fusion experiments (III. Book) [EB/OL]. (20121207) [20151001]. http:∥lunadong.com/ fusionDataSets.htm.