Computer Technology |
|
|
|
|
Bayesian conflicting Web data credibility algorithm |
WANG Ji kui |
Key Laboratory of electronic commerce,Lanzhou University of Finance and Economics, Lanzhou 730000, China |
|
|
Abstract The key of Web data fusion is to judge the credibility of conflicting Web data. The credibility of observing values was only rectified by their similarity relations, while the inclusion relations was ignored in existed algorithms. To solve this problem, the concept of inclusion degree was defined based on the characteristic analysis of the observing values; a modified model was proposed to rectify the credibility of observing values using inclusion degree was proposed. Taking the observing values as random variables, the reliability problem of the observation values could be attributed to a posterior probability distribution problem. Bayesian theory was adopted to define the conception of data source credibility, to derive the relationship model for data source credibility and observing value credibility, and to propose the Bayesian conflicting Web data credibility algorithm: DataCredibility. The experiment results show that the proposed DataCredibility algorithm achieves better accuracy, recall rate and F1 measure value compared with the baseline algorithms.
|
Published: 08 December 2016
|
|
贝叶斯冲突Web数据可信度算法
Web数据融合的关键是判定冲突Web数据的可信度.已有算法只采用观察值之间的相似关系而忽视包含关系对可信度进行修正,针对这一问题,在分析观察值本身特性的基础上定义观察值包含度,提出利用观察值包含度对观察值可信度进行修正的模型.将观察值看作随机变量,观察值的可信度问题可归结为观察值的后验概率分布问题.在贝叶斯分析的基础上,定义数据源可信度,推导出数据源可信度与观察值可信度之间的关系模型|并提出基于贝叶斯理论的冲突Web数据可信度算法DataCredibility.实验结果表明,与基准算法相比,DataCredibility获得了更高的精确度、召回率及F1测度值.
|
|
[1] 董永权.Deep Web数据集成关键问题研究 [D]. 济南:山东大学, 2010.
DONG Yongquan. Key problem of deep web data integration [D]. Jinan: Shandong University, 2010.
[2] FETTERLY D, MANASSE M, NAJORK M, et al. A largescale study of the evolution of web pages [C] ∥ In proceedings of the 12th international conference on World Wide Web. Budapest: ACM, 2003: 669678.
[3] CHANG K C C, HE B, LI C, et al. Structured databases on the web: observations and implications [J]. ACM Sigmod Record, 2004, 33(3): 6170.
[4] ENRIGHT A. Consumers trust information found online less than offline messages [J]. Internet Retailer, 2010, 25.
[5] DALVI N, MACHANAVAJJHALA A, PANG B. An analysis of structured data on the web [C] ∥ In proceedings of the 38th International Conference on Very Large DataBases. Istanbul: ACM, 2012: 680691.
[6] LI X, DONG X L, LYONS K, et al. Truth finding on the deep web: Is the problem solved? [C] ∥ In proceedings of the 38th International Conference on Very Large DataBases. Istanbul: ACM, 2012: 97108.
[7] YIN X X, HAN J W, YU P S. Truth discovery with multiple conflicting information providers on the Web [J]. IEEE Transactions on Knowledge and Data Engineering, 2008, 20(6): 796808.
[8] GALLAND A, ABITEBOUL S, MARIAN A, et al. Corroborating information from disagreeing views [C] ∥ In proceedings of the third ACM international conference on Web search and data mining. New York: ACM, 2010: 131140.
[9] PASTERNACK J, ROTH D. Knowing what to believe (when you already know something) [C] ∥ In proceedings of the International Conference on Computational Linguistics. Beijing: ACM, 2010: 877885.
[10] JSANG A, MARSH S, POPE S. Exploring different types of trust propagation [C] ∥ International Conference on Trust Management. Berlin Heidelberg: Springer, 2006: 179192.
[11] 张志强,刘丽霞,谢晓芹,等.基于数据源依赖关系的信息评价方法研究[J].计算机学报, 2012, 35(11):23922402.
ZHANG Zhiqiang, LIU Lixia, XIE Xiaoqin, et al. Information evaluation based on source dependence [J]. Chinese Journal of Computers, 2012, 35(11): 23922402.
[12] 张永新,李庆忠,彭朝晖.基于Markov逻辑网的两阶段数据冲突解决方法 [J]. 计算机学报, 2012, 35(1): 101111.
ZHANG Yongxin, LI Qingzhong, PENG Zhaohui. 2stage data conflict resolution based on Markov logic networks [J]. Chinese Journal of Computers, 2012, 35(1): 101111.
[13] ZHAO B, RUBINSTEIN B I, GEMMELL J, et al. A bayesian approach to discovering truth from conflicting sources for data integration [C] ∥ In proceedings of the 38th International Conference on Very Large DataBases. Istanbul: ACM, 2012, 5(6): 550561.
[14] RAVALI P, ANISH D S, DSONG X L, et al. Fusing data with correlations [C] ∥ In proceedings of the SIGMOD, Snowbird: ACM, 2014: 433444.
[15] LI Q, LI Y, GAO J, et al. A confidenceaware approach for truth discovery on longtail data [C] ∥ In proceedings of the 41th International Conference on Very LargeDataBases. Kohala Coast: ACM, 2015: 425436.
[16] 曲开社,翟岩慧.偏序集,包含度与形式概念分析[J].计算机学报,2006,29(2): 219226.
QU Kaishe, ZHAI Yanhui. Posets, inclusion degree theory and FCA [J]. Chinese Journal of Computers, 2006.29(2): 219226.
[17] YIN X, DONG L. Data sets for data fusion experiments (III. Book) [EB/OL]. (20121207) [20151001]. http:∥lunadong.com/ fusionDataSets.htm. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|