Please wait a minute...
浙江大学学报(工学版)  2024, Vol. 58 Issue (5): 918-930    DOI: 10.3785/j.issn.1008-973X.2024.05.005
计算机技术、通信技术     
基于网格空间团的多级同位模式挖掘方法
刘宇情1(),王丽珍2,*(),杨培忠1,朴丽莎2
1. 云南大学 信息学院,云南 昆明 650504
2. 滇池学院 理工学院,云南 昆明 650228
Multi-level co-location pattern mining algorithm based on grid spatial cliques
Yuqing LIU1(),Lizhen WANG2,*(),Peizhong YANG1,Lisha PIAO2
1. School of Information Science and Engineering, Yunnan University, Kunming 650504, China
2. Institute of Science and Technology, Dianchi College of Yunnan University, Kunming 650228 ,China
 全文: PDF(2343 KB)   HTML
摘要:

针对传统的多级同位模式挖掘方法未考虑到实际数据分布的网格特性,且从全局到区域的多级模式挖掘框架会导致算法效率低下的问题,提出逆向挖掘多级同位模式的新框架. 先挖掘区域同位模式,再由区域同位模式推导出全局同位模式,提出有效的剪枝策略提高挖掘效率. 考虑真实数据集中数据分布的网格特性,定义实例间的网格邻近关系,提出网格空间团及计算网格空间团的新颖方法. 在区域划分阶段,提出基于自适应网格密度峰值聚类的区域划分方法,基于2阶网格空间团的网格相似性来分配簇. 在合成和实际数据集上进行大量的实验,验证了提出方法的有效性、高效性和可扩展性,在真实数据集上的剪枝率可以达到78%.

关键词: 空间数据挖掘多级同位模式网格空间团密度峰值聚类(DPC)    
Abstract:

A novel framework of reverse mining of multi-level co-location patterns was proposed aiming at the problems that traditional methods of multi-level co-location pattern mining did not consider the grid characteristics of the real data distribution, and the multi-level mining framework from global to regional led to the algorithm inefficiency. The regional co-location patterns were first mined, and the global co-location patterns were deduced based on the mined regional patterns. Some pruning strategies were proposed to enhance the mining efficiency. The grid characteristics of the data distribution in real datasets were considered, and the grid neighbor relationship between instances was defined. The concept of grid spatial cliques with a novel method for calculating grid spatial cliques was defined. An adaptive grid density peak clustering strategy for partitioning regions was proposed in the regional division stage, and clusters were assigned based on the similarity of two-size grid spatial cliques. Extensive experiments were conducted on both synthetic and real-world datasets. The experimental results validated the effectiveness, efficiency and scalability of the proposed method. A pruning rate of up to 78% was achieved on real datasets.

Key words: spatial data mining    multi-level co-location pattern    grid spatial clique    density peak clustering (DPC)
收稿日期: 2023-07-03 出版日期: 2024-04-26
CLC:  TP 311  
基金资助: 国家自然科学基金资助项目(62276227, 62306266, 62266050);云南省基础研究计划资助项目(202201AS070015,202401AT070450);云南省创新团队资助项目(2018HC019) ;云南省智能系统与计算重点实验室建设项目(202205AG070003).
通讯作者: 王丽珍     E-mail: liuyuqing@mail.ynu.edu.cn;lzhwang@ynu.edu.cn
作者简介: 刘宇情(2000—),女,硕士生,从事空间数据挖掘的研究. orcid.org/0009-0005-2066-2639.E-mail:liuyuqing@mail.ynu.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
刘宇情
王丽珍
杨培忠
朴丽莎

引用本文:

刘宇情,王丽珍,杨培忠,朴丽莎. 基于网格空间团的多级同位模式挖掘方法[J]. 浙江大学学报(工学版), 2024, 58(5): 918-930.

Yuqing LIU,Lizhen WANG,Peizhong YANG,Lisha PIAO. Multi-level co-location pattern mining algorithm based on grid spatial cliques. Journal of ZheJiang University (Engineering Science), 2024, 58(5): 918-930.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2024.05.005        https://www.zjujournals.com/eng/CN/Y2024/V58/I5/918

图 1  包含4个空间特征A、B、C、D的区域数据示例(左图)和一些模式的参与实例(右图)
图 2  说明网格相似度度量的用例
图 3  多级同位模式挖掘框架上的对比
图 4  候选参与实例的示例
图 5  真实数据集分布及其分区结果
方法EI
深圳高黎贡山
S-ML0.98920.8771
S-ML-20.87830.6564
FDPC-RCPM0.65320.6401
表 1  不同方法的EI值
图 6  挖掘到的区域模式频繁区域的比较
图 7  利用ML方法识别的全局模式的分布
图 8  利用提出方法识别的全局模式的分布
图 9  在不同频繁度阈值下的运行时间对比
图 10  在不同频繁度阈值下的区域模式数量对比
图 11  S-ML算法的剪枝率
图 12  合成数据集上的实验对比
1 HE Z, DENG M, XIE Z, et al Discovering the joint influence of urban facilities on crime occurrence using spatial co-location pattern mining[J]. Cities, 2020, 99: 102612
doi: 10.1016/j.cities.2020.102612
2 LI J, ADILMAGAMBETOV A, MOHOMED JABBAR M S, et al On discovering co-location patterns in datasets: a case study of pollutants and child cancers[J]. Geo Informatica, 2016, 20 (4): 651- 692
3 LU J L, WANG L Z, FANG Y, et al Mining strong symbiotic patterns hidden in spatial prevalent co-location patterns[J]. Knowledge-Based Systems, 2018, 146: 190- 202
doi: 10.1016/j.knosys.2018.02.006
4 WEILER M, SCHMID K A, MAMOULIS N, et al. Geo-social co-location mining [C]// 2nd International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data . Melbourne: ACM, 2015: 19-24.
5 CHEN Y M, CHEN X Y, LIU Z H, et al Understanding the spatial organization of urban functions based on co-location patterns mining: a comparative analysis for 25 Chinese cities[J]. Cities, 2020, 97: 102563
doi: 10.1016/j.cities.2019.102563
6 蒋希文, 王丽珍, TRAN V H 基于模糊密度峰值聚类的区域同位模式并行挖掘算法[J]. 中国科学:信息科学, 2023, 53 (7): 1281- 1298
JIANG Xiwen, WANG Lizhen, TRAN V H Parallel mining algorithm for regional co-location patterns based on fuzzy density peak clustering[J]. Scientia Sinica Informationis, 2023, 53 (7): 1281- 1298
7 刘新斌, 王丽珍, 周丽华 MLCPM-UC: 一种基于模式实例分布均匀系数的多级 co-location 模式挖掘算法[J]. 计算机科学, 2021, 48 (11): 208- 218
LIU Xinbin, WANG Lizhen, ZHOU Lihua MLCPM-UC: a multi-level co-location pattern mining algorithm based on uniform coefficient of pattern instance distribution[J]. Computer Science, 2021, 48 (11): 208- 218
doi: 10.11896/jsjkx.201000097
8 HUANG Y, SHEKHAR S, XIONG H Discovering colocation patterns from spatial data sets: a general approach[J]. IEEE Transactions on Knowledge and Data Engineering, 2004, 16 (12): 1472- 1485
doi: 10.1109/TKDE.2004.90
9 YOO J S, SHEKHAR S A joinless approach for mining spatial colocation patterns[J]. IEEE Transactions on Knowledge and Data Engineering, 2006, 18 (10): 1323- 1337
doi: 10.1109/TKDE.2006.150
10 WANG L Z, ZHOU L H, LU J A, et al An order-clique-based approach for mining maximal co-locations[J]. Information Sciences, 2009, 179 (19): 3370- 3382
doi: 10.1016/j.ins.2009.05.023
11 张绍雪, 王丽珍, 陈文和 CPM-MCHM: 一种基于极大团和哈希表的空间并置模式挖掘算法[J]. 计算机学报, 2022, 45 (3): 526- 541
ZHANG Shaoxue, WANG Lizhen, TRAN V H An order-clique-based approach for mining maximal co-locations[J]. Chinese Journal of Computers, 2022, 45 (3): 526- 541
doi: 10.11897/SP.J.1016.2022.00526
12 EICK C F, PARMAR R, DING W, et al. Finding regional co-location patterns for sets of continuous variables in spatial datasets [C]// Proceedings of the 16th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems . Irvine: ACM, 2008: 1-10.
13 MOHAN P, SHEKHAR S, SHINE J A, et al. A neighborhood graph based approach to regional co-location pattern discovery: a summary of results [C]// Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems . Chicago: ACM, 2011: 122-132.
14 DENG M, CAI J N, LIU Q L, et al Multi-level method for discovery of regional co-location patterns[J]. International Journal of Geographical Information Science, 2017, 31 (9): 1846- 1870
doi: 10.1080/13658816.2017.1334890
15 LIU Q L, LIU W K, DENG M, et al An adaptive detection of multilevel co-location patterns based on natural neighborhoods[J]. International Journal of Geographical Information Science, 2021, 35 (3): 556- 581
doi: 10.1080/13658816.2020.1775235
16 LIU W K, LIU Q L, DENG M, et al Discovery of statistically significant regional co-location patterns on urban road networks[J]. International Journal of Geographical Information Science, 2022, 36 (4): 749- 772
doi: 10.1080/13658816.2021.1981335
17 CELIK M, KANG J M, SHEKHAR S. Zonal co-location pattern discovery with dynamic parameters [C]// 7th IEEE International Conference on Data Mining . Omaha: IEEE, 2007: 433-438.
18 DING W, EICK C F, YUAN X, et al A framework for regional association rule mining and scoping in spatial datasets[J]. Geo-Informatica, 2011, 15: 1- 28
19 QIAN F, CHIEW K, HE Q M, et al Mining regional co-location patterns with kNNG[J]. Journal of Intelligent Information Systems, 2014, 42: 485- 505
doi: 10.1007/s10844-013-0280-5
20 WANG S, HUANG Y, WANG X S. Regional co-locations of arbitrary shapes [C]// Advances in Spatial and Temporal Databases: 13th International Symposium . Berlin: Springer, 2013: 19-37.
21 RODRIGUEZ A, LAIO A Clustering by fast search and find of density peaks[J]. Science, 2014, 344 (6191): 1492- 1496
doi: 10.1126/science.1242072
22 LIU Y H, MA Z M, YU F Adaptive density peak clustering based on K-nearest neighbors with aggregating strategy[J]. Knowledge-Based Systems, 2017, 133: 208- 220
doi: 10.1016/j.knosys.2017.07.010
23 杨培忠, 王丽珍, 王晓璇, 等. 一种基于列计算的空间并置模式挖掘方法[J]. 中国科学: 信息科学, 2022, 52(6): 1053–1068.
YANG Peizhong, WANG Lizhen, WANG Xiaoxuan, et al. A spatial co-location pattern mining approach based on column calculation [J]. Scientia Sinica Informationis , 2022, 52(6): 1053–1068.
[1] 赵越,赵赫,谭海波,余斌,俞望年,马志宇. 基于小世界理论的区块链Kademlia网络改进方法[J]. 浙江大学学报(工学版), 2024, 58(1): 1-9.
[2] 崔璨,杨小虎,邱炜伟,黄方蕾. 基于GPU的区块链交易验签加速技术[J]. 浙江大学学报(工学版), 2023, 57(8): 1505-1515.
[3] 王进,李博涵,吴佳骏,宋欣洋. 支持日志乱序提交的分布式一致性协议[J]. 浙江大学学报(工学版), 2023, 57(2): 320-329.
[4] 孙亮,李晓风,赵赫,余斌,周桐,李皙茹. 基于NFT的实物上链资产化方法[J]. 浙江大学学报(工学版), 2022, 56(10): 1900-1911.
[5] 杨淑琴,马玉浩,方铭宇,钱伟行,蔡洁萱,刘童. 基于实例分割的复杂环境车道线检测方法[J]. 浙江大学学报(工学版), 2022, 56(4): 809-815, 832.
[6] 戴天伦,李博涵,臧亚磊,戴华,于自强,陈钢. PORP:面向无人驾驶的路径规划并行优化策略[J]. 浙江大学学报(工学版), 2022, 56(2): 329-337.
[7] 何苗,柏粉花,于卓,沈韬. 区块链中可公开验证密钥共享技术[J]. 浙江大学学报(工学版), 2022, 56(2): 306-312.
[8] 王友卫,凤丽洲. 基于合群度-隶属度噪声检测及动态特征选择的改进AdaBoost算法[J]. 浙江大学学报(工学版), 2021, 55(2): 367-376.
[9] 廖佳豪,於志文,刘一萌,郭斌. 移动群智感知平台设计与实现[J]. 浙江大学学报(工学版), 2020, 54(10): 1915-1922.
[10] 纪子龙,冀俊忠. 基于双萤火虫种群并行搜索的脑效应连接网络学习方法[J]. 浙江大学学报(工学版), 2020, 54(4): 694-703.
[11] 王万良,杨小涵,赵燕伟,高楠,吕闯,张兆娟. 采用卷积自编码器网络的图像增强算法[J]. 浙江大学学报(工学版), 2019, 53(9): 1728-1740.
[12] 万志远,陶嘉恒,梁家坤,才振功,苌程,乔林,周巧妮. Stack Overflow上机器学习相关问题的大规模实证研究[J]. 浙江大学学报(工学版), 2019, 53(5): 819-828.
[13] 朱凯龙,陆余良,黄晖,邓兆琨,邓一杰. 基于混合分析的二进制程序控制流图构建方法[J]. 浙江大学学报(工学版), 2019, 53(5): 829-836.
[14] 袁友伟, 余佳, 郑宏升, 王娇娇. 基于新颖性排名和多服务质量的云工作流调度算法[J]. 浙江大学学报(工学版), 2017, 51(6): 1190-1196.
[15] 王海艳, 程严. 基于离散系数的双向服务选择方法[J]. 浙江大学学报(工学版), 2017, 51(6): 1197-1204.