Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2023, Vol. 57 Issue (9): 1843-1855    DOI: 10.3785/j.issn.1008-973X.2023.09.016
    
Clustering model of user community subgroup’s demand based on complex networks
Shu-tao ZHANG1(),Zhi-qiang YANG1,Shi-jie WANG2,Shi-feng LIU2,Fan ZHANG1,Ai-min ZHOU1,*()
1. School of Design Art, Lanzhou University of Technology, Lanzhou 730050, China
2. School of Mechanical and Electrical Engineering, Lanzhou University of Technology, Lanzhou 730050, China
Download: HTML     PDF(2764KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A new clustering model was proposed to resolve the problem of missing associated information and ambiguous demand change trend in user demand description. The crawler technology was used to obtain user online comments, and the heterogeneous participatory link network was constructed with Jieba word segmentation results. The vocabulary importance was calculated based on the PageRank algorithm, and the PageRank calculation results were used as a basis for filtering to establish a feature vocabulary set. By improving the overlapping community discriminated algorithm, the similarity calculation of subgroup edge attributes was enhanced, and an undirected weighted network of comment feature vocabulary was constructed. By calculating the Jaccard distance, the hierarchical clustering of the network diagram attribute was carried out to determine the user’s purchase decision information. Update the network link with the similarity calculation results of multi-path nodes to achieve the user demand forecasting. Taking a kettle as an example, the improved modularity was 0.69, and the link prediction accuracy rate was 86%. Results show that the proposed clustering model can clarify the potential correlation information of clustering results, and the link prediction results are in line with the objective characteristics of the group effect. The clustering results can assist designers to clarify the needs of user groups in order to target their designs.



Key wordsuser decision-making      complex network      community discovery      link prediction     
Received: 01 November 2022      Published: 16 October 2023
CLC:  TB 472  
Fund:  国家自然科学基金资助项目(51705226,52165033);甘肃省青年博士基金资助项目(2022QB-047);甘肃省高等学校创新基金资助项目(2021A-020)
Corresponding Authors: Ai-min ZHOU     E-mail: zhangsht@lut.edu.cn;51289547@qq.com
Cite this article:

Shu-tao ZHANG,Zhi-qiang YANG,Shi-jie WANG,Shi-feng LIU,Fan ZHANG,Ai-min ZHOU. Clustering model of user community subgroup’s demand based on complex networks. Journal of ZheJiang University (Engineering Science), 2023, 57(9): 1843-1855.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2023.09.016     OR     https://www.zjujournals.com/eng/Y2023/V57/I9/1843


基于复杂网络的用户社区子群需求聚类模型

用户需求描述存在部分关联信息缺失及需求转变趋势模糊的问题,为此提出新的聚类模型. 利用爬虫技术获取用户在线评论,依据Jieba分词结果构建异质词性有向链路网络,并基于PageRank算法计算词汇重要度,以重要度排序结果筛选特征词汇集. 通过改进重叠模块识别算法,增强子群边属性相似度计算,构建评论特征词汇的无向加权网络;通过计算Jaccard距离开展网络图属性层次聚类,确定用户购买决策信息. 通过多路径节点的相似性计算结果更新网络链路,实现用户需求预测. 以水壶为例,改进的模块度为0.69,链路预测准确率达到86%,表明所提出的聚类模型能够明晰聚类结果潜在关联信息,链路预测结果符合群体效应客观特征,聚类结果可以辅助设计师明晰用户群需求以开展针对性设计工作.


关键词: 用户决策,  复杂网络,  社区发现,  链路预测 
Fig.1 Research process of user community subgroup demand clustering model based on complex networks
Fig.2 Web crawler technology flow
Fig.3 Schematic diagram of directed link network
Fig.4 Generation process of undirected weighted network
Fig.5 Clustering and module division process
Fig.6 Edges similarity calculation of directed network
Fig.7 Products renderings (partial)
评论文本 购买日期 m
真的很好用,对商品非常满意,特别实用. 2022-01-10 1
这个很小巧,烧水速度很快,几乎没啥声音自带的功能也很多. 2022-03-01 1
$ \vdots $ $ \vdots $ $ \vdots $
颜值挺高的,白色很耐看也很配居家使用,准备再入手一个放在办公室. 2022-01-09 121
保温效果不错,容量的话稍微有点小,刚够自己一个人一早上的量. 2022-05-02 121
Tab.1 User comment text data crawling results (partial)
评论文本 分词结果 m
真的很好用,对商品非常满意,
特别实用.
好用、满意、实用 1
这个很小巧,烧水速度很快,几乎没啥声音自带的功能也很多. 养生壶、小巧、烧水、速度、功能 1
$\vdots $ $\vdots $ $\vdots $
颜值挺高的,白色很耐看也很配居家使用,准备再入手一个放在办公室 颜值、白色、居家、
办公室
121
保温效果不错,容量的话稍微有点小,刚够自己一个人一早上的量. 保温、容量、小 121
Tab.2 User comment text word segmentation results (partial)
Fig.8 Four types of part-of-speech link networks
词性 词汇数量 特征词汇
动词 (Verb) 573 保温、购买、控制、清洗、···、
操作、优化、密封
形容词或序数词 (JJ) 206 唯一、自动、实用、主要、···、
大、同样
形容词(Adj) 306 方便、漂亮、简约、干净、···、
精致、透明、便捷
名词 (Noun) 472 质量、功能、外观、速度、···、
价格、时尚、恒温
Tab.3 Filtered feature vocabulary (partial)
V v1 v2 v3 v4 v5 v6 ··· v146
v1 0 1 1 1 1 1 ··· 0
v2 1 0 1 1 1 1 ··· 0
$\vdots $ $\vdots $ $\vdots $ $\vdots $ $\vdots $ $\vdots $ $\vdots $ $\vdots $
v146 0 0 0 0 0 0 ··· 0
Tab.4 Adjacency matrix of feature vocabulary undirected network (partial)
V R w(vi)
v1 0.012 261 0.006 886
v2 0.018 427 0.006 929
v3 0.012 443 0.006 887
v4 0.018 449 0.006 929
$ \vdots $ $\vdots $ $\vdots $
v146 0.004 226 0.006 831
Tab.5 Feature vocabulary PageRank values and standardization (partial)
V W(vi) V W(vi)
v1 2.301 981 v4 4.324 156
v2 3.901 660 $ \vdots $ $ \vdots $
v3 1.609 552 v146 0.288 848
Tab.6 Node weight updated results (partial)
E w(eij) E w(eij)
v1-v2 0.240 561 v1-v6 0.138 447
v1-v3 0.061 063 v1-v7 0.043 440
v1-v4 0.236 022 $ \vdots $ $ \vdots $
v1-v5 0.072 965 v146-v138 0.046 154
Tab.7 Weight of edges between nodes (partial)
Fig.9 Undirected weighted network of comment feature vocabulary
Fig.10 Undirected weighted network clustering results of feature vocabulary
编号 用户购买行为
决策维度
特征词汇
01 产品质感(触觉) 包装/透明度/手感/玻璃/质感/···/家
庭/通透
02 造型设计(视觉) 材质/外观/款式/漂亮/大气/高端/···/
时尚/样子
03 服务性(性价比) 质量/物流/价格/品牌/品质/···/便宜/
性价比
04 实用性 容量/效果/方便/难易/清洗/特色/···/
快捷/热水
05 操控性 方式/菜单/操作/简单/温度/···/功能/
性能
06 安全性 安全/气味/味道/塑料/密封性/用料/···/
瑕疵/精细
07 体验感 大方/简洁/简约/美观/造型/精致/···/小
巧/样式
Tab.8 Decision dimension of user purchase behavior
Fig.11 Cross-summarization of sample and decision dimensions
模块编号 ψij
模块1 模块2 模块3 模块4 模块5 模块6 模块7
模块1 0.016 668 619 0.086 354 438 0.069 068 441 0.016 313 308 0.087 846 043 0.006 154 762
模块2 0.016 668 619 0.013 357 219 0.004 687 216 0.006 308 408 0.016 572 784 0.022 698 338
模块3 0.086 354 438 0.013 357 219 0254 936 1 0.047 531 501 0.039 625 591 0.013 705 564
模块4 0.069 068 441 0.004 687 216 0.254 936 1 0.031 339 468 0.101 056 076 0.038 998 219
模块5 0.016 313 308 0.006 308 408 0.047 531 501 0.031 339 468 0.019 495 627 0.003 386 003
模块6 0.087 846 043 0.016 572 784 0.039 625 591 0.101 056 076 0.019 495 627 0.046 899 991
模块7 0.006 154 762 0.022 698 338 0.013 705 564 0.038 998 219 0.003 386 003 0.046 899 991
Tab.9 Interconnect edge weights between modules
Fig.12 Calculated screenshot of network modularity
E* w(eij) E* w(eij)
v1-v24 0.05 v2-v72 0.09
v1-v100 0.43 v2-v102 0.08
v1-v107 0.24 $ \vdots $ $ \vdots $
v2-v9 0.07 v130-v134 0.03
Tab.10 New edges weights between nodes (partial)
Fig.13 Undirected weighted prediction network clustering results of feature vocabulary
编号 用户购买行为决策维度 特征词汇
01 产品质感(触觉) 手感/透明度/包装/质量/
买/···/密封/干净
02 实用性 大小/容量/方便/烧水/
购买/···/实用/声音
03 衍生功能 保温/效果/时间/温度/
手机/···/蓝牙/手机
04 造型设计(视觉) 外观/好看/做工/材质/
功能/···/精致/大气
05 体验感 便携/特色/设计/安装/
过滤/···/塑料/把手
Tab.11 Prediction results of user purchase behavior decision dimension
Fig.14 Area under curve calculation result of link
Fig.15 Latent Dirichlet allocation topic model verifies decision dimension clustering results
年度 时间跨度 NT NTP
2022 1—3月 7 4
1—6月 5 7
1—9月 6 5
1—12月
2021 1—3月 7 4
1—6月 5 3
1—9月 6 3
1—12月 4 4
2020 1—3月 3 6
1—6月 7 5
1—9月 6 8
1—12月 7 6
2019 1—3月 4 6
1—6月 6 8
1—9月 5 6
1—12月 5 5
2018 1—3月 5
1—6月 6
1—9月 6
1—12月 4
Tab.12 User demand forecast result validity test
[1]   张发明, 朱姝琪 社会网络环境下基于群体一致性的概率语言多属性大群体决策方法[J]. 系统管理学报, 2022, 31 (4): 679- 688
ZHANG Fa-ming, ZHU Shu-qi Probabilistic language multi-attribute large group decision-making method based on group consistency in social network analysis[J]. Journal of Systems and Management, 2022, 31 (4): 679- 688
[2]   罗仕鉴, 朱上上, 应放天, 等 产品设计中的用户隐性知识研究现状与进展[J]. 计算机集成制造系统, 2010, 16 (4): 673- 688
LUO Shi-jian, ZHU Shang-shang, YING Fang-tian, et al Statues and progress of research on users’ tacit knowledge in product design[J]. Computer Integrated Manufacturing Systems, 2010, 16 (4): 673- 688
doi: 10.13196/j.cims.2010.04.3.luoshj.009
[3]   林丽, 任丽, 阳明庆, 等 基于改进加权协同过滤的集群用户黑箱个性意象预测[J]. 浙江大学学报: 工学版, 2022, 56 (4): 803- 808
LIN Li, REN Li, YANG Ming-qing, et al Prediction of black-box personality image of cluster users based on improved weighted collaborative filtering[J]. Journal of Zhejiang University: Engineering Science, 2022, 56 (4): 803- 808
[4]   WATTS D J, STROGATZ S H Collective dynamics of ‘small-world’ networks[J]. Nature, 1998, 393: 440- 442
doi: 10.1038/30918
[5]   BARABASI A L, ALBERT R Emergence of scaling in random networks[J]. Since, 1999, 286 (5439): 509- 512
doi: 10.1126/science.286.5439.509
[6]   郑文萍, 刘美麟, 杨贵 一种基于节点稳定性和邻域相似性的社区发现算法[J]. 计算机科学, 2022, 49 (9): 83- 91
ZHENG Wen-ping, LIU Mei-lin, YANG Gui Community detection algorithm based on node stability and neighbor similarity[J]. Computer Science, 2022, 49 (9): 83- 91
doi: 10.11896/jsjkx.220400146
[7]   杨延璞, 龚政, 兰晨昕, 等 工业设计决策网络构建及其动态演化仿真[J]. 浙江大学学报: 工学版, 2021, 55 (12): 2298- 2306
YANG Yan-pu, GONG Zheng, LAN Chen-xin, et al Construction of industrial design decision-making network and its dynamic evolution simulation[J]. Journal of Zhejiang University: Engineering Science, 2021, 55 (12): 2298- 2306
[8]   杨旭华, 王磊, 叶蕾, 等 基于节点相似性和网络嵌入的复杂网络社区发现算法[J]. 计算机科学, 2022, 49 (3): 121- 128
YANG Xu-hua, WANG Lei, YE Lei, et al Complex network community detection algorithm based on node similarity and network embedding[J]. Computer Science, 2022, 49 (3): 121- 128
doi: 10.11896/jsjkx.210200009
[9]   柯建坤, 许忠好 Louvain算法与K均值聚类算法的比较研究[J]. 应用概率统计, 2022, 38 (5): 780- 790
KE Jian-kun, XU Zhong-hao A comparative study of Louvain algorithm and K-means clustering algorithm[J]. Chinese Journal of Applied Probability and Statistics, 2022, 38 (5): 780- 790
doi: 10.3969/j.issn.1001-4268.2022.05.010
[10]   李乾瑞, 郭俊芳, 黄颖, 等 基于突变—融合视角的颠覆性技术主题演化研究[J]. 科学学研究, 2022, 39 (12): 2129- 2139
LI Qian-rui, GUO Jun-fang, HUANG Ying, et al Topic evolution research of disruptive technology based on mutation and fusion perspective[J]. Studies in Science of Science, 2022, 39 (12): 2129- 2139
doi: 10.3969/j.issn.1003-2053.2022.12.003
[11]   ANDREA L, SANTO F, KERTESZ J, et al Detecting the overlapping and hierarchical community structure in complex networks[J]. New Journal of Physics, 2009, 11: 033015
doi: 10.1088/1367-2630/11/3/033015
[12]   PANTHADEEP B, PINAKI M A survey of density based clustering algorithms[J]. Frontiers of Computer Science, 2020, 15 (1): 139- 165
[13]   段庆锋, 陈红, 刘东霞, 等 基于LSTM模型与加权链路预测的学科新兴主题成长性识别研究[J]. 现代情报, 2022, 42 (9): 37- 48
DUAN Qing-feng, CHEN Hong, LIU Dong-xia, et al Identifying growth of discipline topics using LSTM and weighted link prediction[J]. Journal of Modern Information, 2022, 42 (9): 37- 48
doi: 10.3969/j.issn.1008-0821.2022.09.004
[14]   杨延璞 基于犹豫模糊语言术语集和粒子群优化算法的产品造型设计感性评价方法[J]. 图学学报, 2021, 42 (4): 680- 687
YANG Yan-pu Kansei evaluation method of product form design based on hesitant fuzzy linguistic term sets and particle swarm optimization[J]. Journal of Graphics, 2021, 42 (4): 680- 687
[15]   刘琳岚, 宋修洋, 陈宇斌 基于网络表示学习的机会网络链路预测[J]. 北京邮电大学学报, 2022, 45 (4): 64- 69
LIU Lin-lan, SONG Xiu-yang, CHEN Yu-bin Link prediction in opportunistic networks based on network representation learning[J]. Journal of Beijing University of Posts and Telecommunications, 2022, 45 (4): 64- 69
[16]   杨智勇, 许倩倩, 何源, 等 半监督AUC优化的Boosting算法及理论[J]. 计算机学报, 2022, 45 (8): 1598- 1617
YANG Zhi-Yong, XU Qian-qian, HE Yuan, et al Boosting-based semi-supervised AUC optimization: theory and algorithm[J]. Chinese Journal of Computers, 2022, 45 (8): 1598- 1617
doi: 10.11897/SP.J.1016.2022.01598
[17]   娄策群, 李罗佶, 王雪莹 基于Kano模型的B2C电商平台信息服务功能研究[J]. 现代情报, 2021, 41 (4): 26- 35
LOU Ce-qun, LI Luo-ji, WANG Xue-ying Research on information service function of B2C E-commerce platform based on Kano model[J]. Journal of Modern Information, 2021, 41 (4): 26- 35
doi: 10.3969/j.issn.1008-0821.2021.04.004
[18]   景瑶, 郭斌, 王柱, 等 基于群体智能挖掘的个性化商品评论呈现方法[J]. 浙江大学学报: 工学版, 2017, 51 (4): 675- 681
JING Yao, GUO Bin, WANG Zhu, et al CrowdReview: personalized product review presentation based on crowd intelligence mining[J]. Journal of Zhejiang University: Engineering Science, 2017, 51 (4): 675- 681
[19]   陆蔚华, 倪祎寒, 蔡志彬, 等 用户评论数据驱动的产品优化设计方法[J]. 计算机辅助设计与图形学学报, 2022, 34 (3): 482- 490
LU Wei-hua, NI Yi-han, CAI Zhi-bin, et al User review data-driven product optimization design method[J]. Journal of Computer-Aided Design and Computer Graphics, 2022, 34 (3): 482- 490
[20]   苏青 网络爬虫的演变及其合法性限定[J]. 比较法研究, 2021, (3): 89- 104
SU Qing Evolution of web crawling and conditions for its legitimacy[J]. Journal of Comparative Law, 2021, (3): 89- 104
[21]   GU K, FAN Y, DI Z Signed pageRank on online rating systems[J]. Journal of Systems Science and Complexity, 2022, 35: 58- 80
doi: 10.1007/s11424-021-0124-2
[22]   鲍阳, 杨志斌, 杨永强, 等 基于限定中文自然语言需求的SysML模型自动生成方法[J]. 计算机研究与发展, 2021, 58 (4): 706- 730
BAO Yang, YANG Zhi-bin, YANG Yong-qiang, et al An automated approach to generate SysML models from restricted natural language requirements in chinese[J]. Journal of Computer Research and Development, 2021, 58 (4): 706- 730
doi: 10.7544/issn1000-1239.2021.20200757
[23]   郭景峰, 刘苗苗, 罗旭 加权网络中基于多路径节点相似性的链接预测[J]. 浙江大学学报: 工学版, 2016, 50 (7): 1347- 1352
GUO Jing-feng, LIU Miao-miao, LUO Xu Link prediction based on similarity of nodes of multipath in weighted social networks[J]. Journal of Zhejiang University: Engineering Science, 2016, 50 (7): 1347- 1352
[24]   赵文涛, 张烁 稀疏数据下基于用户偏好的协同过滤算法[J]. 重庆邮电大学学报: 自然科学版, 2021, 33 (4): 669- 674
ZHAO Wen-tao, ZHANG Shuo Collaborative filtering algorithm based on user preference in sparse data[J]. Journal of Chongqing University of Posts and Telecommunications: Natural Science Edition, 2021, 33 (4): 669- 674
[25]   蒋璐, 陈云伟 多节点多关系的混合网络社团划分研究综述[J]. 图书情报工作, 2021, 65 (19): 142- 150
JIANG Lu, CHEN Yun-wei A review of community detection in hybrid networks with multiple nodes and multiple relationships[J]. Library and Information Service, 2021, 65 (19): 142- 150
doi: 10.13266/j.issn.0252-3116.2021.19.014
[26]   方祺娜, 许小可 基于异质模体特征的社交网络链路预测[J]. 电子科技大学学报, 2022, 51 (2): 274- 281
FANG Qi-na, XU Xiao-ke Link prediction by hetergeneous motifs in social networks[J]. Journal of University of Electronic Science and Technology of China, 2022, 51 (2): 274- 281
doi: 10.12178/1001-0548.2021181
[27]   陈嘉钰, 李艳 基于LDA主题模型的社交媒体倦怠研究: 以微信为例[J]. 情报科学, 2019, 37 (12): 78- 86
CHEN Jia-yu, LI Yan Social media fatigue research based on LDA topic model: take WeChat as an example[J]. Information Science, 2019, 37 (12): 78- 86
doi: 10.13833/j.issn.1007-7634.2019.12.012
[1] Song LI,Shi-tai SHU,Xiao-hong HAO,Zhong-xiao HAO. Knowledge representation learning method integrating textual description and hierarchical type[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(5): 911-920.
[2] Cheng CHEN,Hao ZHANG,Yong-qiang LI,Yuan-jing FENG. Knowledge graph link prediction based on relational generative graph attention network[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(5): 1025-1034.
[3] Yan-pu YANG,Zheng GONG,Chen-xin LAN,Zi-jing LEI,Xin-rui WANG. Construction of industrial design decision-making network and its dynamic evolution simulation[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(12): 2298-2306.
[4] Lin YANG,Jia-jun WANG. Risk transfer process in urban comprehensive corridor PPP project based on complex network model[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(9): 1666-1676.
[5] Peng ZHU,Jian-bo YU,Xiao-yun ZHENG,Yong-song WANG,Xi-wu SUN. Variation propagation network-based modeling and error tracing in mechanical assembling process[J]. Journal of ZheJiang University (Engineering Science), 2019, 53(8): 1582-1593.
[6] ZHANG Lin, CHENG Hua, FANG Yi-quan. CNN-based link representation and prediction method[J]. Journal of ZheJiang University (Engineering Science), 2018, 52(3): 552-559.
[7] HU Gang, XU Xiang Xiang, GUO Xiu-cheng. Importance calculation of complex network nodes based on interpretive structural modeling method[J]. Journal of ZheJiang University (Engineering Science), 2018, 52(10): 1989-1997.
[8] DAI Cai-yan, CHEN Ling, LI Bin, CHEN Bo-lun. Sampling-based link prediction in complex networks[J]. Journal of ZheJiang University (Engineering Science), 2017, 51(3): 554-561.
[9] GUO Jing feng,LIU Miao miao,LUO Xu. Link prediction based on similarity of nodes of multipath in weighted social networks[J]. Journal of ZheJiang University (Engineering Science), 2016, 50(7): 1347-1352.
[10] XUAN Qi, WU Tie-jun. Network model and heuristic scheduling rule designing method for
complex open shop problems
[J]. Journal of ZheJiang University (Engineering Science), 2011, 45(6): 961-968.
[11] XUAN Qi, WU Tie-jun. Open shop complex scheduling network model and
characteristic analysis
[J]. Journal of ZheJiang University (Engineering Science), 2011, 45(4): 589-595.
[12] WU Ying-ying, WEI Wei, LI Guo-yang. Pinning control of complex networks via network division[J]. Journal of ZheJiang University (Engineering Science), 2011, 45(3): 495-502.
[13] CHEN Meng-Liang. Agent-network:a fusion method for heterogeneous business intelligence technologies[J]. Journal of ZheJiang University (Engineering Science), 2009, 43(6): 1053-1059.