Microblog topics summarization algorithm merging sentential semantic structure model

doi:10.3785/j.issn.1008-973X.2015.12.011

JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE)

Computer Technology

Microblog topics summarization algorithm merging sentential semantic structure model

LIN Meng, LUO Sen lin, JIA Cong fei, HAN Lei, YUAN Yu jiao, PAN Li min

1.School of Information and Electronics, Beijing Institute of Technology, Beijing 100081, China

Download:

PDF(1167KB) HTML
Export: BibTeX | EndNote (RIS)

Abstract

A new microblog summarization framework based on sentential semantic structure model was proposed in order to provide concise summarization to help users quickly grasp the essence of topics. Sentential semantic features were extracted by sentential semantic structure model. Latent Dirichlet allocation (LDA) topic model was used to calculate the pairwise sentence similarities and construct the similarity matrix based on sentential semantic structure. Sentences were clustered into several subtopics and the sentential relationship features were obtained. The most informative sentences were extracted from each subtopic through combining both sentential semantic features and relationship features. As a result, the value of ROUGE outperforms the contrast algorithms when the the compress ratio was 0.5%, 1.0% and 1.5%. The value of ROUGE 1 was 51.30%, while that of ROUGE SU* was 25.27% when the compress ratio was 1.5%. Results indicate that the method that introduces sentential semantic structure model can better understand sentential semantic, and the extracted semantic features can highlight the description power of sentential semantic. Meanwhile, using both sentential semantic features and relationship features can enrich the features representation and reduce information loss, increasing the semantic relevance of similar data. Moreover, the impact of noise can be reduced. Besides, the proposed method has excellent generalization ability and can be applied to various topics.

Published: 31 December 2015

CLC:

TP 391

Fund:

林萌（1991—）,女,硕士生,从事中文信息处理的研究.ORCID: 0000 0002 1970 5532. E-mail：lemon0919@bit.edu.cn

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors

Cite this article:

LIN Meng, LUO Sen lin, JIA Cong fei, HAN Lei, YUAN Yu jiao, PAN Li min. Microblog topics summarization algorithm merging sentential semantic structure model. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2015, 49(12): 2316-2325.

URL:

http://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2015.12.011 OR http://www.zjujournals.com/eng/Y2015/V49/I12/2316

融合句义结构模型的微博话题摘要算法

为了更快地从海量微博中获取话题的核心内容,提出融合句义结构模型的微博话题摘要方法.该方法利用句义结构模型抽取句子的语义格得到句子的语义特征,并基于LDA主题模型使用句义结构计算句子两两之间的语义相似度构建相似度矩阵,划分子主题类,得到句子的关联特征.融合句子的语义特征和关联特征,选取子主题内信息量最大的句子作为摘要结果.当压缩比为0.5%、1.0%和1.5%时,ROUGE值均明显优于对比系统.当压缩比为1.5%时,ROUGE 1值达到51.30%,ROUGE SU*达到25.27%.实验结果表明：融合句义结构模型的分析方法能够深化句子的语义分析层次,提取的句义特征增强了语义信息的表达能力.综合考虑句子语义特征和关联特征的句子权重计算方法能够丰富句子的特征表示,减少语义信息丢失,使同类数据的语义相关性增强,有效降低了噪声的影响,从而提升摘要与话题的相关度.此外,所提出的方法处理不同话题的泛化能力较好,适用范围较广.

［1］ Wikipedia. Sina Weibo ［EB/OL］. （2014 11 10)\[2015 10 20］. https:∥en.wikipedia.org/wiki/Sina_Weibo.
［2］ HE Y, SU W, TIAN Y, et al. Summarizing microblogs on network hot topics ［C］ ∥ Proceedings of the 2011 International Conference on Internet Technology and Applications (iTAP 2011). New York: Piscataway, 2011:1-4.
［3］ LONG R, WANG H F, CHEN Y Q, et al. Towards effective event detection, tracking and summarization on microblog data ［M］ ∥ Web Age Information Management. Berlin: Springer, 2011: 652-663.
［4］ WILLIAN H, ZHANG Y. Threshold and associative based classification for social spam profile detection on Twitter ［C］ ∥ 2013 9th International Conference onSemantics, Knowledge and Grids (SKG). New York:Piscataway, 2013: 113-120.
［5］ VANDERWENDE L, SUZUKI H, BROCKETT C, et al. Beyond SumBasic: task focused summarization with sentence simplification and lexical expansion ［J］. Information Processing and Management, 2007, 43(6):1606-1618.
［6］ RADEV D R, JING H, STYS M, et al. Centroid based summarization of multiple documents ［J］. Information Processing and Management, 2004, 40(6): 919-938.
［7］ SINGH M, KHAN F U. Effect of incremental EM on document summarization using probabilistic latent semantic analysis ［C］ ∥ Proceedings of the World Congress on Engineering (WCE 2012). Hong Kong: Newswood Limited, 2012: 21-98.
［8］ GAO D, LI W, OUYANG Y, et al. LDA based topic formation and topic sentence reinforcement for graph based multi document summarization ［M］ ∥ Information Retrieval Technology. Berlin: Springer, 2012:376-385.
［9］ ARORA R, RAVINDRAN B. Latent dirichlet allocation based multi document summarization ［C］ ∥ Proceedings of the 2nd Workshop on Analytics for Noisy Unstructured Text Data. Singapore: ACM, 2008: 91-97.
［10］ BINTI ZAHRI N A H, FUKUMOTO F, MATSUYOSHI S. Link analysis based on rhetorical relations for multi document summarization ［J］. IEICE Transactions on Information and Systems, 2013, 96(5):1182-1191.
［11］ SUJATHA C, CHIVATE A R, GANIHAR S A, et al. Time driven video summarization using GMM ［C］ ∥ 2013 4th National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG). Piscataway: IEEE, 2013: 1-4.
［12］ OLARIU A. Clustering to improve microblog stream summarization ［C］ ∥ 2012 14th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC 2012). Timisoara: IEEE, 2012: 220-226.
［13］ ZHANG R, LI W, GAO D, et al. Automatic Twitter topic summarization with speech acts ［J］. IEEE Transactions on Audio Speech and Language Processing, 2013, 21(3): 649-658.
［14］ KHAN M A H, BOLLEGALA D, LIU G, et al. Multi tweet summarization of real time events ［C］ ∥ 2013 International Conference on Social Computing (SocialCom). Washington DC: ASE/IEEE, 2013:128-133.
［15］ LIU F, LIU Y, WENG F L. Why is “SXSW” trending？ Exploring multiple text sources for twitter topic summarization ［C］ ∥ Proceedings of the Workshop on Languages in Social Media (LSM 2011). Strasbourg: Association for Computational Linguistics, 2011:66-75.
［16］ SHARIFI B, HUTTON M, KALITA J. Summarizing microblogs automatically ［C］ ∥ 2010 Human Language Technologies Conference of the North American Chapter of the Association for Computational Linguistics, NAACL HLT 2010. Los Angeles: ACL, 2010: 685-688.
［17］ HARABAGIU S M, HICKL A. Relevance modeling for microblog summarization ［C］ ∥ Proceedings of the 5th International Conference on Weblogs and Social Media. Menlo Park: AAAI, 2011: 514-517.
［18］ CHAKRABARTI D, PUNERA K. Event summarization using Tweets ［C］ ∥ Proc of the 5th Int AAAI Conference and Social Media (ICWSM’11). Menlo Park: AAAI, 2011: 66-73.
［19］ INOUYE D, KALITA J K. Comparing Twitter Summarization Algorithms for Multiple Post Summaries ［C］ ∥ Proceedings of the 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust and IEEE Third International Conference on Social Computing (PASSAT/SocialCom 2011). Boston: IEEE, 2011: 298-306.
［20］ ERKAN G, RADEV D R. LexRank: graph based lexical centrality as salience in text summarization ［J］. Journal of Artificial Intelligence Research, 2004:457-479.
［21］ MIHALCEA R, TARAU P. TextRank: bringing order into texts ［C］ ∥ Conference on Empirical Methods in Natural Language Processing (EMNLP), 2004.Barcelona: ACL, 2004: 275-279.
［22］ BIAN J, YANG Y, CHUA T. Multimedia summarization for trending topics in microblogs ［C］∥ 22nd ACM International Conference on Information and Knowledge Management, CIKM 2013. San Francisco: ACM, 2013: 1807-1812.
［23］罗森林, 韩磊, 潘丽敏, 等. 汉语句义结构模型及其验证［J］. 北京理工大学学报, 2013, 33(2): 166-171.
LUO Sen lin, HAN Lei, PAN Li min, et al. Chinese sentential semantic mode and verification ［J］. Transactions of Beijing Institute of Technology, 2013, 33(2): 166-171.
［24］罗森林, 刘盈盈, 冯扬, 等. BFS CTC 汉语句义结构标注语料库构建方法［J］. 北京理工大学学报, 2012, 32(3): 311-315.
LUO Sen lin, LIU Ying ying, FENG Yang, et al. Method of building BFS CTC: a Chinese Tagged corpus of sentential semantic structure ［J］. Transactions of Beijing Institute of Technology, 2012, 32(3):311-315.
［25］张华平. ICTCLAS2013版［CP/OL］.（2013 11 15)［2015 10 20］. http:∥ictclas.nlpir.org/newsdownloads？DocId=352.
［26］ BLEI D M, NG A Y, JORDAN M I. Latent dirichlet allocation ［J］. Journal of Machine Learning Research. 2003, 3(4/5): 993-1022.
［27］中国计算机学会中文信息技术专业委员会. 第二届自然语言处理与中文计算会议技术评测结果［CP/OL］. (2013 06 15)［2015 10 20］. http:∥tcci.ccf.org.cn/conference/2013/pages/page04_evares.html.
［28］ LIN C Y. Rouge: a package for automatic evaluation of summaries ［C］ ∥ Text Summarization Branches Out: Proceedings of the ACL 04 Workshop. Barcelona: ACL, 2004: 74-81.

[1]	HE Xue-jun, WANG Jin, LU Guo-dong, LIU Zhen-yu, CHEN Li, JIN Jing. 3D head portrait sculpture by industrial robot based on triangular mesh slicing and collision detection[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(6): 1104-1110.

[2]	WANG Hua, HAN Tong-yang, ZHOU Ke. KeyGraph-based community detection algorithm for public security intelligence[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(6): 1173-1180.

[3]	YOU Hai-hui, MA Zeng-yi, TANG Yi-jun, WANG Yue-lan, ZHENG Lin, YU Zhong, JI Cheng-jun. Soft measurement of heating value of burning municipal solid waste for circulating fluidized bed[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(6): 1163-1172.

[4]	BI Xiao-jun, WANG Jia-hui. Teaching-learning-based optimization algorithm with hybrid learning strategy[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(5): 1024-1031.

[5]	WANG Liang, YU Zhi-wen, GUO Bin. Moving trajectory prediction model based on double layer multi-granularity knowledge discovery[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(4): 669-674.

[6]	LIAO Miao, ZHAO Yu-qian, ZENG Ye-zhan, HUANG Zhong-chao, ZHANG Bing-kui, ZOU Bei-ji. Automatic segmentation for cell images based on support vector machine and ellipse fitting[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(4): 722-728.

[7]	MU Jing-jing, ZHAO Xin-yue, HE Zai-xing, ZHANG Shu-you. Contour reconstruction of overlapped bubbles based on concave-convex transformation and circle fitting[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(4): 714-721.

[8]	HUANG Zheng-yu, JIANG Xin-long, LIU Jun-fa, CHEN Yi-qiang, GU Yang. Fusion feature based semi-supervised manifold localization method[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(4): 655-662.

[9]	JIANG Xin-long, CHEN Yi-qiang, LIU Jun-fa, HU Li-sha, SHEN Jian-fei. Wearable system to support proximity awareness for people with autism[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(4): 637-647.

[10]	DAI Cai-yan, CHEN Ling, LI Bin, CHEN Bo-lun. Sampling-based link prediction in complex networks[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(3): 554-561.

[11]	LIU Lei, YANG Peng, LIU Zuo-jun. Locomotion-Mode recognition using multiple kernel relevance vector machine[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(3): 562-571.

[12]	GUO Meng-li, DA Fei-peng, DENG Xing, GAI Shao-yan. 3D face recognition based on keypoints and local feature[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(3): 584-589.

[13]	WANG Hai jun, GE Hong juan, ZHANG Sheng yan. Fast object tracking algorithm via kernel collaborative presentation[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(2): 399-407.

[14]	ZHANG Ya nan, CHEN De yun, WANG Ying jie, LIU Yu peng. Incremental graph pattern matching based dynamic recommendation method for cold-start user[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(2): 408-415.

[15]	LIU Yu peng, QIAO Xiu ming, ZHAO Shi lei, MA Chun guang. Deep combination of large-scale features in statistical machine translation[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(1): 46-56.

Viewed

Full text

Abstract

Cited

Shared

Discussed