Please wait a minute...
Vis Inf  2020, Vol. 4 Issue (2): 99-108    DOI: 10.1016/j.visinf.2020.04.003
论文     
对中国传统音乐潜在空间的可视探索
Jingyi Shen, Runqi Wang, Han-Wei Shen
The Ohio State University, 2015 Neil Ave, Columbus, OH 43210, United States
Visual exploration of latent space for traditional Chinese music

Jingyi Shen, Runqi Wang, Han-Wei Shen

The Ohio State University, 2015 Neil Ave, Columbus, OH 43210, United States
 全文: PDF 
摘要:

生成数据的简洁有效的数值表示是许多机器学习任务的基本步骤。传统上使用手工创建的特征,但随着深度学习开始显示其潜力,使用深度学习模型来提取其简洁表示成为新的趋势。其中,采用模型潜在空间向量是最为流行的方法。已有数项研究聚焦于自然语言处理(NLP)和计算机视觉潜在空间的可视分析方面。

然而,对音乐信息检索(MIR),特别是结合可视化方法的研究相对较少。为了填补这一空缺,来自美国俄亥俄州立大学的Han-Wei Shen团队提出了一个可视分析系统,利用自动编码器来支持对中国传统音乐的分析和探索。由于缺乏合适的中国传统音乐数据,他们从一组预先录制的音频中构造了一个标记数据集,然后将它们转换为声谱图。

系统采用由两个深度学习模型(一个全连接的自动编码器和一个长短时记忆(LSTM)的自动编码器)学到的音乐特征作为输入。通过交互选择、相似度计算、聚类和聆听,证明了编码数据的潜在表示使我们的系统能够识别出基本的音乐元素,从而为将来对中国音乐进行进一步分析和检索奠定了基础。

关键词: 音乐信息检索潜在空间分析长短时记忆自动编码器中国传统音乐    
Abstract: Generating compact and effective numerical representations of data is a fundamental step for many machine learning tasks. Traditionally, handcrafted features are used but as deep learning starts to show its potential, using deep learning models to extract compact representations becomes a new trend. Among them, adopting vectors from the model’s latent space is the most popular. There are several studies focused on visual analysis of latent space in NLP and computer vision. However, relatively little work has been done for music information retrieval (MIR) especially incorporating visualization. To bridge this gap, we propose a visual analysis system utilizing Autoencoders to facilitate analysis and exploration of traditional Chinese music. Due to the lack of proper traditional Chinese music data, we construct a labeled dataset from a collection of pre-recorded audios and then convert them into spectrograms. Our system takes music features learned from two deep learning models (a fully-connected Autoencoder and a Long Short-Term Memory (LSTM) Autoencoder) as input. Through interactive selection, similarity calculation, clustering and listening, we show that the latent representations of the encoded data allow our system to identify essential music elements, which lay the foundation for further analysis and retrieval of Chinese music in the future.
Key words: Music information retrieval    Latent space analysis    Long Short-Term Memory    Autoencoder    Traditional Chinese music
出版日期: 2020-06-03
通讯作者: Jingyi Shen     E-mail: shen.1250@osu.edu
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  
Jingyi Shen
Runqi Wang
Han-Wei Shen

引用本文:

Jingyi Shen, Runqi Wang, Han-Wei Shen. Visual exploration of latent space for traditional Chinese music. Vis Inf, 2020, 4(2): 99-108.

链接本文:

http://www.zjujournals.com/vi/CN/10.1016/j.visinf.2020.04.003        http://www.zjujournals.com/vi/CN/Y2020/V4/I2/99

[1] Rongchen Guo, Takanori Fujiwara, Yiran Li, Kelly M. Lima, Soman Sen, Nam K. Tran, Kwan-Liu Ma. 借助于序列嵌入实现相似患者病历的比较式可视分析[J]. Vis Inf, 2020, 4(2): 86-98.