对中国传统音乐潜在空间的可视探索

doi:10.1016/j.visinf.2020.04.003

Vis Inf

2020, Vol. 4

Issue (2): 99-108 DOI: 10.1016/j.visinf.2020.04.003

论文

对中国传统音乐潜在空间的可视探索

Jingyi Shen, Runqi Wang, Han-Wei Shen

The Ohio State University, 2015 Neil Ave, Columbus, OH 43210, United States

Visual exploration of latent space for traditional Chinese music

Jingyi Shen, Runqi Wang, Han-Wei Shen

The Ohio State University, 2015 Neil Ave, Columbus, OH 43210, United States

全文: PDF

摘要：

生成数据的简洁有效的数值表示是许多机器学习任务的基本步骤。传统上使用手工创建的特征，但随着深度学习开始显示其潜力，使用深度学习模型来提取其简洁表示成为新的趋势。其中，采用模型潜在空间向量是最为流行的方法。已有数项研究聚焦于自然语言处理（NLP）和计算机视觉潜在空间的可视分析方面。

然而，对音乐信息检索（MIR），特别是结合可视化方法的研究相对较少。为了填补这一空缺，来自美国俄亥俄州立大学的Han-Wei Shen团队提出了一个可视分析系统，利用自动编码器来支持对中国传统音乐的分析和探索。由于缺乏合适的中国传统音乐数据，他们从一组预先录制的音频中构造了一个标记数据集，然后将它们转换为声谱图。

系统采用由两个深度学习模型（一个全连接的自动编码器和一个长短时记忆（LSTM）的自动编码器）学到的音乐特征作为输入。通过交互选择、相似度计算、聚类和聆听，证明了编码数据的潜在表示使我们的系统能够识别出基本的音乐元素，从而为将来对中国音乐进行进一步分析和检索奠定了基础。

关键词： 音乐信息检索; 潜在空间分析; 长短时记忆; 自动编码器; 中国传统音乐

Abstract: Generating compact and effective numerical representations of data is a fundamental step for many machine learning tasks. Traditionally, handcrafted features are used but as deep learning starts to show its potential, using deep learning models to extract compact representations becomes a new trend. Among them, adopting vectors from the model’s latent space is the most popular. There are several studies focused on visual analysis of latent space in NLP and computer vision. However, relatively little work has been done for music information retrieval (MIR) especially incorporating visualization. To bridge this gap, we propose a visual analysis system utilizing Autoencoders to facilitate analysis and exploration of traditional Chinese music. Due to the lack of proper traditional Chinese music data, we construct a labeled dataset from a collection of pre-recorded audios and then convert them into spectrograms. Our system takes music features learned from two deep learning models (a fully-connected Autoencoder and a Long Short-Term Memory (LSTM) Autoencoder) as input. Through interactive selection, similarity calculation, clustering and listening, we show that the latent representations of the encoded data allow our system to identify essential music elements, which lay the foundation for further analysis and retrieval of Chinese music in the future.

Key words: Music information retrieval Latent space analysis Long Short-Term Memory Autoencoder Traditional Chinese music

出版日期: 2020-06-03

通讯作者: Jingyi Shen E-mail: shen.1250@osu.edu

	服务
	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	Jingyi Shen
	Runqi Wang
	Han-Wei Shen

引用本文:

Jingyi Shen, Runqi Wang, Han-Wei Shen. Visual exploration of latent space for traditional Chinese music. Vis Inf, 2020, 4(2): 99-108.

链接本文:

http://www.zjujournals.com/vi/CN/10.1016/j.visinf.2020.04.003 或 http://www.zjujournals.com/vi/CN/Y2020/V4/I2/99

[1]	Rongchen Guo, Takanori Fujiwara, Yiran Li, Kelly M. Lima, Soman Sen, Nam K. Tran, Kwan-Liu Ma. 借助于序列嵌入实现相似患者病历的比较式可视分析[J]. Vis Inf, 2020, 4(2): 86-98.

Viewed

Full text

Abstract

Cited

Shared

Discussed