Evaluating single-channel speech separation performance in transform-domain

doi:10.1631/jzus.C0910087

Front. Inform. Technol. Electron. Eng.

2010, Vol. 11

Issue (3): 160-174 DOI: 10.1631/jzus.C0910087

Evaluating single-channel speech separation performance in transform-domain

Pejman MOWLAEE, Abolghasem SAYADIYAN, Hamid SHEIKHZADEH

Department of Electronic Engineering, Amirkabir University of Technology, Tehran 15875-4413, Iran

Evaluating single-channel speech separation performance in transform-domain

Pejman MOWLAEE, Abolghasem SAYADIYAN, Hamid SHEIKHZADEH

Department of Electronic Engineering, Amirkabir University of Technology, Tehran 15875-4413, Iran

全文: PDF(269 KB)

摘要： Single-channel separation (SCS) is a challenging scenario where the objective is to segregate speaker signals from their mixture with high accuracy. In this research a novel framework called subband perceptually weighted transformation (SPWT) is developed to offer a perceptually relevant feature to replace the commonly used magnitude of the short-time Fourier transform (STFT). The main objectives of the proposed SPWT are to lower the spectral distortion (SD) and to improve the ideal separation quality. The performance of the SPWT is compared to those obtained using mixmax and Wiener filter methods. A comprehensive statistical analysis is conducted to compare the SPWT quantization performance as well as the ideal separation quality with other features of log-spectrum and magnitude spectrum. Our evaluations show that the SPWT provides lower SD values and a more compact distribution of SD, leading to more acceptable subjective separation quality as evaluated using the mean opinion score.

关键词： Single-channel separation (SCS); Magnitude spectrum; Vector quantization (VQ); Subband perceptually weighted transformation (SPWT); Spectral distortion (SD)

Abstract: Single-channel separation (SCS) is a challenging scenario where the objective is to segregate speaker signals from their mixture with high accuracy. In this research a novel framework called subband perceptually weighted transformation (SPWT) is developed to offer a perceptually relevant feature to replace the commonly used magnitude of the short-time Fourier transform (STFT). The main objectives of the proposed SPWT are to lower the spectral distortion (SD) and to improve the ideal separation quality. The performance of the SPWT is compared to those obtained using mixmax and Wiener filter methods. A comprehensive statistical analysis is conducted to compare the SPWT quantization performance as well as the ideal separation quality with other features of log-spectrum and magnitude spectrum. Our evaluations show that the SPWT provides lower SD values and a more compact distribution of SD, leading to more acceptable subjective separation quality as evaluated using the mean opinion score.

Key words: Single-channel separation (SCS) Magnitude spectrum Vector quantization (VQ) Subband perceptually weighted transformation (SPWT) Spectral distortion (SD)

收稿日期: 2009-02-12 出版日期: 2010-03-01

CLC:

TN912.3

基金资助: A preliminary version of this paper was presented at the 7th ACS/IEEE International Conference on Computer Systems and Ap-
plications, Rabat, Morocco, 2009

	服务
	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	Pejman MOWLAEE
	Abolghasem SAYADIYAN
	Hamid SHEIKHZADEH

引用本文:

Pejman MOWLAEE, Abolghasem SAYADIYAN, Hamid SHEIKHZADEH. Evaluating single-channel speech separation performance in transform-domain. Front. Inform. Technol. Electron. Eng., 2010, 11(3): 160-174.

链接本文:

http://www.zjujournals.com/xueshu/fitee/CN/10.1631/jzus.C0910087 或 http://www.zjujournals.com/xueshu/fitee/CN/Y2010/V11/I3/160

[1]	Li-chun Yang, Yun-tao Qian. 基于稀疏编码的广义旁瓣抵消器语音增强算法[J]. Front. Inform. Technol. Electron. Eng., 2014, 15(12): 1154-1163.
[2]	Junhong Zhao, Ji Xu, Wei-qiang Zhang, Hua Yuan, Jia Liu, Shanhong Xia. Exploiting articulatory features for pitch accent detection[J]. Front. Inform. Technol. Electron. Eng., 2013, 14(11): 835-844.
[3]	Hong Hong, Xiao-hua Zhu, Wei-min Su, Run-tong Geng, Xin-long Wang. Detection of time varying pitch in tonal languages: an approach based on ensemble empirical mode decomposition[J]. Front. Inform. Technol. Electron. Eng., 2012, 13(2): 139-145.
[4]	Myoungbeom Chung, Ilju Ko. An algorithm that minimizes audio fingerprints using the difference of Gaussians[J]. Front. Inform. Technol. Electron. Eng., 2011, 12(10): 836-845.
[5]	Myoung-beom CHUNG, Il-ju KO. Identical-video retrieval using the low-peak feature of a video’s audio information[J]. Front. Inform. Technol. Electron. Eng., 2010, 11(3): 151-159.

Viewed

Full text

Abstract

Cited

Shared

Discussed