Please wait a minute...
Front. Inform. Technol. Electron. Eng.  2010, Vol. 11 Issue (3): 160-174    DOI: 10.1631/jzus.C0910087
    
Evaluating single-channel speech separation performance in transform-domain
Pejman MOWLAEE, Abolghasem SAYADIYAN, Hamid SHEIKHZADEH
Department of Electronic Engineering, Amirkabir University of Technology, Tehran 15875-4413, Iran
Evaluating single-channel speech separation performance in transform-domain
Pejman MOWLAEE, Abolghasem SAYADIYAN, Hamid SHEIKHZADEH
Department of Electronic Engineering, Amirkabir University of Technology, Tehran 15875-4413, Iran
 全文: PDF(269 KB)  
摘要: Single-channel separation (SCS) is a challenging scenario where the objective is to segregate speaker signals from their mixture with high accuracy. In this research a novel framework called subband perceptually weighted transformation (SPWT) is developed to offer a perceptually relevant feature to replace the commonly used magnitude of the short-time Fourier transform (STFT). The main objectives of the proposed SPWT are to lower the spectral distortion (SD) and to improve the ideal separation quality. The performance of the SPWT is compared to those obtained using mixmax and Wiener filter methods. A comprehensive statistical analysis is conducted to compare the SPWT quantization performance as well as the ideal separation quality with other features of log-spectrum and magnitude spectrum. Our evaluations show that the SPWT provides lower SD values and a more compact distribution of SD, leading to more acceptable subjective separation quality as evaluated using the mean opinion score.
关键词: Single-channel separation (SCS)Magnitude spectrumVector quantization (VQ)Subband perceptually weighted transformation (SPWT)Spectral distortion (SD)    
Abstract: Single-channel separation (SCS) is a challenging scenario where the objective is to segregate speaker signals from their mixture with high accuracy. In this research a novel framework called subband perceptually weighted transformation (SPWT) is developed to offer a perceptually relevant feature to replace the commonly used magnitude of the short-time Fourier transform (STFT). The main objectives of the proposed SPWT are to lower the spectral distortion (SD) and to improve the ideal separation quality. The performance of the SPWT is compared to those obtained using mixmax and Wiener filter methods. A comprehensive statistical analysis is conducted to compare the SPWT quantization performance as well as the ideal separation quality with other features of log-spectrum and magnitude spectrum. Our evaluations show that the SPWT provides lower SD values and a more compact distribution of SD, leading to more acceptable subjective separation quality as evaluated using the mean opinion score.
Key words: Single-channel separation (SCS)    Magnitude spectrum    Vector quantization (VQ)    Subband perceptually weighted transformation (SPWT)    Spectral distortion (SD)
收稿日期: 2009-02-12 出版日期: 2010-03-01
CLC:  TN912.3  
基金资助: A  preliminary  version  of  this  paper  was  presented  at  the  7th ACS/IEEE  International  Conference  on  Computer  Systems  and  Ap-
plications, Rabat, Morocco, 2009
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  
Pejman MOWLAEE
Abolghasem SAYADIYAN
Hamid SHEIKHZADEH

引用本文:

Pejman MOWLAEE, Abolghasem SAYADIYAN, Hamid SHEIKHZADEH. Evaluating single-channel speech separation performance in transform-domain. Front. Inform. Technol. Electron. Eng., 2010, 11(3): 160-174.

链接本文:

http://www.zjujournals.com/xueshu/fitee/CN/10.1631/jzus.C0910087        http://www.zjujournals.com/xueshu/fitee/CN/Y2010/V11/I3/160

[1] Li-chun Yang, Yun-tao Qian. 基于稀疏编码的广义旁瓣抵消器语音增强算法[J]. Front. Inform. Technol. Electron. Eng., 2014, 15(12): 1154-1163.
[2] Junhong Zhao, Ji Xu, Wei-qiang Zhang, Hua Yuan, Jia Liu, Shanhong Xia. Exploiting articulatory features for pitch accent detection[J]. Front. Inform. Technol. Electron. Eng., 2013, 14(11): 835-844.
[3] Hong Hong, Xiao-hua Zhu, Wei-min Su, Run-tong Geng, Xin-long Wang. Detection of time varying pitch in tonal languages: an approach based on ensemble empirical mode decomposition[J]. Front. Inform. Technol. Electron. Eng., 2012, 13(2): 139-145.
[4] Myoungbeom Chung, Ilju Ko. An algorithm that minimizes audio fingerprints using the difference of Gaussians[J]. Front. Inform. Technol. Electron. Eng., 2011, 12(10): 836-845.
[5] Myoung-beom CHUNG, Il-ju KO. Identical-video retrieval using the low-peak feature of a video’s audio information[J]. Front. Inform. Technol. Electron. Eng., 2010, 11(3): 151-159.