Speaker adapted dynamic lexicons containing phonetic deviations of words

doi:10.1631/jzus.A0820761

Journal of Zhejiang University-SCIENCE A (Applied Physics & Engineering)

2009, Vol. 10

Issue (10): 1461-1475 DOI: 10.1631/jzus.A0820761

Computer & Automation

Speaker adapted dynamic lexicons containing phonetic deviations of words

Bahram VAZIRNEZHAD, Farshad ALMASGANJ, Seyed Mohammad AHADI, Ari CHANEN

Biomedical Engineering Department, Amirkabir University of Technology, Hafez Avenue, Tehran, Iran; Electrical Engineering Department, Amirkabir University of Technology, Hafez Avenue, Tehran, Iran; Language and Knowledge Management Research Lab, School of Information Technologies, University of Sydney, NSW, Australia

Download:

PDF (0 KB)
Export: BibTeX | EndNote (RIS)

Abstract Speaker variability is an important source of speech variations which makes continuous speech recognition a difficult task. Adapting automatic speech recognition (ASR) models to the speaker variations is a well-known strategy to cope with the challenge. Almost all such techniques focus on developing adaptation solutions within the acoustic models of the ASR systems. Although variations of the acoustic features constitute an important portion of the inter-speaker variations, they do not cover variations at the phonetic level. Phonetic variations are known to form an important part of variations which are influenced by both micro-segmental and suprasegmental factors. Inter-speaker phonetic variations are influenced by the structure and anatomy of a speaker’s articulatory system and also his/her speaking style which is driven by many speaker background characteristics such as accent, gender, age, socioeconomic and educational class. The effect of inter-speaker variations in the feature space may cause explicit phone recognition errors. These errors can be compensated later by having appropriate pronunciation variants for the lexicon entries which consider likely phone misclassifications besides pronunciation. In this paper, we introduce speaker adaptive dynamic pronunciation models, which generate different lexicons for various speaker clusters and different ranges of speech rate. The models are hybrids of speaker adapted contextual rules and dynamic generalized decision trees, which take into account word phonological structures, rate of speech, unigram probabilities and stress to generate pronunciation variants of words. Employing the set of speaker adapted dynamic lexicons in a Farsi (Persian) continuous speech recognition task results in word error rate reductions of as much as 10.1% in a speaker-dependent scenario and 7.4% in a speaker-independent scenario.

Key words： Pronunciation models Continuous speech recognition Lexicon adaptation

Received: 18 December 2008 Published: 14 August 2009

CLC:

TP391.42

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Bahram VAZIRNEZHAD
	Farshad ALMASGANJ
	Seyed Mohammad AHADI
	Ari CHANEN

Cite this article:

Bahram VAZIRNEZHAD, Farshad ALMASGANJ, Seyed Mohammad AHADI, Ari CHANEN. Speaker adapted dynamic lexicons containing phonetic deviations of words. Journal of Zhejiang University-SCIENCE A (Applied Physics & Engineering), 2009, 10(10): 1461-1475.

URL:

http://www.zjujournals.com/xueshu/zjus-a/10.1631/jzus.A0820761 OR http://www.zjujournals.com/xueshu/zjus-a/Y2009/V10/I10/1461

No related articles found!

Viewed

Full text

Abstract

Cited

Shared

Discussed