Development of a Speech Corpus for Speaker Verification Research in Multilingual Environment
Utpal Bhattacharjee1, Kshirod Sarmah2

1Utpal Bhattacharjee, Department of Computer Science and Engineering, Rajiv Gandhi University, Rono Hills, Doimukh, Arunachal Pradesh, India.
2Kshirod Sarmah, Department of Computer Science and Engineering, Rajiv Gandhi University, Rono Hills, Doimukh, Arunachal Pradesh, India.
Manuscript received on January 01, 2013. | Revised Manuscript received on January 02, 2013. | Manuscript published on January 05, 2013. | PP: 443-446 | Volume-2, Issue-6, January 2013. | Retrieval Number: F1244112612/2013©BEIESP
Open Access | Ethics and Policies | Cite
© The Authors. Published By: Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (

Abstract: Automatic Speaker Verification (ASV) refers to the task of verifying the claimed identity of a speaker based on speech data. The decision made by a Speaker Verification system is basically a binary decision returns either “Yes” or “No” based on the credibility of the claim, determined by some scoring techniques. The output of an automatic speaker verification system is highly dependent on database used for training and testing the system. The results obtained by the speaker verification system are meaningless if recording specifications and environment for training and testing data are not known. This paper describes methodology and experimental setup used for the development of a speech corpus for the evaluation of textindependent speaker verification system in multilingual environment. Four major languages of Arunachal Pradesh (a North-Eastern frontier state of India, boarding with China) Nyishi, Adi, Galo and Apatani along with English and Hindi have been considered for the developing of the speech corpus. Each speaker has been recoded for three languages – English, Hindi and a local language which must be the mother tongue of the informant. A basic characteristic of this corpus is the presence of both native and non-native speaker. English and Hindi languages have been considered as non-native languages for the speaker. Though the corpus is basically developed for the speaker and language recognition research, it can also be used for various studies including the influence of non-nativeness on speaker and language recognition and accent recognition.
Keywords: Speaker Verification, Speech corpus, Multilingual, Non-native.