IIITH-CSTD Corpus: Crowdsourced Strategies for the Collection of a Large-scale Telugu Speech Corpus
Mirishkar Sai Ganesh,VISHNU VIDYADHARA RAJU V,Meher Dinesh Naroju,Sudhamay Maity,Veera Prakash Yalla,Anil Kumar Vuppala
ACM Trasactions on Asian and Low Resource Language Information Processing, TALLIP, 2023
Abs | | bib Tex
@inproceedings{bib_IIIT_2023, AUTHOR = {Mirishkar Sai Ganesh, VISHNU VIDYADHARA RAJU V, Meher Dinesh Naroju, Sudhamay Maity, Veera Prakash Yalla, Anil Kumar Vuppala}, TITLE = {IIITH-CSTD Corpus: Crowdsourced Strategies for the Collection of a Large-scale Telugu Speech Corpus}, BOOKTITLE = {ACM Trasactions on Asian and Low Resource Language Information Processing}. YEAR = {2023}}
Due to the lack of a large annotated speech corpus, many low-resource Indian languages struggle to utilize recent advancements in deep neural network architectures for Automatic Speech Recognition (ASR) tasks. Collecting large-scale databases is an expensive and time-consuming task. Current approaches lack extensive traditional expert-based data acquisition guidelines, as they are tedious and complex. In this work, we present the International Institute of Information Technology Hyderabad-Crowd Sourced Telugu Database (IIITH-CSTD), a Telugu corpus collected through crowdsourcing
Outcomes of Speech to Speech Translation for Broadcast Speeches and Crowd Source Based Speech Data Collection Pilot Projects
Anil Kumar Vuppala,Veera Prakash Yalla,Mirishkar Sai Ganesh,VISHNU VIDYADHARA RAJU V
International Conference on Big Data Analytics, BDA, 2021
Abs | | bib Tex
@inproceedings{bib_Outc_2021, AUTHOR = {Anil Kumar Vuppala, Veera Prakash Yalla, Mirishkar Sai Ganesh, VISHNU VIDYADHARA RAJU V}, TITLE = {Outcomes of Speech to Speech Translation for Broadcast Speeches and Crowd Source Based Speech Data Collection Pilot Projects}, BOOKTITLE = {International Conference on Big Data Analytics}. YEAR = {2021}}
Speech-to-Speech Machine Translation (SSMT) applications and services use a three-step process. Speech recognition is the first step to obtain transcriptions. This is followed by text-to-text language translation and, finally, synthesis into text-speech. As data availability and computing power improved, these individual steps evolved. However, despite significant progress, there is always the error of the first stage in terms of speech recognition, accent, etc. Having traversed the speech recognition stage, the error becomes more prevalent and decreases very often. This chapter presents a complete pipeline for transferring speaker intent in SSMT involving humans in the loop. Initially, the SSMT pipeline has been discussed and analyzed for broadcast speeches and talks on a few sessions of Mann Ki Baat, where the source language is in Hindi, and the target language is in English and Telugu. To perform this task, industry-grade APIs from Google, Microsoft, CDAC, and IITM has been used for benchmarking. Later challenges faced while building the pipeline are discussed, and potential solutions have been introduced. Later this chapter introduces a framework developed to
CSTD-Telugu Corpus: Crowd-Sourced Approach for Large-Scale Speech data collection
Ganesh S Mirishkar,VISHNU VIDYADHARA RAJU V,Meher Dinesh Naroju,Sudhamay Maity,Veera Prakash Yalla,Anil Kumar Vuppala
Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), APSIPA, 2021
@inproceedings{bib_CSTD_2021, AUTHOR = {Ganesh S Mirishkar, VISHNU VIDYADHARA RAJU V, Meher Dinesh Naroju, Sudhamay Maity, Veera Prakash Yalla, Anil Kumar Vuppala}, TITLE = {CSTD-Telugu Corpus: Crowd-Sourced Approach for Large-Scale Speech data collection}, BOOKTITLE = {Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)}. YEAR = {2021}}
Speech is a natural mode of communication among all beings. India is a densely populated country, and people are diverse throughout the globe. The spoken language is the medium of instruction to interact among the people. The majority of Indian languages are spoken globally. The unavailability of larger volumes of transcribed and annotated speech data is often a hurdle for building reliable speech recognition (ASR) systems for Indian languages. Crowdsourcing strategies are effective in collaboratively collecting speech data resources. This paper describes the experience of large-scale speech data collection for the Telugu language through mobile and web-based applications. With this crowd contributed speech, the performance of the baseline ASR system is shown for clean speech. ASR performance for pink and white noises is also compared for various deep neural network (DNN) based acoustic models. The details regarding the usage of frameworks and their challenges during their implementation are part of this paper. The framework adopted for collecting the speech data is rapid, cost-saving, and offers the advantage of extending it to all the other Indian languages. Index Terms—ASR, Crowd-sourced, TDNN, GMM, SGMM