An Efficient Algorithm To Collect Minimal Speech Corpora

Saad Irtza; Sarmad Hussain

PDF

Published: 2016-06-22

Saad Irtza

Sarmad Hussain

Abstract

Generally phonetically rich and balanced corpora are popular for training speech recognition system but these corpora are costly to develop. Different greedy algorithms have been develop to collect such corpora. A significant effort is required to record and transcribe such speech corpora. Therefore there is motivation to further reduce their size. This paper demonstrates such an algorithm. Earlier work shows that different amount of training data is required to train different phonemes. The current work further develops these findings to reduce phonetically rich training data. Experiments show that this algorithm reduces the size of an Urdu speech corpus by 56.49% without degradation in accuracy.

Issue

2015: Volume 17 JULY 2015

Section

Electrical Engineering and Computer Science

References

B.Chandra Mohan, S. Srinivas Kumar, and B.N. Chatterji. (2008 ). A Robust Digital Image Watermarking Scheme using Singular Value Decomposition, Dither Quantization, and Edge Detection, ICGST-GVIP,8(2),43-51.

Cox, I. J., M.L.Miller, and J.A. Bloom. ( 2002). Digital Watermarking, The Morgan Kaufman Series in Multimedia Information and Systems, San Francisco: Morgan Kaufmann Publishers.

Pakistan Journal of Engineering and Applied Sciences

Article Sidebar

Main Article Content

Abstract

Article Details

Issue

Section

References