Toggling and Circular Partial Distortion Elimination Algorithms to Speedup Speaker Identification based on Vector Quantization

Muhammad Afzal, Mohammad A. Maud, Ali Hammad Akbar

Abstract


Vector quantization (VQ) efficiently competes with contemporary speaker identification techniques. However, VQ-based real-time speaker identification systems suffer latency due to distance computation between a large number of feature vectors and code vectors of speakers’ codebooks to find the best match in the database. The identification time depends on dimension and count of extracted feature vectors as well as the number of codebooks. Previous speedup techniques in VQ-based speaker identification decrease test vector count through prequantization and prune out unlikely speakers. However reported speedup factors come with accuracy degradation. This paper proposes techniques to speedup closest code vector search (CCS) based on stationarity of speech. In this paper proximity relationship is substantiated among code vectors extracted through LBG process of codebook generation. Based upon the high correlation of proximate code vectors, circular partial distortion elimination (CPDE) and toggling-CPDE algorithms have been proposed in this paper to speedup CCS. Further speedup is proposed through pruning test feature vector sequence for unlikely codebooks during best match speaker search. Our empirical results show that an average speedup factor up to 5.8 for 630 registered speakers of TIMIT 8kHz corpus and 6.6 for 230 speakers of NIST-1999 database have been achieved through integrating the proposed techniques.

Full Text:

PDF

References


T. Quatieri, Discrete-time Speech Signal Processing Principles and Practice, Pearson Education, 2002

T. Kinnunen, H. Li, An Overview of TextIndependent Speaker Recognition: from Features to Supervectors, Speech Communication, Elsevier, 52,(1), (2010) pp.12-40

A. Glaser, and F. Bimbot, Steps Towards the Integration of Speaker Recognition in Realworld Telecom Applications, Proc., Int. Conference on Spoken Language Processing, (ICSLP) Sydney, NSW, Australia,1998.

T. Kinnunen, E. Karpove, and P. Franti, Real-Time Speaker Identification and Verification, IEEE Transactions on Audio and Language Processing, January 14, (1), (2006) pp. 277-288.

H. Bei, R. Gray, An Improvement of the Minimum Distortion Encoding Algorithm for Vector Quantization, IEEE Transactions on Communication, 33,(10), (1985) pp.1132- 1133

V. Ramasubramanian and K. Paliwal, Fast Nearest-Neighbor Search Algorithms Based on Approximation-Elimination Search, Pattern Recognition, 33,(9), (2000) pp.1497-- 1510

M. Afzal, and S. Haq, Accelerating Vector Quantization Based Speaker Identification, Journal of American Science, 6, (11), (2010) pp.1046-1050

D. Cheng, A. Gersho, B, Ramamurthi and Y. Shoham, Fast Search Algorithms for Vector Quantization and Pattern Matching, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 1, (1984) pp. 9.11.1-9.11.4

K. Paliwal and V. Ramasubramanian, Effect of Ordering the Codebook on the Efficiency of the Partial Distance Search Algorithm for Vector Quantization, IEEE Transactions on Communications, 37,(5), (1989) pp. 538-540

A. Martin and M. Przybocki, The NIST 1999 Speaker Recognition Evaluation-- An Overview, Digital Signal Process., ,10, (2000) pp.1-18. http://www.ldc.upenn.edu/

J. Garofolo, L. Lamel, W. Fisher, J. Fiscus, D. Pallett, N. Dahlgren and V Zue, TIMIT Acoustic-Phonetic Continuous Speech Corpus,. 1993. http://www.ldc.upenn.edu/

V. Hautamäki, T. Kinnunen and P. Fränti, Text-Independent Speaker Recognition Using Graph Matching, Pattern Recognition Letters, 29,(9), (2008) p.1427-1432

D. Salomon, Data compression: the complete reference, Volume 10, 4th Ed, Springer, 2007.






Copyright (c) 2016 Muhammad Afzal, Mohammad A. Maud, Ali Hammad Akbar

Powered By KICS