Automatic Recognition of Correctly Pronounced English Words using Machine Learning

Main Article Content

Ronalyn C. Pedronan
Rizaldy Jr. A. Manglal-lan
Kristine Joy B. Galasinao
Reychell P. Salvador
James Patrick A. Acang

Keywords

digital signal processing, Hidden Markov Model, Mel Frequency Cepstral Coefficient, Pronunciation Recognition, Speech Recognition

Abstract

Speech recognition is a form of human machine communication where interpreting speech is done by the computer. This research deals with the problem of recognizing correct pronunciation of words in English. In view of using this technology to help in education, the researchers gathered voice samples from middle graders and they labelled them based on ground-truth, English, pronunciation data from Google. The words were based from the current curriculum of the samples. The words were also clustered according to syllables to see how the model performs as the complexity of the words to be recognized is increased. Since there are numerous voice or speech features to consider, the researchers selected three of the known feature extraction techniques subjected for evaluation. Results show that the Mel Frequency Cepstral Coefficient with Linear Predictive Coding model have better performance with high and stable recognition rates compared to the other models. It was also observed that the model only needs four syllables to reach its optimum 100% recognition rate when recognizing English words. To make the model more robust to noise, an automatic signal segmentation approach is needed to detect the significant components of the signal for analysis.

Abstract 351 | PDF Downloads 197

References

Adams, R. E. (1990). Sourcebook of automatic identification and data collection. New York: Van Nostrand Reinhold.

Alsulaiman, M., Muhammad, G., & Ali, Z. (2011). Comparison of voice features for Arabic speech recognition. 2011 Sixth International Conference on Digital Information Management. doi:10.1109/icdim.2011.6093369.

Anand, D., & Meher, P. (n.d.). Combined LPC and MFCC Fetaures based technique for Isolated Speech Recognition. Hyderabad India.

Ananthi, S., & Dhanalakshmi, P. (2013). Speech Recognition System and Isolated Word Recognition based on Hidden Markov Model (HMM) for Hearing Impaired. International Journal of Computer Applications, 73.

Anusuya, M., & Katti, S. (2009). Speech Recognition by Machine: A Review. Proceedings of the International Conference on Computer Applications, 6.

Aurora, S., & Singh, R. (2012). Automatic Speech Recognition: A Review. International Journal of Computer Applications, 60.

Bagge, N., & Donica, C. (2001). Final Project Text Independent Speaker Recognition.”, ELEC 301 Signals and Systems Group Projects.

Barbu, T. (2007). A supervised text-independent speaker recognition approach. In Proceedings of the 12th International Conference on Computer, Electrical and Systems Science, and Engineering, CESSE 2007, 22, 444-448.

Barrett, G. (2006). Kale, K., Mehrotra, S., & Manza, R. (n.d.). Computer Vision and Information Technology: Advances and Applications. Retrieved September 15, 2015, from http://www.waywordradio.org/Official_Dictionary_of_Unofficial_English-Grant-Barrett-0071458042.pdf.

Baum, L., Petrie, T., Soules, G., & Weiss, N. (1970). A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains. The Annals of Mathematical Statistics, 41, 164-171.

Boll, S. (n.d.). Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust., Speech, Signal Process, ASSP-27, 113-120.

Daphal, S., & Jagtap, S. (2012). DSP Based Improved Speech Recognition System. International Conference on Communication, Information & Computing Technology (ICCICT).

Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39(1), 1-38.

Ernawan, F., Abu, N., & Suryana, N. (2011). Spectrum Analysis of Speech Recognition via discrete TchebichefTransform.Spie.Digital Library. Retrieved September 15, 2015, from http://proceedings.spiedigitallibrary.org/proceeding.aspxarticleid=1197978.

Furui, S. (1981). Cepstral analysis technique for automatic speaker verification. IEEE Transactions on Acoustics, Speech, and Signal Processing, 29(2), 254-272.

Hispanicallyspeakingnews. (2011). Google's Voice Search Sends Hunter Around the World, Hispanically Speaking News. Retrieved
September 4, 2015, from http://www.hispanicallyspeakingnews.com/latino-daily-news/details/googlesvoice-search-sends-hunter-aroundthe-
world/9520/

Jarng, S. (2011). HMM Voice Recognition Algorithm Coding. International Conference on Information Science and Applications.

Kale, K., Mehrotra, S., & Manza, R. (n.d.). Computer Vision and Information Technology: Advances and Applications.

Karpagachelvi, S., Arthanari, M., & Sivakumar, M. (2010). ECG Feature Extraction Techniques - A Survey Approach (Vol. 8, Ser. 1).

Krenker, A., Bester, J., & Kos, A. (n.d.). Introduction to the Artificial Neural Networks. Slovenia: University of Ljubljana.

Li, J. et. al. (n.d.) . An Overview of Noise-Robust Automatic Speech Recognition. Retrieved September 15, 2015, from https://www.lsv.uni-saarland.de/fileadmin/publications/non_articles/an_over-view_of_noise_robust_automatic_speech.pdf.

Ludwig, S. (n.d.). Siri Assistant 1.0 for iPhone. Retrieved September 4, 2015, from http://www.pcmag.com/article2/0,2817,2358823,00.asp.

Lyu, M., Xiong, C., & Zhang, Q. (2014). Electromyography (EMG)-based Chinese voice command recognition. 2014 IEEE International Conference on Information and Automation (ICIA). doi:10.1109/icinfa.2014.6932784.

Mastin, L. (2011). Language Issues: English as a Global Language. Retrieved September 15, 2015, from http://www.thehistoryofenglish.com/issues_global.html.

Maurice. (2013). 5 Ways to get Siri Alternatives for Android Phones. Retrieved September 4, 2015, from http://www.tipsotricks.com/2013/03/5-best-siri-alternatives-for-androidphones.html.

Mermelstein, D. (1980). Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences. IEEE T. Acoust., Speech Signal P., 28(4), 357-366.

Muzaffar, F., Mohsin, B., Naz, F., & Jawed, F. (2005). DSP Implementation of Voice Recognition Using Dynamic Time Warping Algorithm. 2005 Student Conference on Engineering Sciences and Technology, 1.

Nitta, T., Murata, T., Tsuboi, H., Takeda, K., Kawada, T., & Watanabe, S. (n.d.). Development of Japanese voice-activated word processor using isolated monosyllable recognition. ICASSP 82. IEEE International Conference on Acoustics, Speech, and Signal Processing. doi:10.1109/icassp.1982.1171875.

Rabiner, L. (1989). A tutorial on hidden Markov models and selected applications in speech recognition.

Sak, H. et. al. (2015). Google Voice Search: Faster and More Accurate. Retrieved September 15, 2015, from http://googleresearch.blogspot.com/2015/09/google-voicesearch-
faster-and-more.html.

Sarikaya, R., & Hansen, J. (2001). Analysis of the root-cepstrum for acoustic modeling and fast decoding in speech recognition. Proc. Eurospeech'01. Aalborg, Denmark.

Sapijaszko, V., & Michael, W. (2012). An overview of recent window based feature extraction algorithms for speaker recognition. IEEE 55th International Midwest Symposium on Circuits and Systems (MWSCAS).

Shi, Z., Shimohara, K., & Feng, D. (2006). Intelligent information processing III. IFIP TC12 International Conference on Intelligent Information Processing (IIP 2006).

Speech. (n.d.). Retrieved September 15, 2015, from https://www.uic.edu/classes/ece/ece434/chapter_file/Chapter5_files/Speech.htm.

Thakare, V. (n.d.). Techniques for Feature Extraction In Speech Recognition System : A Comparative Study. Retrieved September 4, 2015, from http://arxiv.org/abs/1305.1145#.

Varshney, N. (2014). Embedded Speech Recognition System. International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering, 3.

Viikki, O., & Laurila, K. (1998). Cepstral domain segmental feature vector normalization for noise robust speech recognition. Speech Communication, 25(1-3), 133-147.

Vimala, C., & Radha, V. (2012). A Review on Speech Recognition Challenges and Approaches. World of Computer Science and Information Technology Journal (WCSIT), 2(1), 1-7.