Automatic Recognition of Correctly Pronounced English Words using Machine Learning

Ronalyn C. Pedronan
Rizaldy Jr. A. Manglal-lan
Kristine Joy B. Galasinao
Reychell P. Salvador
James Patrick A. Acang


digital signal processing, Hidden Markov Model, Mel Frequency Cepstral Coefficient, Pronunciation Recognition, Speech Recognition


Speech recognition is a form of human machine communication where interpreting speech is done by the computer. This research deals with the problem of recognizing correct pronunciation of words in English. In view of using this technology to help in education, the researchers gathered voice samples from middle graders and they labelled them based on ground-truth, English, pronunciation data from Google. The words were based from the current curriculum of the samples. The words were also clustered according to syllables to see how the model performs as the complexity of the words to be recognized is increased. Since there are numerous voice or speech features to consider, the researchers selected three of the known feature extraction techniques subjected for evaluation. Results show that the Mel Frequency Cepstral Coefficient with Linear Predictive Coding model have better performance with high and stable recognition rates compared to the other models. It was also observed that the model only needs four syllables to reach its optimum 100% recognition rate when recognizing English words. To make the model more robust to noise, an automatic signal segmentation approach is needed to detect the significant components of the signal for analysis.

