Joint speech and spearker recognition using neural networks
Xue, Xiaoguo (2013)
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-201305107567
https://urn.fi/URN:NBN:fi:amk-201305107567
Tiivistelmä
Speech is the main communication method between human beings. Since the time of the invention of the computer people have been trying to let the computer understand natural speech. Speech recognition is a technology which has close connections with computer science, signal processing, voice linguistics and intelligent systems. It has been a ”hot” subject not only in the field of research but also as a practical application. Especially in real life, speaker and speech recognition have been used very frequently.
Even though the aims of speaker recognition and speech recognition are different, they have the same algorithms. Neural network is a technology which tries to mimic human brain functions. With the development of neural network these past few decades, using neural network in speech and speaker recognition has become very popular and successful.
In this thesis the main procedures that include signal pre-processing, feature extraction, neural network design and implementation,are introduced. The Mel Frequency Cepstrum Coefficients(MFCC) is the best available approximation of human ear features. Back propagation neural network is used to design the recognition system. Moreover an implementation has been made in Matlab platform. The experiment results show that the system works well and it can be improved by using more training samples. This research gives a good foundation for future implementation on a realtime DSP system.
Even though the aims of speaker recognition and speech recognition are different, they have the same algorithms. Neural network is a technology which tries to mimic human brain functions. With the development of neural network these past few decades, using neural network in speech and speaker recognition has become very popular and successful.
In this thesis the main procedures that include signal pre-processing, feature extraction, neural network design and implementation,are introduced. The Mel Frequency Cepstrum Coefficients(MFCC) is the best available approximation of human ear features. Back propagation neural network is used to design the recognition system. Moreover an implementation has been made in Matlab platform. The experiment results show that the system works well and it can be improved by using more training samples. This research gives a good foundation for future implementation on a realtime DSP system.