Surveying and analyzing open-source AI tools for human expression recognition
Ipinze Tutuianu, Gianmarco (2023)
Ipinze Tutuianu, Gianmarco
2023
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2023053116805
https://urn.fi/URN:NBN:fi:amk-2023053116805
Tiivistelmä
The ability to perceive and understand human emotions plays a crucial role in various aspects of human-computer interaction and artificial intelligence systems. In recent years, the advent of deep learning techniques, particularly Convolutional Neural Networks (CNNs), has revolutionized the field of facial emotion recognition.
This thesis explores the realm of facial emotion recognition through the lens of open-source tools and CNNs, aiming to test, analyze, train, and design an accurate and robust model capable of decoding the intricate and subtle emotional cues expressed on human face and evaluate these tools in respect of accuracy, easiness, practicality, and technical robustness.
These open-source tools and algorithms are widely used by the industry and the community to develop image classification models and real-time implementations. We are going to focus primarily on 2 open-source ready to use tools like DeepFace, and FaceApi, open-source frameworks such as TensorFlow and Keras where we are going to use models like Vgg16, Vgg19, MobileNet, Xception, ResNet50 among others.
We will create models from zero to test pre-trained weights efficiency, using 3 different datasets (RAF-DB, FER2013 and AffectNet) and two different class balance strategies.
The results showed that ready-to-use tools are the best for first time users, their tutorials, repositories, and guides are easy to follow and user friendly, additionally they do not require training or powerful hardware.
CNN showed the best results when using 224 x 224 resolution images, but also pretty good results when training with 48 x 48 resolutions, sacrificing some accuracy for a faster training.
Through the findings of this study, further research must be conducted to evaluate the perfect image resolution for facial expression recognition, multimodal approaches might give even better results as well.
This thesis explores the realm of facial emotion recognition through the lens of open-source tools and CNNs, aiming to test, analyze, train, and design an accurate and robust model capable of decoding the intricate and subtle emotional cues expressed on human face and evaluate these tools in respect of accuracy, easiness, practicality, and technical robustness.
These open-source tools and algorithms are widely used by the industry and the community to develop image classification models and real-time implementations. We are going to focus primarily on 2 open-source ready to use tools like DeepFace, and FaceApi, open-source frameworks such as TensorFlow and Keras where we are going to use models like Vgg16, Vgg19, MobileNet, Xception, ResNet50 among others.
We will create models from zero to test pre-trained weights efficiency, using 3 different datasets (RAF-DB, FER2013 and AffectNet) and two different class balance strategies.
The results showed that ready-to-use tools are the best for first time users, their tutorials, repositories, and guides are easy to follow and user friendly, additionally they do not require training or powerful hardware.
CNN showed the best results when using 224 x 224 resolution images, but also pretty good results when training with 48 x 48 resolutions, sacrificing some accuracy for a faster training.
Through the findings of this study, further research must be conducted to evaluate the perfect image resolution for facial expression recognition, multimodal approaches might give even better results as well.