CAPTCHA Recognition System
Zhu, Wenhao (2019)
Zhu, Wenhao
2019
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2019052611969
https://urn.fi/URN:NBN:fi:amk-2019052611969
Tiivistelmä
Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA), is a public fully automatic program that distinguishes users from computers or people. This thesis developed a CAPTCHA recognition system that can be deployed on the NAO robot, a humanoid robot to pass the Turing test and also can be deployed on web services in order to provide a recognition service.
The recognition system uses convolutional neural network to extract features in CAPTCHA image and encode data with one-hot encoding system which is widely used in multiclassification. Python is the programming language used in developing this project, TensorFlow and Keras library are used to easily establish a neural network. NAO robot version is v5 and code testing is on Ubuntu 16.04 release.
The final recognition model showed about 99.67% accuracy on train dataset and 98.10% accuracy on test dataset with suitable optimizer and loss function. According to the one-hot encoding features when regulated data, the accuracy is a bit high than it performed in real applications. Due to a large amount of CAPTCHA data for the combination of numbers and letters, the CAPTCHA in this thesis dataset consists only of numbers, which could be improved by using datasets contains numbers and letters CAPTCHA.
The recognition system uses convolutional neural network to extract features in CAPTCHA image and encode data with one-hot encoding system which is widely used in multiclassification. Python is the programming language used in developing this project, TensorFlow and Keras library are used to easily establish a neural network. NAO robot version is v5 and code testing is on Ubuntu 16.04 release.
The final recognition model showed about 99.67% accuracy on train dataset and 98.10% accuracy on test dataset with suitable optimizer and loss function. According to the one-hot encoding features when regulated data, the accuracy is a bit high than it performed in real applications. Due to a large amount of CAPTCHA data for the combination of numbers and letters, the CAPTCHA in this thesis dataset consists only of numbers, which could be improved by using datasets contains numbers and letters CAPTCHA.