Text Search Web Application
Liu, Jingyu (2020)
Liu, Jingyu
2020
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2020112624423
https://urn.fi/URN:NBN:fi:amk-2020112624423
Tiivistelmä
This project provides a solution for Wärtsilä to store the documents including contracts and proposals as well as extract the texts from them.
Text Search Web Application project was built in the Django framework and
Pytesseract Python module. This thesis project was made in four main parts. The
first part is the Django framework. Django is an open-source web framework based
on Python language. It lightens the workload of designing database-driven websites.
The second is MongoDB which is a Not Only Structured Query Language (NoSQL)
database. With the help of MongoDB, the scanned files can be stored in a file storage system locally that will reduce the stereo matching time and the information
storage significantly. The third is Asynchronous JavaScript and XML (AJAX),
a
set of web development techniques. AJAX helps project POST data from web forms
to the backend server when Django was not able to complete this. The last is the
Pytesseract Python module. Pytesseract is an optical character recognition command library developed for Python. It retrieves the text data from image files. It is
a wrapper for Google’s Tesseract-OCR Engine and in this thesis, it deploys words
extracting and fetching positions in files as a simple module.
The web application was implemented with client-server model web architecture.
The web application allows users to register their accounts for storing the documents.
Text Search Web Application project was built in the Django framework and
Pytesseract Python module. This thesis project was made in four main parts. The
first part is the Django framework. Django is an open-source web framework based
on Python language. It lightens the workload of designing database-driven websites.
The second is MongoDB which is a Not Only Structured Query Language (NoSQL)
database. With the help of MongoDB, the scanned files can be stored in a file storage system locally that will reduce the stereo matching time and the information
storage significantly. The third is Asynchronous JavaScript and XML (AJAX),
a
set of web development techniques. AJAX helps project POST data from web forms
to the backend server when Django was not able to complete this. The last is the
Pytesseract Python module. Pytesseract is an optical character recognition command library developed for Python. It retrieves the text data from image files. It is
a wrapper for Google’s Tesseract-OCR Engine and in this thesis, it deploys words
extracting and fetching positions in files as a simple module.
The web application was implemented with client-server model web architecture.
The web application allows users to register their accounts for storing the documents.