Text Search Web Application

Liu, Jingyu

Text Search Web Application

Liu, Jingyu (2020)

Avaa tiedosto

Text Search Web Application (1.918Mt)

Lataukset:

Liu, Jingyu

2020

Näytä kaikki kuvailutiedot

Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2020112624423

Tiivistelmä

This project provides a solution for Wärtsilä to store the documents including contracts and proposals as well as extract the texts from them.
Text Search Web Application project was built in the Django framework and
Pytesseract Python module. This thesis project was made in four main parts. The
first part is the Django framework. Django is an open-source web framework based
on Python language. It lightens the workload of designing database-driven websites.
The second is MongoDB which is a Not Only Structured Query Language (NoSQL)
database. With the help of MongoDB, the scanned files can be stored in a file storage system locally that will reduce the stereo matching time and the information
storage significantly. The third is Asynchronous JavaScript and XML (AJAX),
a
set of web development techniques. AJAX helps project POST data from web forms
to the backend server when Django was not able to complete this. The last is the
Pytesseract Python module. Pytesseract is an optical character recognition command library developed for Python. It retrieves the text data from image files. It is
a wrapper for Google’s Tesseract-OCR Engine and in this thesis, it deploys words
extracting and fetching positions in files as a simple module.
The web application was implemented with client-server model web architecture.
The web application allows users to register their accounts for storing the documents.

Kokoelmat

Opinnäytetyöt (Avoin kokoelma)