Automated Parsing and Integration  of Job Requirements from Public  Sources : A Case Study in Excel and  SQL

Neittamo, Joona

Automated Parsing and Integration of Job Requirements from Public Sources : A Case Study in Excel and SQL

Neittamo, Joona (2024)

Avaa tiedosto

neittamo_joona.pdf (688.5Kt)

Lataukset:

Neittamo, Joona

2024

Näytä kaikki kuvailutiedot

Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-202404085998

Tiivistelmä

The process of extracting structured information from Excel files is becoming increasingly essential in an era of data-driven decision making. This thesis explores the main challenges associated with extracting and going through data in multiple different patterns changing Excel files and presents versatile strategies to go through these complexities. With the main datasets that hold job requirement information extracted from a web-parser robot, this study addresses the prominent obstacles of location, structure and pattern repetition encountered during the extraction process.

The introductory chapter establishes the context of Excel parsing, emphasizing the importance of efficient data extraction for knowledgeable decision-making. The subsequent literature review provides an in-depth exploration of known tools, methodologies, and impediments pertaining to data extraction from Excel files, sustaining a comprehensive grasp of the subject matter. The chapter centers on locationbased challenges investigating the identification of relevant cells and subtle handling of merged cells and spans. Meanwhile, the chapter dedicated to structural challenges talks about the task of normalizing inconsistent data formats and extracting hierarchical data. Lastly, the section focusing on the problems of pattern repeat scrutinizes the discernment of repetitive structures and strategies to effectively manage irregular patterns.

The conclusion chapter joins the findings and implications found throughout the study. The identified challenges and their corresponding solutions collectively contribute to the advancement of data extraction practices, augmenting the efficiency and precision of these processes.

Kokoelmat

Opinnäytetyöt (Avoin kokoelma)