Harvesting Statistical Metadata from an Online Repository for Data Analysis and Visualization : Concept Application on Theseus
Gebresilassie, Sem (2014)
Gebresilassie, Sem
Metropolia Ammattikorkeakoulu
2014
All rights reserved
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-201505219436
https://urn.fi/URN:NBN:fi:amk-201505219436
Tiivistelmä
Theses and publications from Finnish universities of applied sciences are accessible from an open online repository called Theseus. This repository has an application programming interface (API) that provides tools for harvesting its contents. By properly utilizing this API, it is possible to gather and reuse metadata of thesis documents for any other objective.
This thesis mainly intends to explain how to gather the author name, title, submission year, keywords, subjects, department, university, language, and the number of pages of every thesis document in Theseus and then reuse the gathered data for building a Web portal. This Web portal provides tools to examine thesis documents and visualize statistical facts about the contribution of each university of applied sciences in Finland. To achieve this goal, robotic agents that fetch and store the metadata of thesis documents into a separate MYSQL database were created using the PHP programming language. Moreover, Google Charts API was extensively used to visualize the gathered statistical data.
The thesis first discusses the anatomy of Theseus and its communication protocol followed by a summary of concepts and technologies in data extraction process. Afterwards, it gives an illustration on the application of these concepts to parse and store metadata of every thesis document in Theseus. Finally, a brief description and benefits of the built Web portal are discussed.
This thesis mainly intends to explain how to gather the author name, title, submission year, keywords, subjects, department, university, language, and the number of pages of every thesis document in Theseus and then reuse the gathered data for building a Web portal. This Web portal provides tools to examine thesis documents and visualize statistical facts about the contribution of each university of applied sciences in Finland. To achieve this goal, robotic agents that fetch and store the metadata of thesis documents into a separate MYSQL database were created using the PHP programming language. Moreover, Google Charts API was extensively used to visualize the gathered statistical data.
The thesis first discusses the anatomy of Theseus and its communication protocol followed by a summary of concepts and technologies in data extraction process. Afterwards, it gives an illustration on the application of these concepts to parse and store metadata of every thesis document in Theseus. Finally, a brief description and benefits of the built Web portal are discussed.