Web crawler and information management
Worku, Abey (2011)
Worku, Abey
Metropolia Ammattikorkeakoulu
2011
All rights reserved
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2011062212412
https://urn.fi/URN:NBN:fi:amk-2011062212412
Tiivistelmä
This project was made for a company that provides a networking website for people who are interested in music. The objective of this project was to develop an application which collects and stores data from two distinct websites namely, http://www.muusikoiden.net and http://www.imperiumi.net. Both websites contain rich information about artists and their gig schedule, which was the main interest of the company. Another goal was to compile one data table that contains all the information which was gathered from the above mentioned sites.
The application was developed using a web crawler integrated with Document Object Model Interface, which enables different programming languages to access any websites by representing the components in a more structured form. This crawler was made to iterate for every page in the websites. After parsing, the stored data was analyzed by the application to detect if there were any name prefixes, and remove those prefixes on every occurrence. Finally, a complete table with all the required information was build.
Web crawler is a powerful programming application which can help for locating and accessing information from any web documents. And by combining the information it is possible to innovate or make useful analysis.
The application was developed using a web crawler integrated with Document Object Model Interface, which enables different programming languages to access any websites by representing the components in a more structured form. This crawler was made to iterate for every page in the websites. After parsing, the stored data was analyzed by the application to detect if there were any name prefixes, and remove those prefixes on every occurrence. Finally, a complete table with all the required information was build.
Web crawler is a powerful programming application which can help for locating and accessing information from any web documents. And by combining the information it is possible to innovate or make useful analysis.