Näytä suppeat kuvailutiedot

Building Application Powered by Web Scraping

Phan, Huy (2019)

dc.contributor.authorPhan, Huy
dc.date.accessioned2019-04-18T09:26:45Z
dc.date.available2019-04-18T09:26:45Z
dc.date.issued2019-
dc.identifier.urihttp://www.theseus.fi/handle/10024/166489
dc.description.abstractBeing able to collect and process online contents to help can help businesses to make informed decisions. With the explosion of data available online this process cannot be practically accomplished with manual browsing but can be done with Web Scraping, an automated system that can collect just the necessary data. This paper examines the use of Web Scraper in building Web Applications, in order to identify the major advantages and challenges of web scraping. Two applications based on web scrapers are built to study how scraper can help developers retrieve and analyze data. One has a web scraper backend to fetch data from web stores as demanded. The other scraps and accumulates data over time. A good web scraper requires very robust, multi-component architecture that is fault tolerant. The retrieval logic can be complicated since the data can be in different format. A typical application based on web scraper requires regular maintenance in order to function smoothly. Site owners may not want such a robot scraper to visit and extract data from their sites so it is important to check the site’s policy before trying to scrap its contents. It will be beneficial to look into ways to optimize the scraper traffic. The next step after data retrieval is to have a well-defined pipeline to process the raw data to get just the meaningful data that the developer intended to get.-
dc.language.isoeng-
dc.rightsfi=All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.|sv=All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.|en=All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.|-
dc.titleBuilding Application Powered by Web Scraping-
dc.type.ontasotfi=AMK-opinnäytetyö|sv=YH-examensarbete|en=Bachelor's thesis|-
dc.identifier.urnURN:NBN:fi:amk-201904175517-
dc.subject.specializationSoftware Engineering-
dc.subject.degreeprogramfi=Tieto- ja viestintätekniikka|sv=Informations- och kommunikationsteknik|en=Information and Communications Technology|-
dc.subject.ysoWeb Scraping-
dc.subject.ysoWeb Crawler-
dc.subject.ysoHTML-
dc.subject.ysoPython (programming languages)-
dc.relation.contractorHuy Phan-
dc.subject.disciplineDegree Programme in Information Technology-


Tiedostot

Thumbnail

Viite kuuluu kokoelmiin:

Näytä suppeat kuvailutiedot