Web log pre-processing
Ufwinki, Ezekiel (2012)
Ufwinki, Ezekiel
Turun ammattikorkeakoulu
2012
All rights reserved
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2012091213593
https://urn.fi/URN:NBN:fi:amk-2012091213593
Tiivistelmä
Over the past decade, with the rapid growth in Internet, especially Web2.0 era and BS application times, the arrival of blogs, virtual communities, online office, e-commerce, e-government, B2B and C2C and other emerging Web applications, the Web has become one of the core elements of human life and work. How can we enhance the value of the Web site, allowing users a better experience, and quickly find the information we need to find the user's needs? How can we improve the competitiveness of e-commerce applications and to survive in the fierce war of the Internet? These issues require answers we can find in the vast amounts of Web data. Thus, the combination of data mining technology and Internet applications constitute a very active and very important a field of study, in other words, Web mining.
Having a similar structure and content of the access log file on each Web server, Web logs automatically become an important data source for Web mining and its mining has a universal and practical significance. However, the large amount of web log data, containing a lot of noise, not suitable for Web mining, must first be pre-treated.The workload of data pre-processing accounts for more than 50% of the total web mining workload. This thesis introduces the Web log, the log pre-processing methods, and seeks the maximum forward path and frequent traversal path algorithm based on the use of http://shopping.yahoo.com/.
Having a similar structure and content of the access log file on each Web server, Web logs automatically become an important data source for Web mining and its mining has a universal and practical significance. However, the large amount of web log data, containing a lot of noise, not suitable for Web mining, must first be pre-treated.The workload of data pre-processing accounts for more than 50% of the total web mining workload. This thesis introduces the Web log, the log pre-processing methods, and seeks the maximum forward path and frequent traversal path algorithm based on the use of http://shopping.yahoo.com/.