Links Are All You Need: Graph Embeddings for Website Analysis and Classification
Govender, Praven (2025)
Govender, Praven
2025
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2025051411598
https://urn.fi/URN:NBN:fi:amk-2025051411598
Tiivistelmä
This thesis explores the practical utility of domain-level graph embeddings derived from hyperlink structures at web scale. Using a 21TB subset of Common Crawl (n.d.) data, we demonstrate how these embeddings can effectively capture relationships between websites without requiring content processing. The study achieves competitive results in political bias and factual reporting classification on news publishers compared to state-of-the-art approaches, while offering greater flexibility due to the use of embeddings. Beyond classification, we show how the same embeddings enable various analytical tasks including similarity analysis and clustering, demonstrating their flexibility as a practical tool for web-scale analysis. Key contributions include a methodology for creating and utilizing domain-level graph embeddings at scale, strong empirical evidence of their effectiveness in website classification, and insights into their broader applications for understanding website relationships through link structures alone.