Topic Detection, Sentiment Analysis, and Multi-label Classification of Tweets for Improving Airport Services Quality
Anashchenkov, Fedor (2024)
Anashchenkov, Fedor
2024
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2024120131645
https://urn.fi/URN:NBN:fi:amk-2024120131645
Tiivistelmä
Analyzing user-generated content from social media can potentially provide significant insight for commercial organizations into how their customers evaluate provided services, and what potential pitfalls they fall into. However, the appropriate collection, preparation, and consecutive analysis of content is challenging. The main objective of this thesis is to explore a way for analyzing content from social media, specifically, user messages from X (ex. Twitter) concerning airports, and for getting an outcome that would help improve airport service quality (ASQ). Another objective is to find out, if accessibility is discussed regarding ASQ, what aspects of it are discussed, and how broad the discussion about the accessibility in airports is.
Completed within the AI Driver research project, the thesis uses a dataset of over 1 million tweets containing mentions of international airports, along with other metadata. First, collected data is cleaned and explored, highlighting possible challenges and limitations of the given type of con-tent. This is followed by the preprocessing stage, during which the specifics of collected data is considered. Next, the question of locating messages that discuss airport services is addressed by the rule-based technique based on Airport Council International (ACI) ASQ measurement and a custom dictionary of key words and phrases assigned to each service category. The presence of various airport services in the data is further studied, with special attention to accessibility, following the set objectives.
Sentiment analysis is performed to understand the direction of customer perception of ASQ using emotional attitude as an evident measurement. To calculate polarity scores, the VADER lexicon is used. With polarity scores, each tweet is further assigned to a certain sentiment category. Finally, polarity scores are visualized for each airport service category, presenting a convenient tool for assessing the ASQ. The final part of the thesis is dedicated to finding AI-based alternatives to the rule-based technique for detecting user messages related to airport services. For this task, five off-the-shelf machine-learning algorithms from the popular Python library Scikit-learn are trained and tested for multi-label classification.
Completed within the AI Driver research project, the thesis uses a dataset of over 1 million tweets containing mentions of international airports, along with other metadata. First, collected data is cleaned and explored, highlighting possible challenges and limitations of the given type of con-tent. This is followed by the preprocessing stage, during which the specifics of collected data is considered. Next, the question of locating messages that discuss airport services is addressed by the rule-based technique based on Airport Council International (ACI) ASQ measurement and a custom dictionary of key words and phrases assigned to each service category. The presence of various airport services in the data is further studied, with special attention to accessibility, following the set objectives.
Sentiment analysis is performed to understand the direction of customer perception of ASQ using emotional attitude as an evident measurement. To calculate polarity scores, the VADER lexicon is used. With polarity scores, each tweet is further assigned to a certain sentiment category. Finally, polarity scores are visualized for each airport service category, presenting a convenient tool for assessing the ASQ. The final part of the thesis is dedicated to finding AI-based alternatives to the rule-based technique for detecting user messages related to airport services. For this task, five off-the-shelf machine-learning algorithms from the popular Python library Scikit-learn are trained and tested for multi-label classification.