Zero-Shot Anomaly Detection in Alphanumeric Vehicle Data using Large Language Models : a Design Science Approach

Braack, Julian

Zero-Shot Anomaly Detection in Alphanumeric Vehicle Data using Large Language Models : a Design Science Approach

Braack, Julian (2025)

Avaa tiedosto

Braack_Julian.pdf (3.183Mt)

Lataukset:

Rajattu käyttöoikeus / Restricted access / Tillgången är begränsad

Braack, Julian

2025

Näytä kaikki kuvailutiedot

Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2025111928556

Tiivistelmä

This master’s thesis examines the use of large language models for zero-shot
anomaly detection in alphanumeric vehicle datasets, filling a gap where
traditional statistical methods face limitations. While numerical data can be
reliably assessed with algorithms like Local Outlier Factor or Isolation Forest, the
high-dimensional nature of alphanumeric serial numbers makes them difficult to
model with established algorithms. Using an iterative design science approach,
this study develops and tests a Proof-of-Concept Python application that uses
state-of-the-art large language models to detect anomalies in real-world vehicle
datasets. Besides some prompt engineering, the models are intentionally not
fine-tuned, enabling application without in-depth knowledge of large language
models.
The theoretical background covers data management, anomaly detection, and
the core principles of large language models. Results show that large language
models, especially Google’s Gemini 2.5 Pro, can effectively identify anomalies in
both numerical and alphanumeric data. Compared to statistical algorithms, large
language models offer the benefit of processing alphanumeric inputs, adding a
valuable extension to the anomaly detection toolkit. However, challenges like
hallucination, inconsistent length counting, and sensitivity to highly anomalous
datasets highlight current limitations. Additionally, statistical methods remain
more efficient, scalable, and cost-effective for purely numerical datasets. The
findings confirm that large language models can be applied in a zero-shot manner
to detect anomalies in alphanumeric datasets. Beyond the automotive industry,
these insights can be applied to other fields where alphanumeric identifiers are
essential. This work advances both academic discussion and practical
applications, providing a foundation for future research on fine-tuned models and
industrial implementation.

Kokoelmat

Opinnäytetyöt (Käyttörajattu kokoelma)