Enhancing IoRT with Multimodal AI: A Framework for LLMs, Computer Vision, and Predictive ML

Shourov, Md Ashikur Rahman

Enhancing IoRT with Multimodal AI: A Framework for LLMs, Computer Vision, and Predictive ML

Shourov, Md Ashikur Rahman (2025)

Avaa tiedosto

Shourov_Md Ashikur Rahman.pdf (5.358Mt)

Lataukset:

Shourov, Md Ashikur Rahman

2025

Näytä kaikki kuvailutiedot

Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2025121636979

Tiivistelmä

Artificial intelligence is the latest trend to appear in industrial and connected systems, but most of its
available solutions are based on a single data type, such as text, pictures, or sensor readings. This restricts
their ability to understand complex situations. This thesis proposed a lightweight multimodal and aimed at
designing and testing a multimodal framework that incorporates large language models, computer vision
and predictive machine learning within an Internet of Robotic Things (IoRT) system. The idea was to
investigate the possibility of these three components of AI to work together on edge hardware while
maintaining low-latency and acceptable accuracy.
The research-based approach was followed in the development work. Lightweight versions of DistilBERT,
YOLOv8n and a Random Forest classifier were chosen and combined together using an event-driven
pipeline. The implementation of the system was done using Raspberry Pi 5 and the modules communicated
via MQTT. The system was tested with the help of public datasets, open-source tools as well as with the
help of synthetic logs of IoRT sensors. The metrics used to measure performance were accuracy, latency,
resource usage, and stability of the entire system when operating under real-time conditions.
The result showed that the system handled commands, detected objects, and found abnormal sensor
values all at once. The combined pipeline also reacted in a short time window and remained stable even in
the presence of noise in the inputs. The multimodal structure was found to be superior to single modes,
enhancing the quality of decisions and decreasing wrong actions.
The study concludes that edge devices can implement multimodal intelligence with the assistance of
lightweight AI models. It has the possibility to expand this framework with more sensors, enhanced fusion
logic or more powerful edge accelerators. It is also possible to repeat the development process of the work
by students and engineers who are interested in creating similar IoRT prototypes.

Kokoelmat

Opinnäytetyöt (Avoin kokoelma)