Developing an AI-powered multimodal chatbot

Chen, Hua; Le, Kiet

Developing an AI-powered multimodal chatbot

Chen, Hua; Le, Kiet (2025)

Avaa tiedosto

Chen_Le.pdf (6.689Mt)

Lataukset:

Chen, Hua

Le, Kiet

2025

Näytä kaikki kuvailutiedot

Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2025052114086

Tiivistelmä

This thesis investigates designing and implementing a chatbot system based on Google's Gemini AI model that supports multimodal interaction. To overcome the limitations of traditional text-based systems, the chatbot incorporates text, emojis, audio, and images. It utilizes the processing capabilities of Gemini 2.0 Flash within a custom web interface built with an HTML/CSS/JavaScript frontend and a Flask backend. This architecture supports integrated communication across multiple modalities in a structured and extensible manner, enabling more flexible user interactions.
Several technical challenges were addressed, including the definition of a standardized payload format for communication with the Gemini 2.0 Flash API and the implementation of security mechanisms. These included the use of bcrypt with a 12-round key derivation function and salting for password hashing, the prevention of SQL injection through ORM-level query parameterization, and the structured management of user session lifecycles to ensure data integrity and enforce access control.
The system was evaluated over multiple development iterations using predefined test scenarios that included diverse input types, such as text, emojis, audio, and images. The results indicated consistent system behavior across modalities and adherence to the intended design specifications. The thesis also provides future research directions to enhance applicability to practical use cases.

Kokoelmat

Opinnäytetyöt (Avoin kokoelma)