Analysis of Customer Journey Video Data Using Eye-Tracking and Multimodal AI

Heikkinen, Sami; Väisänen, Jaani; Kaikonen, Hannu; Makkula, Sami; Markkanen, Petteri

Analysis of Customer Journey Video Data Using Eye-Tracking and Multimodal AI

Heikkinen, Sami; Väisänen, Jaani; Kaikonen, Hannu; Makkula, Sami; Markkanen, Petteri (2026)

Avaa tiedosto

Proceedings_260105_accepted_manuscript.pdf (615.4Kt)

Huom! Embargollinen tiedosto,
avautuu julkiseksi: 05.01.2027

Heikkinen, Sami

Väisänen, Jaani

Kaikonen, Hannu

Makkula, Sami

Markkanen, Petteri

Editoija

Chira, Camelia

Matei, Oliviu

Pop, Florin

Pop-Sitar, Petrică

Springer Nature

2026

doi:10.1007/978-3-032-12478-4_1

Näytä kaikki kuvailutiedot

Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi-fe2026022516193

Tiivistelmä

This study introduces a novel methodological approach for analyzing customer journey data using eye-tracking technology and multimodal AI. Customer journey research faces significant limitations due to reliance on resource-intensive qualitative methods that are difficult to scale. We address this gap by employing vision language models (VLMs) to automate the interpretation of eye-tracking data, eliminating the manual coding bottleneck while enabling analysis of larger and more diverse datasets. Our research evaluates five locally-deployable VLMs to determine the optimal balance between semantic accuracy and computational efficiency. Using data collected with Tobii eye-tracking glasses, we developed an analytical pipeline that synchronizes gaze location data with verbal think-aloud protocols to create a comprehensive multimodal dataset. Results demonstrate that Gemma3 (4B parameters) achieved 100% semantic accuracy on our test set while maintaining reasonable processing efficiency (43.26 s per image). When validated against human coding across the complete dataset, the model achieved a 74.2% recall rate. The integration of eye-tracking and verbal data revealed distinctive attention patterns including “navigational uncertainty,” “confirmatory scanning,” and “socially-mediated attention” throughout the customer journey. Our approach provides objective behavioral evidence of visual attention that complements traditional self-reported measures, enabling more comprehensive touchpoint analysis while aligning with event-driven perspectives from process mining research. This methodology offers promising applications for service design by identifying discrepancies between reported and actual customer attention patterns and providing a foundation for developing automated behavioral indicators to detect moments of customer confusion, decision-making, or confirmation throughout service journeys.

Kokoelmat

Julkaisut