Abstract Image Similarity with Pre-Trained Image Embeddings
Kosova, Ron (2025)
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2025052013701
https://urn.fi/URN:NBN:fi:amk-2025052013701
Tiivistelmä
This thesis aims to show that systems that understand images abstractly are easily achievable using modern advancements in computer vision, specifically utilizing the findings of neural style transfer research, as some of the first and most successful approaches to decoupling and representing style and content separately. It also attempts to compare this to more modern, natural language guided systems of image understanding.
The structure of the thesis is as follows: firstly, the theoretical background is described with special focus being placed on the concepts and models that make up the backbone of the systems presented in this thesis. Secondly, the tools and methodologies used for both the theory-based information gathering and the practical implementations and evaluations are listed and described. Thirdly, the implementation details are outlined in which the theoretical knowledge outline in earlier sections is put to practice. Finally, the results with respect to the evaluations are shown and possible explanations for the behaviours of the systems are discussed along with possible future research that may improve this behaviour.
Overall, this thesis designs and presents a number of systems built upon these theoretical ideas that, without any training beyond their pretext tasks, achieve good results in abstract image representation. Beyond this, the thesis also presents some valuable insights into the way the models forming the backbone of these systems “see” and “understand” images by analysing the results of the qualitative evaluation. Finally, the work presents a few solutions and extensions that build upon its findings.
The structure of the thesis is as follows: firstly, the theoretical background is described with special focus being placed on the concepts and models that make up the backbone of the systems presented in this thesis. Secondly, the tools and methodologies used for both the theory-based information gathering and the practical implementations and evaluations are listed and described. Thirdly, the implementation details are outlined in which the theoretical knowledge outline in earlier sections is put to practice. Finally, the results with respect to the evaluations are shown and possible explanations for the behaviours of the systems are discussed along with possible future research that may improve this behaviour.
Overall, this thesis designs and presents a number of systems built upon these theoretical ideas that, without any training beyond their pretext tasks, achieve good results in abstract image representation. Beyond this, the thesis also presents some valuable insights into the way the models forming the backbone of these systems “see” and “understand” images by analysing the results of the qualitative evaluation. Finally, the work presents a few solutions and extensions that build upon its findings.
