Optimisation of the fraud detection algorithm and the timeliness of data collection by the TNA application
Bilgic, Sezayi (2024)
Bilgic, Sezayi
2024
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2024100926261
https://urn.fi/URN:NBN:fi:amk-2024100926261
Tiivistelmä
VAT fraud is provoking a huge loss in public budget and VAT carrousel fraud is one of these fraud at European level, fraudsters should be tracked before disappearing. So Eurofisc a network of experts that exchange data about VAT sales or acquisitions throughout European Union, has launched Transaction Network Analysis ap-plication in 2019, a datamining tool that helps business experts to qualify traders as fraudster or not fraudster. The thesis aims to find a machine learning model that will facilitate business experts to detect and qualify quicker traders and also optimizing data collection.
The implementation method was carried out on SAS E-Miner software, it is a tool which requirements are based on historical data exported from Transaction Network Analysis application, and SEMMA a quantitative research method was implemented within SAS E-Miner and the goal is to provide classification model predict-ing binary target variable from input variable. The choice of classification model was decision tree based on training process to obtain a machine learning model using a large dataset from input variables and yielding output binary target variables such as fraud or non-fraud. Data preparation and data cleaning is handling miss-ing data and selecting the most relevant input features to inject in the training process. The decision tree clas-sification algorithm was applied using GINI as splitting criteria, data binning has regrouped data and bringing a better result.
The results yield a decision tree classification model close to optimal, model was tested and validated, and input features can facilitate to get predictions and improve data collection in order to detect quickly and qual-ify fraudsters.
Summarizing the research, it can clearly express that a decision tree machine learning classification model can detect and provide information to prequalify traders as fraudster or non-fraudster. In the other hand, this classification model will induce better data collection and data processing through TNA will yields better re-sults. In future developments, input variables selection should be deeply analysed to improve classification model.
The implementation method was carried out on SAS E-Miner software, it is a tool which requirements are based on historical data exported from Transaction Network Analysis application, and SEMMA a quantitative research method was implemented within SAS E-Miner and the goal is to provide classification model predict-ing binary target variable from input variable. The choice of classification model was decision tree based on training process to obtain a machine learning model using a large dataset from input variables and yielding output binary target variables such as fraud or non-fraud. Data preparation and data cleaning is handling miss-ing data and selecting the most relevant input features to inject in the training process. The decision tree clas-sification algorithm was applied using GINI as splitting criteria, data binning has regrouped data and bringing a better result.
The results yield a decision tree classification model close to optimal, model was tested and validated, and input features can facilitate to get predictions and improve data collection in order to detect quickly and qual-ify fraudsters.
Summarizing the research, it can clearly express that a decision tree machine learning classification model can detect and provide information to prequalify traders as fraudster or non-fraudster. In the other hand, this classification model will induce better data collection and data processing through TNA will yields better re-sults. In future developments, input variables selection should be deeply analysed to improve classification model.