INTERNATIONAL JOURNAL OF NOVEL RESEARCH AND DEVELOPMENT International Peer Reviewed & Refereed Journals, Open Access Journal ISSN Approved Journal No: 2456-4184 | Impact factor: 8.76 | ESTD Year: 2016
Scholarly open access journals, Peer-reviewed, and Refereed Journals, Impact factor 8.76 (Calculate by google scholar and Semantic Scholar | AI-Powered Research Tool) , Multidisciplinary, Monthly, Indexing in all major database & Metadata, Citation Generator, Digital Object Identifier(DOI)
The financial sector is currently undergoing digital transformation across products, services, and business models. This digitization aims to automate most of the manual financial transactions and other related services. Therefore, detecting fraud in financial transactions has become an important priority for all financial institutions. With modern technology and global communication, fraud has greatly increased and caused great damage. The focus of this paper is to test different approaches to detect fraud on a real data set of financial payment transactions. The dataset is obtained from Kaggle and consists of 6 million event records and 10 features with an event label of "fraudulent" or "non-fraudulent". These functions are investigated through exploratory data analysis and only 6 are kept for testing, such as payment type, account balance, transaction amount, etc. Two supervised machine learning algorithms, a random forest, and a support vector classifier are used to detect fraudsters transactions. The dataset is large and requires high computing power to process and train machine learning algorithms. Additionally, another challenge is the very uneven distribution between the fraudulent (0.1%) and non-fraudulent (99.9%) classes. This study aims to address both of these issues. To address the class imbalance, oversampling of minority class data using the Synthetic Minority Oversampling Technique (SMOTE) and under sampling of the majority class using random sub-sampling are investigated. Computational efficiency is achieved by implementing Apache Spark, which provides distributed processing for large volumes of data. The best performance is achieved using the random forest algorithm on the oversampled dataset with a precision of 99.95, an F1 score of 0.999, a recall value of 0.999, a geometric mean of 99.9%, and a model training time of 13.9 minutes. This article provides valuable insights into using large-scale, highly imbalanced big data sets to predict and generate financial fraud alerts.
Keywords:
Fraud Detection, Big data analytics, Apache Spark
Cite Article:
"A CASE STUDY ON FINANCIAL FRAUD DETECTION WITH BIG DATA ANALYTICS", International Journal of Novel Research and Development (www.ijnrd.org), ISSN:2456-4184, Vol.8, Issue 1, page no.a54-a58, January-2023, Available :http://www.ijnrd.org/papers/IJNRD2301007.pdf
Downloads:
000118751
ISSN:
2456-4184 | IMPACT FACTOR: 8.76 Calculated By Google Scholar| ESTD YEAR: 2016
An International Scholarly Open Access Journal, Peer-Reviewed, Refereed Journal Impact Factor 8.76 Calculate by Google Scholar and Semantic Scholar | AI-Powered Research Tool, Multidisciplinary, Monthly, Multilanguage Journal Indexing in All Major Database & Metadata, Citation Generator
Facebook Twitter Instagram LinkedIn