Fraud Detection in Machine Learning – Analytics Insight

Food, clothes, accessories, or furniture. Everything you used to buy from shops down the busy roads can now be bought online. E-Commerce is a multibillion-dollar industry and it’s only rising more and more. Needless to say, this too is in the grasp of the criminal world. Online criminals are present in all online sectors you can name.
Fraud Detection with Machine Learning is possible because of the ability of the models to learn from past fraud data to recognize patterns and predict the legitimacy of future transactions. In most cases, it’s more effective than humans due to the speed and efficiency of information processing.
Some types of internet frauds are:
1. ID forgery. Nowadays IDs are fabricated so well that it’s almost impossible for humans to verify their legitimacy and prevent any identity fraud.
Through the use of AI, various features of the ID card appearance can be analysed to give a result on the authenticity of the document. This allows companies to establish their own criteria for security when requests are made which require certain ID documents.
2. Bank loan scams. These may happen if a person contacts you and offers a loan scheme with suspiciously favourable conditions. Here the person contacting you will ask for your bank details or for payment upfront, without having any proper company information or even using an international contact number. Such frauds can easily be handled by AI using previous loan application records to filter out loan defaulters.
3. Email phishing. This is a kind of cybercrime where fake sites and messages are advertised to users, asking them to share personal data. If a person is not too careful, he or she may enter any confidential data which can make them vulnerable to threats. Best way to avoid this fraud is for the user to be careful themselves, however, AI can do the job of finding out fraud emails by filtering them using basic machine learning algorithms like regression. 
4. Credit card frauds. This is the most common type of payment fraud. This is because all details are stored online which makes it easier for criminals and hackers to access. Cards sent through mail can also be easily intercepted. One way to filter such fraud transactions using machine learning is discussed below.
5. Identity theft. Machine Learning for detecting identity theft helps checking valuable identity documents such as passports, PAN cards, or driver’s licenses in real-time. Moreover, biometric information can be sometimes required to improve security even more. These security methods need in-person authentication which decreases the chance of frauds to a great extent.
Model to predict fraud using credit card data:
Here a very famous Kaggle dataset is used to demonstrate how fraud detection works using a simple neural network model.

import pandas as pd
import numpy as np
import tensorflow as tf
import keras
from sklearn.preprocessing import StandardScaler
from keras.models import Sequential
from keras.layers import Dense
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

Have a look at the dataset here. The Amount column is normalized like all other features and the Time column is removed as it’s irrelevant.

data= pd.read_csv(‘creditcard.csv’)
data[‘Amount_norm’] = StandardScaler().fit_transform(data[‘Amount’].values.reshape(-1,1))
data= data.drop([‘Amount’],axis=1)
data= data.drop([‘Time’],axis=1)
data= data[:-1]

Now after some data cleaning, our dataset contains a total of 28 features and one target, all having float values which are not empty.

Our target is the Class column which determines whether the particular credit card transaction is fraud or not. So the dataset is divided accordingly into train and test, keeping the usual 80:20 split ratio. (random_state is fixed to help you reproduce your split data)

X = data.iloc[:, data.columns != ‘Class’]
y = data.iloc[:, data.columns == ‘Class’]

X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.2, random_state=0)

We use the sequential model from keras library to build a neural network with 3 dense layers. The output layer contains only a single neuron which will use the sigmoid function to result in either a positive class or a negative class.
The model is then compiled with adam optimizer, though it is highly suggested that you try out different values of hyper parameters by yourself, such as the number of units in each layer, activation, optimizer, etc. to see what works best for a given dataset.

model= Sequential()
model.add(Dense(units= 16 , activation = ‘relu’, input_dim = 29))
model.add(Dense(units= 16, activation = ‘relu’))
model.add(Dense(units= 1, activation = ‘sigmoid’))

model.compile(optimizer = ‘adam’, loss = ‘binary_crossentropy’, metrics = [‘accuracy’]), y_train, batch_size = 32, epochs = 15)

This is the result after running the model for a few epochs.
We see that the model gives 99.97% accuracy very fast. Below, y_pred contains the predictions made by our model on the test data, and a neat summary of its performance is shown.

y_pred = model.predict(X_test)
y_pred = (y_pred > 0.5)

print(classification_report(y_test, y_pred))

So this way we were successfully able to build a highly accurate model to determine fraudulent transactions. These come in very handy for risk management purposes.

Author Bio:
Pavan Vadapalli, Director of Engineering @ upGrad, an ed-tech platform in India which provides data science, machine learning courses. Motivated to leverage technology to solve problems. Seasoned leader for startups and fast moving orgs. Working on solving problems of scale and long term technology strategy.

Share This Article
Do the sharing thingy