Want to learn more? Take the full course at
https://learn.datacamp.com/courses/fraud-detection-in-python at your own pace. More than a video, you'll learn hands-on coding & quickly apply skills to your daily work.
---
This video is about traditional fraud detection methods versus machine learning models. As a data scientist, you'll often be asked to defend your method of choice, so it is important to understand the intricacies of both methods. You'll also get a refresher on machine learning models to help you with the exercises.
Traditionally, fraud analysts use rules based systems for detection of fraud. For example in the case of credit cards, the analysts might create rules based on a location and block transactions from risky zip codes. They might also create rules to block transactions from cards used too frequently for example in the last 30 minutes.
Some of these rules can be highly efficient at catching fraud, whilst others are not and results in false alarm too often.
A major limitation of rules based systems, is that the thresholds per rule are fixed, and those do not adapt as fraudulent behaviour changes over time. Also, it's very difficult to determine what the right threshold should be.
Second, with a rule you'll get a yes/no outcome, unlike with machine learning where you can get a probability value. With probabilities, you can much better fine tune the outcomes to the amount of cases you want to inspect as a fraud team. Effectively, with a machine learning model you can easily determine how many false positives and false negatives are acceptable, with rules that's much harder.
Rules based system also cannot capture the interaction of features like machine learning models can. So for example suppose the size of a transaction only matters in combination with the frequency, for determining fraudulent transactions. A rules based systems cannot really deal with that.
Machine learning models don't have these limitations. They will adapt to new data, and therefore can capture new fraudulent behaviour. You are able to capture interactions between features, and can work with probabilities rather than yes/no answers. Machine learning models therefore typically have a better performance in fraud detection.
However, machine learning models are not always the holy grail. Some simple rules might prove to be quite capable of catching fraud. You therefore want to explore whether you can combine models with rules, to improve overall performance.
Because you'll be working with machine learning models in the exercises, here's a quick refresher about how to define one with scikit-learn.
First, you always want to start with splitting your data into a train and a test set.
The second step is to define which model you want to use, and define its parameters. Let's take a very simple linear model, without defining any parameters.
You then continue by fitting your model onto your training data, you want to pass X_train and y_train into the model.
Your model has now been trained and you can obtain predictions, by running the model.predict() function onto X_test.
The last step is to compare your predictions from the model, with the true values by combining the y_predicted with y_test in a test metric. Here, we obtain an R-squared score for our lineal model. We'll practice this once again in the exercise on a different model.
In the following chapter you'll learn how to adapt classification models to effectively detect fraud cases. In chapter 3 you'll explore the situation where there are no reliable labels, and you need to flag potential fraudsters by clustering your data. Lastly, in chapter 4 you'll learn how to further improve our models by analyzing text data and applying topic modelling to further detect fraud.
Let's practice!
#DataCamp #PythonTutorial #Fraud #Detection #Python