Anomaly Detection

Guide to Anomaly Detection

What is Anomaly Detection

Anomaly detection is when you try to observe and detect unusual behavior. This is for example used within banking to determine fraudulent behavior. For example if one credit card holder usually makes a low number of transaction, but then rapidly makes a high number of high cost transactions, the card might be blocked since it is suspected of being stolen. This can be implemented 100% in software, and the detection and blockage can be done automatically.

Assumptions Anomaly Detection

A lot of variables can be assumed to behave like Gaussian random processes. They can be assumed to have a certain mean and a variance. It’s also possible to have a mean which floats over time, drifting towards one direction. If we assume this, then based on historical data we could construct a model for what we are observing. We could also take new data into consideration to take into account to include the natural change in mean which might occur.

With this model, we can for example say that we detect an error if we are a couple of standard deviations from the mean. For example, one rule could be to report an error if we are two standard deviations from the mean.

Software Implementation of Anomaly Detection

To many this efficient, is is necessary to implement all of this in software. There are several suitable platforms and programming languages, which you choose should perhaps be based on your previous knowledge or what goals you have with your project. However, one great thing is that we as humans don’t actually have to do any of the modelling ourselves, as long as we just assume that the variable behaves as a Gaussian, the software will calculate and update the mean and variance. Of course, it is important that we first analyze the data to see that it behaves approximately like a Gaussian process. If it does not, it is possible to transform the variable, this could for example mean to take the square or the logarithm of the variable. The goal with the transformation is to make the distribution of the random process appear to be Gaussian. This first step of analyzing the data is crucial, and will determine how effective the anomaly detection will be in the end.

Summary – Anomaly Detection

We learned the ideas and basics behind anomaly detection. This tool is extremely useful and is used within research and industry a lot. Another area where it is used widely is within the IT industry. For example Google uses it to detect spammers, fraudsters and other ill-behaved users automatically. This is because, as a user base, or the data amount scales up, it is impossible to employ humans to monitor the data, since it is to much to monitor. Instead automatic software need to be implemented.