Anomaly detection, or outlier detection, is the identification of data points, observations, or events that do not conform to expected patterns of a given group. Anomalies or outliers occur very infrequently but can signify a large and significant threat, such as cyber intrusion, financial fraud, compliance violation, and machinery malfunction, to businesses. Anomaly detection has traditionally relied on subject matter experts to curate and set business rules to trigger red flags in data. This traditional approach is inherently flawed because:

  • As the methodology is highly dependent on subject matter experts’ knowledge and experience, the results can be subjective or biased.
  • It entails a long and expensive development cycle to cultivate and refine the set of business rules.
  • Oftentimes, business rules are not precise enough, difficult to manage overtime and may contradict one another.

Machine learning-based anomaly detection algorithms are a leap forward from the rule-based solution. However, conventional machine learning algorithms oftentimes fail to achieve a satisfactory performance level to end users due to several reasons.

  • First, as anomalies are rare and expensive to be investigated, many organizations have few or no historical labels on anomalies. The lack of historical data and sparsity in labels impede the use of supervised machine learning algorithms.
  • Different anomaly detection algorithms rely on different assumptions and are domain specific. There is no one-size-fits-all anomaly detection algorithm.
  • Customizing anomaly detection algorithms is non-trivial as it relies heavily on feature engineering. This is an increasing challenge with rapidly rising data velocity and variety in today’s business world.