In recent years, financial fraud has exploded, as companies and consumers increasingly use online services. A recent PwC survey found that 51% of surveyed organizations reported experiencing fraud in the past two years, the highest in PwC’s 20 years of research. Common fraud scenarios include identity theft, ATM skimming, and money laundering.
Artificial Intelligence (AI) and machine learning (ML) can provide fast, scalable, and cost-effective methods for detecting fraud that are more accurate than traditional rule-based systems and better able to detect new and evolving fraud patterns. Yet techniques that can explain the results of the ML model – known as Explainable AI – are increasingly necessary to address regulatory requirements, comply with ethical imperatives, and ensure accurate results. In this blog post, we will discuss the application of Explainable AI to ML-based risk analytics.
Machine Learning for Fraud Detection
Companies increasingly need innovative technologies to improve their fraud management, given the large volumes of data (such as events and transactions) to be analyzed. Machine learning has thus become a highly relevant approach in a variety of industries, including financial services.
Supervised and Unsupervised Models
A combination of supervised and unsupervised ML techniques for fraud detection can help identify existing patterns of fraud while also integrating anomaly detection. Supervised learning is intended to develop predictive models based on labeled data, detecting patterns that resemble the ones they were trained on, while unsupervised learning finds patterns in unlabeled data.
Privacy-Preserving Fraud Detection
Today, more companies are using privacy-enhancing technologies (PETs) to securely collaborate on data for more effective fraud detection.
BNY Mellon is one of the world’s largest custodian banks. Working with Inpher, BNY Mellon was able to build a fraud detection model by using collaborative data that was 20% more accurate than their internal data alone.
At Sibos 2023, an emerging group of banks and technology providers unveiled SWIFT’s Federated AI initiative, a privacy-preserving platform to enable data collaboration for anomaly detection.
Explainability for ML-based Fraud Detection
Explaining the results of ML systems is becoming an ethical and regulatory imperative for applications with life-changing impacts. (For example, Articles 13-14 and 22 of the EU’s General Data Protection Regulation, or GDPR.) In the context of ML-based financial fraud detection, Explainable AI can protect consumers (e.g., by uncovering possible discrimination and bias), ensure compliance with financial supervision and internal governance frameworks, and allay data management and usage concerns. In sum, explainability supports Trustworthy AI.
Benefits of Explainability
In addition to the benefits mentioned above, such as building trust with employees and customers and complying with the law, Explainable AI offers a number of operational advantages to businesses employing ML-based risk analytics.
Reduce False Positives
A large false positive rate (i.e., transactions incorrectly labeled as fraud) is a common problem in fraud detection methods. Explanations can enhance operational efficiency by reducing false positives. Fraud analysts who view explanations can differentiate between fraudulent anomalies and rare but innocuous events by genuine customers. This saves time in the manual inspection of cases.
Improve and Debug Models
Explainable AI can uncover a variety of problems with ML models. In fraud detection, when data shifts (due to seasonal spending patterns, for example, or events like a natural disaster) it can lead to concept drifts, where the model’s predictions drift from the original focus. Explainable AI enables improving, debugging, and more seamlessly reusing ML models – saving time and money.
Head Off Adversarial Attacks
Explainable AI is one means to enhance protection against adversarial attacks: for example, when a cybercriminal inserts certain instances in an ML model, knowing it will disrupt learning and lead to misclassifications.
Challenges with Explainability
Explainable AI presents several challenges that must be carefully considered and addressed.
Time and Accuracy Trade-offs
Most notably, explainability requires trade-offs between accuracy, explainability, and run time. In real time systems such as fraud detection, the time required to generate an explanation matters. Typically, the larger the background dataset, the more reliable the explanation. Yet using entire datasets is computationally expensive, so in most cases relying on a subsample of the data is preferable. However, this reduces the reliability of the explanation.
Privacy
Financial datasets contain sensitive personal and corporate information. Thus the data is generally protected and anonymized, making it difficult to explore. In addition, decisions must be made about whether to include a variety of confidential features, since providing explanations that are understandable to an end user could compromise privacy requirements.
Imbalanced Datasets
Another challenge of using machine learning for fraud detection is the dataset’s imbalanced nature. Fortunately, far fewer fraudulent cases than genuine cases are found in financial data. However, this imbalance works against the goal of fraudulent pattern learning by the system.
Explainability Techniques
The article “Explainable Machine Learning for Fraud Detection,” by Ismini Psychoula, Andreas Gutmann, Pradip Mainali, S. H. Lee, Paul Dunphy, and Fabien A. P. Petitcolas, explores generating explanations for fraud detection systems based on two of the most prominent techniques, LIME and SHAP. The authors note that reliability and practical considerations for real time systems are relatively unexplored.
Attribution techniques such as LIME and SHAP explain a single prediction by ranking the features that most influenced it. (For more on SHAP values, stay tuned for our upcoming blog post.)
The article concludes: “While SHAP gives more reliable explanations, LIME is faster. In real time systems it is not always feasible to explain everything. It may be beneficial to use a combination of both methods where LIME is utilized to provide real time explanations for fraud prevention and SHAP is used to enable regulatory compliance and examine the model accuracy in retrospective.”
Conclusion
In a world of escalating financial fraud, Explainable AI has the potential to make powerful, ML-based fraud detection a realistic option that aligns with regulations, ethics, and the need for accuracy.