Healthcare Fraud Mitigation – How Data, GenAI and Machine Learning Can Help

Healthcare Fraud Mitigation – How Data, GenAI and Machine Learning Can Help


In the U.S. healthcare fraud is an ongoing problem. In fact, The National Health Care Anti-Fraud Association (NHCAA) estimates that 3-10% of health insurance spending is lost to some form of healthcare fraud. Some government and law enforcement agencies place the loss at over $300 billion per year across the entire U.S. healthcare system. With widespread impacts on both individuals and businesses, fraud can raise health insurance premiums, expose people to unneeded medical procedures, and even increase taxes. And while government regulations and public-private partnerships are attempting to address this enormous problem, they could go further in leveraging innovation that is readily available to help.

In this post, we will also look at how AI and machine learning (ML) can help combat fraud – and why further development and innovation in this area could have even more pay-off.


What Is Healthcare Fraud?

Healthcare fraud has many faces and forms. It can be committed by medical providers, patients, or others who carry out deliberate deceptions in order to receive illegal benefits, funds or payments. 

As the primary agency in the U.S. for investigating healthcare fraud, for both federal and private insurance programs, the FBI lists the the most common types of healthcare fraud. These include:

Medical provider fraud

  • Double billing: Submitting multiple claims for the same service
  • Phantom billing: Billing for a service visit or supplies the patient never received

Patient (or other individual) fraud

  • Identity theft/identity swapping: Using another person’s health insurance 
  • Impersonating a healthcare professional: Billing for health services without a license

Prescription fraud

  • Forgery: Creating or using forged prescriptions
  • Diversion: Diverting legal prescriptions for illegal uses, such as reselling


The Role of Regulations in Healthcare Fraud

Recognizing the extensive damage caused by healthcare fraud, the federal government has taken steps to prevent healthcare fraud, waste and abuse. 

The Health Insurance Portability and Accountability Act of 1996 (HIPAA) mandated the establishment of a national Health Care Fraud and Abuse Control Program (HCFAC). HCFAC coordinates federal, state and local law enforcement activities. The HCFAC reports that in 2022 more than $1.7 billion was returned to the federal government or paid to individuals thanks to its healthcare anti-fraud and abuse activities.

A number of laws deter and punish healthcare fraud. Fraud can be prosecuted under both civil statutes (False Claims Act, Physician Self-Referral Law) and criminal statutes (Criminal Healthcare Fraud Statute, Anti-Kickback Statute). 

Many states have also strengthened their insurance fraud laws and penalties as well as requiring health insurers to meet certain standards of fraud detection, investigation and referral in order to remain licensed in the state.


Public-Private Partnerships 

As we discussed in a previous post on insurance fraud, collaboration is key to combating fraud. Healthcare data can be categorized as practitioners’ data, administrative claims data, and clinical data. Together these three data sources form a near-complete picture of fraud activity –  yet it is extremely challenging to bring them all together. 

Healthcare data can be categorized as practitioners’ data, administrative claims data, and clinical data.

Several organizations are working to fight healthcare fraud with public-private partnerships.

National Health Care Anti-Fraud Association

The NHCAA is a private-public organization that brings together the anti-fraud units of most private health payers in the U.S., as well as the relevant federal and state law enforcement and regulatory agencies. By promoting private-public cooperation at both the case and policymaking levels, and by enabling the sharing of investigative information among health insurers and government agencies, the NHCAA aims to improve the ability of both sectors to detect and prevent healthcare fraud and abuse.

Healthcare Fraud Prevention Partnership

The Healthcare Fraud Prevention Partnership (HFPP) is a voluntary, public-private

partnership between the U.S. government, state agencies, law enforcement, private health insurance plans, and healthcare anti-fraud associations. The HFPP aims to identify and reduce fraud, waste, and abuse across the healthcare sector by leveraging data acquisition and aggregation, information sharing, and cross-payer research studies.

To facilitate collaboration, the organization’s Trusted Third Party information technology infrastructure from GDIT is maintained in an Amazon Web Services Federal Risk and Authorization Management Program accredited Infrastructure as a Service environment.


AI for Fraud Detection and Prevention

As referenced above, detecting and preventing healthcare fraud necessitates acquiring and analyzing large amounts of data. Not surprisingly, organizations are increasingly looking to AI and machine learning (ML) to derive the most value from that data. For example, a recent report describes the new legal scope of the HFPP to generate risk scores, potentially using advanced fraud detection derived from ML.  

However, organizations working to leverage ML for better fraud detection face some significant challenges.

Diversity of data and systems

Technological innovation is especially challenging in the U.S. due to complex, varied data systems and diverse health models. The different systems and identifiers involved in data collection and extraction pose obstacles to integrating data from different sources.
Research shows that when it comes to ML for fraud detection, investing more in the data understanding and preparation phases leads to better results.

Silos can also pose a problem, such as between statisticians and data analysts on one side and auditors and investigators on the other. 

Need for explainability
It is important to be able to show that fraud detection is fair. Complex algorithms can be challenging for people throughout the anti-fraud process to understand and use. In general, handling imbalanced big data and high dimensionality in healthcare datasets is a challenge.

Explainable AI can foster more transparency, although it also introduces technical challenges. Recent research presents careful feature selection as a step toward enabling more transparent, explainable models. 


Data privacy and quality
HIPAA regulations around protected health information (PHI) mean that most fraud detection models discussed in research literature are based on either synthetic data or data collected in a de-identified manner, such as large Medicare datasets. The fraud detection models developed using such aggregated data extracts can be difficult to apply in the real world. 

While detection methods based on publicly available data can be difficult to leverage in a business setting, methods using private data sources (electronic health records, private insurance data) are limited by data privacy and legal issues, and thus are also difficult to replicate in the real world.


Privacy-Preserving Data Collaboration

Despite all the measures discussed above, Medicaid and CHIP programs, for example, generally showed a steady increase in the percentage of improper payments from 2012-2019. There is a clear need for further steps and innovation.

New technologies can support information sharing and research that is more applicable in the real world. For example, privacy-enhancing technologies (PETs), such as Secure Multiparty Collaboration, support data collaboration for Data Analytics and ML. PETs enable secure, privacy-preserving computations, in which the collaborating parties do not need to share data or expose their data to each other.

PETs are already being used with healthcare data, offering both reassurance and use cases to the healthcare anti-fraud ecosystem.

In one case, a large healthcare provider wanted to facilitate early prognosis of cardiovascular diseases by using more private data. However, privacy laws like HIPAA and confidentiality concerns prevent contributors from sharing data outside of organizational firewalls. Inpher’s XOR platform enabled researchers to privately compute across organizational data sources in order to increase both sample size and patient attributes, leading to improved model performance and disease prognosis.

XOR - Healthcare

Through ongoing innovation, including advanced privacy-preserving technologies, the fraud detection ecosystem can continue to advance in the fight against costly healthcare fraud and abuse.

To learn about Inpher’s Privacy-Preserving AI and ML solutions, visit our website.