Up to 10% of the pharmacy claims submitted to health plans and insurance companies are estimated to be fraudulent. A two-step process comprising of a combination of unsupervised and supervised learning techniques can be used to effectively identify the Fraud, Waste and Abuse in the Pharmacy industry. Claims submitted by the pharmacies to insurance companies/health plans provide data rich in valuable insights that enable prediction of fraud, waste and abuse. Suspicious activity can be identified by detecting anomalies in the data using unsupervised techniques like clustering, univariate and multi-variate outlier analysis, link analysis, simulated fraud signatures, etc.
With known fraud, waste and abuse data, supervised learning techniques like Random Forest, Neural Networks, etc., can be used to identify fraudulent transactions similar to historical fraud signatures. Combining outputs of these techniques Fraud scores are generated for each player [member, pharmacy and prescriber].
In this talk, we’re going to illustrate how machine learning [Spark MLLib and GraphX] was used to identify suspicious activity like co-conspiracies to commit fraud by pharmacies and prescribers[doctors] and others. We’re also going to demonstrate how fraud score was determined in this pharmacy claims fraud detection application.
Session hashtag:#DS9SAIS
https://databricks.com/session/pharmacy-claims-fraud-detection-using-apache-spark