Improving Completeness and Accuracy of Real World Data


The pharmaceutical industry today is evolving to develop patient experience as a core dimension when bringing new drugs to market. Shifting patient expectations combined with innovative technologies will have a dramatic impact on drugs and healthcare in the coming years. To cater to shifting trends, pharma companies are now turning towards patient data to power their decision making.

Real world data (RWD) accounts for 95% of the patient data, as opposed to the meagre 5% covered by clinical trials. Pharma companies are spending close to 20 Million USD annually on generating RWD-based insights. However, data fragmentation and non-standardized formats across RWD sources – coupled with incomplete and/or inaccurate data capture – raise concerns on the quality of RWD. In once such instance, the challenge was with low coverage of a key biomarker in one data source (<10%) while the coverage was better in another (>50%). We improved the coverage by experimenting with techniques such as Random Forest and Neural Networks to predict the values of the biomarker in the low-coverage dataset.

Parallelly, there is a boom in machine learning (ML) being used for data quality processes, which can aide stakeholders in overcoming the obstacles faced in the consumption of RWD. Various ML/DL algorithms can be implemented for the imputation of missing data, prediction of variables completely absent in a data source, and detect anomalies, thereby improving the completeness and accuracy of data. Effectiveness of the methods is measured through a combination of accuracy parameters, benchmarking against results from industry standard publications, and improvement in the number of potential studies. Through this webinar, we’ll be exploring:

  • What are the challenges in using Real World Data for product commercialization?
  • How can ML algorithms be leveraged to improve the quality of RWD sources?
  • What are the RWD elements (such as biomarkers) that could enrich a study based on patient data?


  • Bingcao Wu, M.S, Associate Director, Real-World Market Access Analytics, Janssen Scientific Affairs
  • Siddhant Deshmukh, Engagement Manager, Mu Sigma

Sponsored by:

Mu Sigma

Back to: