A Novel Interpretable Machine Learning Approach as a Commercial Decision Support Tool

A Novel Interpretable Machine Learning Approach as a Commercial Decision Support Tool
Avgoustinos Filippoupolitis PhD; Michael Kusnetsov PhD; Nicola Lazzarini PhD; Hariklia Eleftherohorinou PhD—Machine Learning & Artificial Intelligence Solutions Global Unit, Real World Solutions, IQVIA
Interpretability of a machine learning (ML) model is of high importance, as it enables the users to understand which features contribute to a prediction. As the ML model is no longer seen as a ‘black-box’, interpretability promotes trust and provides actionable insights into the model’s outputs. In this work we present an interpretability approach that goes beyond global feature contribution, and allows the attribution of the relative importance of ML drivers to individual predictions and to population sub-groups. We demonstrate the results of our approach on an ML model based on Gradient Boosting Trees, trained to classify Heart Failure with Preserved Ejection Fraction (HFpEF) patients. We further demonstrate how our approach enables the identification of sub-cohorts for which a feature is important although its global relative importance is low, allowing to identify high-value market segments.

Keywords: Model Interpretability, SHAP Values, Feature Importance, Real World Data, Gradient Boosting Trees

1. Background
Machine Learning (ML) applications show strong potential as commercial decision support tools, with an increasing body of literature demonstrating ML outperforming traditional commercial analytics. As ML adoption in commercial applications increases, interpretation of the results and understanding the patterns identified by the ML model is of paramount importance to translate ML outputs to actions with both improved patient outcomes and commercial impact. That said, most ML interpretation approaches focus on calculating average relative importance of the ML drivers across an entire population and fall short in interpreting targeted sub-populations of high value. This is especially impactful in rare diseases and specialty brands, where the populations are small and diverse, at times hard to specify with medical codes, making it challenging for launch and commercial teams to understand the ML inputs/outputs and develop effective targeted strategies, often resulting in overlooking high value market segments and missed commercial opportunities.
2. Objectives
The novel approach presented in this paper takes traditional ML feature attribution techniques one step further and acts as interpretable ML in healthcare that allows the attribution of the relative importance of ML drivers to specific population sub-groups. It builds on previously published work of successfully using ML to identify commercially viable patient populations and now helps understand the patterns of ML to translate them to actions. We compare results with traditional approaches for calculating the global relative importance of ML drivers and we discuss the benefits of our approach in both flexibility and interpretability. As annual health care marketing spending increased from $17.7 billion in 1997 to $29.9 billion in 2016, with direct-to-consumer advertising for prescription drugs increasing from $2.1 billion to $9.6 billion during the same period1, our novel decision support tool for sub-population identification and ML interpretation can have a strong impact on optimizing resource allocation and increasing revenue for pharmaceutical companies.
3. Data
We employed patient-level data that were extracted from transactional IQVIA US prescription and medical claims between 2010 and 2019. Prescription data are received from pharmacies, and contain information such as product, provider, age, gender, and date of service. Medical claims data are derived from office-based professionals, ambulatory and general health care sites, and include diagnosis and procedure information. The size of the dataset was 18.1 million patients, which is more than 400 times larger than the next largest dataset reported in the literature for Heart Failure patient classification.2
4. Methods
In order to construct a dataset for supervised learning, patients diagnosed with Heart Failure with Preserved Ejection Fraction (HFpEF) in the period between 2015 and 2019, are defined as positive, while non-diagnosed are considered negative. We demonstrate results on a binary classifier based on Gradient Boosting Trees3 for diagnosing HFpEF. This is a complex clinical condition, which is manifested by signs of heart failure, left ventricular diastolic dysfunction, and by a preserved left ventricular systolic function.4 The predicted percentage of hospitalized heart failure US patients that will have HFpEF by 2020 is 50%.5 We used features capturing information on demographics, treatments, procedures and symptomatology, including temporal associations between the timing of events. These features were selected based on clinical expert opinion, as potential risk factors for HFpEF. After applying a 1% prevalence filter, the total number of features was 98.

Table 1. Characteristics of Dataset
Patients with HFpEF Patients without HFpEF
Age (mean) 69.74 64.57
Age (std) 8.46 9.32
Gender (% of male) 45.79% 45.84%
Gender (% of female) 54.21% 54.16%
Count 1,646,563 16,465,630

Table 1 illustrates the characteristics of our dataset, along with the class ratio. Identification of the best model parameters has been realized using a Bayesian optimization approach.6 In particular, we use a Tree-Structured Parzen Estimator (TPE) algorithm for hyperparameter space exploration. Traditionally, hyper-parameter selection is based on grid-search, an exhaustive search of a specified subset of hyper-parameter values. Instead of this, the Bayesian optimiser iteratively evaluates subsets of values and automatically identifies the direction towards moving to improve the results. The TPE algorithm has been shown to outperform both grid-search and random search over the configuration space of hyper-parameters. However, the performance of the Bayesian optimisation approach depends on the probability distributions that define the domain of hyper-parameters over which to search.

We identified and analyzed key drivers of the trained model using SHapley Additive exPlanations (SHAP) values – a cutting-edge interpretability approach that is based on recent applications of game theory.7 SHAP values describe how each feature used for modeling contributes to any prediction made by the model. The approach is model-agnostic but is optimized for tree-based models such as Gradient Boosting Trees. SHAP values have two significant advantages over other existing interpretability methodologies. Firstly, it is the only methodology to have rigorous theoretical underpinning. Secondly, it enables a much wider suite of analytic and visualization techniques as we show below.

Figure 1: ROC Curve of the Predictive Model and the Area Under the Curve Value (AUC)

5. Results
Figure 1 illustrates the Receiver Operator Characteristic (ROC) curve, where we can confirm that our approach performed well in identifying HFpEF patients, with an Area Under the Curve (AUC) value of 0.939. As our dataset is unbalanced with a class ratio of 1:10 between positive (HFpEF) and negative (non-HFpEF) patients (which can be confirmed by Table 1), the Precision – Recall curve is a more suitable metric as it is robust to class imbalance.8

Figure 2: Precision-Recall Curve of the Predictive Model and the Average Precision of the Model

Figure 2 depicts the Precision Recall curve for our model, where we can observe that the Average Precision Score is 0.672. In particular, the ML model achieved 91% and 86% precision at 10% and 20% recall respectively, in identifying HFpEF patients and their sub-populations, improving by more than 20% on the performance reported in the literature.9

Figure 3: The Top 15 Features Contributing to the Predictions; The X-Axis Shows Mean Absolute SHAP Values for Each Feature in the Test Set

To illustrate the utility of our novel interpretability approach, we first apply it to demonstrate the global feature contribution significance variables for the entire population, as illustrated in Figure 3, which illustrates the top fifteen features contributing to the prediction of HFpEF. The value of each feature is the mean absolute SHAP value for each of the features in the test set. This is compatible to the insights produced by traditional ML interpretation approaches, that focus on calculating average relative importance of the ML drivers across an entire population. Beyond global feature importance attribution, our approach can also provide attribution of the relative importance of ML drivers to specific sub-populations. The model features illustrated in Figure 3 are in accordance with prior research and with guidelines10 recommending control of HFpEF symptoms with diuretics as well as managing comorbidities, including hypertension, because these appear to be the drivers for the inflammation that lies at the root of the condition.

Figure 4: Dependence Plot for Age-Based Population Segmentation Identifies a Patient Sub-Group for Which Age Has High Significance in Predicting Them as Diagnosed, Compared to the Global Population

A first example of this capability is depicted in Figure 4, where we illustrate the feature contribution significance for the Age of the population. Specifically, the x axis depicts the age value of each patient, while the y axis denotes the importance of each value for predicting a patient being diagnosed with the disease. The importance values near the dotted horizontal line (zero-contribution boundary) do not contribute significantly to the diagnosis. The values above the boundary contribute to making a positive diagnosis, while the values below the boundary contribute to making a negative diagnosis. We can confirm that our interpretability approach not only highlights age as a globally important feature, but also identifies a specific group of patients (80 years old) for which age is more crucial in predicting them as being diagnosed with the disease, compared to the rest of the population. This can guide commercial teams in designing appropriate physician education strategies on points of intervention with earlier diagnosis and treatment for highly probable HFpEF patients and then further classification of HFpEF to their sub-populations.

Figure 5: Dependence Plot for Days from First Occurrence of Hypertension, Indicates that this Feature’s Contribution Is Significant Across All Patients

To further demonstrate how our interpretability approach identifies patient sub-groups, Figure 5 illustrates the feature contribution significance for the days since first occurrence of hypertension. As expected, the feature contribution varies for different values of the days since first occurrence; however, we can confirm that the contribution of this feature is significant across all patients, as the importance values are consistently above the zero-contribution boundary. We should also note that the accumulation of negative importance values near the start point of the x axis corresponds to the subgroup of patients that do not have hypertension present in their history, and indicates that the absence of this feature is contributing towards making a negative diagnosis.

Figure 6: Dependence Plot for First Occurrence of Dyspnea on Exertion, Identifies Patient Sub-Groups for which Dyspnea Has High Significance, Although the Global Relative Importance of Dyspnea is Low

Our interpretability approach can also reveal useful insights for features with a low global relative importance. An example is depicted in Figure 6, which illustrates the feature contribution significance for the days since first occurrence of dyspnea on exertion. As we can confirm from Figure 3, this feature has a low global contribution significance. We can also note that for the sub-population that had the first occurrence of dyspnea within the last two years before HFpEF diagnosis, dyspnea was indeed not important in predicting them as diagnosed with the disease. However, for the sub-population that had the first occurrence of dyspnea three years before HFpEF diagnosis, dyspnea was an important factor in predicting them as diagnosed with the disease. These results illustrate how we can combine the attribution of the relative importance of ML drivers to specific population sub-groups, to identify sub-cohorts for whom a specific feature is important even though traditional approaches calculate the global relative importance of the feature to be low. This would allow pharma companies to identify high-value market segments that traditional approaches often fail to target.
6. Business Implications
The enhanced insights enabled by our approach are beneficial for commercial teams, as they enable them to better interpret ML outcomes to identify relevant patients and intervention points across the patient journey for early diagnosis and treatment. They also help design effective physician education strategies and improve the efficiency of marketing strategies. This approach can find wide application across ML commercial uses and can help bridge the gap from ‘black-box’ to ‘glass-box’ ML, as ML becomes increasingly embedded to commercial decision making. The proposed approach can also be applied to the area of rare diseases, considering the limitations related to class balance. In particular, given that the number of patients in this case will be very limited, the size of the positive cohort will be small compared to the negative cohort. However, this presents a challenge for all ML algorithms, as the trained model must achieve a high accuracy value by learning from a highly imbalanced dataset.

One limitation of the presented approach is that its usefulness depends on the accuracy of the underlying trained model. When a model cannot accurately predict whether a patient will be diagnosed with the disease, then the calculation of the global and individual feature contribution will provide insights that are less credible. A second limitation is related to combination of features and the significance of their contribution. Currently, our approach provides attribution of the relative importance of single ML drivers (features) to sub-populations. Extending this to calculation to combinations of features is a direction for future work.
7. Conclusions
We presented a ML interpretation approach that takes ML a step further by helping understand the drivers and patterns behind the model predictions. Our approach was applied to a complex disease and market, such as Heart Failure, with important implications to understanding and acting timely to sub-populations, demonstrating the power to identify the drivers behind predicting population sub-groups. Our approach can identify patient sub-populations of high value, expose the drivers behind the HFpEF diagnosis and highlight patients for which a specific drug is likely to yield improved outcomes.
About the Author
Avgoustinos Filippoupolitis leads the Data Science group of IQVIA Machine Learning & Artificial Intelligence Solutions, focusing on research, design and development of scalable machine learning methodologies for a wide range of areas including but not limited to digital disease diagnostics, HCP segmentation and targeting, and decision support algorithms. He is responsible for the design and development of Machine Learning methodologies and algorithms, using Real World Evidence (RWE), patient-level, and commercial datasets. Avgoustinos holds an MEng in Electrical and Computer Engineering from the National Technical University of Athens, and an MSc in Signal Processing from Imperial College London. He obtained his Phd in Emergency Simulation and Decision Support Algorithms at Imperial College London. Before joining IQVIA, Avgoustinos was a Senior Lecturer in Disruptive Technologies at the University of Greenwich.

Michael Kusnetsov is a Data Scientist at IQVIA with 10 years of multi-disciplinary experience. He conducts research for the machine learning team, focusing on model performance and interpretability. Michael has worked in the legal and healthcare industries with exposure to the financial and regulatory sectors. He received his PhD in Financial Mathematics from London School of Economics as well as MRes in Financial Computing from University College London and MSc in Mathematical Finance from Imperial College London. Michael is also a non-practicing qualified solicitor.

Nicola Lazzarini is a Machine Learning Data Scientist with a computational and bioinformatics background, working at IQVIA since 2018. He conducts research-related and client projects that involve the development of highly predictive and interpretable machine learning models. Nicola has over 7 years of experience in applying machine learning techniques to biomedical and pharma data, both in the academic and industrial fields. He has published various research papers in high quality academic journals and top-tier conferences, including Nature biotechnology and BMC Bioinformatics. His work has been cited by 200+ research papers. Nicola received his PhD in Computer Science from Newcastle University, UK. He also received both a BSc and an MSc in Computer Engineering from the Università of Padova, Italy.

Hariklia Eleftherohorinou is the global Head of AI and Machine Learning Solutions on Real World Data in IQVIA. The AI product portfolio spans across several application areas including but not limited to cost effective study design, commercial strategy development, AI-led disease diagnostics and screening, patient identification, Salesforce effectiveness, and AI-driven brand optimization. She is a trusted advisor to clients, guiding and supporting them as they are building their Predictive and AI/ML capabilities, translating predictions and insights into actions and informed decision making following a ‘glass-box’ approach. Prior to IQVIA, Hariklia was the Data Science and Advanced Analytics Practice Leader at Deloitte UK Consulting. She holds an MEng in Electrical and Computer Engineering from the Aristotle University of Thessaloniki; an MSc in Bioinformatics & Systems Biology and a PhD on Machine Learning In Medicine, from Imperial College London.
1 Schwartz LM, Woloshin S. Medical marketing in the United States, 1997-2016. Jama. 2019 Jan 1;321(1):80-96.

2 Ahmad T, Lund LH, Rao P, Ghosh R, Warier P, Vaccaro B, Dahlström U, O’connor CM, Felker GM, Desai NR. Machine learning methods improve prognostication, identify clinically distinct phenotypes, and detect heterogeneity in response to therapy in a large cohort of heart failure patients. Journal of the American Heart Association. 2018 Apr 12;7(8):e008081.

3 Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu TY. Lightgbm: A highly efficient gradient boosting decision tree. In Advances in neural information processing systems 2017 (pp. 3146-3154).

4 Huis AE, De Man FS, Van Rossum AC, Handoko ML. How to diagnose heart failure with preserved ejection fraction: the value of invasive stress testing. Netherlands Heart Journal. 2016 Apr 1;24(4):244-51.

5 Oktay AA, Rich JD, Shah SJ. The emerging epidemic of heart failure with preserved ejection fraction. Current heart failure reports. 2013 Dec 1;10(4):401-10.

6 Bergstra J, Yamins D, Cox DD. Hyperopt: A python library for optimizing the hyperparameters of machine learning algorithms. In Proceedings of the 12th Python in science conference 2013 Jun (pp. 13-20).

7 Lundberg SM, Lee SI. A unified approach to interpreting model predictions. In Advances in neural information processing systems 2017 (pp. 4765-4774).

8 Saito T, Rehmsmeier M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PloS one. 2015;10(3).

9 Austin PC, Tu JV, Ho JE, Levy D, Lee DS. Using methods from the data-mining and machine-learning literature for disease classification and prediction: a case study examining classification of heart failure subtypes. Journal of clinical epidemiology. 2013 Apr 1;66(4):398-407.

10 Deaton C, Benson J. Time for correct diagnosis and categorisation of heart failure in primary care. British Journal of General Practice. 2016;66(652):554 - 555.