Using Bayesian Reasoning to Predict When a Patient Will Discontinue Therapy

Can we predict with any accuracy whether an individual patient is likely to discontinue drug therapy? We're not talking about knowing if a patient has stopped using their meds, but if they're likely to do so. It might sound like something out of "Mission Impossible," but it's just data analytics at work in pharma. Yes, we're that cool.

This is regarding a presentation made at last year's PMSA conference by Jean-Patrick Tsang, founder and president of Bayser, a Chicago-based consulting firm dedicated to pharmaceuticals sales and marketing. JP is an expert in patient-level data and related analyses, and he's focusing on using a novel approach, Bayesian reasoning, to predict when a patient will discontinue therapy.

Look at the persistence curve of any chronic therapy and you'll invariably see a significant drop in the earlier stage of the therapy. We all know there is a slew of events that influence discontinuation. For starters, let's include factors like:
  • Admission to an ER
  • A visit to another doctor for a second opinion
  • A change in dosing
  • A side effect
  • An increased co-pay
  • The drug simply not working
  • Negligence (just forgetting to take it)
  • Psychology (If I'm taking medication, that means I'm sick.)
We know they have an impact but are not quite sure about the magnitude of the impact. We also sense that the impact of these events may not be the same if the patient is in the early stage or latter stage of the therapy.

Also, we cannot ascertain if the difference is limited to magnitude or also involves directionality. If we could somehow quantify the impact of an event on patient discontinuation while differentiating, say, between early stage (ramp-up) and latter stage (cruising), we would be able to establish which events are material and which ones are not.

We would then be able to focus on the important ones and identify relevant interventions both at the physician and patient levels that would significantly reduce the odds that the patient would discontinue therapy. We could gain additional insights by analyzing the impact on competitive drugs.

Enter Bayesian Reasoning

Bayesian reasoning is about how we update our belief in light of evidence. If we do not know much, we’ll assume that each patient has the same probability of discontinuing therapy. We can derive this from the adherence curve of the patient cohort we are tracking.

Using patient data, we know that patient 12345 who is under the care of Doctor John Smith has not filled the prescription for the last two months, has been admitted to the hospital, underwent a procedure, saw a second physician, has been diagnosed with a new diagnosis, was ordered a new lab test, or had another variable happen.

For each of the events, we have a likelihood that says how likely this is to appear if the patient were to discontinue compared with if the patient were to remain on the drug. We combine all the likelihoods and come up with a new probability of discontinuation that is specific to that one patient. This is called the posterior probability. We usually use a threshold to convert the probability into a yes/no answer.

Then What?

Many events that drive discontinuations have strong clinical rationale, including hospitalization and the seeing of additional specialists. But certain events may signal opportunities for intervention, including changes in dosing and change in the days' supply of the drug.

When we know patient 12345 is likely to discontinue with his therapy, we can have targeted interventions. It means contacting the patient's doctor. We can do this through visits from sales reps, emails and other forms of contact that will let the physician know their patient is likely to stop.

  • We can predict discontinuations with high accuracy.
  • More work is needed to effectively triage at-risk patients to a specific intervention.
  • The analytical path to identify root causes can now be more focused.
"This is a success story that shows that we can indeed predict with great accuracy when a patient will discontinue therapy, which some still have trouble believing we can accomplish," said JP Tsang.

It's about explainability. We need to understand why the algorithm is saying what it's saying. Explainability is very difficult to achieve but we can start with transparency. Bayesian reasoning, we found out, strikes a good balance between accuracy (neural net) and transparency.

Combining Secondary & APLD Advanced Analytics and Primary Analytics

What do you do if your primary research and secondary analytics don't come up with the same conclusions? At a recent PMSA conference, Igor Rudychev, head of U.S. digital, data, analytics and innovations at AstraZeneca Oncology gave a presentation that delved into this issue.

Here's the crux of it.

Historically, we know that primary market research drives pharmaceutical decision-making. Senior leadership is making major strategic and tactical marketing decisions based on a variety of factors, including:
  • Awareness and familiarity with the drug
  • Percentage of trialists, prescribers or switchers
  • Perceptions (including perceptions of efficacy and tolerability)
  • Likelihood to prescribe in the future
  • Discussions with sales reps
  • Barriers to prescribing
  • Brand perception and satisfaction
  • Influencer nominations/mapping
  • Inputs to forecasts
  • Market shares
When you get into secondary analytics, we know that the data is coming primarily from the patient level.
  • Optimization
  • Targeting
  • Segmentation
  • Prescriber analysis, including early or late adopters and historical prescribing patterns
  • Sources of business
  • Durations of therapy
  • Spheres of influence
  • Inputs to forecasts
  • Market shares
But, here's the problem. In pharma, decisions are made by using primary data, but that data is incomplete. Sales decisions are made by using the secondary data. A combination of the two is the optimal way to improve patient outcomes. But the results of those two methods, even when measuring the same thing, come up different.

Two Models Research the Same Thing, Different Results. Now What?

A way to look at this is to look at the goals of primary research and secondary analytics. Many of those goals overlap to answer the same questions.

Primary goals:
  • Market shares
  • Inputs to forecast
  • Influencer nominations/mapping
  • Likelihood to prescribe in the future
  • Had discussion with sales rep
  • Sources of business
  • Durations of therapy
Secondary goals:
  • Market shares
  • Inputs to forecast
  • Influence mapping
  • Innovators/Laggards Analysis, probability to prescribe
  • Call execution
  • Sources of business
  • Durations of therapy
Some of those goals overlap, but the research of the two methods can come up with different results. Say that, in primary research, you find a market share of 30% and a likelihood to prescribe of 80% and had a discussion with a sales rep comes in at 40%. Great! But your secondary analytics find that market share is at 41%, probability to prescribe in the future based on analogs is 50% and sales rep discussion is at 70%.

It happens much of the time because of a difference in assumptions in the two methods. So, now what?


The key is to triangulate the data and look at the subset of where the triangles meet.

Let's use one example. Using a machine learning/AI model, you can create a subset from, say, claims that imitate complete HCP and patient populations and that are representative of the payor and patient population. You can then train the ML/AI model on this subset and estimate market shares, making sure to capture the parameters driving initial data skews.

Then, you can apply the model to the primary research subset and compare the numbers. This improves the model.

It's further possible to create behavioral HCP segmentation based on both primary and secondary data using that overlap and use the results for targeting.

The point is to link primary and secondary data to train the ML/AI model. It's about linking attitudinal primary variables with secondary variables in claims.

Pros and Cons

When you're talking about the projection of attitudinal variables for every HCP for targeting, the standard approach is to just use secondary variables from claims to create secondary segments.

What if, with the Qual Variables Projection approach, you project variables from primary research to the secondary data HCP? Here are the pros and cons.

Standard approach
  • Uses only secondary data for individual HCP parametrization
  • Uses only secondary data for targeting
  • Often secondary data is not enough to create uniform segments
Qual Variables Projection approach
  • Allows to probabilistically introduce qual variables to the secondary data
  • Models data first and improves projectability of segmentation
  • Creates more uniform segment
So, what do we learn from all of this?

AI and ML allows us to bridge primary research and secondary analytics. It also allows us to resolve major differences between results of primary and secondary data analysis. Many primary research techniques could be improved with secondary data analytics.

Bottom line: It's important to communicate to the leadership and decision-makers that pure primary research data could be skewed. Data enhanced with secondary analytics should be used in their strategic and tactical decision-making instead.

Complete data allows us to understand which medicines work best for which patients. In the end, it's about saving lives.

We Were Data Scientists Before Data Science Was Cool: New Challenges for the Profession

Suddenly, being a data scientist is cool. And in high demand.

Why? Because these days, data makes the world go round. Nearly every industry in our economic ecosystem is clamoring for it.

If a company, no matter the industry, is not using Big Data to chart and forecast customers' journeys, better connect with them, ferret out their wants and needs before they even know what they are (thank you, Netflix, for creating that perception in people's minds), and otherwise using the numbers to enhance the customer experience, it will be left in the dust by competitors that do.

The increased demand for data in all sectors of the economy has created a boom in the data science field. According to Forbes magazine, the fastest-growing jobs in the country today are data scientist, machine learning engineer and big data engineer. In the blink of an eye, every company needs people who can make sense of data. LinkedIn conducted a survey and found there are 6.5 times as many data scientists working today than there were just five years ago. For machine learning engineers, that number jumps to 9.8.

"The field has exploded within the past four or five years," says Nuray Yurt, head of enterprise data science at Novartis. But, she points out, while the need for data pros continues to ramp up, which is a good thing on many levels for the profession, it also brings with it some challenges for the data scientists themselves.

Challenges for Data Scientists Today

The situation can be loosely compared to the disruption the corporate training field went through back when the internet was first starting to change the way every company on earth worked. People got into the training profession because they liked teaching in front of a classroom, which is where the bulk of training happened pre-internet. But very soon after the screech of dial-up technology began connecting every desk in every office to the World Wide Web, someone got the idea that training should happen online, so trainees could sit at those very desks and get the knowledge they needed on their own schedule. Suddenly, trainers had to learn an entirely new skillset — creating online learning modules. It was not what they signed up for, but it quickly became an essential part of the job.

Data scientists are finding themselves in a similar predicament today. The nuts and bolts of analyzing data are always evolving, but the skills to do the job, like analyzing statistics, computer knowledge and business knowledge, remain the same. What's new for data scientists are the so-called soft skills that are becoming necessary parts of the job.

"Data scientists need to be curious, open minded, quick learners and have the right personality fit now," Yurt explains.

Communication skills are a vital part of that. Why? Because industries that are newly reliant on data, like sales, customer service and hospitality, are hiring data scientists to help them make sense of it all. And, gently put, the people who run those companies are not data scientists nor have they ever had one on staff. As Yurt notes, everyone now knows what to do with data, but few know what it takes to glean that data, analyze it and translate it into actionable goals and strategies for companies to implement. So, data scientists are suddenly put into the position of emerging from their offices where they've been happily crunching numbers on their own and explaining to higher-ups what the data science actually means, in language they can understand.

The temptation may be to "dumb down" the explanation, but Yurt says that's a mistake.

"The challenge for data scientists today is being able to communicate complex concepts to people who don't understand them without diluting the complexity," she says. That last part is the key. People in industries new to data need to understand the complexity of the process, or it diminishes the data science field as a whole. It also puts funding and potentially jobs at risk if people don't entirely get the fact that analyzing and interpreting data is a science that Hal from accounting wasn't trained for.

"We need to communicate why and how what we do makes a difference," she says.

Another challenge for data scientists is the need to be more open minded. "We need to be OK with change," Yurt says. "Our jobs won't be the same as they always were, and we need to be OK with that."

Application of NLP to Detect Adverse Events in Patients

At the last PMSA conference, Ketan Walia, senior associate of decision science at Axtria, and his colleague Rushil Goyal, also a senior associate of decision science at Axtria, presented "Application of NLP to Detect Adverse Events in Patients," which generated a lot of interest. They looked into the automated detection of adverse drug reactions using social media text data leveraging natural language processing and machine learning, and gave conference attendees a rundown of what they found. In case you missed it, here's a recap.

Why Are Adverse Drug Reactions (ADR) Significant?

Getting a handle on ADR will significantly benefit the industry, leading to huge savings in healthcare costs and better patient compliance.

"ADR detection is a very significant task which typically doesn’t get as much traction as it needs," Walia says. "Especially considering the fact that adverse reactions related to a drug could affect the entire life cycle of the drug from clinical trials to the time it is launched in the market. Around 90% of ADR are underreported and there is often a big delay by the time they get formally reported and registered. This creates a huge lag in the system called a delayed feedback syndrome. This eventually hurts drug performance in the long run, greatly impacting safety of patients and commercial gains for the manufacturer."

ADR is a top cause of morbidity. Here are the grim stats:
  • 6.7% of hospitalized patients have a serious ADR with a fatality rate of 0.32%
  • Adverse reactions to drugs cause 100,000 deaths yearly
  • ADRs are the 4th leading cause of death in the U.S.
  • 90% are underreported
There's an urgent need for action. Adverse drug reactions and their impact on drug approval are having serious impacts on the commercial outlook of drugs.

One place to start, Walia found, was Twitter.

Wait, Twitter?

Yes, Twitter. The social media giant is a potential gold mine of information about ADR. Its 645 million users generate about 9,100 tweets every second, some of them about their own health and response to medications.

Twitter has widely been used in other frontline industries like retail, e-commerce, consumer durables, service and more for opinion mining, customer intelligence and gauging customer satisfaction levels.

However, Twitter as a data source has not been widely used by the pharmaceutical and life science industry as it is not a standard practice. Here's what led Walia and his team to Twitter:

Delayed feedback syndrome: "For our topic while doing literature review we realized that around 90% of ADR cases are underreported, which results in delayed feedback syndrome and many times ADR are officially registered only after their market launch. This hurts entire USP (unique selling point) of the product/drug and continues to affect drug performance throughout its lifecycle," Walia explains. "To mitigate these shortcomings, we were looking to build a pharmacovigilance system which could provide automated feedback, possibly in real time."

While researching, Walia and his team realized that although ADR are underreported, patients do not hesitate to go online and vent about their experiences in almost real time. So one of the reasons to use Twitter as a data source emanates from the shortcoming of the present system and also the nature of the problem we are trying to solve.

Lack of data sources: There is not much data pertaining to ADR being collected and made frequently available publicly for commercial use. So, Walia found Twitter to solve this problem. All the data is publicly available, directly coming out of affected patients themselves.

How Walia Used Twitter Data for Pharmacovigilance

Step 1: Data Acquisition
The first part of the process was to collect tweets as a source of potential ADR. Arizona State University collected 10,000 tweets corresponding to a list of 81 drugs as per IMS Health Top 100 Drugs. What they found was raw, unstructured data: people's thoughts, feelings and experiences. Next, it was time to remove the "noisy" information, like retweets, advertisements, URL links, boiling the tweets down into the information they really needed — a patient's reaction to the drug they were taking.

Step 2: Tweet Pre-Processing
Pre-processing involved segmentation of the raw text, sentence splitting and tokenization. It was about converting words into numbers so the data could be analyzed.

Step 3: Feature Engineering
The third stage in the process involved coming up with a representation of a group of words having similar meanings. One representation of this is the "bag of words" that most everyone has seen. It is simply a grouping of words, some very large and some tiny, representing how people feel about a certain situation or issue. Ketan Walia explains: "The way it works is that you feed in 'Bag of word' representation of words to this algorithm and it runs a neural network on the background and converts bag of word representation into a more generalizable vector representation called 'Word Embeddings.' Once you get word embeddings for all the words in your data set you can now feed these word embeddings instead of bag of words to a machine-learning algorithm."

Step 4: Binary Classification
This step involved categorization of sentences as ADR or not-ADR, and testing and evaluation of data using various cross validation techniques. Here, it's about deep learning. The main advantage of deep learning is that it is capable enough to deal with highly complex and unstructured data like text.

Step 5: Named Entity Recognition
Walia and his team used the Hidden Markov Model to annotate words and phrases directly related to ADR. That's because the Hidden Markov Model has a 63% accuracy rate to train a machine learning model to automatically annotate ADR positive tweets.


Maximizing knowledge of a drug’s safety profile and integrating it into commercial planning will have greater influence with regulators, payers, and ultimately patients and prescribers.

The end result is that the entire modeling framework provides an Artificial Intelligence based system which could automatically stream a drug-related post online (Twitter in this case), interpret the text data and classify if the text is pertaining to an ADR or not. If yes, these ADR positive tweets are further analyzed within the framework itself to tag the words and phrases in the tweet directly pertaining to ADR, thus providing the most relevant and concise intelligence to the user. It provides the user tools to perform this pharmacovigilance and extract the most relevant information in an automated fashion.