Application of NLP to Detect Adverse Events in Patients

At the last PMSA conference, Ketan Walia, senior associate of decision science at Axtria, and his colleague Rushil Goyal, also a senior associate of decision science at Axtria, presented "Application of NLP to Detect Adverse Events in Patients," which generated a lot of interest. They looked into the automated detection of adverse drug reactions using social media text data leveraging natural language processing and machine learning, and gave conference attendees a rundown of what they found. In case you missed it, here's a recap.

Why Are Adverse Drug Reactions (ADR) Significant?

Getting a handle on ADR will significantly benefit the industry, leading to huge savings in healthcare costs and better patient compliance.

"ADR detection is a very significant task which typically doesn’t get as much traction as it needs," Walia says. "Especially considering the fact that adverse reactions related to a drug could affect the entire life cycle of the drug from clinical trials to the time it is launched in the market. Around 90% of ADR are underreported and there is often a big delay by the time they get formally reported and registered. This creates a huge lag in the system called a delayed feedback syndrome. This eventually hurts drug performance in the long run, greatly impacting safety of patients and commercial gains for the manufacturer."

ADR is a top cause of morbidity. Here are the grim stats:
  • 6.7% of hospitalized patients have a serious ADR with a fatality rate of 0.32%
  • Adverse reactions to drugs cause 100,000 deaths yearly
  • ADRs are the 4th leading cause of death in the U.S.
  • 90% are underreported
There's an urgent need for action. Adverse drug reactions and their impact on drug approval are having serious impacts on the commercial outlook of drugs.

One place to start, Walia found, was Twitter.

Wait, Twitter?

Yes, Twitter. The social media giant is a potential gold mine of information about ADR. Its 645 million users generate about 9,100 tweets every second, some of them about their own health and response to medications.

Twitter has widely been used in other frontline industries like retail, e-commerce, consumer durables, service and more for opinion mining, customer intelligence and gauging customer satisfaction levels.

However, Twitter as a data source has not been widely used by the pharmaceutical and life science industry as it is not a standard practice. Here's what led Walia and his team to Twitter:

Delayed feedback syndrome: "For our topic while doing literature review we realized that around 90% of ADR cases are underreported, which results in delayed feedback syndrome and many times ADR are officially registered only after their market launch. This hurts entire USP (unique selling point) of the product/drug and continues to affect drug performance throughout its lifecycle," Walia explains. "To mitigate these shortcomings, we were looking to build a pharmacovigilance system which could provide automated feedback, possibly in real time."

While researching, Walia and his team realized that although ADR are underreported, patients do not hesitate to go online and vent about their experiences in almost real time. So one of the reasons to use Twitter as a data source emanates from the shortcoming of the present system and also the nature of the problem we are trying to solve.

Lack of data sources: There is not much data pertaining to ADR being collected and made frequently available publicly for commercial use. So, Walia found Twitter to solve this problem. All the data is publicly available, directly coming out of affected patients themselves.

How Walia Used Twitter Data for Pharmacovigilance

Step 1: Data Acquisition
The first part of the process was to collect tweets as a source of potential ADR. Arizona State University collected 10,000 tweets corresponding to a list of 81 drugs as per IMS Health Top 100 Drugs. What they found was raw, unstructured data: people's thoughts, feelings and experiences. Next, it was time to remove the "noisy" information, like retweets, advertisements, URL links, boiling the tweets down into the information they really needed — a patient's reaction to the drug they were taking.

Step 2: Tweet Pre-Processing
Pre-processing involved segmentation of the raw text, sentence splitting and tokenization. It was about converting words into numbers so the data could be analyzed.

Step 3: Feature Engineering
The third stage in the process involved coming up with a representation of a group of words having similar meanings. One representation of this is the "bag of words" that most everyone has seen. It is simply a grouping of words, some very large and some tiny, representing how people feel about a certain situation or issue. Ketan Walia explains: "The way it works is that you feed in 'Bag of word' representation of words to this algorithm and it runs a neural network on the background and converts bag of word representation into a more generalizable vector representation called 'Word Embeddings.' Once you get word embeddings for all the words in your data set you can now feed these word embeddings instead of bag of words to a machine-learning algorithm."

Step 4: Binary Classification
This step involved categorization of sentences as ADR or not-ADR, and testing and evaluation of data using various cross validation techniques. Here, it's about deep learning. The main advantage of deep learning is that it is capable enough to deal with highly complex and unstructured data like text.

Step 5: Named Entity Recognition
Walia and his team used the Hidden Markov Model to annotate words and phrases directly related to ADR. That's because the Hidden Markov Model has a 63% accuracy rate to train a machine learning model to automatically annotate ADR positive tweets.

Conclusion

Maximizing knowledge of a drug’s safety profile and integrating it into commercial planning will have greater influence with regulators, payers, and ultimately patients and prescribers.

The end result is that the entire modeling framework provides an Artificial Intelligence based system which could automatically stream a drug-related post online (Twitter in this case), interpret the text data and classify if the text is pertaining to an ADR or not. If yes, these ADR positive tweets are further analyzed within the framework itself to tag the words and phrases in the tweet directly pertaining to ADR, thus providing the most relevant and concise intelligence to the user. It provides the user tools to perform this pharmacovigilance and extract the most relevant information in an automated fashion.


Comments (4)

This comment was minimized by the moderator on the site

Great article,very informative.Thanks for sharing your thought process! looking forward to subscribe to your news letter.
Thanks for such an interesting and wonderful blog.It is really a nice and informative blog and the content is really...

Great article,very informative.Thanks for sharing your thought process! looking forward to subscribe to your news letter.
Thanks for such an interesting and wonderful blog.It is really a nice and informative blog and the content is really precise. looking forward to subscribe to your news letter.

ExcelR is one of the best institute for<a href="https://g.page/ExcelRDataScienceMumbai">Business Analytics Course in Mumbai</a>

ExcelR- Data Science, Data Analytics, Business Analytics Course Training Mumbai

304, 3rd Floor, Pratibha Building. Three Petrol pump, Opposite Manas Tower, LBS Rd, Pakhdi, Thane West, Thane, Maharashtra 400602
Contact us:+919108238354

Read More
 
This comment was minimized by the moderator on the site

Excellent blog. Very well written. Concept explained so well!! Looking forward to read more

ExcelR is one of the best training institute for Data Science course in Mumbai

ExcelR- Data Science, Data Analytics, Business Analytics Course Training...

Excellent blog. Very well written. Concept explained so well!! Looking forward to read more

ExcelR is one of the best training institute for Data Science course in Mumbai

ExcelR- Data Science, Data Analytics, Business Analytics Course Training Mumbai

304, 3rd Floor, Pratibha Building. Three Petrol pump, Opposite Manas Tower, LBS Rd, Pakhdi, Thane West, Thane, Maharashtra 400602

+919108238354

Read More
 
This comment was minimized by the moderator on the site

Excellent blog. Very well written. Concept explained so well!! Looking forward to read more

ExcelR is one of the best training institute for Data Science course in Mumbai

ExcelR- Data Science, Data Analytics, Business Analytics Course Training...

Excellent blog. Very well written. Concept explained so well!! Looking forward to read more

ExcelR is one of the best training institute for Data Science course in Mumbai

ExcelR- Data Science, Data Analytics, Business Analytics Course Training Mumbai

304, 3rd Floor, Pratibha Building. Three Petrol pump, Opposite Manas Tower, LBS Rd, Pakhdi, Thane West, Thane, Maharashtra 400602

+919108238354

Read More
 
This comment was minimized by the moderator on the site

Hey its a great knowledgable stuff provided in this blog will look forward to follow. ExcelR is the best institute for <a href="https://g.page/ExcelRDataScienceMumbai">Data Analytics Course in Mumbai</a>
ExcelR- Data Science, Data Analytics,...

Hey its a great knowledgable stuff provided in this blog will look forward to follow. ExcelR is the best institute for <a href="https://g.page/ExcelRDataScienceMumbai">Data Analytics Course in Mumbai</a>
ExcelR- Data Science, Data Analytics, Business Analytics Course Training Mumbai
304, 3rd Floor, Pratibha Building. Three Petrol pump, Opposite Manas Tower, LBS Rd, Pakhdi, Thane West, Thane, Maharashtra 400602
+919108238354

Read More
 
There are no comments posted here yet

Leave your comments

  1. Posting comment as a guest.
Attachments (0 / 3)
Share Your Location
Type the text presented in the image below