Artificial Intelligence and Machine Learning in Medical R&D: Hope or Hype?
To the relief, perhaps, of more than a few in the Partnering for Cures audience, Atul Butte of the University of California, San Francisco began the lunchtime plenary discussion on artificial intelligence and machine learning by asking, “What on earth do all these terms mean?” He described artificial intelligence (AI) as taking aspects of human intelligence and modeling them with computers, machine learning as one type of AI related to processing data in a supervised or unsupervised fashion, and deep learning as focused on modeling brain architecture.
And why does everyone in medical R&D seem to be talking about them right now? Butte and three other experts agreed that this is a unique time with a number of factors converging to make the application of computational approaches especially important in the fight against disease and for improved human health. Butte summed it up as a time when “we have great hardware, open software libraries, plentiful data sets to learn from (including genomics and electronic health records), and lots of hard questions in biology that still need answers.”
Iya Khalil, co-founder and chief commercial officer at GNS Healthcare, noted that while advances in computation have been underway for some time, what’s changed is the availability of data. Now, data are deeper, richer, more varied, and in greater quantities, which allow us to illuminate human biology and health in a more robust and actionable way.
Alice Zhang, CEO and co-founder of Verge Genomics, became an entrepreneur three months before finishing graduate school, borne of her frustration that drug development, particularly in neurodegeneration, is still fundamentally a guessing game. She saw an exciting convergence of new technologies in machine learning with a deeper understanding of the human brain as “an opportunity to eliminate the guesswork.”
And John Baldoni, senior vice president at GSK Pharmaceuticals, called out the advent of factors like greater sharing of data, high-performance computing capacity, modeling, and high-throughput cell biology as being key enablers for efforts like the ATOM Consortium. This recently launched effort aims to mine the “dark data” on failed compounds within companies to generate models and hypotheses that can accelerate drug development.
Another common theme among the speakers was the need to “bring in both computation and biology from day one. The only path to fully realizing AI’s potential is breaking down the silos between software and drug development,” said Zhang. Khalil eschewed the notion of “practicing AI” in favor of thinking in terms of how to solve problems: “At the core of solving problems in biology now, you have to have AI in your toolkit, and keep learning along the way. The most sophisticated algorithms are meaningless without good data to train them on.”
Butte turned the conversation back to data, asking speakers whether they view it as “oil” (i.e., the world’s most valuable resource) or “soil” (i.e., a regenerating medium for growth). Baldoni replied neither: “Data is the currency of the new pharmaceutical world,” to be invested in creating value for patients and society, and that generates returns. Zhang and Khalil commented on the proliferation of new types of data, which are necessary if we’re going to get the answers we seek from these computational approaches. Said Zhang, “Biology is unique in its ‘missing data’ problem. We don’t fundamentally understand a lot of disease biology, unlike other applications of AI.” Verge is trying to address this challenge by, for instance, collecting living brain tissue from Parkinson’s patients undergoing deep brain stimulation, allowing them a view of early disease progression, as well as by pioneering single nuclei sequencing to help verify cell types and clarify signals. GNS is working with data both “broad” and “deep” from health systems as well as clinical research data and patient registries such as those collected by groups like the Multiple Myeloma Research Foundation.
Zhang called “patient-level data our biggest hair-on-fire problem.” All felt that big cohorts like the All of Us Research Program and cohorts organized by patient groups would be extremely useful. And all related to the challenges of insufficient data standardization, lamenting the time and effort devoted to “data munging.” Zhang noted that one-third of Verge’s first 1 million lines of code were devoted to data curation.