AI in biopharma research: A time to focus and scale

(PDF-706 KB)

Despite recent advancements,¹ biopharma research in drug R&D remains expensive and time-consuming, although there are numerous opportunities to build capabilities that enhance productivity and provide probability-of-success gains. In this time of rapid growth of AI in biopharma, attention today is on how to make the most of the opportunity to deliver value at scale by fully integrating AI approaches into scientific process changes. In this article, we outline how biopharma companies can harness AI-driven discovery to deliver patient benefit, and why now is the time for a shift from pursuing select marquee partnerships and self-contained capability builds, to focusing on coordinated investment in research AI with impact to show for it.

The goal of the research phase in drug R&D is to generate as many quality drug candidates as possible, as quickly as possible, with the highest probability of successful transition to clinical development. The discovery process has historically been a convergent, stepped, pass–fail funnel process with attrition at every step—a process that is highly inefficient given the number of compounds initially tested.²Less than 0.1 percent of candidate molecules pass from screening to Phase I, with approximately 30 percent of new molecular entity costs being spent in discovery (in the context of $1.1 billion to $2.8 billion spent to bring each drug to market). Oliver J. Wouters et al., “Estimated research and development investment needed to bring a new medicine to market, 2009–2018,” JAMA, March 3, 2020. Ideally, this process should only promote compounds for testing that are relevant for targets that would lead to effective drugs for patients. AI can help identify the most promising compounds and targets at every step of the value chain so that fewer, more successful experiments are conducted in the lab to achieve the same number of leads.

The AI-driven drug discovery industry: Jury still out on impact

Why now is the time for AI-enabled drug discovery

Technology advances and regulatory openness to innovation have now combined to make AI-enabled drug discovery a practicable proposition.

In Europe, regulatory openness to in silico and synthetic-derived insights has been facilitated by EU regulation ICH M7 EU, which enables quantitative structure-activity relationship (QSAR) assessment of toxicity instead of traditional assay-based approaches. At the same time, the US Food and Drug Administration (FDA) recognizes single-arm trials in rare diseases with control groups incorporating real-world evidence (RWE).

In parallel, technology is advancing on two fronts with (1) standardized approaches to industrialization and scaling of machine learning (ML)—for example, MLOps (ML operations) and DataOps (data operations) alongside customized services and platforms; and (2) development of deep-learning approaches for designing new molecules and computer vision, which are increasingly accessible through public code repositories and academic literature.

The AI-driven drug discovery industry has grown significantly over the past decade, fueled by new entrants in the market, significant capital investment, and technology maturation. These AI-driven companies fall broadly into two categories: providers of AI enablement for biopharma as a service only, including software as a service (SaaS); and providers of AI enablement that have, in parallel with their services, their own AI-enabled drug development pipeline (see sidebar “Why now is the time for AI-enabled drug discovery”).

Our research has identified nearly 270 companies working in the AI-driven drug discovery industry, with more than 50 percent of the companies based in the United States, though key hubs are emerging in Western Europe and Southeast Asia.³ The number of AI-driven companies with their own pipeline is still relatively small today (approximately 15 percent have an asset in preclinical development). Those with new molecular entities (NMEs) in clinical development (Phase I and II) have predominantly in-licensed assets or have developed assets using traditional techniques.⁴

The growth in the AI-driven drug discovery space has caught the attention of established biopharma companies, and there has been a rapid rise in partnerships between traditional biopharma companies and AI-driven drug companies (Exhibit 1). However, there is a significant concentration in partnership activity and funding toward a small number of AI-driven players with high valuations, multiple deals, and significant capital raised (62 percent CAGR in investment over the past decade). Over half the capital invested in the space is concentrated in only ten companies (all based in the United States or United Kingdom). This is partly because of the difficulty biopharma companies and investors have in evaluating the long tail of AI-driven players. We have seen biopharma companies that are deeply interested in this space struggle to determine what emerging players do, where they operate along the value chain, the distinctiveness of their technology, and which technologies have demonstrable impact.

Two potential obstacles need to be overcome to unlock impact from AI enablement in partnerships among biopharma companies and AI-driven discovery players. First, AI-enabled discovery approaches (including via partnerships) are often kept at arm’s length from internal day-to-day R&D; they proceed as an experiment and are not anchored in a biopharma companies’ scientific and operational processes to achieve impact at scale. Second, investment in digitized drug discovery capabilities and data sets within internal R&D teams is all too frequently to leverage partner platforms and enrich their IP, rather than building the biopharma’s end-to-end tech stack and capabilities.

When hurdles are overcome, partnerships can come to fruition, and examples exist across the discovery value chain. AstraZeneca’s long-standing collaboration with BenevolentAI resulted in the identification of multiple new targets in idiopathic pulmonary fibrosis, with subsequent broadening of the scope to other therapeutic areas (TAs).⁵ Sumitomo Dainippon Pharma worked with Exscientia to identify DSP-1181 for obsessive compulsive disorder in less than a quarter of the time typically taken for drug discovery processes (under 12 months versus four and a half years)—with ambitions to enter the molecule into Phase I trials.⁶

Similarly, building AI-enablement capabilities in-house within biopharma companies is difficult, assembling the cross-functional teams required to drive the transformation is challenging, and it has been observed that AI enablement is often implemented in a relatively isolated way. AI-enabled approaches are often undertaken separately from day-to-day science, with AI-based tools not fully integrated into routine research activities.

Biopharma companies, therefore, need to strike a balance between internal capability building and partnerships with AI-enabled drug discovery companies. Successful biopharma partnerships in the AI space should have some core benefits: biopharma companies gain access to technology (AI platforms, algorithms, and infrastructure), data (such as curated labeled cell images, screening, ADMET⁷ data), talent (a ready supply of data scientists and data engineers to build AI pipelines while training biopharma talent), and assurances of data protection in relation to a highly specific strategic intent to maximize patient impact (for example, to co-develop a certain molecule class in a specific TA).

Substantial impact from building enterprise capabilities in-house

When biopharma companies successfully integrate AI processes in day-to-day science and assembles cross-functional teams with the right skill sets (data science, engineering, software development, epidemiology, discovery sciences, clinical, and design) we have observed significant impact along the value chain (Exhibit 2):

Hypothesis generation capabilities—simplified hypothesis generation tasks in experimental biology fields from several weeks of researcher time to curated lists in minutes by combining real-world data (RWD), genomics data, and scientific literature through a knowledge graph for target identification
Large-molecule-structure inference—100 times acceleration in time to generation of protein structures (for example, for peptide or mRNA-vaccine-antigen generation) for target identification
Computer vision technology—up to ten times acceleration achieved for screening- plate-image analysis, with higher accuracy than classical approaches, harnessing deep-leaning approaches (for instance, convolutional neural networks) for target validation and hit identification
In silico medicinal chemistry—30 to 50 percent acceleration in small molecule, high-throughput screening, using approaches such as molecular property prediction in an iterative screening loop (versus the existing approach of randomized selection of compounds) for hit identification
In silico chemi-informatics—more than two times improvement over baseline on the key metric of “efficacy observed,” over 100 times the number of in silico experiments possible compared with previous screening, and faster time for design of compounds for optimization of drug delivery efficacy for lead optimization
Knowledge-graph-based hypothesis generation and drug repurposing—rapid identification of novel indications for existing investigational new drugs (INDs) or marketed drugs via genomic information and pathways associated with specific disease phenotypes, accelerating time to new treatments for patients, as part of the preclinical phase of R&D
Indication finding leveraging genomics—prioritizing indications to pursue for novel mechanisms of action (MoAs), finding new greenfield indications for life cycle management, prioritizing or deprioritizing ongoing programs within clinical plan by stopping low probability of success programs early and reducing patient burden in clinical trials; informing diligence of molecules for licensing with an independent view of biological potential, as part of the preclinical phase of R&D

Biopharma companies that maximize the impact of AI enablement can move beyond minimum viable product (MVP) individual use cases and build research systems (Exhibit 3).

Glossary of key pharma AI R&D terms

Artificial intelligence/machine learning

Often used interchangeably, AI and ML have subtly different meanings. A subset of AI, machine learning refers to algorithms that improve by design through the addition of new data. AI describes, more broadly, a computer system that can solve problems based on data available in its environment, in the context of achieving specific goals.

Deep learning

A subset of machine learning, deep learning utilizes an artificial neural-network structure, typically involving a very large number of parameters to be calibrated, to identify patterns in large data sets and subsequently to make predictions based on this calibrated neural network.

Molecular property prediction/iterative screening

An approach at the intersection of machine learning and drug discovery that utilizes machine learning approaches to determine patterns in candidate molecule structures that predict their likelihood of being a successful drug candidate. Typically, in an “iterative screening” setup, an ML algorithm will be trained based on historic screening experiments and will predict which of the remaining compounds in the screening library are likely to be hits; as further batches of experiments are conducted, the algorithm increases its precision and predictive power. The algorithms learn from both successful and unsuccessful predictions.

Knowledge graphs

A type of database that integrates data from multiple sources and creates links across them, typically in the same domain (for example, drug discovery). Data is organized in accordance with an ontology—a web of concepts, including the codified relationships between concepts—and its controlled vocabulary. Knowledge graphs can be used to infer connections across multiple data sources, including link prediction and insight generation.

DataOps/MLOps

DataOps (data operations) is an automated, process-oriented methodology used by analytics and data teams to improve quality and reduce the cycle time of advanced analytics. MLOps (ML operations) refers to software engineering practices applied to IT operations (for example, packaging and deploying production software) in the context of machine learning and artificial intelligence.

Research systems harness synergies created by putting AI at the center of the research engine to enhance the outcome of experiments—instead of simply being a preparatory step for real-world experiments in isolation. They act as feedback loops to refine the predictive capability and stability of AI algorithms and inform experimental design (for more key definitions, see sidebar “Glossary of key pharma AI R&D terms”). An example is “iterative screening”: results of an initial round of high-throughput screening⁸ are used to train a machine learning (ML) algorithm. The ML algorithm can learn which underlying compound structures are most effective against a target and suggest other molecules in the library to prioritize for testing. As the ML algorithm gathers more data, its predictions rapidly become more accurate, and a disproportionately large number of “hits” are identified for the relative amount of the library screened. These research systems reduce overall costs, have higher probability of success, accelerate R&D processes (and therefore time to patient impact), and are fully integrated for specific use cases.

What does it take to successfully implement AI in biopharma research?

By implementing digital and data science tools and concepts, biopharma can capture the full value of current portfolios and develop core technologies, competences, and IP to drive future research (such as AI-enabled large-molecule and antibody design). Current AI-driven drug discovery companies are already developing their own, significantly more cost-efficient drug discovery pipelines, so it would be beneficial for established players to identify how they, too, can fully integrate novel technologies into standard research processes. While partnering is one option—where it provides access to data, technology, and talent, and the risk of partners exploiting a company’s IP to become a future competitor in the medium to long term is low—marquee partnerships cannot be the only way to develop in-house drug discovery capabilities. As such, it is critical for biopharma companies to work out how to shift from investing in nonintegrated, lighthouse use cases or partnerships to making AI an integral part of everyday research. With this in mind, here are four areas to consider:

1. Strategy and design-backed road-mapping. Biopharma companies can develop a top-down, C-level strategy, setting out the ways in which AI-enabled discovery will be a critical enabler of future performance. A significant aspect is to understand where the current organizational pain points lie, what the potential gains could be, and where the organization wants to lead the industry (versus only being competitive) in the context of how the space/competitors are expected to move in the future. This strategy needs to be specific, time-bound, linked to value at stake, and have strong alignment among (and sponsorship from) senior leaders—including the heads of R&D, research, and data science. Underpinning this strategy is the need for sufficient resources (balanced across talent, data, and infrastructure investment) to support the capability building and talent acquisition required to make it a reality, or recognition of the trade-offs on IP and capability building if only pursuing external partnerships. Alignment between R&D and digital functions is paramount to ensure balanced co-investment (financial and management time) and for the impact generated from initiatives to be shared appropriately. In addition, it is important to carefully consider which elements of the AI-enabled drug discovery approach will be supported by partnerships versus built in-house.

Biopharma companies can develop a top-down, C-level strategy, setting out the ways in which AI-enabled discovery will be a critical enabler of future performance.

We recommend a design thinking approach to determine which parts of discovery research to tackle, and in which order. This involves studying, end-to-end, common research processes, where there may be two to three steps that are bottlenecks for researchers, and which could be significantly unlocked via AI—for example, automated image analysis for critical cell assays or lead optimization. Design thinking could help companies determine which areas could benefit most from AI, the implementation road map, and the success indicators to track progress and impact (for example, time from target identification to candidate selection, costs associated with target identification).

For R&D and data science leaders, the focus should not be solely on advanced-analytics use cases: there is significant value in cracking established problems, with applications such as basic automation using data transformation pipelines (such as dose response curve fitting), digital operational dashboards, or building data platforms and infrastructure (such as knowledge graphs). For example, building a single data platform for all preclinical data generated can prevent experimental duplication and enhance data sharing across the organization—our experience shows this can reduce months of hypothesis generation time to a few days. The impact includes dramatically increased speed, freeing up people for more productive tasks, and increasing quality of analyses.

2. Relentless value delivery focused on quarterly value releases (QVRs). It is critical that R&D, data science, and data engineering collaborate closely and iterate on delivery of use cases in an agile way. The research process frequently includes specific constraints and ways of working (such as steps and hand offs in the experimental methodology) that need to be accounted for to ensure uptake of the tools and systems that are built (in addition to updating scientific processes and standard operating procedures and introducing financial and performance-based incentives). To consider AI-enablement delivery holistically, leaders can line up key building blocks, as in this specific example focused on “high-throughput screening”:

Blueprinting. Develop a list of use cases across the value chain, prioritizing according to impact, complexity, and business value; then select the highest-need use cases.
Digital and analytics solutions. Build and automate screening algorithms that link molecular descriptors (for example, molecule structure in the form of a SMILES⁹ string) with desired output, or a hit.
Data continuum. Collect experimental data in a reusable way (for instance, with FAIR-data principles¹⁰); build master tables from equipment and existing libraries.
Tech capabilities. Design and build technical infrastructure and data architecture for data extraction and automated gathering.
Talent and agile operating model. Coach data science, data engineering, and translator/product owners on tools and delivery methodologies, iteratively testing and learning to deliver products via a collaborative environment.
Adoption and scaling (including change management). Design new screening protocols and experimental strategy, incorporating ML-based algorithms. Ensure the whole research organization (from leaders to lab technicians) understands what the company is trying to achieve and how daily activities need to change.

Once key AI-enabled use cases are aligned, delivery must be highly organized so as to demonstrate ongoing impact; core requirements and potential synergies must be identified and gaps in ongoing cross-cutting road maps identified. This means departing from long-term road maps delivering impact in multiyear cycles to focus on QVRs (which produce measurable value after each quarterly sprint, such as AI-enablement of a scientific process) while continuously reprioritizing based on organizational needs. This approach enables AI use-case development to be built more efficiently—by dynamically front-loading priority data ingestion and team capacity—with mission-critical assets deployed as required (Exhibit 4).

All core digital processes in research can be delivered with incremental quarterly delivery; however, the nature of “value” delivery may vary. Moonshot programs (in tech, this could be the advent of AlphaFold¹¹) require long-term road maps and typically a dedicated ML research group to deliver potentially groundbreaking discoveries with impact in biopharma. Such programs may not deliver an AI product every quarter such as other digital initiatives, but an insight, report, or decision should still be delivered on a regular basis.

3. IP, capability building, and developing translation expertise through partnerships. While there is certainly evidence for the benefits of partnership in specific areas, including to access unique technologies, data, or solution types, managing these partnerships exclusively at arm’s length and keeping novel methods or solutions separate from day-to-day research mean that necessary future capabilities for a transformation in drug discovery may not be built.

Biopharma companies should be selective and specific about the capabilities to be delivered by partnerships versus those built in-house. Similarly, a balanced approach to in-house and external talent (notably, the data scientists and data engineers needed to work with researchers in developing the algorithms and technology backbones to support prioritized areas) is vital. Often overlooked but mission critical for AI enablement, are “translators” or “product owners” with deep business, clinical, scientific, and AI/ML and systems architecture understanding. These profiles have a product ownership mindset and understand and dynamically evaluate all elements of the analytics team to maintain focus on value and impact delivery, thereby assuring successful project delivery.

4. Industrialization of AI with MLOps and reusable analytical assets. For the capabilities a biopharma company builds in-house, it is essential to have the right enablers in place to support scaling across research activities: the right technology infrastructure and methodologies, especially DataOps and MLOps and an appropriate data architecture (for example, graph databases or Data Vault 2.0 technology). DataOps (data operations) enables companies to gain more value from their data by accelerating the process of building models. MLOps involves ensuring the right platforms, tools, services, and roles with the right team operating model and standards for delivering AI reliably and at scale. Technical-architecture enablers to support processing compute-intensive workflows such as AlphaFold, molecular-dynamics simulations, optimization models, and image-recognition workflows are a core requirement. Furthermore, enabling concepts such as Data Vault 2.0 techniques and graph databases are table stakes as AI capabilities scale.

To successfully deploy research systems, development teams must build multiple interrelated components (data connectors and pipelines, models, APIs, and visual interfaces) that work seamlessly to drive adoption among end users. Fragmentation of code bases and components, and reduced productivity due to integration challenges, are natural risks that arise when multiple tools are deployed across different domains and teams. Ensuring coding standards in development and harmonization of coding approaches across teams increases long-term productivity and solution robustness. Additionally, harmonization enables sharing of reusable components (data connectors, feature libraries, model-based embeddings) across projects: for example, using graph neural-network molecular embeddings for hit prediction and lead optimization for toxicity reduction. As the emerging research platform grows in complexity, “assetization” of reusable components becomes an increasingly important source of development productivity (with twice the productivity for teams that embrace it) and an important in-house capability that requires a dedicated team with a product-centered mindset.¹²

The question today is whether biopharma companies will move analytics investments beyond a focus on individual projects and marquee partnerships to transforming research at scale. A shift to focusing on specific scientific and operational pain points and building AI into fully integrated research systems—with a road map to scale—will enable biopharma companies to capture real business and patient impact from using AI in research.

AI in biopharma research: A time to focus and scale

About the authors

The AI-driven drug discovery industry: Jury still out on impact

Why now is the time for AI-enabled drug discovery

Substantial impact from building enterprise capabilities in-house

Glossary of key pharma AI R&D terms

What does it take to successfully implement AI in biopharma research?

Explore a career with us

Related Articles

Transforming biopharma R&D at scale

Generating real-world evidence at scale using advanced analytics

Automation and the future of work in the US biopharma industry