enlightenbio  Blog

The Importance of Investment in Multimodal Data in the AI Era

As the artificial intelligence (AI) revolution picks up momentum, it is important for us not to lose focus on innovation within the healthcare data ecosystem. We stand at the brink of a transformative era where the deep interrogation and use of historical data assets can unlock incredible potential – to contextualize and guide future research with unprecedented efficiency, enabling data-driven decisions that enhance the entire R&D lifecycle.

This includes evidence generation for regulatory submissions, contextualization of early phase study results, and understanding complex clinical and biological correlation within disease states for drug targeting and repurposing efforts. By synergizing robust biostatistical methods with cutting-edge AI technology, we can realize this vision, but it will require strategic investments aimed at the creation of fit-for-purpose datasets from truly multimodal sources. Fortunately, important strides have been made in recent years on the infrastructure, data linkage, and analytical fronts, creating an environment ripe for partnerships in the space.

Let’s dive in…

Evidence Generation Using Historical Data is Expanding, Resulting in a Need for Rich, Multimodal Data

The gold standard for evidence generation in a regulatory setting will always be contemporaneous randomized controls.  In cases with high unmet need, poor prognosis, or in non-regulatory settings, externally controlled studies using historical data are becoming a go-to option(Khachatryan et al., 2023, Friends of Cancer Research,2019). Some notable recent examples of this approach include:

While the best sources of external data for evidence generation are collected in a controlled setting, such as a clinical trial or registry (Yap et al., 2021), the current scale and growth rate of historical trial and registry data is not large enough to meet existing demand, let alone satisfy a future where emerging biomarkers and other data elements are required for development in precision targeting therapies. These data assets also are largely static versus ever developing real-world data (RWD), creating a challenge when addressing data missingness and questions that require data that was not collected in the context of the initial study. 

On the other hand, RWD is not without its challenges either, often lacking important endpoints like progression-free survival (PFS), overall response rate (ORR), and reporting of adverse events (AEs) as well as necessary data elements that are not commonly collected in the clinical setting, such as prognostic criteria collected at required intervals, comprehensive disease severity metrics, and results from third parties that can be important for making apples-to-apples comparisons. A key advantage for RWD, however, is its adaptability through linkage; allowing the integration of new data sources and time itself to enhance the quality and scale of the assets. Many companies have sought to capitalize on these advantages, resulting in real-world evidence (RWE) enjoying enhanced visibility in recent years

Now that we have a baseline of where the industry and data landscape is, what’s next?

The Frontier of Emerging Data Modalities in Evidence Generation

The convergence of genomics and artificial intelligence is yielding unprecedented breakthroughs, one of the most promising being the application of circulating tumor DNA (ctDNA) for disease monitoring. This innovative approach can significantly impact the diagnosis, treatment, and recurrence in hematological malignancies and oncology, particularly in the context of minimal residual disease (MRD). Recent studies highlight the potential of ctDNA as a critical endpoint in multiple myeloma (MM), a concept now endorsed by the FDA’s Oncologic Drugs Advisory Committee (ODAC). Additionally, burgeoning evidence supports the utility of ctDNA as a pivotal tool in early detection and management of solid tumors as well, even if sensitivity thresholds reduce its utility for assessing MRD (Bittla et al., 2023; Kotani et al., 2023).

Studies continue to emerge showcasing the broader importance of biomarkers in clinical development. For instance, the correlation of biomarkers with decreased trial failure in phase III trials highlights the potential of personalized medicine (Mohamed et al., 2022).

So there’s a specific value case and trend for the broader class of biomarkers as an underrepresented modality in the evidence generation setting. Given the scope of this post, I won’t even go into digital pathology, patient reported outcomes, biometrics, social determinants of health, and many more emerging modalities that have the potential to enhance evidence generation strategy across the research and healthcare landscape(Figure 1). Suffice to say there is a big opportunity to make real progress here.  

Figure 1:  Data modalities and opportunities for multimodal biomedical AI. Source: “Multimodal biomedical AI”(Acosta et al., 2022)

AI offers transformative potential when it comes to aggregation and analysis, how does it impact the approach to these important data modalities?

The Limitations of AI’s Ability to Address Data Gaps Underpins the Need for Real Multimodal Data Assets

Figure 2:  Anticipated growth of synthetic data use in AI models.  (Source: Are Synthetic Data a Real Concern? Substantive Predictions and Mindful Considerations Through 2030, with the original source being a Gartner report).

There is some indication that AI generated synthetic data utilization will expand to dwarf real data for many applications in the future (Figure 2). However, synthetic data, which is typically derived from real patients by reassociating attributes using various machine learning models, has profound limitations in meeting the needs of an increasingly biomarker-informed data environment. 

Cohorts derived from synthetic data are a useful tool for specific use cases, particularly in maintaining privacy, reducing costs, confirming hypotheses in situations where you don’t need to show your work to regulators, and increasing diversity and scale to answer clearly defined questions (Giuffrè & Shung, 2023; Gonzales et al., 2023). However, the process of generating a synthetic cohort can dilute biological signals and other important associations while amplifying artifacts in the data, greatly diminishing the effectiveness of  exploratory research.  Finally, it is currently beyond our scientific capabilities to generate a relevant synthetic genome with significant utility for most research (Oprisanu et al., 2021), indicative of finding ourselves at an impasse in our ability to refine molecular-based disease subtypes and apply them to drug development and treatment. 

So if AI doesn’t provide an easy button for assembling the rich data sets of the future, what are the innovators doing?

Industry Data Asset Players are Capitalizing

Investment in formation of real multimodal data assets is on the rise, and innovators are homing in on the characteristics of datasets that make them best suited for the generation of regulatory grade, fit-for-purpose datasets at scale. This includes high quality, completeness, auditability/traceability (aka data provenance), and regulatory (HIPAA, GDPR) compliance.

The results of this strategic partnership approach are beginning to surface with increased frequency, with some recent notable examples emerging including:

But we are not done yet…

The Healthcare Data Asset Endgame – Patient Centricity

Figure 3: Luna PBC model for patients as stakeholders in the healthcare research industry (source: LunaDNA).

With the recent FDA draft guidance on patient-focussed drug development, it is clear that the industry is charting a course that will increasingly champion the needs of patients alongside evidence generation requirements.  This has a significant upside for both the RWD asset industry and patients, resulting in increased longitudinality, access to consent for linkage, enhanced diversity, and equity/benefit to patients and their care providers.

There are many strategies to engage patients in the healthcare data landscape(and, by extension, R&D), but patient data ownership is the ultimate form of patient centricity and the likely destination of the process if regulators and patient advocates get their way.  The idea behind patient data ownership is not new, Figure 3 outlines the model proposed by the now defunct LunaDNA, who fell victim to being ahead of its time.  However, there are many emerging and ongoing efforts to embrace the patient-centric models that have showcased the power patients wield when it comes to unlocking their data for use in their own care and/or reuse in generating insights and evidence (Ciitizen, Milu Health, PicnicHealth, and xCures).  I expect this trend will only continue, eventually yielding a registrational trial using an external control of high quality, regulatory-grade RWD.

While AI offers transformative potential in healthcare, foundational data challenges remain. To realize the full promise of AI, we must invest in real multimodal data assets. The innovators who are able to best scale and integrate emerging data types as they become relevant in the evidence generation space will enjoy a strong position in the market moving forward, while patient centricity is a distinct advantage in that approach.

___________________________________

Mike Furgason – enlightenbio guest blogger

Mike Furgason, PhD is a Precision Healthcare Technologist and Portfolio Manager with over a decade of experience in medical affairs, precision genomics, and product strategy across industry and academia.  In 2024, he founded DataThreads Strategy Partners, an independent consulting group focussed on healthcare data asset management, partnership, and M&A as well as GTM and commercial strategy in the evidence generation space.  For more information feel free to reach out to mike@data-threads.com or connect via LinkedIn

References

Acosta, J. N., Falcone, G. J., Rajpurkar, P., & Topol, E. J. (2022). Multimodal biomedical AI. Nature Medicine, 28(9), 1773–1784.

Bittla, P., Kaur, S., Sojitra, V., Zahra, A., Hutchinson, J., Folawemi, O., & Khan, S. (2023). Exploring Circulating Tumor DNA (CtDNA) and Its Role in Early Detection of Cancer: A Systematic Review. Cureus, 15(9), e45784.

Giuffrè, M., & Shung, D. L. (2023). Harnessing the power of synthetic data in healthcare: innovation, application, and privacy. NPJ Digital Medicine, 6(1), 186.

Gonzales, A., Guruswamy, G., & Smith, S. R. (2023). Synthetic data in health care: A narrative review. PLOS Digital Health, 2(1), e0000082.

Khachatryan, A., Read, S. H., & Madison, T. (2023). External control arms for rare diseases: building a body of supporting evidence. Journal of Pharmacokinetics and Pharmacodynamics, 50(6), 501–506.

Kotani, D., Oki, E., Nakamura, Y., Yukami, H., Mishima, S., Bando, H., Shirasu, H., Yamazaki, K., Watanabe, J., Kotaka, M., Hirata, K., Akazawa, N., Kataoka, K., Sharma, S., Aushev, V. N., Aleshin, A., Misumi, T., Taniguchi, H., Takemasa, I., … Yoshino, T. (2023). Molecular residual disease and efficacy of adjuvant chemotherapy in patients with colorectal cancer. Nature Medicine, 29(1), 127–134.

Mohamed, L., Manjrekar, S., Ng, D. P., Walsh, A., Lopes, G., & Parker, J. L. (2022). The Effect of Biomarker Use on the Speed and Duration of Clinical Trials for Cancer Drugs. The Oncologist, 27(10), 849–856.

Oprisanu, B., Ganev, G., & De Cristofaro, E. (2021). On Utility and Privacy in Synthetic Genomic Data. In arXiv [q-bio.GN]. arXiv. http://arxiv.org/abs/2102.03314

Yap, T. A., Jacobs, I., Baumfeld Andre, E., Lee, L. J., Beaupre, D., & Azoulay, L. (2021). Application of Real-World Data to External Control Groups in Oncology Clinical Trial Drug Development. Frontiers in Oncology, 11, 695936.Yin, X., Davi, R., Lamont, E. B., Thaker, P. H., Bradley, W. H., Leath, C. A., 3rd, Moore, K. M., Anwer, K., Musso, L., & Borys, N. (2023). Historic Clinical Trial External Control Arm Provides Actionable GEN-1 Efficacy Estimate Before a Randomized Trial. JCO Clinical Cancer Informatics, 7, e2200103.

Yin, X., Davi, R., Lamont, E. B., Thaker, P. H., Bradley, W. H., Leath, C. A., 3rd, Moore, K. M., Anwer, K., Musso, L., & Borys, N. (2023). Historic Clinical Trial External Control Arm Provides Actionable GEN-1 Efficacy Estimate Before a Randomized Trial. JCO Clinical Cancer Informatics, 7, e2200103.

Mike Furgason - enlightenbio Guest Blogger

ADVERTISEMENT

Discover more from enlightenbio Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading