enlightenbio  Blog

The Critical Link Between CROs and Sponsors: Making Data Work for You

Maximizing Focus, Minimizing Cost: The Strategic Logic of CRO Engagement

Outsourcing to Contract Research Organizations (CROs) has been a fundamental practice in therapeutic and device discovery and early development for life science companies since the late 20th century, and the business drivers for this model remain highly relevant today.    

Companies, often referred to as Sponsors, engage CROs due to several key business advantages:

  • Focus on Core Competencies: Outsourcing allows sponsoring pharmaceutical or biotech companies to focus on their core capabilities in discovering, developing, and launching differentiated products. Essential but non-core work, such as routine assays, can be outsourced, enabling companies to maintain a lean operational structure.   
  • Cost Efficiency: The cost of routine assays and testing is often lower through outsourcing compared to what a Sponsor could achieve in-house. A significant amount of routine laboratory work is outsourced to large CROs located offshore from developed countries, where labor and operational costs are lower.
  • Optimized Resource Utilization: For smaller companies with lower volumes of routine assays, outsourcing eliminates the need to establish and maintain qualified in-house capabilities that would not be utilized full time. 
  • Access to Specialized Expertise and Infrastructure: Outsourcing specialty assays that require significant investment in specific skills, advanced instrumentation, and specialized lab accreditation proves to be more cost effective than developing these capabilities internally. 

Effective external collaborations with CROs, academic, and other co-development partners present a variety of challenges to the Sponsors, ranging from administrative hurdles to laboratory workflow, intellectual property protection, project management, and data exchange. With this post, I would like to focus on the challenge of data exchange between the CRO and the Sponsor. It’s important to note that some of the solutions described can also address multiple of these categories of challenges.  

History and Strategic Role of CROs

Despite CROs being a longstanding cornerstone of our industry and the clear need for efficient data exchange, we often see gaps between the structure of data/results CROs provide and what Sponsors prefer to receive. These discrepancies are often due to data flow and business misalignments and are becoming increasingly significant as therapeutic discovery and development continues to evolve. 

New Modalities, New Demands

The urgency to improve this data landscape is heightened by the increasing complexity of new drug modalities and the increasing amount of data generated as a result of it. A broader spectrum of therapeutics and devices, from traditional small molecules to various nucleic acid, peptide, microbiome, and cell-based modalities is now developed. These complex modalities are not only challenging to represent but also more difficult to produce and deliver, necessitating greater collaboration across scientific disciplines. The complexity is further amplified by the intricate biology involved. All these factors point to a clear need for next generation discovery informatics solutions. 

Furthermore, the rapid advancements in computational capabilities, machine learning, and artificial intelligence are greatly enhancing our ability to develop new therapeutic modalities and to accelerate scientific discovery. To fully leverage these powerful tools, it’s essential that our data is “AI-ready”. The principle of ‘Garbage In, Garbage Out’ remains critically important. This means we need to ensure our datasets include both negative and positive results, along with robust test sets, comprehensive training data, and detailed metadata. With meticulously curated data, scientists can confidently feed the data into AI models, leading to more accurate and meaningful insights. 

Two CRO Subsegments Representing Different Types of Data Challenges 

The two subsegments are: 1) therapeutic discovery and preclinical development and 2) target identification through multi-omics.    

  1. Therapeutic discovery and preclinical development: This area spans chemical characterizations and various biological assays to test a therapeutic lead’s suitability for progression. The tests utilize numerous technologies and formats, including colorimetric and (immune)fluorescence, analytical chemistry such as LC-UV and LC-UV-MS, and limited gene expression and genomics. These services generate large data volumes due to the high number of samples analyzed, replicates, and the management of experimental conditions and metadata.  They remain a cornerstone of the therapeutic discovery industry.
  2. Target identification through multi-omics (Conesa and Beck, 2019): Multi-omics plays a crucial role in target identification and development. It encompasses a range of biochemical tests that measure genomic profiles, transcriptomes, proteomes, metabolomes, and many other molecular layers, including coding and non-coding, stable and non-stable RNA, transcription factors, and epigenomics. Multi-omics enables biologists to extract meaningful biological insights; an example is the understanding of which of the 4-6 million non-coding variants in a typical human genome are pathogenic. The added power of multi-omics over single ‘omics’ studies is evidenced by the proliferation of affordable multi-omics services companies. However, the complexity of managing and integrating such multidimensional datasets represents a challenge beyond that of the large data volumes, experimental conditions, and metadata in the drug discovery subsegment.

Scientist or chemists, by nature of their roles, are not data scientists  

The growing volume of experiments and, with that, experimental data presents significant challenges for data curation and utilization, regardless of whether the work is performed at CROs or Sponsors. As Ming Tommy Tang highlights in ‘A Bioinformatician’s Life’, the process involves much more than just running software or creating plots; borrowing from his message: It’s not just “run Seurat” or “make the volcano plot.” It encompasses sysadmin work, data wrangling, QC, validation, debugging, modeling, communication.   

Ideally,  scientists and chemists should be able to analyze and interpret their results to the fullest extent possible, given their complete understanding of the experimental context. At the minimum, all routine data tasks should be easily manageable for them. Not only does this approach enhance scientific effectiveness but it also allows data scientists and other IT professionals to focus on more complex, higher-level, and creative work.   

The CRO perspective 

From a CRO perspective, the most common pricing structure is per assay (e.g., enzyme assays, composition assays, cell sorting, mass spectrometric analyses, microarrays, gene expression, sequencing). CROs must cover the fixed costs of buildings and instrumentation, as well as variable costs such as labor and consumables, while keeping their customer fees as low as possible. As with many services businesses, margins are low and competition is fierce.  

Given these factors, the standard viable contract service product focuses on performing assays and delivering results in the most routine and cost-effective manner. To prevent too low profit margins, services such as data processing and analysis, reporting, and formatting for uploading to data portals and integration by Sponsors have traditionally not been included in the standard CRO model; customers typically pay an additional fee for these services.     

The Sponsor perspective 

On the other hand, scientists at the Sponsor organization have the desire for data to be ready for analysis, iteration, and interpretation as quickly and cost-effectively as possible. Currently, results are handed over to Sponsors in various formats, such as raw or partially processed data files, metadata files, and basic reports. These can include unstructured, multi-tab spreadsheets, .cvs, and/or .txt files, which are often created through manual copying, pasting, and other error prone manual functions.  

This approach often leads to inefficiencies for both parties. The back-and-forth communication with a CRO regarding errors and version control can be time-consuming. Additionally, when a single CRO performs multiple assays on a set of samples, the results might not be integrated, leaving this to the Sponsor. Furthermore, Sponsors working with multiple CROs for the same assays often receive differently formatted results, requiring significant reformatting efforts. These data exchange practices are inefficient for everyone involved, including the CROs themselves.    

This data challenge is being addressed in multiple ways as driven by business realities

One business reality is that larger Sponsors often have the resources and leverage to implement bespoke systems for data exchange with CROs, ranging from file-sharing platforms to dedicated Electronic Notebooks (ELN) installations. While this approach caters to their specific needs, it incurs significant costs for both the Sponsor and the CRO, and it unfortunately leaves smaller Sponsors at a disadvantage.

There is a growing urgency to address this gap in the data landscape, and I am encouraged to see increased attention on informatics solutions from Sponsors, CROs, and scientific software providers.

Therapeutic discovery and development

For therapeutic discovery and development, software solutions crucially focus on improving workflow efficiency. This includes features like efficient data entry and curation, robust project management tools (such as Kanban boards), and enhanced collaboration capabilities. A significant portion of the work, approximately 80%, is still dedicated to data preparation.

“Therefore, the focus needs to be on quality control and standardization of data formats, templates, and reports that CROs can easily upload to Sponsors, along with all the necessary metadata for seamless integration into existing data pipelines.”

Recent advancements in commercial software offer dedicated CRO modules. These types of newer offerings have emerged within the last two to three years and aim to streamline data interplay between Sponsors and their CRO vendors. These CRO modules are typically integrated into a larger platform adopted by the Sponsor. They come equipped with workflows and templates specifically designed for project-based collaborations, ensuring proper intellectual property (IP) protection between the Sponsor and individual CROs. The adoption of such a module can significantly enhance efficiency for both CRO and Sponsor.  Additionally, a key benefit for the Sponsor is the ability to manage all external collaborations through a single, unified system.

Notable examples of commercial software featuring these types of CRO modules include:

  • Revvity Signals Synergy: Built on the Signal ELN platform and specifically designed for managing the extensive assay data generated by CROs.
  • Chemaxon Design Hub: This platform focuses on the ideation and design phases of therapeutic discovery and manages the integration of assay data to support lead optimization. Its Synthesis/Collaborator module, intended for external partners and collaborators like CROs, is available via separate licensing.
  • Benchling Data Entry Assistant: This AI feature takes the process a step further, using large language models (LLMs) to assist with extracting information from lab documents and parsing it into the ELN and reports.  

While Sponsors usually dictate the terms of collaboration, there is a growing trend towards increased CRO agency. For instance, the CRO, XChem, has adopted Collaborative Drug Discovery’s CDD Vault for its own and its Customer-Sponsors’ use. Although CDD Vault does not offer a distinct, licensable (or subscribe-able) ‘CRO module’, it possesses crucial features that make it suitable for XChem’s needs. These include cloud hosting (eliminating heavy IT burdens), robustness, project management capabilities and, most importantly, user-friendliness that empowers scientists and chemists to perform routine data management and analysis without requiring IT support.  

Empowering Multi-Omics Workflows: From Data Generation to Interpretation

Given the informatics needed to curate, analyze, and interpret multi-omics data, there is a growing number of specialized multi-omics services companies. Some traditional CSOs are now partnering with these new genomics services providers, who often offer their own informatics tools. Examples include fios Genomics, which assists with bioinformatics, and Azenta Life Science’s GENEWIZ, which performs the testing services as well. 

A promising new category of ‘omics’ tools is emerging to address the data exchange gap. These tools are designed for adoption by both CROs and Sponsor scientists, with the goal of overall ease of use. They aim to assist scientists throughout the entire workflow, from finding and importing datasets, to curating, processing, analyzing, visualizing and interpreting data. These software solutions are typically vendor-hosted, which reduces implementation effort for users and their organizations. With appropriate permission structures, confidential project management is also ensured. These types of tools can empower CROs and provide smaller Sponsors with equal benefits. An example of such a new offering that empowers scientists and their organizations is the multi-omics (including Mass Spec) EuropaDX Data Exploration Solution.

To Conclude

In today’s data-driven R&D landscape, structured assay data delivery is no longer optional – it’s essential. As drug modalities become increasingly complex, so does the data they generate. Seamlessly integrating this data into Sponsor workflows, especially in real time, is crucial for accelerating insights and informed decision-making. While large pharmaceutical companies often have the leverage to demand that CROs deliver assay results in highly structured, directly ingestible formats, smaller Sponsors frequently face challenges. They often rely on in-house or off-the-shelf tools that may lack flexibility, scalability, or compatibility. Although some platforms and standards exist to bridge this gap, there is a clear need for more purpose-built, scalable solutions to prevent data integration from becoming a bottleneck to innovation.

“Ultimately, if data is paramount, then its efficient delivery is equally critical for its effective use.”

References

Conesa, A., Beck, S. Making multi-omics data accessible to researchers. Sci Data 6, 251 (2019). https://doi.org/10.1038/s41597-019-0258-4

Carol Preisig – enlightenbio Guest Blogger

Carol Preisig, Ph.D., MBA began her career at a CRO, where she developed a new service offering and later transitioned from the laboratory to commercial operations. Since then she has gained extensive experience on the scientific software vendor side. Her work includes selling scientific software to CROs, assisting her pharmaceutical customers in enabling their CROs, and contributing to the development, productization, and pricing of scientific software modules specifically designed for CROs in the therapeutic discovery and multi-omics subsegments.

Carol Preisig - enlightenbio Guest Blogger

ADVERTISEMENT

Discover more from enlightenbio Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading