enlightenbio Customer Spotlight: Cypher Genomics Clinical Content Curation

Cypher Genomics Clinical Content Curation

enlightenbio consists of a unique team of life scientists with extensive industry experience in content acquisition and software tool development for the management, analysis, and knowledge extraction of omics’ data in research, target selection and validation, drug discovery, and the clinical setting. We partner with customer companies to help them build better solutions that are embraced by their customers and the scientific community.

We are excited to introduce our new “Customer Spotlight” blog series where we will explore some of the projects we do for our customers. We will profile some of the customer companies with whom we are lucky to work with, and provide high-level summaries of project specifics to give you a better understanding of who we work with and the type of projects we can do for you.

In this first Customer Spotlight post, we are focusing on Cypher Genomics, a leading genome informatics company offering a highly accurate, rapid and robust interpretation software solution for users of human genome sequencing data. Their proprietary, automated genomic interpretation platform allows clinical laboratories to apply Cypher Genomics’ market leading sensitivity and specificity profiles to developing molecular tests for diagnostic and prognostic use. Pharmaceutical companies can capitalize in their quest to discover biomarkers from whole genome sequence data extracted from sample sizes typical of early stage drug development studies. Through Mantis™, the genome interpretation software as a service offering, and Coral™, a biomarker discovery service, Cypher Genomics improves patient health care by facilitating improved diagnostic accuracy and earlier interventions, optimizing therapeutic approaches, reducing adverse drug reactions, and ultimately contributing to cost reductions. Cypher Genomics is located in San Diego, California. Notable current customers and/or development partners include Illumina, Sequenom, and Scripps Science and Translational Institute: (1) Cypher Genomics and Illumina Enter agreement to facilitate genomic biomarker discovery; (2) Cypher Genomics and Sequenom announce development agreement; and (3) Scripps Health and Complete Genomics Announce Collaboration Aimed at Unlocking Genetic Secrets of Healthy Aging.

As of spring 2015, enlightenbio has been working closely with Cypher Genomics for almost a year, managing multiple different content projects that include, but are not limited to:

Creating a customized Disease Ontology Framework
- Multi-level disease ontology contains references and links to publicly available ontologies and disease classification systems such as OMIM and MeSH.
- Continuous expansion of this ontology as-needed with every additional content acquisition project.
Refining and extending a comprehensive Variant-to-Disease Association Database
- Database that links inheritable genetic variants with disease phenotypes, supported by literature and database references.
- Types of content acquired include specifying whether a particular genetic variant is linked to a particular disease in an autosomal dominant or recessive relationship, or has simply been observed in patients with that particular disease.
- Curation of the effects of specific mutations on a protein’s molecular function, e.g. designating mutations as gain or loss of function, affecting key signaling pathways, or altering drug sensitivity.

All of these projects start with the creation of a custom optimized protocol and content acquisition strategy that efficiently meets Cypher Genomics’ content needs. Our content acquisition team consults with Cypher Genomics scientists to identify and further define the critical content needs and appropriate scope for each project, balancing time and cost concerns with high quality deliverables. Once the protocol is established enlightenbio contracts with multiple teams to provide a flexible, scalable, and on-demand curation service. enlightenbio has a dedicated U.S., Indian, and Canadian-based curation and quality control teams that ensure all deliverables are highly accurate and complete.

Following is a description of some different content and ontology services enlightenbio is currently offering and can manage for you. Please contact us anytime to discuss your specific content curation needs.

enlightenbio’s Content & Ontology Development Services

Our goal is to provide our customers with customized content acquisition and management solutions that optimizes their analysis tools and services. Content solutions may include integrating and curating additional content, management of ongoing curation processes, and development of customized content databases. For our unique experience and background see Who we are. We provide the following content services:

Content curation protocol development
Ontology and vocabulary development
Database development
Biological literature curation process management
Content extraction projects
3rd party content licensing
Providing and managing knowledge extraction services for drug development, disease understanding, ontology and synonym library generation

Ontology Development

We offer expertise in design and development of biomedical ontologies according to taxonomic principles and semantic guidelines. We have familiarity with common web ontology languages and semantic formats such as OWL and RDF. We can assist with the development of customized ontologies on a variety of biomedical topics according to the client’s requirements, and have experience with a number of third party ontology sources.

We have years of hands-on experience with developing proprietary biomedical ontology from the ground up. The complexity of the Disease Ontology (DO) is shown below in the snapshot by Du et al., 2009. Below are some specific examples of ontology-related projects that we have successfully conducted:

Development of a macromolecule ontology based on information from EntrezGene
Development of a chemical ontology based on information from PubChem and ChemID
Linking of anatomical entities via ‘physical part of’ relationships
Alignment of commercially available biological process ontologies with the open source Gene Ontology (GO)
Alignment of commercially available disease ontologies with publically available SNOMED, MeSH and ICD

A snapshot of the DO graph, highlighting the complexity of the DO – courtesy Du P. et al. Bioinformatics 2009; 25:i63-i68.

Controlled Vocabulary Development

enlightenbio can assist with the development of controlled vocabularies and synonym libraries in a number of biomedical areas. A robust controlled vocabulary ensures semantic consistency and object identity for curation and mapping projects, and reduces the occurrence of mis-mappings due to shared synonyms. It also improves search results and unifies content related to a biological concept. Since the same concept is often referred to by multiple distinct terms, a comprehensive synonym library allows for information from multiple sources to map to the same concept and enables detection and removal of duplicate concepts.

enlightenbio provides years of experience in developing comprehensive synonym libraries, which includes synonyms and identifiers for genes, chemicals, drugs, cells, anatomical parts, biological processes, diseases, tox phenotypes and experimental methods. Below are some examples of synonym-related projects we have been involved with:

Integration of gene synonyms and identifiers from EntrezGene
Integration of chemical/drug synonyms from PubChem, ChemID and FDA
Integration of disease synonyms and identifiers from OMIM, SNOMED and MeSH
Integration of biological process synonyms and identifiers from Gene Ontology (GO)
Integration of cell line synonyms and identifiers from ATCC
Addition of novel synonyms based on curation of published literature

Knowledge Modeling

enlightenbio can assist with design and development of knowledge models for storage of biomedical relationships from the published literature, third party sources, or a client’s proprietary content source. Knowledge models are typically developed in conjunction with an underlying ontology, but they can also be implemented in a stand-alone format without ontology mapping.

We have extensive experience in developing a proprietary Knowledge Representation System, and have designed knowledge models for a wide array of biological relationship types. Below are some examples of the types of findings for which we have developed knowledge models:

Molecular interactions (A binds B)
Functional interactions (A increases activation of B)
Expression/transcription/translation regulation events
- Transcription regulators (transactivation/transrepression)
- microRNA-based regulation of mRNA expression
- siRNA-based gene silencing
- Epigenetic events (acetylation/methylation)
Gene/molecule-to-disease associations
- Up/downregulation of genes observed in disease states
- Associations between mutations and hereditary disease states based on individual clinical research studies
- Genome-wide association studies relating SNPs to disease states
- Cytogenetic studies associating copy number variation of genes with disease states
Gene/molecule-to-process relationships (e.g. Gene A increases cell proliferation)
Disease and toxicology phenotype biomarkers (e.g. Gene A is a diagnostic biomarker for phenotype X)
Drug-related content
- Drug-to-disease indication relationships
- Drug-to-molecular target relationships
Pharmacogenomic relationships (i.e. effect of mutations on drug efficacy, response, metabolism or toxicity)
Cell/tissue localization/expression of genes/molecules
Subcellular compartmentalization of genes/molecules

Content Acquisition and Curation

enlightenbio offers expertise in content acquisition strategy development and biocuration management. There are a number of ways that we can help customers with their content acquisition needs, including expert curation of the published literature or other unstructured text sources, content source identification and bibliography development, and integration of structured content from third party content sources.

For expert curation, we can develop customized biocuration protocols based on the customer’s content requirements, and depending on project size/scope, we can also team up with the customer’s software team to develop web-based curation tools, processes, and workflows to ensure high-quality and cost-efficient curation. In cases where the required content is available in structured format from a third party database, the preferred strategy is to import such content from the original source. For such cases, we will help define the requirements for content import and work with the client to do the integration. If necessary, we can also assist with licensing of third party content.

We have extensive experience in managing content acquisition processes and workflows. Processes developed to streamline curation of high-quality biological relationships from the peer-reviewed literature and other unstructured content sources, and to allow for integration of structured content from public and commercially available databases.

Below are examples of structured content from third party database sources commonly requested to be integrated into proprietary customer databases:

Protein-protein interactions (BIOGRID, INTACT, BIND, DIP, MINT, MIPS, COGNIA dBs)
Gene-to-disease relationships (OMIM dB)
SNP-to-disease relationships (GWAS dB)
Gene-to-biological process relationships (GO dB)
MicroRNA-mRNA targeting content (TargetScan, MiRecords and MirBase dBs)
Endogenous chemical reaction content (HumanCyc)
Endogenous metabolite content (HMDB)
Tissue expression body atlas (GNF)
Cell line expression atlas (NCI)
Drug-to-disease indications (ClinicalTrials.gov)
Hazardous substance information (HSDB)
Chemical carcinogenicity information (CCRIS)

Pathway Curation

enlightenbio can assist clients with curation of signaling and metabolic pathways on a broad array of topics such as cancer biology, cell cycle regulation, apoptosis signaling, growth factor signaling, kinase cascade signaling, second messenger signaling, cytokine/chemokine signaling, immune cell signaling, disease-specific signaling, nuclear receptor signaling, cardiovascular and neurobiology-related signaling, and tox signaling. Our flexible business model allows us to tailor our work to the customer’s specific needs with regard to definition of the pathway topics of interest based on content requirements, development of customized pathway curation protocols, and authoring of pathways based on available literature.

enlightenbio offers extensive experience in pathway curation services, including developing and maintaining a library of signaling and metabolic pathways.