For this month’s “Company Spotlight” blog series we are reviewing Nebion, a Swiss bioinformatics company, in an interview with Philip Zimmermann, Co-founder and CEO of Nebion. Nebion is a 30-person company headquartered in Zurich, which has developed a data mining engine for high-throughput biological data such as microarray and RNA-seq expression data. The data content provided consists of manually curated public and proprietary studies.
The following summarizes questions and answers from the interview with Philip Zimmermann.
Enlightenbio: Tell us more about Nebion – Your business is focused on delivering structured curated data for discovery research.
- What specific need(s) or challenges are you aiming to address with Nebion?
- What products/services do you offer?
Philip Zimmermann: Researchers in the pharmaceutical and biotech industry increasingly complement the analysis of their internal data with selected publicly available omics data. However, finding the relevant studies, curating them and integrating them into your analysis workflow is a major challenge. We address the need of integrating the wealth of publicly available data with internal data for combined analysis, enhanced interpretation, target and biomarker discovery, animal or cell model finding, indication finding, drug safety assessment, and other use cases for transcriptomic data.
We offer GENEVESTIGATOR, a curated database of deeply curated public gene expression datasets (both, microarrays and RNA-seq data) combined with innovative visualizations for various use cases. Additionally, we offer scientific curation services for the processing, annotation, and integration of public and private data.
“Our slogan “learning from the transcriptome” refers to the idea that a very large, deeply curated and fully standardized database of expression data can precisely unravel what is happening in different tissues, diseases, treatments and genotypes at a given time point.”
EB: Tell us more about Nebion. What is the history of Nebion from when you started to today?
PZ: The development of the GENEVESTIGATOR platform started in 2004 in the laboratory of Professor Dr. Wilhelm Gruissem at ETH Zürich.
- It consisted of a few hundred microarray samples from internal and public experiments on Arabidopsis, stored in a MySQL database, and with browser-based visualization tools.
- The idea was to aggregate results from multiple experiments into individual plots that answer to concrete biological questions such as “In what tissues is gene X expressed?” or “How does gene X respond to various treatments?”
- In 2006, the database was extended to include mouse data, and we started making use of ontologies and controlled vocabularies to describe each experiment. At that stage, GENEVESTIGATOR was a purely academic tool freely available to everyone in academia, and a small fee was requested from industry users to help us fund the project.
- In 2008, the project was moved into an ETH Zürich spin-off with the objective to professionalize its development, expand the applications to various industries, and allow the continued development of this popular platform. It was at that point that we started to curate human data for biopharmaceutical, nutrition, and biotech industries.
- Since then, we expanded the platform to incorporate data from more than 15 organisms, moved from microarrays to RNA-seq data, and more recently also to single-cell RNA-seq studies.
EB: What customer segment are you targeting with Nebion? Who is benefitting from using your software solutions?
PZ: Scientists working on target discovery, biomarker discovery, drug safety and toxicology, functional genomics, and basically in all areas of industry where transcriptomic data is being used or where information about expression patterns of genes are of value.
EB: Are Nebion’s products and services geared towards the sophisticated computational scientist or bioinformatician or do you also provide a GUI (graphical user interface)-type solution in support of users that are not familiar with the command-line or advanced computing? What are the types of users and organizations for whom Nebion solutions have been built?
PZ: Our offerings target both command-line oriented users (computational scientists, bioinformaticians and statisticians) as well as users who work exclusively with our GUI. For this purpose, we developed a well-defined API as well as export tools, allowing extensive but precise export of relevant portions of data for subsequent analysis with a user’s preferred tools, and a user-friendly graphical interface with partly unique, partly standard tools for the exploration of very large amounts of data. Both communities are well represented among our clients.
EB: Can you tell us more about the datasets Nebion is curating? How are those data sets selected/prioritized?
PZ: We integrate three types of data:
- baseline data to increase our coverage of different tissues, cell types and conditions,
- large consortium data that many users wish to have, such as TCGA, GTEx, CCLE, etc., and
- disease or research area specific datasets that our existing customers wish to have enriched.
EB: You offer curated data for Biopharma and for Agrobiotech. Are you exclusively focusing on these two types of sectors? How about the basic research sector?
PZ: The priorities are defined by our customers and they are indeed primarily industrial customers from biopharma and agrobiotech. Though, the largest number of users comes from academia in the research sector, for whom we offer both the Professional edition at an “academic friendly” fee, or the Basic version which is entirely free for academic researchers.
EB: Can you provide some examples of pharma/healthcare end-users and organizations that are currently users of Nebion? How are they applying and integrating Nebion’s solutions in their regular day-to-day activities?
PZ: Roche has licensed our curated data and tools across multiple sites to strengthen their evidence-based target and biomarker discovery; the curated public data enables them to efficiently tap into the wealth of publicly available transcriptomic data. Novartis on the other hand, uses Nebion’s curated data to complement their internal knowledgebases of biological signatures, allowing them to connect internal results with public data for improved interpretation and prioritization. Galapagos, a clinical stage biopharmaceutical company, has licensed our Enterprise solution for the purpose of integrating high quality public transcriptomic data with internal data for their target and biomarker discovery pipelines. They combine our offering with solutions from other providers in the bioinformatics space to enable an efficient and cost-effective storage, processing, analysis and interpretation of omics data, combining our data with other data types which they have produced in-house.
Syngenta, in the agro-biotech sector, has been working with Nebion since several years to complement and harmonize their internal crop transcriptomic data with public data, facilitate cross-species analysis with novel ortholog mappings and tools, and make use of our tools to identify genes regulated in specific conditions or treatments.
In the academic sector, a good example of an established collaboration is with Professor Dr. Diana Mechtcheriakova from the Medical University of Vienna, Austria. Her team has been using Genevestigator extensively for ovarian cancer research, and more specifically, to develop novel methods for targeting and clinical decision-making strategies.
EB: Nebion offers two different types of software, called Genevestigator and Genevisible.
- Can you explain the purpose and applications for these types of software.
- What is required to run the software? Is it all web-based or does the software run locally?
- Where does the Genevestigator and Genevisible data reside? What if pharma wants to add their proprietary data into the mix, will you allow them to store that data behind their firewall or is all of the data somewhere else, like the cloud? Can Pharma users integrate their own data into Nebion software that is only visible to them, like a private instance?
- Can any type of content or data be integrated into Nebion or is it exclusively expression data?
PZ: Genevestigator is the full-fledged solution with all features (database, search engine, rich graphical interface, API, public and internal data integration, etc.), while Genevisible is a free, browser-based interface to the Genevestigator database that was designed for simple queries on desktop or mobile devices. Genevestigator is the commercial solution of choice for industry and academic labs performing a more professional and in-depth use of our curation and data integration. Genevisible is a quick look-up tool that was designed to introduce users to some of the types of meta-analysis queries typical of Genevestigator, such as finding the top tissues, cancers, cell lines or conditions relevant for a gene of interest. But it does not contain drill-down possibilities and advanced analysis functionalities such as the powerful biomarker search, clustering, co-expression or gene set enrichment capabilities of Genevestigator.
Genevestigator is an application installed on your computer. When running it connects to the server to fetch the data and search results. We provide installation packages for Windows, Mac and Linux which auto-update when there is a new version of Genevestigator.
For the Genevestigator Enterprise edition the server software can be installed in-house and completely sealed off the internet giving our customers full control over the access to the proprietary data that is added to the public data in Genevestigator.
EB: Besides active data curation, does Nebion also integrate content from 3rd party databases, such as OMIM or ClinVar, to enrich expression data with public knowledge?
PZ: We typically link out to third party databases of other data types (e.g. gene, protein or variant databases) rather than integrating their content into our own database. However, databases that we have included are Gene Ontology categories, Reactome pathways, and ENSEMBL mappings to various gene models so that users can better interpret their results within our platform, or refer them to their preferred gene models within Genevestigator.
EB: Do you support the integration of propriety expression and other data as a service for your customers?
- And if so, Can your users integrate their proprietary content/data themselves or is this only possible via a Nebion service?
- How long does it take to integrate a large amount of new data into Nebion, and how long does it take to extract findings from these newly added data?
PZ: Yes, customers can integrate their proprietary data. This is only possible in collaboration with us because we need to make sure all standard operating procedures (SOPs) for data curation are respected. To do so, we work with them to process and curate their data so that it is fully harmonized with public data for combined analyses. In fact, the processing and scientific curation of studies is a science of its own that requires training and experience. However, customers who work with us since several years have built streamlined pipelines to efficiently and cost-effectively integrate private data into our systems, and with our growing experience we can assist new customers in establishing these processes.
“…the processing and scientific curation of studies is a science of its own that requires training and experience.”
In terms of timelines, this depends on how one defines “large amounts of data”. Companies who intend to integrate their proprietary data typically start with a few dozen key studies. This can be done within 2-3 months. A more frequent use case is the continuous integration of new studies, which is typically done in batches of a handful of studies within a few weeks.
EB: Can one integrate various different data, including microarray data or expression data from various different platforms, such as Illumina or Oxford Nanopore? Are you normalizing the data to support this?
PZ: We have built quality control and data processing pipelines for more than 60 different microarray or RNA-seq platforms, including from Ilumina, Affymetrix, Agilent, and 10X Genomics. Most often, we follow the trends and needs of our customers and build the corresponding platforms. It is important to note that a substantial effort is needed not only to set up a new platform, but to maintain it over the years such as to map the measurements to the latest reference genomes/transcriptomes, to integrate current versions of the main gene models (e.g. ENSEMBL, Entrez, UniProt, HUGO, AGI, etc.), and maintain mappings between these models. For RNA-seq, such a switch requires a re-processing of the entire database in such a way that all studies are fully harmonized and based on the same genome version. This represents quite a unique capability and customer offering.
EB: On your website you state that end-users can apply machine learning (ML) through clean data. How much of that machine learning are you offering via your software solution, and how much is it is user-driven? Or in other words, do you offer machine learning data mining capabilities for your customers?
PZ: Our primary goal is to curate and assemble high quality data for various bioinformatic applications and algorithms, including ML. We apply machine learning internally for improving our curation, or for generating results to be used with our tools (e.g. with our ortholog prediction tool), and we have an ongoing internal research project where ML is used to learn from specific subsets of data to predict the state of individual samples. The users will at some point benefit from our ML implementations via tools that offer them better insights into their data, or from new types of applications previously not feasible on a single-study level. Regarding users applying ML themselves, our approach has been to make the data “analysis-ready” such that companies can use high quality, well described datasets for their internal projects, whether they apply ML or other methods. We are convinced that this quality and depth of details are key to successfully applying ML, allowing a better representation of co-variates and reducing noise. Our slogan “learning from the transcriptome” refers to the idea that a very large, deeply curated and fully standardized database of expression data can precisely unravel what is happening in different tissues, diseases, treatments and genotypes at a given time point.
EB: Are there any specific infrastructure requirements to run Genevestigator, especially when processing and integrating large quantities of data (“big data”) combined with complex machine learning algorithms?
PZ: In the standard version which accesses our servers, you can run Genevestigator on all usual laptop or desktop computers. The heavy lifting is done on our servers. For in-house installation a standard compute server needs to be provided.
EB: Who do you view as your current competition and why? What differentiates Nebion from these other players in the market?
PZ: While we have a lot of unique content, capabilities and tools, several indirect competitors exist who target the same customer segment.
- OmicSoft is probably the closest, offering a combination of databases, tools and pre-integrated public omics content. While OmicSoft grew from the bioinformatics infrastructure side and later added curated content to its offerings, Nebion came from the biocuration/data side and is more specialized with a deeper curation and a broader spectrum of species.
- Illumina’s BaseSpace Correlation Engine is also an integrator of public omics data but at a different level and for slightly different use cases.
- And of course, competition is often present in-house among large pharma and biotech companies through bioinformatics teams who have been assigned to integrate public data.
“As a Swiss company, we are very strict on achieving highest quality in our curation and software development.”
What differentiates Nebion is the quality of the data and meta-data and our approach to make the curated studies readily available to customers for analysis with their internal pipelines. As a Swiss company, we are very strict on achieving highest quality in our curation and software development. For example, we go through great lengths to verify and enrich the meta-data of every sample, involving domain experts to perform and peer-review the curation. We are also meticulous in our compendium-wide cross-study standardization, ensuring that studies curated in the past are re-processed when new reference genomes are used, and re-annotated according to the current vocabularies and standards.
EB: Nebion is a European company.
- Would you say the majority of customers are located in the UK, Europe, or the US?
- How do you approach a market such as the US market?
PZ: The majority of customers are in Europe and the UK, but we also have a substantial and growing customer base in the US. At the moment, sales and business development are operated by personnel employed in Zurich but who travel regularly to the US for conferences and customer visits. Our US customers are very well supported thanks to modern technologies of screen sharing and communication. Additionally, we have co-marketing agreements with other companies with a complementary offering, such as Qlucore or Genestack.
EB: What do you see as the biggest challenge(s) the industry is currently facing and how do these challenges affect what Nebion does? How will the industry – or Nebion – need to overcome the challenges?
PZ: For the pharmaceutical industry, patient-centric healthcare and patent expiration are two key components pressuring them to adopt novel approaches to drug discovery. The use of high-throughput molecular and clinical data is clearly becoming a mainstay of drug discovery. While machine learning and artificial intelligence methods are increasingly being used for target discovery or patient stratification, they will not perform well unless the quality and relevance of the underlying data is granted. We therefore strongly believe in the application of standardized data processing and expert curation to achieve highest quality, well described patient data, and Nebion has all the required technologies and expertise to achieve this, not only for transcriptomic data, but also for other data types.
“For the pharmaceutical industry, patient-centric healthcare and patent expiration are two key components pressuring them to adopt novel approaches to drug discovery.”
The agro-biotech industry has just gone through a phase of consolidation, with only a handful of companies controlling most of the world’s seed markets for the main crops like wheat, rice, maize or soybean. More targeted but integrative approaches will be needed. In this case as well, genomic and transcriptomic data are playing important roles in developing higher-yield crops in an environmentally friendlier way.
EB: Is there anything else you would like to share with the readership?
PZ: By operating with different industries, we aim to not only improve the quality of life of human beings, but also to keep our planet healthy and biodiverse for generations to come. To achieve this, structuring and efficiently mining the wealth of omics data being produced is critical to help researchers find answers and propose robust solutions.