ONTOFORCE Wants to Increase Data Literacy and Empower Citizen Data Scientists

This month’s “Company Spotlight” provides a closer look at ONTOFORCE, the creator of a semantic search technology platform for insight generation. I spoke with Valerie Morel, CEO of ONTOFORCE, about ONTOFORCE’s platform DISQOVER which intelligently links heterogeneous data sources using the concept of a knowledge graph to integrate, link, and structure data. The integrated semantic search capability allows to turn siloed data sources into a single, uniform, and high-quality knowledgebase. DISQOVER seamlessly integrates public data with internal data. It can also connect with other software used within a company to further open, analyze, and visualize the data. To-date, ONTOFORCE, a 50 people Belgium company, has had two seed rounds in 2013 and 2016 and a Series A round of €7.5M in 2018. ONTOFORCE just completed another funding round from VCs that are current shareholders – details are undisclosed. The funding raised recently is aimed at realizing commercial growth and also at growing the sales and marketing organization, not only in Europe, but also in the US.

While there are several other players within the sector, including Ontotext, Biomax Informatics, and Innoplexus who are also using knowledge graphs and connect siloed data, there are also companies that are using totally different AI-driven technologies to achieve the same result. Within this group is Causaly (focuses on well-defined use cases earlier in R&D), but also Cambridge Semantics, or Amazon Neptune. The main difference between these various players and ONTOFORCE in the semantic search and information exploration sectors is that ONTOFORCE provides a user interface that addresses the needs of the various users including the bench research scientist which is often forgotten when it comes to large data, information management, and data mining systems. ONTOFORCE is more about discovery and providing data insights to the end users with the focus being scientist.

Following is my interview with ONTOFORCE’s CEO Valerie Morel.

Enlightenbio: How did ONTOFORCE get started? What were the initial goals and what is the main focus today? What business needs is the company addressing?

Valerie Morel: ONTOFORCE was founded in 2011 by Hans Constandt out of frustration of not being able to find the right information to help him understand the underlying causes of the disease of his sick child. He was working as a data scientist in an international pharmaceutical company trying to understand what was wrong. He quickly realized, that if it is so challenging for him, a data scientist, to gather all the necessary information it must be even more difficult for other parents and patients. He had to tap into many different data sources, a rather inefficient process. As a result, Hans started ONTOFORCE with the patient in mind. The initially built platform was intended as a business-to-consumer (B2C) offering, focusing on the patient. Soon thereafter, the company pivoted to a business-to-business (B2B) offering targeting the pharmaceutical sector. B2C offerings in the information sector have their challenges, this was especially true more than a decade ago. While the company has since pivoted, the original idea still stands:

“Our goal is to help the patient via a tool that supports pharmaceutical companies bringing their drugs to market faster and more efficiently resulting in improved patient outcome.”

Myself, I joined ONTOFORCE in early 2021 taking over from the founder as the CEO, with the goal to grow and scale the company and to put it on the map, also from a commercial perspective. We have currently a staff of about 50 people, of which 15 started since I joined. There are a few more open positions to fill this year, mainly in sales and marketing, customer support, and customer success sectors. While we are focusing on growing these various internal organizations, we are also continuing to invest into our product DISQOVER, as it is important to always stay at the forefront of technology. This is a product that obviously requires continuous investment.

We have a core team in Ghent, Belgium, and we have a sales and solution delivery organization in Boston. The solution delivery part of the organization is meant to work hand in hand with our customers to go the extra mile, and not only implement the DISQOVER platform, but also to help them to drive the value on their use cases, to help with change management, or to onboard their users. So typically, the solution delivery team works very closely with the customers. Lastly, we are also re-establishing the US sales and post-sales teams.

EB: What excites you about ONTOFORCE, why did you join the company, and what keeps you going every day?

VM: I was already well aware of ONTOFORCE and their product prior to joining them. This was in my previous role as Chief Revenue Officer at Bluebee (acquired by Illumina in June 2020) when we invested in the partnership between the two companies at that time. While the partnership didn’t happen, I was really impressed with the product DISQOVER ONTOFORCE had developed. I believe as a product and as a technology, it is absolutely superior. Furthermore, the company has a solid reputation. When I was offered the CEO role, I knew it was the right decision to accept the offer. Besides, I really like the team and the attitude of the people at ONTOFORCE. The company culture is very customer-centric and it is thus a perfect basis to be our customers’ trusted partner. Everybody goes above and beyond for the customer. The team at ONTOFORCE is next level, and it shows as we hear it again and again – customers are complimenting us how much they appreciate all of our efforts and support we provide. This is absolutely key and resonates well with my personal values. I believe, we have reached a state, where more and more people want a company culture that fits their personal values, which is the case at ONTOFORCE – with the product, with the team, and with the solid customer base that we developed. Besides, it’s really fun and exciting to work here. And obviously there is also a higher purpose in working in the life sciences sector, as ultimately, we are contributing to improving patient outcome, which keeps me going.

“…more and more people want a company culture that fits their personal values, which is the case at ONTOFORCE …”

EB: If I understand this right, Hans Constandt tried to provide information in a concise and reusable way, so nobody has to read through the mountain of unorganized information in various locations, correct?

VM: In essence, yes. To achieve this, we have built the knowledge platform DISQOVER. A tool that allows scientists to find links faster and more efficiently between disparate data that are spread across data silos/databases. From a technology perspective, we use the concept of a knowledge graph to integrate, to link, and to structure the data, and then make it accessible via semantic search. It allows people to turn siloed data into a single uniform and high quality knowledgebase. A really important aspect is that this data is not only important for data scientists, but also for bench scientists that actively search for information themselves.

In essence, there are hundreds of health research databases, and for a new study, be it for example, a new clinical trial, scientists need to go into all the different databases to understand what research already has been done. And that includes internally developed data which is often not harmonized and structured. All these hundreds of different data sources reside within a database, with their own user interface, and with their own logic. As a consequence, finding information can be an extremely time consuming process. Even worse, researchers often don’t know what piece of information they are missing that might make all the difference for their research. This is where the magic of DISQOVER happens – we are combining all these internal, public and third party data source, which are pre-ingested into the ONTOFORCE system, the DISQOVER platform.

In total, we manage 140 public data sources for our customers. In addition, customers can ingest their own internal, proprietary data and take advantage of the combined sources within DISQOVER. Their private, internal data can stay within their environment. Connecting internal and external, public data sources in a harmonized way, is where researchers often get the most valuable insights from. It’s of value not only to ask questions, but also to explore the data and testing hypotheses. Most importantly, we cover use cases from early research, to drug discovery, to clinical, and even regulatory marketing. Figure 1 lists the various use cases DISQOVER supports today which also includes some transversal use cases such as competitive intelligence, which is of course very relevant across the drug development cycle.

“Connecting internal and external, public data sources in a harmonized way, is where researchers get the most valuable insights from.”

Figure 1: The various use cases ONTOFORCE supports with its DISCQOVER platform (from drug discovery, to clinical studies, registration, competitive analysis and commercialization) – source ONTOFORCE.

To sum it up, we focus exclusively on life sciences with our main customers being in pharmaceutical and biopharmaceutical companies. While the big focus is on pharma, we do have a number of large hospitals and universities among our customers as well.

EB: When you work with new organizations, do you add new content to your database? In other words, do you enrich your content to support your new customer?

VM: The public data is already pre-ingested into the system. With that we have gene content, biomarker and disease content, clinical trials information, and other different data types. In addition, we will always work with the customer to support their varied use cases to address their scientific questions, and yes, this sometimes does require us to possibly identify and add new content to our system. To better understand their needs, we try to understand what private data they intend to ingest into DISQOVER. With all this information in hand, we start adding the required content and connect the dots between the public data and private data that is relevant to support their use cases. As an example, we recently started to work with the Princess Maxima Pediatric Oncology Center in The Netherlands, that is using the combination of research and real-world evidence type data for their clinical research.

In many occasions, there will also be the addition of licensed data – if the customer has a license to a commercial data source – which we then typically ingest for those customers. We have already plugins to all those providers because the whole idea is to bring all the information together in one place.

EB: Who are your specific end users and what is the biggest value you provide to them?

VM: I believe this is where ONTOFORCE is unique. There are obviously many players on the market that provide semantic search, knowledge graphs, or knowledge tools, but what really differentiates us is our focus on the end user. Many solutions on the market focus on the data scientists or they focus on the computational teams. Our unique focus though, is the bench scientists that need to perform their own searches, test their own hypotheses, and explore the data by navigating the knowledge graph in an intuitive way. At the same time, we make sure our platform is be open and accessible via APIs to extract data and to work with the data directly.

While we do have a knowledge graph underneath, we do not advertise ourself as the knowledge graph company. At the end of the day, it’s the data that is important combined with a user interface that is intuitive for non-data scientists. We call it: “empowering citizen data science”. We want to increase data literacy, and as such support our customers in becoming data-driven organizations. That’s really the space that we’re playing in.

Some of our customers have enterprise knowledge graphs, but are still using DISQOVER in conjunction with it to be able to reach out to the scientists and the end user community. What we try to help with is overcoming the data scientist bottleneck. For bench researchers the computational data scientists are their go-to people, but since they are super busy they often have to wait a long time to get their answers. This is not a good solution to live with. We really want to empower citizen data scientists, specifically for exploratory search where you are testing a single hypothesis, not hundreds of them, but something that bench researchers can quickly address themselves.

EB: That brings me right to the next question. What types of data do you integrate and how often do you update the data?

VM: The public data is continuously updated, but of course not every data source provides the same frequency of updates. For the most sources we would pull in new data information and process it on a weekly basis and make it available as such to our customers. There are also exceptions, like permits, where we upload new data daily. So it really depends how relevant the updates are. Then of course, there is also the private, customer data, which our customers have full control over.

Our public data is shared via our federated system. This is the system our users access to get to the data. We host that data and maintain it. We do perform data modelling which then can be used by our customers to query and explore the data. ONTOFORCE uses RDS (Remote Data Subscription) to publish data sets (see Figure 2) that contains information from the public domain. This is where the data is imported at the client side. And that is typically how they will integrate it with their own data. Within that system, the client is free to change the data model, to tweak it, and to determine how frequently and when they pull in new data. So these two systems really work together.

Figure 2: Schematic overview of Remote Data Subscription within the DISQOVER platform – source is ONTOFORCE.

EB: You mentioned that your customers can update their data whenever they need to, but let’s assume I am a bench researcher and I want to include a new data set. Can I do that? Or do I have to go through a process via contacting the data scientist to help me? How does that work?

VM: Adding data sources or changing the data model would be the role of what we call the expert users, which is typically somebody with a data scientist profile within the organization. While we could do it for them, we very much encourage our customers to do it themselves. We can quickly bootstrap our customers to create fast value from our data, but ingesting internal data into our system, or changing/adding data sources is typically done by a data wrangler within the organization. Though, we do have customers that heavily rely on our solution delivery team to help them with data ingestion. But typically, a bench scientist would not add data himself/herself.

EB: What other numbers can you share with me that describes the vastness of your database/data model?

VM: The entire knowledge graph contains about 250 million instances, which we call nodes and over 10 billion triples (i.e., describes properties and the relationship between nodes). This is huge, and adding to that the private data of a customer, it can become even bigger. Having said this, this is not a big data platform, compared to, for example, a genomics platform, that incorporates individual genomic variations to look for relationships and insights. We don’t do that; we focus on the metadata.

EB: What are typical questions you can ask within the DISQOVER platform?

VM: The questions are really quite detailed, scientific questions. For example:

Which clinical studies have CT lung data and EGFR expression data?
Which are the clinical trials that have been tested for disease hepatitis B in phase II?
What oncology trials reported cardiovascular adverse events and is there sequencing data available with those trials? The reason being: I may be interested in getting the sequencing data for all patients that were part of the trial.

So it’s really very much about answering those specific, scientific questions that require a lot of connections. There is a lot of information behind those questions, e.g., clinical trials data, gene data, adverse events data, and much more. So it’s really combining all that data in the backend so that a very natural question can be asked in the front end.

One thing to note is that one cannot ask a question like you ask a question in Google. That’s not how it works, but it is overall a very intuitive interface to ask questions and drill down. So you can start with searching for a disease or a clinical trial and then start drilling down like into phase II, or oncology trials, or adverse events, etc. So you actually narrow down your data in a very intuitive way, but we are not there yet that you can just throw a question at the system.

EB: Is ONTOFORCE, at the moment, using an NLP approach to enrich their data?

VM: The integration of unstructured content is a growing market to make knowledge more accessible. For us, this is absolutely something that we are focusing on from a future product development perspective. Currently, we use third-party tools for our public data offering and we’re typically partnering with the customer if they already have a tool or work together with a partner to start not only bringing in structured data, but also extracting unstructured data and then extract causal relationships between the data points. We have several off the shelve plug-ins in place, e.g., for Amazon Medical Comprehend, Termite, or Averbis.

Our customers are currently bringing in unstructured data, but they’re first processing it using another tool and then ingesting it into DISQOVER, through our plugins. So yes, this is something that is on our radar.

EB: How do you see the field of heterogeneous data sources and data silos moving forward? There are obviously data challenges and artificial intelligence (AI) is a component to address this. Any thoughts?

VM: Yes, we see organizations turning more and more data-centric. That’s something that will not stop, and we see this trend especially in pharmaceutical companies that are currently observing a digital transition journey. Up until now, it was mostly the largest organizations that focused on the digitalization transition, but we can now also see mid-size and smaller companies transforming – there are biotech companies that are starting to take an approach that is 100% computational. This is a tendency that will not stop – we also saw this at Bio-IT in Boston earlier this year. On top of it, we hear a lot about FAIR data which is an excellent data framework. It’s now the time to apply those guidelines and to really put them into practical solutions. As our VP of Solutions, Bérénice Wulbrecht, says:

“Maintenant il faut le FAIR(e)!”

With “faire” meaning “doing” in French, which means it’s now the time to really incorporate those FAIR data principles into the daily operations for consistent user experience to facilitate adoption. This is important, because …

“…you’re not just making data FAIR for the data scientists or for the computational teams, but also for every end user and through this you empower citizen data science.”

It is extremely important to us to get the most value out of the digital transformation. When we talk to customers, the data scientists really fall in love with the product, but the budget comes from the business side. So, that’s really where the user adoption is super important.

EB: Is there anything you want to say about the long-term vision of ONTOFORCE? How does that affect what you will do in the near-term, if you can talk about it?

VM: Our mission is really making data actionable and helping our customers with their digital transformation, making data actionable [we call it for people and machines], providing access via APIs, and being open. This is super important to us.

Our system is incredibly open and fits – and it really needs to fit – into a bigger tech stack, especially for our larger customers. It’s not a little product that is used in isolation. The value really comes from all the effort to harmonize the data, to structure the data, to link the data, and make it actionable. But we are unique in that we are really continuing to put our development dollars into empowering citizen data science and providing an intuitive user experience to the varied users including business users.

“At the end of the day, I believe the value comes from the questions that the scientists are able to answer.”

So the long-term vision of the company is really to focus on our strengths and to continuously invest in the user experience and interoperability.