AGBT 2014 showcases sequence data analysis software solutions

This past week, 850 researcher were traveling to Marco Island – despite snow storms and airport shutdowns – to mingle with like-minded, to hear about the latest advancements in sequencing technologies and software tool developments that address the sequencing analysis bottleneck, or perhaps simply to party for four consecutive days.

Four days of scientific talks, with 78 presenters

The talks were as usual of high caliber and a good mix of different applications and science:

One of my favorite real-life application was presented by Joe deRisi (UCSF): an encephalitis use case demonstrating the value of sequencing in delivering results quickly with the potential of saving lives.
Funniest, yet still a great science talk by Thomas Gilbert: on monitoring biodiversity with its challenges and its potentials.
The much awaited talk was by Dave Jaffe: presenting the first real Oxford Nanopore data.
Valerie Schneider presented the new human genome reference assembly GRCh38 (hg20), which incorporates modeled centromere sequence and for which the RefSeq annotations are now released.

Two poster sessions with over100 scientific and commercial posters rounded out the scientific program in a lively environment and stimulated many scientific discussions.

Software everywhere

Newly added this year were the software demo sessions held at the Hilton with 28 commercial and academic software solutions represented. Clearly, the many software tools presented this time around highlight the fact that the analysis component is now a critical part of working with sequence data. The different tools demoed ranged from sample and data management, to sequence data analysis, and knowledge extraction for both the clinical and the research sector.

A few software highlights:

Agilent built their own software tool SureCall for the clinical researcher that studies gene panels for inherited diseases and cancer. SureCall analyzes, visualizes, and contextualizes sequence data using a single application.
BioTeam showcased their Galaxy Appliance.
CLC Bio (recently acquired by Qiagen) presented the newly developed CLC Cancer Research workbench.
Lab7 demoed Lab7 ESP, the first of its kind “sample, methods, data, and pipeline management tool”, an all in one enterprise solution.
Maverix Biomics (raised $6M in January of 2014) presented their cloud-based data management and analysis tool targeted at the non-computational scientist.
Omicia (raised $6.8M in January of 2014 to accelerate whole genome interpretation) demoed their newly developed diagnostic pipeline with a very nice user interface targeted at non-computational users and clinicians.
Station X presented its latest version of GenePool with a wealth of newly added functionality including free access to TCGA expression data.

For more information on all the different tools check out the Illumina blog.

Many commercial activities one could engage in

20 plus suites and lounges where companies held wine and whisky tastings, demoed their latest software, or had serious one-on-one meetings.
Five commercial workshops provided another avenue to hear from your favorite vendor on what they are up to.
12 sponsors, with Enzymatics as the surprising gold sponsor.
Four commercial workshops with eight bronze sponsor talks.

And of course the many parties that were taking place over the course of the four days which provided a great environment for networking and also for simply having a good time.

Highlights worth mentioning

PacBio Releases 54x Coverage Human Genome Data to accelerate the understanding of genome-wide variation at all genome size scales, and to improve assembly techniques. This should be of value to the bioinformatics and the scientific community that study various forms of structural variation across the human genome. To access the full data set, send an email to pbdata@pacificbiosciences.com.

David Jaffe’s (Broad Institute) talk created quite a bit of buzz (in discussions, on twitter, and on blog posts) as he presented “real” first Oxford Nanopore MinION data with their long Nanopore reads. The read length were on average 5kb plus, ranging up to 20kb, with longer reads supposedly being possible soon. The expectations of course were high as the community has been waiting for two years since Oxford Nanopore’s initial high profile announcement at AGBT in 2012. At the same time, the talk seemed overshadowed by the now very impressive PacBio high quality long read lengths.In conjunction with this talk Oxford Nanopore issued their early access program and started to send out invites to their MinION Access Program (MAP).

Qiagen announced the launch plans of several novel products designed to significantly reduce the challenges of the most critical bottlenecks in next-generation sequencing such as sample preparation and bioinformatics. These included the GeneRead custom panels and bioinformatics solutions for sequence data analysis such as the CLC Cancer Research Workbench and Ingenuity(R) Clinical, currently in early access. Supposedly, Ingenuity by now has 200,000 samples in Ingenuity Variant Analysis (IVA) for over 2,000 users. It will be interesting to see the adoption rate of the Clinical solution in the already crowded solutions space. However, it is expected that as the “content leader” Ingenuity will set itself apart from other solution providers.

And then there was the Illumina lounge. Always packed, showcasing a NextSeq and the impressive new library preparation instrument NeoPrep, a fully automated solution for 16 samples to be launched mid-year. BaseSpace got its spot as well, with lots of interest for the many different third party apps. Still there seems no change in sight in the near or far future that allows adding other than Illumina data – an understandable business decision that based on end-user thoughts and feedback should potentially be reconsidered.

GenomOncology came out of stealth mode and announced a collaboration with Roswell Park Cancer Institute (RPCI) on Informatics Solution for Next-Generation Sequencing. The newly developed informatics solution will enable the association of sequencing results with knowledge resources to define actionable mutations. GenomOncology and RPCI are working together to develop a software platform that integrates laboratory information management systems, electronic health records, information technology and bioinformatics and that provides a workflow enabling genomic analysts and pathologists to create actionable reports.

Station X launched GenePool Reference™, which includes free access to TCGA Gene Expression Data. By making the public TCGA (Cancer Genome Atlas – compendium of genomic and clinical data for cancer research) gene expression data set freely available to the research community via GenePool, Station X removes the burden of downloading and managing the data while integrating it directly into its analysis environment. In addition, the clinical metadata has been compiled and curated by Station X scientists, making the data even more readily useful.

Enzymatics launched Archer Targeted sequencing technology based on (1) Anchored Multiplex PCR (AMP™), a new method for scalable targeted library construction, and (2) Archer™ Analysis Pipeline software. The software analyzes sequence data and generates a simple, multi-tiered report of all mutations in the target sequences, targeted at non-computational scientists. The software is part of the package to deliver data to their customer and not available as a stand-alone offering.

Image credit: Erlich lab