The goal of clinical genomics is to provide a diagnosis for difficult-to-solve inherited and somatic disorders to enhance a personalized therapeutic approach. Unfortunately, despite costly and extensive testing only about 50% of the people with rare genetic disorders receive a correct molecular diagnosis. Similarly, only a small percentage of cancer patients have their malignancies treated based on specific molecular alterations. The reasons why classical cytogenetics, FISH, microarray, targeted panel, whole exome, and even generic whole genome testing fail to provide a genetic diagnosis are manifold:
- The biggest limitation for cytogenetics is its resolution which is around 30-40 megabases and is variable depending on the chromosomal position. The resolution is even poorer for malignant samples that are often also resistant to culturing efforts.
- Microarrays allow the detection of copy number changes and structural variants (SV) in the ten kilobase range, but do not provide the precise genomic location and orientation of duplicated segments. They also fail to detect copy number neutral structural variants such as inversions and translocations.
- The downside of targeted panel sequencing tests is that they often miss pathogenic variants due to inappropriate test panel selection. Syndrome-causing genes might be missing from the panel design or the panel(s) available may simply not be up-to-date. The list of new disease-causing genes grows daily, while introducing a new gene into a capture panel requires redesign and revalidation. Similarly, targeted sequencing panels often lack mitochondrial genome assessment even though mitochondrial mutations can cause symptomology for which the test was ordered. Furthermore, technical limitation results in low reliability of copy number and structural variants calling.
- Targeted testing using whole exome capture is a better option, but it misses pathogenic variants that are outside exons. Targeted capture often results in poor coverage at the intron-exon junctions with the result that variants present near these junctions are not called. Copy number variation (CNV) and SV calling is difficult, especially for small exons or partial exon involvement. This means that intronic CNVs partially overlapping with an exon will be missed. Furthermore, copy number neutral structural variants such as inversions and translocations are also not detected with exome sequencing. Lastly, exomes also often lack mitochondrial genome assessment and they do not work for pathogenic repeat expansion detection.
- When it comes to whole genome sequencing, most commercial genomes do not analyze or report structural variants in the fifty to ten thousand base pair range. The reason for this is that short-read sequence data analyzed by the Manta software, which relies on detection of chimeric reads, creates thousands of false positive structural variant calls. This is largely due to the mismapping of 150 bp long paired-end reads onto an imperfect GRCh38 reference. Very few labs spend the effort to sort through thousands of calls and identify the true positives. For the same reason copy number neutral structural variants, like reciprocal translocations of entire chromosome arms might go unnoticed. Genome data allows detection of repeat expansions, such as the one causing Fragile X syndrome, but the sizing is not precise. Short-read genome sequencing also cannot characterize large repeat containing contractions, such as those causing Facio-Scapulo-Humeral Dysplasia (FSHD).
- In addition to the difficulties of detecting various variant types, limited understanding of the effects of the variants detected is a major cause of missing a diagnosis. This is due in part to insufficient structural understanding of the impact of the variant detected on the protein and the fact that over half of protein-coding genes still do not have disease associations. Variants in structural RNA coding genes and in intronic and intergenic regions are still harder to interpret.
“Missing copy number variants and structural variants due to the inherent problems with short-read sequencing and available genomic references is a major cause of missing a diagnosis.”
A new paradigm of testing is needed to increase the sensitivity, specificity, and speed of genetic diagnosis
The current testing paradigm consists of multiple blood draws and sample submissions to perform individual tests with inherent shortcomings in succession over months and sometimes years. Discouraged by multiple negative or misleading reports, the patient is set on a journey that never seems to end. A new paradigm is proposed that consists of simultaneous performance of whole genome sequencing and optical genome mapping to detect all types of genomic variation and performance of transcriptome analysis to establish the functional significance of the findings. In this paradigm Bionano Optical Genome Mapping (OGM) would replace karyotyping, microarray analysis, repeat expansion sizing, and FSHD testing while whole genome sequencing (WGS) would replace microarray, targeted panel, exome, mitochondrial genome sequencing, repeat expansion testing, mitochondrial depletion testing, and uniparental disomy testing. This approach, using technologies with complementary strength, can achieve increased sensitivity and specificity, as well as, speedier and more accurate genetic diagnoses. While WGS can provide single nucleotide resolution of the breakpoints associated with structural variants, OGM is better for detection and visualization of structural variants. While genome sequencing is better for detection of expansion of short repeats, optical genome mapping is better at sizing large “short repeats” and counting long repeats. The effects of the DNA variants identified can be observed and evaluated by interrogating RNA sequence data. Variants in regulatory regions such as enhancers and promoters will cause observable RNA expression level changes, while variants in coding regions and at the intron exon junctions can be assessed for their role by examining the exon composition of the resulting processed RNA. Indeed, RNA sequencing is an essential confirmatory tool in the discovery of new genetic disorders. The same approach can also be applied for carrier testing for couples who have difficulty conceiving or carrying through a pregnancy or somatic testing in which case comparison on a paired tumor normal sample set is performed.
“Combined simultaneous performance of OGM and WGS and establishing the functional significance of variants identified using transcriptome analysis can significantly increase the speed and accuracy of clinical diagnoses both for constitutional and somatic cases.”
The technological changes that allow for this new paradigm
Novel technology: Optical genome mapping (OGM)
OGM is a technology developed by Bionano Genomics to reveal elusive copy number neutral structural variations. OGM begins with the isolation of ultra-high molecular weight (200Kb-300Kb) DNA from blood, bone marrow aspirates, cultured cells (including chorionic villi and amniocytes), tissue, or tumor biopsies. A single enzymatic reaction places fluorescent labels all throughout the genome at a palindromic 6 bp sequence motif. This results in an average label density of one per 1000 base pairs over the genome depending on the occurrence of the recognition sequence in any given genomic region. Hundreds of millions of individual labeled DNA molecules are linearized in nanochannel arrays on the chip and scanned by a fluorescent microscope in the Saphyr® OGM System instrument. Depending on the time allotted for scanning, the whole genome can be scanned 100-1400 times in this manner. The resulting de novo consensus genome map is then compared with a reference genome map calculated based on the occurrence of the 6bp motif in the reference genome. Alternatively, two de novo assembled genomes can be compared to each other. Altered distances between two labels in the proband and the reference genome can indicate deletions, insertions, or repeat expansions while altered orientation of the barcode patterns can indicate inversions. Translocations are visualized as regions from different chromosomes mapping to a single genomic region of the newly assembled genome.
Reduced cost of sequencing: New Whole genome sequencing (WGS) technology
High throughput sequencers came a long way in the last 20 years. The MGI DNBSEQ-T7 sequencer supports whole human genome analysis for the cost of an exome test a few years back, and yet provides much higher quality data that can be reanalyzed at any time as understanding of genetic disorders (i.e., variant-to-gene-to-diseases information) grows.
Improved sorting and annotation of variants: Artificial intelligence assisted WGS data analysis tools
The secondary analysis pipelines also evolved significantly over time including their speed, accuracy, and scope of analysis. The Illumina Dragen Bio-IT platform provides accurate whole genome alignment and variant calling in one to two hours depending on the processing power assigned within the cloud computing environment. This platform does more than simply providing variant calling, it reports structural variants identified by Manta relying on chimeric reads, copy number variants based on coverage depth assessment, and repeat expansions using Expansion Hunter. In addition, it also reports out HLA genotypes, pharmacogenomics star alleles along with an extensive array of analytical performance metrics.
Tertiary analysis consisting of annotation and prioritization of the variants detected is performed by artificial intelligence (AI) assisted software Genoox. This software is used as a consultant, but a decision about the causative role of any variant is made by the medical director reviewing the case. Genoox is especially helpful because it supports the co-analysis of OGM and WGS data while annotating the different types of variants.
Accessibility to results at a reduced cost: A graded reporting cost architecture from the collected genomic dataset
“Stepwise analysis and reporting of genomic data, such as basic exome, expanded exome, and whole genome analysis can provide financial relief for the patients.”
The first step is the creation of a “basic exome” report. Even though it is limited to the analysis of the coding regions plus minus 100 base pairs of the genome, since it is based on genomic sequence data, it is superior to a capture-based exome report. It includes all known pathogenic variants listed in ClinVar irrespective of where they are found in the genome and provides single exon resolution CNV calling and mitochondrial genome analysis. If this is negative, an “expanded exome” report is created that includes reporting on all clinically relevant repeat expansions. If the patient’s symptoms are still not explained, a whole genome report is created that includes single nucleotide variants, CNVs, and structural variants throughout the genome, as well as mitochondrial depletion assessment. The nice thing about this architecture is that the patient only needs to pay the difference between the prices of the exome, expanded exome, and whole genome reports and doesn’t have to submit a new sample. Since a whole genome data set already exists, there is no need to perform additional preparation and sequencing which speeds up report generation. Even though the combined OGM and WGS is recommended from the get-go, if financial considerations do not allow this, OGM can be performed after whole genome sequencing. The following section describes the clinical situations where optical genome mapping can really improve diagnostic yield. Follow-up transcriptome sequencing allows for functional assessment of the significance of the variants detected by genomic testing.
Clinical Situations Demonstrating the Value of Incorporating OGM Data Within a WGS Analysis Pipeline
The first example illustrates the usefulness of OGM to properly characterize complex chromosomal rearrangements.
The patient was a male child with a complex developmental phenotype that has not been previously described (publication in preparation). Whole genome sequencing showed an apparent 61 kilobase duplication over the distal p arm of chromosome 7, as well as a 22 kilobase deletion mapping over the distal q arm of chromosome X. OGM showed no tandem or inverted duplication on chromosome 7 which indicated that the extra copy of that region detected by WGS was reinserted somewhere else in the genome. Interestingly, OGM showed an insertion rather than a deletion on chromosome X seemingly contradicting the WGS data. Co-analysis of the OGM and WGS datasets revealed that the duplicated fragment from chromosome 7 got inserted into chromosome X and resulted in the deletion of the distal q arm band of chromosome X. Arriving to the precise structural rearrangement relied on the banding pattern of the chromosome 7 fragment appearing in the problematic region of the X chromosome and identifying the chimeric reads overlapping the breakpoints between chr. 7 and chr. X.
The second example demonstrates the usefulness of OGM in the diagnosis of Facioscapulohumeral muscular dystrophy (FSHD).
FSHD is a rare genetic disorder that is extremely hard to diagnose using traditional molecular methods such as pulse field electrophoresis and Southern blot. It requires counting the number of 2000 bp long D4Z4 repeats in genomic clusters that are present both on chromosome 4 and chromosome 10. Since only the repeat contraction (<11 repeats) on chromosome 4 has clinical significance, it is important to determine the chromosomal origin of the contraction observed. Diagnosis also requires the determination whether the last repeat has an intact polyadenylation site associated with it or not (4qA allele versus 4qB allele). Based on the labelling pattern of the repeat regions, OGM can identify the number of repeats, their chromosomal origin, as well as whether the transcript can be polyadenylated or not (see Figure 1).
WGS using short-read technology alone is ineffective in diagnosing this condition. OGM is the state of the art technology to use for over 90% of all people who have FSHD. The remaining 10%, on the other hand, requires WGS, since their problem, referred to as FSHD2, is based on small variants in the machinery that regulates gene expression from the D4Z4 repeats.

Figure 1: OGM labelling pattern near the D4Z4 repeat regions on chromosome 4 and 10 evaluated for FSHD diagnosis. The length of the repeat region (purple bar) is used to calculate the number of repeats. The four examples (not from the same individual) show how the length of the repeat area correlates with the calculated repeat count. The pattern of labels to the left of the repeat region defines chromosome 4 (top 3 panels) and chromosome 10 (bottom panel) and the pattern of labels to the right from the repeat defines alleles qA or qB.
The third scenario where OGM can be useful is sizing of large (greater than 500bp long) tandem repeat expansions
Tandem repeat expansions are a common cause of neurological conditions such as fragile X syndrome or ALS. The size of the repeats correlates with carrier status or manifestation of the disease and also with the severity of the condition. This means sizing of these repeat expansions is important for diagnosis and prediction of outcome. Detecting repeat expansions is possible from WGS, but sizing large repeat expansions requires Southern blotting which is not practical and often does not provide a precise measurement either. OGM is an outstanding tool for this purpose. It is especially useful for sizing repeats that are greater than 500bps in length, such as those associated with Fragile X syndrome, myotonic dystrophy (DM1 and DM2), and ALS/Frontotemporal Dementia (C9ORF72).
Conclusion
The combined OGM, WGS, and transcriptome analysis paradigm proposed here provides a superior clinical alternative to the current fragmented and often disjointed molecular diagnostic process. The proposed paradigm is fast, sensitive, highly specific, and affordable. In addition, it generates data that is definitive and can be used for reanalysis for years on as understanding of genomic variation becomes more complete.
Peter L. Nagy MD PhD is a Stanford trained board certified Anatomic and Molecular Genetic pathologist. He started the Personalized Genomic Medicine Division in the Pathology Department of Columbia University Medical Center in 2011, served as Chief Medical Officer of Medical Neurogenetics Laboratories from 2016, and in 2019 founded Praxis Genomics. His company’s mission is to improve the sensitivity and specificity of constitutional and somatic genetic testing and thus the quality of patient care. He believes that this can be best accomplished for the time being by combining short-read whole genome sequencing, optical genome mapping, and transcriptome analysis. Praxis Genomics, with its headquarters in Atlanta, GA, provides clinical genetic testing and counseling using this new paradigm. In addition. Praxis Genomics also provides custom research or industrial applications of Optical Genome Mapping, Whole Genome, and Transcriptome Sequencing for academic or commercial clients.