Today I’m delivering the research genomics lecture at NCH’s Myology Training Course, an annual, week-long, in person training program that covers numerous aspects of clinical, research, and laboratory topics relevant to the field. In a stroke of excellent timing, Monica H. Wojcik and colleagues from the Genomics Research to Elucidate the Genetics of Rare Diseases (GREGoR) Consortium have just published a review on diagnostic testing beyond the exome. In other words, they review the current available tests and diagnostic procedures that may elucidate a molecular diagnosis for a patient with a Mendelian disorder when exome sequencing (ES) has failed to do so.
This is a subject I know something about, having spent more than a decade studying rare genetic diseases in a large genome center. A negative ES report is no longer the end of the road, as there are numerous other possible strategies to uncover the genetic basis of a rare disorder. This review covers them well, so I thought it would make a useful blog post.
First, it needs to be said that we are living in a golden era of genetic testing. Technological advances have enabled comprehensive assays for genomic interrogation – microarrays, exome sequencing, and genome sequencing – while improvements in bioinformatics and community-created resources like gnomAD and ClinVar have improved our ability to identify and classify disease-causing variants. Best practice guidelines from organizations like the American College of Medical Genetics have been modified to leverage the strengths of these new technologies. Under those guidelines, for patients with congenital anomalies, developmental delays, or intellectual disability, exome/genome sequencing should be the first- or second-line test. One of the reasons for this is the fact that an increasing number of genetic conditions are clinically similar but genetically heterogeneous. Another is the observation that ~5% of diagnosed patients have multiple genetic conditions.
When Comprehensive Genetic Testing Fails
Despite all of this progress, some 50-60% of individuals with a suspected Mendelian condition remain undiagnosed after comprehensive molecular testing. Why does testing fail to produce a diagnosis? It’s a question I spend a lot of time thinking about, discussing with colleagues, and posing to candidates during interviews. Generally, they fall into two categories:
Category 1: The causal variant was detected, but:
- The genetic basis of the disorder is not yet known. New disease genes are being discovered every day, but we have a long way to go.
- The gene is associated with disease, but the patient represents a new phenotypic manifestation or severity, i.e. variable expressivity.
- The variant is inherited from an unaffected parent, i.e. incomplete penetrance.
- There is not enough evidence to call the variant pathogenic. Variant interpretation is challenging, especially in the scenario where information is limited concerning the pattern of disease-causing variants, the origin of the variant in the patient, or both.
Category 2: The causal variant was not detected, because:
- The gene (or exon) is not interrogated by the sequencing assay, i.e. poorly captured for ES or poorly covered for GS.
- The variant is difficult to detect by short-read sequencing, e.g. structural variants and trinucleotide repeat expansions.
- The variant lies in a noncoding region (note, it might well be detected by GS, but could be challenging to interpret).
- The variant is epigenetic, not genetic.
- The disorder is not genetic but has an infectious or other acquired origin.
This is a partial list of reasons that the most comprehensive test available (currently exome sequencing in most situations) fails to make a diagnostic finding.
Post-Exome Testing Options
The central focus of “Beyond the Exome” is the set of options for further diagnostic testing, many of which fall under research. They include:
Exome Sequencing Reanalysis
A key advantage of ES as a genetic test is the ability to re-analyze data when new phenotypic information emerges and/or when some time (usually 2-3 years or more) has passed. The yield of ES reanalysis can vary widely, but a systematic review estimated the increased diagnostic yield at 15% and recommended that reanalysis is warranted 18 months after the initial test. Generally speaking, diagnoses made by ES reanalysis are the result of:
- New gene discovery for Mendelian conditions, i.e. identification of a variant in a gene now associated with disease. Consistently the major contributor to new diagnoses.
- Resolution of previously known variants of uncertain significance (VUS) as pathogenic.
- Improvements in bioinformatics pipelines for variant calling and annotation.
As the authors off this review highlight, diagnoses found on exome reanalysis may also be in known disease genes not previously thught to explain the phenotype, where the clinical interpretation of a variant has changed due to novel data such as additional clinical information, new variant inheritance information, segregation data from other affected family members, newly published case reports, or an expansion of the phenotype associated with the gene. Clinical re-evaluation / clinician input are also essential in these scenarios.
Short-read Genome Sequencing
Genome sequencing (GS) in most contexts means “short-read” genome sequencing — paired-end sequencing of 150-bp to 250-bp reads from the ends of ~350-500 bp fragments by whole genome shotgun approaches. Illumina platforms continue to dominate this market. Compared to ES, GS has some key advantages:
- More uniform coverage of genes and exons, including certain genes which are notoriously difficult to capture (especially immune genes) for exome sequencing.
- Identification of copy number variants (CNVs) and structural variants (SVs), usually with better sensitivity and resolution than SNP microarrays.
- Comprehensive interrogation of noncoding regions that may harbor pathogenic variants, such as introns, promoters, regulatory elements, etc.
It’s important to note that while clinical GS is increasingly being offered on a clinical basis, it is fairly exome-centric in terms of variants reported. In other words, although millions of noncoding variants are identified by GS, our ability to interpret them remains limited. The incremental diagnostic yield of GS in exome-negative patients varies but is probably in the 5-15% range. As expected, some diagnoses made by GS involve types or sizes of SVs that are difficult to detect by other assays. Some are splice-region variants. Some are variants in poorly-captured genes. Yet a significant proportion of findings afforded by GS are made not because the detection was superior, but rather because GS was performed later and on a research basis. This can enable the identification of candidate variants and the exclusion of other genetic causes.
Long-read Targeted/Genome Sequencing
Single molecule long-read sequencing is commercially available on two platforms — Pacific Biosciences and Oxford Nanopore Technologies — and produces reads that are significantly longer than standard GS approaches: 10,000 to 15,000 bp on average, compared to 150 bp. On a per-base level, these platforms have a higher rate of sequencing error than sequencing-by-synthesis (Illumina) approaches, especially in certain sequence contexts (e.g. homopolymers). However, even with slightly diminished accuracy, long reads in this size range are extremely useful for resolving structural variants / complex rearrangements and for interrogating otherwise hard-to-sequence regions of the genome. We have used PacBio long-read sequencing to:
- Identify causal variants in syndromic rare disease patients that were poorly covered / not detected by GS. Example: Polyalanine repeat expansions in HOXD13.
- Resolve the genomic breakpoints of translocations, inversions, and other complex rearrangements.
- Determine the phase of two variants in the same gene, e.g. somatic variants in PTEN in hemimegalencephaly patients.
Naturally, there are disadvantages to long-read sequencing compared to traditional ES and GS approaches. The first and most obvious disadvantage is the cost, which can be 3-4x higher than standard GS. The “DNA cost” is also high, as long read technologies require a large amount of high-molecular-weight DNA. Informatics pipelines are not as mature for long-read technologies, so the analysis cost is higher as well.
Transcriptome profiling by RNA sequencing is, in my opinion, one of the most powerful research genomics tools for undiagnosed patients. RNA can be co-extracted from blood along with DNA, and RNA sequencing is relatively inexpensive. RNA-seq provides a lot of useful information, including:
- Comprehensive gene expression measurements with higher precision than microarray testing.
- Isoform expression, i.e. the expression level of each exon and splicing of adjacent exons.
- Quantification of allele-specific expression, i.e. the balance of alleles of a variant in expressed transcript
- Splicing patterns, including both canonical and disruptive splicing.
RNA-Seq data is most useful when paired with genomic data. In our hands, it has been most useful in identifying “missing” variants in known disease genes (e.g. a second hit in a recessive gene in patients who have a single pathogenic variant by standard testing but otherwise fit the condition well). Many of those missing second hits are deep intronic variants with occult disruptions to canonical splicing, but we also see splice-disrupting variants in coding regions and intronic splice regions (outside canonical splice site, but close to the intron-exon junction). RNA-seq can also resolve VUS by showing the variant’s impact on mRNA transcripts. We have used it both to prove and disprove effects on splicing.
The main disadvantage to RNA-seq is that it is only informative if the gene is expressed in available tissue. Otherwise you get no reads, or too few reads to infer splicing patterns. Many genes are expressed in fibroblasts, which is why RNA-seq from blood can be useful even for disorders that affect other systems / developmental timepoints. However, we should be cautious about over-promising the number of genes expressed at high enough levels to analyze: in my experience, it’s only around 50%. RNA-seq has made the most gains in disorders for which disease-relevant tissue is available, e.g. muscle diseases (due to muscle biopsy). Many genes have tissue-specific expression and that tissue often is not available for research testing.
Optical Genome Mapping (OGM) and Epigenetic Methylation Profiling
Both of these are relatively new/emerging technologies with a lot of promise that have already begun to be implemented in some clinical areas. Optical Genome Mapping (OGM) is not sequencing per se, but high-resolution imaging of long labeled DNA molecules coupled with sophisticated informatics to map the physical structure of chromosomes. OGM is therefore very useful for identifying CNVs, SVs, and complex rearrangements with higher resolution (to ~500 bp) than standard of care approaches. Like long-read sequencing, it is costly in terms of reagents and input DNA requirements. OGM in fact requires a specialized sample prep, so you need access to fresh patient material (blood or fresh/frozen tissue) to do the library prep, and that library is only useful for OGM.
Epigenetic profiling, which in the current field usually refers to DNA methylation profiling by microarray, is another diagnostic tool available on clinical and/or research basis depending on the phenotype. Among Mendelian disorders, it has found the most success in diagnosing neurodevelopmental/ID disorders caused by mutations with altered global methylation profiles, i.e. mutations in methylation pathway genes and transcription factors. Methylation profiles for the test patient are generated and clustered alongside reference cohorts of individuals with known diagnoses, assigning the patient a cluster and a confidence score. Doing this type of analysis thus requires access to a large reference cohort of profiles from the same tissue type from many patients with known disorders. If diagnostic, it does not provide sequence-level information, i.e. the mutation responsible for the aberrant methylation pattern. But it can tell you where to look, and in some cases can resolve VUS in genes with abnormal methylation profiles.
In summary, a nondiagnostic ES report is no longer the end of the road for patients with Mendelian disorders. A growing number of other assays, many of which are only available on a research basis, can provide answers to a considerable proportion of patients and families.