Clinical Sequencing

The Challenges of Variants of Uncertain Significance (VUS)

April 30, 2024 by dkoboldt Leave a Comment

As genetic testing continues to expand in both clinical and research settings, variants of uncertain significance (VUS) present a persistent challenge. For the uninitiated, VUS is one of five classifications assigned to genetic variants under ACMG guidelines which indicate the likelihood that a variant causes disease.

Variant interpretation scale — Variant interpretation categories (NHGRI, Guide to Interpreting Genomic Reports: A Genomics Toolkit)

Generally, uncertain significance is the default classification for variants that cannot otherwise be classified as pathogenic/likely pathogenic (i.e. disease causing) or benign/likely benign (not disease causing).

If you’re familiar with genetic testing trends over the last decade, you probably know that VUS are increasingly prevalent on genetic testing reports. In part, that’s due to technological advances, e.g. high-throughput DNA sequencing, that make it possible to interrogate more of the patient’s genome in a timely and cost-effective manner. Gene panels for specific conditions now often encompass thousands of genes, and comprehensive testing — genome or exome sequencing — is increasingly available as a first-tier or second-tier test.

Increasing knowledge — specifically, the number of genes associated with disease — is another important contributor to the VUS explosion. The pace of gene discovery accelerated in the NGS era and continues to grow, as illustrated by the statistics provided by the Online Mendelian Inheritance in Man (OMIM) database:

OMIM pace of gene discovery — The Pace of Gene Discovery (Credit: OMIM)

Long story short, more variants detected in every patient (by comprehensive sequencing) combined with more genes that are possibly reportable (due to association with disease) means more variants on genetic testing reports. And, as I’m about to tell you, for reportable variants, VUS will often be the expected classification.

ACMG Variant Interpretation 101

First, a very brief introduction to the types of evidence that are used when interpreting variants and how they are represented. ACMG evidence codes are letter/number combinations.

The first letter indicates the type of evidence (Pathogenic or Benign).
The second 1-2 letters indicate he strength of evidence (Very Strong, Strong, Moderate, or SuPporting).
The number is a category we use to keep them all straight.

Combining ACMG Rules for pathogenic variants — Evidence required for P/LP variants (Richards et al, Genetics in Medicine, 2015)

So for example, PS2 is the evidence code applied when a variant occurs de novo in a patient with confirmed maternity/paternity. This is the second (2) type of strong (S) evidence of pathogenicity (P), hence the code PS2. For another example, when a variant’s population allele frequency is greater than expected for the disorder, it gets the code BS1 (the first type of benign strong evidence). The weakest level of evidence, supporting, is given the strength-designation P. For example, BP1 applies when you have a missense variant in a gene in which almost all disease-causing variants are truncating/loss-of-function.

When a variant is assessed, each type of evidence is evaluated to see if it applies. The final set of evidence is combined into a formula to determine the final classification. The rules for combination to get a pathogenic or likely pathogenic variant are shown to the right. So for example, for a variant with very strong (VS) evidence of pathogenicity, only one additional strong evidence code is required to classify it as pathogenic. If that second piece of evidence is moderate strength, the variant would be classified likely pathogenic. There’s a similar formula for benign/likely benign evidence.

How We Get To VUS

What if we have some evidence that a variant is pathogenic, but not enough to meet this threshold? Or worse, what if we have a lot of benign evidence but one pathogenic code? The ACMG guidelines lay it out:

COnflicting ACMG interpretation evidence guidelines — Guideline for conflicting evidence (Richards et al, GiM 2015)

VUS Due to Conflicting Evidence

Under the ACMG framework, every variant is assessed both for pathogenic and benign evidence criteria. It is thus quite possible — and does happen on a regular basis — that a variant has both types of evidence, i.e. conflicting evidence. For example, a variant that does not segregate with disease in a family (BS4) and has no predicted effect on the encoded protein (BP4) might still be rare in the general population (PM2).

Another example we often encounter is a missense variant that is rare (PM2), segregates with disease (PP1), and is computationally predicted to be damaging (PP3), but in a gene in which most known disease-causing variants are null variants (BP1). As written above, under ACMG rules, any variant with both types of evidence, no matter the tipping of the scale, defaults to VUS.

VUS Due to Insufficient Evidence

This is the more common pathway to classifying a variant as VUS: there is not enough evidence of pathogenicity to meet the threshold of likely pathogenic, or there’s benign evidence but not enough for likely benign. For example, a novel missense variant (PM2) in a dominant disease gene which is computationally predicted to damage the encoded protein (PP3), without additional evidence, is a VUS (PM2, PP3). Missense variants in general struggle to garner enough evidence to reach pathogenicity due to the strength of evidence codes that can be applied to them; more on that in the next section.

Variants in new or emerging disease genes are especially prone to the “Insufficient Evidence” VUS classification because the etiology of disease is still being established. If only a handful of disease-causing variants have been reported, it’s often difficult to ascertain:

Whether null variants or missense variants are the predominant type of causal variants
The presence of mutation hotspots or critical functional domains in which variants almost always cause disease
The maximum population frequency of established disease-causing variants

Also, relatively new disease genes rarely have robust functional studies that can be used to enhance variant classification. These problems are all exacerbated for missense variants.

Some Variants Have It Easy: Null Variants and De Novo Mutations

You will note that having Very Strong evidence gets you a long way toward classifying a variant as pathogenic. Unfortunately, there is only one type of evidence that carries the weight Very Strong: PVS1. This is reserved for null variants, i.e. the types of variants (e.g., nonsense, frameshift, canonical ±1 or 2 splice sites, initiation codon, single exon or multiexon deletion) that are “assumed to disrupt gene function by leading to a complete absence of the gene product by lack of transcription or nonsense-mediated decay of an altered transcript.” (Richards et al 2015).

As I mentioned earlier, null variants that qualify for PVS1 only need one more piece of moderate-strength evidence to reach likely pathogenic. That’s great for null variants, but such variants represent a tiny fraction of the variants encountered in most genes in most patients. Missense variants are far more prevalent but face an uphill battle toward pathogenicity.

It’s a similar story for de novo mutations: a variant in a dominant disease gene that occurred de novo is rewarded with the strong evidence code PS2. That’s a long way toward a pathogenic classification. However, applying PS2 requires that you test both parents *and* that the parental relationships, especially paternity, have been confirmed. Again, it’s great when the stars align and you can do this. It’s also one of the reasons most labs prefer having family trios (proband and both parents) whenever possible. Yet we live in the real world where:

Children are sometimes adopted or in foster care
Families cannot afford all available testing
Parents may no longer be alive
Parents may be incarcerated
Parents may be unwilling to participate in genetic studies

In these situations, testing both parents is not an option and that usually prevents some of the strongest evidence from being applied.

Consequences of Null and De Novo Variant Bias

The biases that favor null variants and de novo mutations may have scientific underpinnings, but they also exert real-world consequences that often skew the perceptions of emerging disease genes. Let’s be honest: it is far easier to publish a cohort of patients who all have de novo loss-of-function mutations in the same gene. I have been a part of multiple GeneMatcher collaborations in which the study leaders either gave preference to patients with null / de novo variants or were forced to do so to get the work published.

This often means the first few papers linking a gene to a disease describe only de novo / null variants, and that becomes the expected etiology of disease. Even established disease genes can be affected by the null-variant bias: because missense variants are harder to classify as likely pathogenic, they often remain VUS. Anyone who glances at the landscape of disease-causing (P/LP) variants for a gene might (incorrectly) assume that only null variants cause disease. I strongly encourage researchers to push back when they are told that the first paper will only include the “easy button” LOF/de novo variants.

Effects of Changing Variant Interpretation Guidelines

The ACMG 2015 guidelines for interpretation of sequence variants were published 8 years ago this month. It was an important milestone in our field, the members of which increasingly recognized that many variants reported as disease-causing were (in retrospect) probably not. The methods for classifying variants were inconsistent, and there was no universal set of rules that someone could apply. The 2015 guidelines provided such a framework.

However, a lot can change in eight years, and although the ClinGen Sequence Variant Interpretation (SVI) working group has released subsequent recommendations on the use of computational evidence for missense variants and refining classification of splicing variants, these are interim guidance.

The long-awaited revised framework for variant interpretation, which implements a points system to improve accuracy/consistency, is not yet published. In theory, it will help us resolve some VUS. That remains to be seen. Just as some types of evidence can be assigned higher strength (e.g. computational predictions of variant impact), other types of evidence are be blunted (e.g. rareness in the population). We won’t know until the revised guidelines are published, which probably will not be in 2024.

What About Variants in Candidate Genes?

I should take this moment to remind you — as I sometimes have to remind myself — that ACMG variant interpretation should only be applied to sequence variants that affect established disease genes. It should not really be used for variants in unknown genes or candidate genes not yet associated with human disease. That’s because we can’t assess pathogenicity of a variant without a definitive link between the gene and disease.

For clarity, we try to avoid the use of VUS when discussing candidate genes. Occasionally I hear the term GUS — for Gene of Uncertain Significance — and I really like it, but it does not seem to have gained much momentum.

More VUS, More Problems

The increasing number of VUS on genetic testing reports — and our inability to definitively classify them — present significant challenges for clinicians, laboratories, and patient families.

For the lab, a VUS is a non-diagnostic outcome. They can be reported, but generally in the dreaded “Section 2” of the test report.
Patients with VUS thus may not qualify for gene therapy or clinical trials if those are available.
Clinicians must decide whether or not to pursue further testing, either to clarify the VUS or to keep searching.

Which VUS Merit Further Scrutiny?

It’s important to emphasize here that not all VUS are created equal. Because of the conflicting-evidence-means-VUS rules described above, plenty of variants receive this classification but are extremely unlikely to be disease-causing. On the other hand, sometimes VUS offer a promising potential diagnosis in a patient who otherwise has no significant findings. Perhaps the most important question to be answered is the phenotypic overlap, i.e. whether the gene’s associated condition matches the patient clinical presentation. This is why good clinical phenotyping is critical for genetic testing, especially when interpreting uncertain results.

The number, strength, and types of ACMG evidence codes that accompany a VUS classification are also relevant considerations. Some of the proposed revisions of variant classification guidelines allow for tiering of VUS into subsets representing the amount of pathogenic evidence behind them. If these come to pass, they’ll offer a useful communication tool for laboratories to indicate. In the meantime, I tend to refer to them as weak or strong VUS, with the latter category possibly warranting follow-up. Examples of strong VUS include:

A VUS that is compound-heterozygous with a pathogenic variant in a recessive disease gene.
A VUS with multiple pathogenic criteria that segregates with disease in a gene that fits the phenotype. For example, VUS (PM2, PP3, PP5) would indicate a rare variant that’s predicted to be deleterious and has been reported as disease-causing by another laboratory.
A VUS with a predicted effect that could be evaluated by additional testing, such as metabolic/biochemical testing or even RNA-seq for potential splice variants.

This leads to the last section of my post, the million-dollar question.

How Can We Resolve a VUS?

I get asked this question all the time. Honestly, if you’re reading this post and have some ideas, I’d love if you shared them in the comments section below. Note, resolution can go either way: building a case for pathogenicity for a suspicious VUS, or ruling out a VUS that might otherwise be a concern. Here are some strategies we and other groups have tried.

Segregation testing. Determining the segregation pattern and disease status in family members informs, at the very least, the plausibility of a variant fit and there are ACMG evidence codes for both segregation (PP1) and non-segregation (BS4).
Clinical evaluations. The clinician can review patient/family medical records, bring them in for another clinical visit, or refer them to a relevant specialty to determine the presence (or absence) of clinical features associated with the disorder.
Checking the latest population allele frequency databases to determine the variant’s prevalence in presumed-healthy individuals.
Reaching out to other laboratories who have reported the variant according to ClinVar or the literature can sometimes yield useful information.
Identifying additional patients with the variant can provide or strengthen certain categories of pathogenicity evidence. This is something we use in my ClinGen Variant Curation Expert Panel to resolve VUS in the RPE65 gene.
Additional patient testing, such as biochemical/metabolic testing, methylation profiling, etc. that would support or exclude the diagnosis
Variant functional studies in cells, organoids, or animal models. Obviously we’d love this to clarify any variant, but it can be expensive and time-consuming.

Sometimes strategies like the ones above can push a variant to a more definitive classification, and sometimes not. The hard truth is that some VUS cannot be resolved at the present time. Formal classifications aside, the clinicians can make their own judgements about uncertain findings, and counsel and treat the patients accordingly.

The Importance of Patient Phenotype in Genetic Testing

February 23, 2024 by dkoboldt Leave a Comment

The tools and resources we have for human genomic analysis continue to grow in scale and quality. Computational tools like REVEL and SpliceAI leverage machine learning to provide increasingly accurate predictions of the effects of variants. Public databases of sequence variation like the newly expanded gnomAD tell us how common they are in populations. Community-supported resources like ClinVar continue to curate disease-gene associations and interpretations of those variants.

It follows that genetic testing should continue to improve, especially in the setting of rare disorders. Ten years ago, some of the earliest exome sequencing studies of Mendelian disorders showed that with a fairly straightforward filtering approach, it was possible to winnow the set of coding variants identified in a patient (usually in the tends of thousands) to just a handful of compelling candidate variants. And that was ten years ago.

Predictive Genomics for Rare Disorders

Recently, I began to wonder if we are approaching a GATTACA-like future of predictive genomics, at least for rare genetic conditions. If you obtained the genome sequence of a family trio and all you knew was that the proband has a genetic disorder, would it be possible to identify the most likely causal variant(s) and thus predict the rare disorder the patient has? The answer to this question is probably obvious from the theme of this post, but let’s consider it as a thought experiment. So you have trio WGS data from a family and all you know is that the proband is affected. Maybe it’s a severely ill baby in the NICU, or an as-yet-undiagnosed patient coming to Genetics clinic. In this scenario, you might:

Run the trio WGS data through your existing pipeline to identify genomic variants (SNVs, indels, and CNVs).
Annotate all variants with population frequency, in silico predictions, gene / disease associations, ClinVar status, etc.
Identify variants that fit a Mendelian inheritance model (de novo, recessive, or X-linked)
Remove variants that are too common in populations to cause the disease associated with their gene
Apply automated variant interpretation to determine which variants reach pathogenicity
Retain pathogenic variants in disease-associated genes that fit the inheritance for those genes

With these fairly intuitive steps, you’ll likely get a rather short list of candidates and it would be straightforward to rank them so that the most probable diagnostic findings are at the top. This process would be very amenable to automation, so it could be done at speed and scale. Will that become the new paradigm for genetic testing in rare disorders?

The Missing Component of Genome-Driven Analysis

The predictive genetics approach described above has a rational basis but does not account for some crucial information: the clinical phenotype and family history of the patient being tested. Clinical correlation — the overlap between patient symptoms and disease features — has an outsized influence on whether or not a result can be considered diagnostic. In our work, which is research, we encounter (at a surprising frequency) genetic variants that:

Are in a known disease gene
Segregate with the inheritance mode associated with that gene
Reach pathogenicity under ACMG guidelines, but
Are associated with a disease that is not clinically apparent in the proband.

In a world where the majority of tests are non-diagnostic and variants of uncertain significance (VUS) are increasingly prevalent, it is hard to ignore these variants. Naturally, we go back to the clinicians and/or medical records to verify that the patient does not have the disease. If there’s no clinical correlation, these are not considered diagnostic findings. No matter how compelling the variants are.

The Power of Phenotype

Admittedly, my perspective is biased: I work on translational research studies that primarily enroll undiagnosed patients. Often they have already undergone extensive genetic/molecular testing as part of their standard of care. When a clinician orders such testing, they provide patient clinical information. On the laboratory side, especially for comprehensive tests like exome/genome sequencing, patient clinical features are critical. The order forms collect extensive details about patient symptoms, which are converted into standardized disease terms (e.g. HPO terms) and used to identify/prioritize variants for interpretation.

Most rare diseases have genetic origins, and many of the genes responsible give rise to highly specific patterns of patient symptoms. Individually, a single patient symptom may not have significant diagnostic value, but the collective picture of patient clinical features can be very powerful. Especially when some of those features are specific and/or unusual. Even a rudimentary system that ranks a patient’s genetic variants based on clinical feature overlap (the number of features shared between the patient and the disease) helps put the most plausible genetic findings at the top.

Good clinical phenotyping also provides a powerful tool to exclude candidate findings. This is useful because some medical conditions that warrant genetic testing are associated with a wide range of disorders. In the pediatric setting, for example, global developmental delay is associated with thousands of genetic disorders and thus casts a very wide net. However, for many such disorders, global delays occur alongside a number of other distinctive clinical features. If these are not present in the proband, they can often be ruled out. This reduces the search space and interpretation burden for the laboratory.

Limitations of Phenotype-Driven Analysis

Despite these advantages, a phenotype-centric approach to genetic testing has some important limitations.

Variable expressivity. Many genetic disorders have clinically significant features that can vary from one patient to the next, even within families.
Phenocopies. I love this word, which refers to disorders that resemble one another clinically but have different underlying causes.
Pleiotropy. On the other hand, some genes give rise to multiple disorders which can be clinically very distinct.
Phenotype expansion. For many genetic disorders, our understanding of the full phenotypic spectrum changes over time. This is especially true for new/emerging rare disorders for which the clinical description is based on a small number of patients.
Patient evolution. For many patients, the clinical picture changes over time. In the pediatric setting this is a major consideration, as lots of key diagnostic features take time to manifest or be clinically apparent.
Blended phenotypes. At least 5-10% of patients suspected for a monogenic disorder have multiple genetic conditions and their presentation can thus be a confounding combination of the associated features.

The OMIM Curation Bottleneck

The Online Mendelian Inheritance in Man (OMIM) database is one of the most vital resources in human genetics. For many/most laboratories, OMIM is the primary and definitive source for the genes, inheritance patterns, and clinical manifestations associated with genetic disorders. The information in OMIM is curated from the peer-reviewed biomedical literature by trained experts at Johns Hopkins University. This manual curation is why the resource is so widely trusted by the community. However, it’s a double-edged sword because curation takes expertise, time, and funding. The latter two have been a challenge for OMIM, especially since the pace of genetic discovery has accelerated in the past decade. Simply put, there’s way too much literature for OMIM to curate it all.

This bottleneck has real consequences. We look to OMIM as our trusted source of information about disease genes, but that information is increasingly outdated or incomplete. Given the powerful influence that clinical correlation has over genetic testing results… well, it’s a problem. And not one that the OMIM curators will be able to solve on their own. The good news is that there are more sustainable efforts under way. ClinGen, for example, is both standardizing the way information is collected/curated and leveraging expert volunteers (i.e. crowdsourcing) from the community to manage the workload. We still have a long way to go because ClinGen is a relatively new endeavor. However, it’s a more sustainable model that we should continue to support with funding and volunteerism.

In other words, if you’re not part of a ClinGen working group or panel, please think about joining one.

Clinical Genome Sequencing Replaces Exome Sequencing

January 19, 2024 by dkoboldt Leave a Comment

This month our clinical laboratory began offering genome sequencing as an orderable in-house test. It’s a milestone achievement made possible by a talented multidisciplinary team and 3+ years of pre-clinical work under a translational research study. Yes, clinical genome sequencing was already available to our clinical geneticists — as a sendout test to commercial laboratories — but there are distinct advantages to providing this state-of-the-art test in-house. Especially the rapid genome sequencing (rGS) test, for which results are called out just a few days. We have years of data showing that genomic testing results can inform patient care in acute cases. Not to over-hype it, but sometimes it saves lives.

Still, that is not my story to tell, so this post is more about the transition from exome to genome sequencing in a (pediatric) hospital setting. It seems likely that many institutions (not just ours) will make the leap this year. There are several factors driving this change, but one of them is simply the ever-increasing speed/ throughout of next-generation sequencing instruments. For a long period, approximately 2014-2020, exome sequencing was a more practical choice as the mainstay comprehensive genetic test.

The Exome Advantage

Often patients who qualified for genetic testing would first get cytogenetic and microarray testing for chromosomal abnormalities and CNVs, respectively. Depending on the patient’s clinical features, the next step would often be a gene panel, followed by exome sequencing. As a clinical test, exome sequencing was attractive as a comprehensive test because:

Exome capture kits had matured significantly, achieving consistent coverage and enabling fairly reliable deletion/duplication calling.
In terms of laboratory costs, generating ~40-50 Gbp (gigabase-pairs) of data per sample was far less than the ~120 Gpb required for genome.
Turnaround times were pretty good.
The variant interpretation was likely to be gene- and exon-centric anyway.

Simply put, exome sequencing interrogated virtually all genes with a reasonable turnaround time and cost, so it made sense as the comprehensive test. If it was working so well, the natural question might be:

Why move to genome sequencing?

Speed, for one thing. The hybridization process (where probes capture target regions) adds about 1-1.5 days to the laboratory prep time between library creation and when things get loaded on the sequencer. The instruments are now so fast that this increases the lab time by about 50% compared to going straight to genome. The throughput is also so high that exome libraries need to be increasingly multiplexed (i.e. run lots of things at once) to be sequenced. Believe it or not, that can also introduce a delay because one has to wait until enough samples have accumulated to pool and sequence them.

“We don’t have enough samples to sequence” is a phrase I never thought I would hear. Man, how a decade can change things.

Reagent costs are a factor, too, since exome kits cost money. As the per-base cost of sequencing goes down, the savings you get from exome capture instead of genome decrease as well. The capture also requires more input DNA, which can be an issue when dealing with precious clinical samples. So genome sequencing is faster, requires less DNA, and ends up costing about the same for reagents. That’s on top of the obvious advantages GS offers in terms of variant detection.

Does genome sequencing have a higher diagnostic yield than exome sequencing?

In most cases, it should. That’s the theoretical answer. GS interrogates both coding and noncoding regions, and it’s better suited to detecting copy number variants (CNVs) and structural variants (SVs) because the breakpoints of such variants often lie in noncoding regions. Plus, exome capture introduces some hybridization biases which, while somewhat addressable during analysis, make it harder to detect changes in sequence depth that signal the presence of a copy number variant.

However, in my opinion, a major diagnostic advantage of genome sequencing comes from its ability to cover genes and exons that don’t play nicely with exome capture. Immune system genes, for example, are notorious for their poor coverage by exome sequencing. We have numerous examples of diagnostic variants uncovered by genome sequencing which were missed by exome testing due to coverage. From the clinician’s point of view, genetic test results from genome sequencing (even when nondiagnostic) come with more confidence that all of the relevant exons and genes have been interrogated.

A second advantage of genome sequencing is the ability to find deep intronic “second hits” in patients who have a single pathogenic variant in a recessive disease gene. Under exome sequencing, you generally have to do another test. With genome data, labs can at least screen nearby noncoding regions (introns, etc) to see if a second variant is present. Computational tools to predict splicing effects of variants have improved substantially in the past few years to the point where SpliceAI scores have been incorporated into ACMG/AMP guidelines. With clinical GS, upon the identification of a single variant in a promising recessive gene, labs can thus screen the data for rare variants in trans that are predicted to disrupt splicing. We have done this in a translational research setting and I think it will be a major source of improved diagnostic rates.

When should a clinical genome be ordered for an exome-negative patient?

This is an important question as clinical GS becomes more widely available. We know that 50-70% of exome tests are nondiagnostic, and it’s reasonable to assume that most patients who have undergone comprehensive testing in the last decade had an exome, not a genome. As I wrote in my recent post on post-exome strategies for Mendelian disorders, a negative exome result means that genome should be considered as the next step. If the clinical test already was genome, this changes the calculus.

I think it will be difficult to establish a perfect set of rules because every patient is different. However, I’d suggest that clinical GS should be considered when:

The WES testing was done more than two years ago. This seems to be the sweet spot for exome reanalysis anyway, because enough new genes and disease-causing variants have been discovered to significantly boost diagnostic rates.
New and relevant phenotypic information has emerged. Clinical exome testing is almost always guided/driven by the phenotypic data provided to the laboratory. If that changes, so too could the result. In particular, new phenotypes of features with significant genetic associations (dysmorphism, seizures, metabolic changes, neurological/neuromuscular changes, etc.) can significantly impact how variants are considered.
There is medical urgency. A patient who continues to decline, or whose care is limited due to the lack of diagnosis, stands to benefit.
Previous testing or new knowledge hints at a possible diagnosis. So-called “Section 2” variants and newly identified genes/pathways relevant for a patient’s phenotype may justify a harder look at certain loci.

What are the limitations of genome sequencing as a first-tier test?

No test is perfect, and despite the many advantages of genome as a first-line test, it comes with some limitations. GS may have similar experimental costs, but it comes with higher analysis costs (especially computational processing and data storage) because it’s 3-4x more data per sample. Processing that is an occasional cost, but storing data is like the Netflix subscription that never ends. The human staffing costs of interpretation are also higher because there are more variants (and detected variant classes) to evaluate. Balancing workload among staff also becomes more challenging, especially for rapid turnaround tests. And on the technical side, there is sequence depth to consider: Typical depths for exome sequencing (150-200x) have more power to detect somatic/mosaic variation. Patients undergoing testing for conditions associated with somatic mutations — the obvious example being tumor sequencing — are likely to benefit more from exome or panel testing.

Post-Exome Strategies for Mendelian Disorders

August 28, 2023 by dkoboldt Leave a Comment

Today I’m delivering the research genomics lecture at NCH’s Myology Training Course, an annual, week-long, in person training program that covers numerous aspects of clinical, research, and laboratory topics relevant to the field. In a stroke of excellent timing, Monica H. Wojcik and colleagues from the Genomics Research to Elucidate the Genetics of Rare Diseases (GREGoR) Consortium have just published a review on diagnostic testing beyond the exome. In other words, they review the current available tests and diagnostic procedures that may elucidate a molecular diagnosis for a patient with a Mendelian disorder when exome sequencing (ES) has failed to do so.

A flowchart for choosing assays for a Mendelian patient after exome testing is negative — Post-exome strategies for Mendelian patients (Credit: Wojcik et al, AJHG 2023)

This is a subject I know something about, having spent more than a decade studying rare genetic diseases in a large genome center. A negative ES report is no longer the end of the road, as there are numerous other possible strategies to uncover the genetic basis of a rare disorder. This review covers them well, so I thought it would make a useful blog post.

First, it needs to be said that we are living in a golden era of genetic testing. Technological advances have enabled comprehensive assays for genomic interrogation – microarrays, exome sequencing, and genome sequencing – while improvements in bioinformatics and community-created resources like gnomAD and ClinVar have improved our ability to identify and classify disease-causing variants. Best practice guidelines from organizations like the American College of Medical Genetics have been modified to leverage the strengths of these new technologies. Under those guidelines, for patients with congenital anomalies, developmental delays, or intellectual disability, exome/genome sequencing should be the first- or second-line test. One of the reasons for this is the fact that an increasing number of genetic conditions are clinically similar but genetically heterogeneous. Another is the observation that ~5% of diagnosed patients have multiple genetic conditions.

When Comprehensive Genetic Testing Fails

Despite all of this progress, some 50-60% of individuals with a suspected Mendelian condition remain undiagnosed after comprehensive molecular testing. Why does testing fail to produce a diagnosis? It’s a question I spend a lot of time thinking about, discussing with colleagues, and posing to candidates during interviews. Generally, they fall into two categories:

Category 1: The causal variant was detected, but:

The genetic basis of the disorder is not yet known. New disease genes are being discovered every day, but we have a long way to go.
The gene is associated with disease, but the patient represents a new phenotypic manifestation or severity, i.e. variable expressivity.
The variant is inherited from an unaffected parent, i.e. incomplete penetrance.
There is not enough evidence to call the variant pathogenic. Variant interpretation is challenging, especially in the scenario where information is limited concerning the pattern of disease-causing variants, the origin of the variant in the patient, or both.

Category 2: The causal variant was not detected, because:

The gene (or exon) is not interrogated by the sequencing assay, i.e. poorly captured for ES or poorly covered for GS.
The variant is difficult to detect by short-read sequencing, e.g. structural variants and trinucleotide repeat expansions.
The variant lies in a noncoding region (note, it might well be detected by GS, but could be challenging to interpret).
The variant is epigenetic, not genetic.
The disorder is not genetic but has an infectious or other acquired origin.

This is a partial list of reasons that the most comprehensive test available (currently exome sequencing in most situations) fails to make a diagnostic finding.

Post-Exome Testing Options

The central focus of “Beyond the Exome” is the set of options for further diagnostic testing, many of which fall under research. They include:

Exome Sequencing Reanalysis

A key advantage of ES as a genetic test is the ability to re-analyze data when new phenotypic information emerges and/or when some time (usually 2-3 years or more) has passed. The yield of ES reanalysis can vary widely, but a systematic review estimated the increased diagnostic yield at 15% and recommended that reanalysis is warranted 18 months after the initial test. Generally speaking, diagnoses made by ES reanalysis are the result of:

New gene discovery for Mendelian conditions, i.e. identification of a variant in a gene now associated with disease. Consistently the major contributor to new diagnoses.
Resolution of previously known variants of uncertain significance (VUS) as pathogenic.
Improvements in bioinformatics pipelines for variant calling and annotation.

As the authors off this review highlight, diagnoses found on exome reanalysis may also be in known disease genes not previously thught to explain the phenotype, where the clinical interpretation of a variant has changed due to novel data such as additional clinical information, new variant inheritance information, segregation data from other affected family members, newly published case reports, or an expansion of the phenotype associated with the gene. Clinical re-evaluation / clinician input are also essential in these scenarios.

Short-read Genome Sequencing

Genome sequencing (GS) in most contexts means “short-read” genome sequencing — paired-end sequencing of 150-bp to 250-bp reads from the ends of ~350-500 bp fragments by whole genome shotgun approaches. Illumina platforms continue to dominate this market. Compared to ES, GS has some key advantages:

More uniform coverage of genes and exons, including certain genes which are notoriously difficult to capture (especially immune genes) for exome sequencing.
Identification of copy number variants (CNVs) and structural variants (SVs), usually with better sensitivity and resolution than SNP microarrays.
Comprehensive interrogation of noncoding regions that may harbor pathogenic variants, such as introns, promoters, regulatory elements, etc.

It’s important to note that while clinical GS is increasingly being offered on a clinical basis, it is fairly exome-centric in terms of variants reported. In other words, although millions of noncoding variants are identified by GS, our ability to interpret them remains limited. The incremental diagnostic yield of GS in exome-negative patients varies but is probably in the 5-15% range. As expected, some diagnoses made by GS involve types or sizes of SVs that are difficult to detect by other assays. Some are splice-region variants. Some are variants in poorly-captured genes. Yet a significant proportion of findings afforded by GS are made not because the detection was superior, but rather because GS was performed later and on a research basis. This can enable the identification of candidate variants and the exclusion of other genetic causes.

Long-read Targeted/Genome Sequencing

Single molecule long-read sequencing is commercially available on two platforms — Pacific Biosciences and Oxford Nanopore Technologies — and produces reads that are significantly longer than standard GS approaches: 10,000 to 15,000 bp on average, compared to 150 bp. On a per-base level, these platforms have a higher rate of sequencing error than sequencing-by-synthesis (Illumina) approaches, especially in certain sequence contexts (e.g. homopolymers). However, even with slightly diminished accuracy, long reads in this size range are extremely useful for resolving structural variants / complex rearrangements and for interrogating otherwise hard-to-sequence regions of the genome. We have used PacBio long-read sequencing to:

Identify causal variants in syndromic rare disease patients that were poorly covered / not detected by GS. Example: Polyalanine repeat expansions in HOXD13.
Resolve the genomic breakpoints of translocations, inversions, and other complex rearrangements.
Determine the phase of two variants in the same gene, e.g. somatic variants in PTEN in hemimegalencephaly patients.

Naturally, there are disadvantages to long-read sequencing compared to traditional ES and GS approaches. The first and most obvious disadvantage is the cost, which can be 3-4x higher than standard GS. The “DNA cost” is also high, as long read technologies require a large amount of high-molecular-weight DNA. Informatics pipelines are not as mature for long-read technologies, so the analysis cost is higher as well.

RNA Sequencing

Transcriptome profiling by RNA sequencing is, in my opinion, one of the most powerful research genomics tools for undiagnosed patients. RNA can be co-extracted from blood along with DNA, and RNA sequencing is relatively inexpensive. RNA-seq provides a lot of useful information, including:

Comprehensive gene expression measurements with higher precision than microarray testing.
Isoform expression, i.e. the expression level of each exon and splicing of adjacent exons.
Quantification of allele-specific expression, i.e. the balance of alleles of a variant in expressed transcript
Splicing patterns, including both canonical and disruptive splicing.

RNA-Seq data is most useful when paired with genomic data. In our hands, it has been most useful in identifying “missing” variants in known disease genes (e.g. a second hit in a recessive gene in patients who have a single pathogenic variant by standard testing but otherwise fit the condition well). Many of those missing second hits are deep intronic variants with occult disruptions to canonical splicing, but we also see splice-disrupting variants in coding regions and intronic splice regions (outside canonical splice site, but close to the intron-exon junction). RNA-seq can also resolve VUS by showing the variant’s impact on mRNA transcripts. We have used it both to prove and disprove effects on splicing.

The main disadvantage to RNA-seq is that it is only informative if the gene is expressed in available tissue. Otherwise you get no reads, or too few reads to infer splicing patterns. Many genes are expressed in fibroblasts, which is why RNA-seq from blood can be useful even for disorders that affect other systems / developmental timepoints. However, we should be cautious about over-promising the number of genes expressed at high enough levels to analyze: in my experience, it’s only around 50%. RNA-seq has made the most gains in disorders for which disease-relevant tissue is available, e.g. muscle diseases (due to muscle biopsy). Many genes have tissue-specific expression and that tissue often is not available for research testing.

Optical Genome Mapping (OGM) and Epigenetic Methylation Profiling

Both of these are relatively new/emerging technologies with a lot of promise that have already begun to be implemented in some clinical areas. Optical Genome Mapping (OGM) is not sequencing per se, but high-resolution imaging of long labeled DNA molecules coupled with sophisticated informatics to map the physical structure of chromosomes. OGM is therefore very useful for identifying CNVs, SVs, and complex rearrangements with higher resolution (to ~500 bp) than standard of care approaches. Like long-read sequencing, it is costly in terms of reagents and input DNA requirements. OGM in fact requires a specialized sample prep, so you need access to fresh patient material (blood or fresh/frozen tissue) to do the library prep, and that library is only useful for OGM.

Epigenetic profiling, which in the current field usually refers to DNA methylation profiling by microarray, is another diagnostic tool available on clinical and/or research basis depending on the phenotype. Among Mendelian disorders, it has found the most success in diagnosing neurodevelopmental/ID disorders caused by mutations with altered global methylation profiles, i.e. mutations in methylation pathway genes and transcription factors. Methylation profiles for the test patient are generated and clustered alongside reference cohorts of individuals with known diagnoses, assigning the patient a cluster and a confidence score. Doing this type of analysis thus requires access to a large reference cohort of profiles from the same tissue type from many patients with known disorders. If diagnostic, it does not provide sequence-level information, i.e. the mutation responsible for the aberrant methylation pattern. But it can tell you where to look, and in some cases can resolve VUS in genes with abnormal methylation profiles.

Summary

In summary, a nondiagnostic ES report is no longer the end of the road for patients with Mendelian disorders. A growing number of other assays, many of which are only available on a research basis, can provide answers to a considerable proportion of patients and families.

Rare Disease Research in 2023

March 3, 2023 by dkoboldt 1 Comment

This week we celebrate Rare Disease Day, which aims to raise awareness for rare diseases and improve the medical and research support for individuals with rare diseases and their families. I’ve worked in the rare disease research space for several years, and it does feel like the importance of these conditions is increasingly recognized. Earlier this week, U.S. Senators Sherrod Brown (D-OH) and John Barrasso (R-WY) announced the passage of their Senate resolution designating February 28, 2023 as “Rare Disease Day” in the US. When the two major political parties in our country agree on something, it’s kind of a big deal.

Our institution, like many others, hosted an event for patients, families, clinicians, and researchers who are invested in rare disease. It was a special occasion this year as we returned to in-person and celebrated the recognition of Nationwide Children’s Hospital as a Rare Disease Center of Excellence by the National Organization for Rare Disorders (NORD). The keynote speaker was Dr. Jerry Mendell, a pioneer in gene therapy, who shared some of the challenges and successes of developing viral gene therapies for devastating diseases like spinal muscular atrophy and Duchenne Muscular Dystrophy. Watching videos of his patients before and after therapy… well, it’s inspiring to say the least.

The State of Rare Disease Research in 2023

The first step to treating a patient with a rare disease is to obtain a molecular diagnosis. Because the vast majority of rare diseases have genetic origins, this is an area that has seen dramatic advances in the era of high-throughput sequencing. I listen to many scientific talks about disease genetics and gene discovery. Many of them report around ~8,000 gene-disease associations currently in the OMIM database. This week I heard that modern counts are higher, something closer to 12,000. Anecdotally, it’s obvious to those of us who work in gene discovery that at least 2-3 new disease-gene relationships are published in peer-reviewed journals per week.

For many clinical and research laboratories, the OMIM Database is the primary authority on human gene disease associations. It’s a curated resource that is free to the public and maintained by Johns Hopkins. A few months ago, I downloaded the entire database of OMIM associations and did some rudimentary analyses. At that time, OMIM contained:

18,077 recognized human genes
7,333 phenotypes mapped to a gene
6,473 of which are genetic disorders (versus susceptibility and quantitative trait associations)
5,955 of which are Mendelian disorders with an established pattern of inheritance

OMIM disease inheritance pie chart — Major inheritance patterns of OMIM disease genes (Nov 22)

Inheritance of OMIM Disease Genes

As you can see from the pie chart, autosomal recessive disorders are the most prevalent category (53%), followed by autosomal dominant (35%). I imagine the gap between those two categories has been shrinking, as the majority of newly discovered genes in this era are associated with de novo mutations and thus classified as autosomal dominant. A small proportion of disorders are described as both AD and AR. Sometimes that is true. Other times, most patients in the literature are biallelic (AR), but there’s some [usually 10+ years old] published study of a patient with disease who has only one heterozygous variant in the gene. In my experience, if that patient were studied with modern approaches — whole-genome sequencing and transcriptome analysis — there’s a good chance a second variant would be identified.

X-linked inheritance patterns were lumped together for the pie chart, but most are X-linked recessive (n=216), as opposed to X-linked dominant (n=62) or unspecified X-linked inheritance (n=69). With the latter category, if you go and read about those disorders, many have only been reported in affected males and thus are likely to be X-linked recessive.

Somatic inheritance is not really a Mendelian pattern, but I included it as the 5th most common category of OMIM disease genes with >200 curated. Most of these are cancers or cancer syndromes associated with well-known genes like APC, KRAS, and PIK3CA, but there are a handful of somatic-mosaic disorders recognized. This is another category likely to grow substantially as our ability to detect disease-causing variation in relevant tissues and cell types improves.

Other inheritance patterns are recognized by OMIM but are too rare to be shown in a pie chart. These include things like digenic inheritance (n=16) and pseudo-autosomal inheritance (n=4). There are even a handful of Y-linked disorders. Frankly, I’m dubious of most associations in these categories, but once something is established in OMIM, it’s hard to go back.

Curation versus Gene Discovery

One of the major reasons OMIM is so popular is that it’s curated and updated with biomedical literature. In the genomics field, we have come to love curated resources because, generally speaking, the quality is much higher. Yet in the case of OMIM, it is clear that curation lags far behind the pace of discovery. This is unsurprising. New disease genes and expanded phenotypes are being published daily across a wide array of peer-reviewed journals. It’s hard for anyone to keep up.

Yet for patients with undiagnosed rare diseases, recent discoveries are critical. Multiple studies of the reanalysis of previously-negative exome tests have demonstrated that once 2-3 years have passed, a significant proportion (10-20%) yield diagnostic results when reanalyzed. Most of the time, it’s not pipeline improvements or sophisticated analytical tools, but due to the fact that the diagnostic finding is in a newly discovered disease gene, or a new phenotypic manifestation of a known disease gene.

Given the value of up-to-date information, how do we strike the balance? One possibility is to leverage crowdsourcing of experts in the field. This is what the Clinical Genome Resource (ClinGen) is doing with its expert panels, which interact with various working groups. I’m part of the Variant Curation Expert Panel for RPE65, a gene linked to autosomal recessive Leber Congenital Amaurosis (progressive blindness) — our duty is to help establish extremely rigorous guidelines for variant interpretation of variants in this gene. This goes a level beyond ACMG guidelines, which are fairly broad, and applies some gene/disease-specific rules for how each type of evidence could should be evaluated. The VCEP panel is complemented by a GCEP (Gene Curation Expert Panel) which reviews and establishes gene-disease associations.

Everyone on an expert panel is a volunteer, and each panel has member with complementary expertise. Since we all have a personal stake in diagnostics for this gene (or condition), we’re all invested in doing the work and reaching a useful consensus when necessary. I just took a look and wow, there are many GCEPs and VCEPs already established. That seems encouraging.

Rare Disease Genomics in 2023

There is plenty of work to do, of course. Many groups including ours will continue sequencing undiagnosed patients and connecting with other investigators via Matchmaking to help establish new disease genes. Trend-wise, I expect at least two new paradigms will emerge:

Leveraging large-scale biobanks with genomic data to validate new disease genes and identify additional patients to better define phenotypes.
Applying ever-more-sophisticated functional assays, including organoids, gene-edited model organisms, and other approaches, to support and explore the mechanisms of disease.

It’s going to be a busy year, but a good one, for Rare Diseases.

Main navigation