The tools and resources we have for human genomic analysis continue to grow in scale and quality. Computational tools like REVEL and SpliceAI leverage machine learning to provide increasingly accurate predictions of the effects of variants. Public databases of sequence variation like the newly expanded gnomAD tell us how common they are in populations. Community-supported resources like ClinVar continue to curate disease-gene associations and interpretations of those variants.
It follows that genetic testing should continue to improve, especially in the setting of rare disorders. Ten years ago, some of the earliest exome sequencing studies of Mendelian disorders showed that with a fairly straightforward filtering approach, it was possible to winnow the set of coding variants identified in a patient (usually in the tends of thousands) to just a handful of compelling candidate variants. And that was ten years ago.
Predictive Genomics for Rare Disorders
Recently, I began to wonder if we are approaching a GATTACA-like future of predictive genomics, at least for rare genetic conditions. If you obtained the genome sequence of a family trio and all you knew was that the proband has a genetic disorder, would it be possible to identify the most likely causal variant(s) and thus predict the rare disorder the patient has? The answer to this question is probably obvious from the theme of this post, but let’s consider it as a thought experiment. So you have trio WGS data from a family and all you know is that the proband is affected. Maybe it’s a severely ill baby in the NICU, or an as-yet-undiagnosed patient coming to Genetics clinic. In this scenario, you might:
- Run the trio WGS data through your existing pipeline to identify genomic variants (SNVs, indels, and CNVs).
- Annotate all variants with population frequency, in silico predictions, gene / disease associations, ClinVar status, etc.
- Identify variants that fit a Mendelian inheritance model (de novo, recessive, or X-linked)
- Remove variants that are too common in populations to cause the disease associated with their gene
- Apply automated variant interpretation to determine which variants reach pathogenicity
- Retain pathogenic variants in disease-associated genes that fit the inheritance for those genes
With these fairly intuitive steps, you’ll likely get a rather short list of candidates and it would be straightforward to rank them so that the most probable diagnostic findings are at the top. This process would be very amenable to automation, so it could be done at speed and scale. Will that become the new paradigm for genetic testing in rare disorders?
The Missing Component of Genome-Driven Analysis
The predictive genetics approach described above has a rational basis but does not account for some crucial information: the clinical phenotype and family history of the patient being tested. Clinical correlation — the overlap between patient symptoms and disease features — has an outsized influence on whether or not a result can be considered diagnostic. In our work, which is research, we encounter (at a surprising frequency) genetic variants that:
- Are in a known disease gene
- Segregate with the inheritance mode associated with that gene
- Reach pathogenicity under ACMG guidelines, but
- Are associated with a disease that is not clinically apparent in the proband.
In a world where the majority of tests are non-diagnostic and variants of uncertain significance (VUS) are increasingly prevalent, it is hard to ignore these variants. Naturally, we go back to the clinicians and/or medical records to verify that the patient does not have the disease. If there’s no clinical correlation, these are not considered diagnostic findings. No matter how compelling the variants are.
The Power of Phenotype
Admittedly, my perspective is biased: I work on translational research studies that primarily enroll undiagnosed patients. Often they have already undergone extensive genetic/molecular testing as part of their standard of care. When a clinician orders such testing, they provide patient clinical information. On the laboratory side, especially for comprehensive tests like exome/genome sequencing, patient clinical features are critical. The order forms collect extensive details about patient symptoms, which are converted into standardized disease terms (e.g. HPO terms) and used to identify/prioritize variants for interpretation.
Most rare diseases have genetic origins, and many of the genes responsible give rise to highly specific patterns of patient symptoms. Individually, a single patient symptom may not have significant diagnostic value, but the collective picture of patient clinical features can be very powerful. Especially when some of those features are specific and/or unusual. Even a rudimentary system that ranks a patient’s genetic variants based on clinical feature overlap (the number of features shared between the patient and the disease) helps put the most plausible genetic findings at the top.
Good clinical phenotyping also provides a powerful tool to exclude candidate findings. This is useful because some medical conditions that warrant genetic testing are associated with a wide range of disorders. In the pediatric setting, for example, global developmental delay is associated with thousands of genetic disorders and thus casts a very wide net. However, for many such disorders, global delays occur alongside a number of other distinctive clinical features. If these are not present in the proband, they can often be ruled out. This reduces the search space and interpretation burden for the laboratory.
Limitations of Phenotype-Driven Analysis
Despite these advantages, a phenotype-centric approach to genetic testing has some important limitations.
- Variable expressivity. Many genetic disorders have clinically significant features that can vary from one patient to the next, even within families.
- Phenocopies. I love this word, which refers to disorders that resemble one another clinically but have different underlying causes.
- Pleiotropy. On the other hand, some genes give rise to multiple disorders which can be clinically very distinct.
- Phenotype expansion. For many genetic disorders, our understanding of the full phenotypic spectrum changes over time. This is especially true for new/emerging rare disorders for which the clinical description is based on a small number of patients.
- Patient evolution. For many patients, the clinical picture changes over time. In the pediatric setting this is a major consideration, as lots of key diagnostic features take time to manifest or be clinically apparent.
- Blended phenotypes. At least 5-10% of patients suspected for a monogenic disorder have multiple genetic conditions and their presentation can thus be a confounding combination of the associated features.
The OMIM Curation Bottleneck
The Online Mendelian Inheritance in Man (OMIM) database is one of the most vital resources in human genetics. For many/most laboratories, OMIM is the primary and definitive source for the genes, inheritance patterns, and clinical manifestations associated with genetic disorders. The information in OMIM is curated from the peer-reviewed biomedical literature by trained experts at Johns Hopkins University. This manual curation is why the resource is so widely trusted by the community. However, it’s a double-edged sword because curation takes expertise, time, and funding. The latter two have been a challenge for OMIM, especially since the pace of genetic discovery has accelerated in the past decade. Simply put, there’s way too much literature for OMIM to curate it all.
This bottleneck has real consequences. We look to OMIM as our trusted source of information about disease genes, but that information is increasingly outdated or incomplete. Given the powerful influence that clinical correlation has over genetic testing results… well, it’s a problem. And not one that the OMIM curators will be able to solve on their own. The good news is that there are more sustainable efforts under way. ClinGen, for example, is both standardizing the way information is collected/curated and leveraging expert volunteers (i.e. crowdsourcing) from the community to manage the workload. We still have a long way to go because ClinGen is a relatively new endeavor. However, it’s a more sustainable model that we should continue to support with funding and volunteerism.
In other words, if you’re not part of a ClinGen working group or panel, please think about joining one.
Leave a Reply