The vast majority of the human genome (98%) does not encode proteins. Most of the ~4-6 million sequence variants in any individual’s genome thus lie within this ‘noncoding’ portion. The deep, dark secret of current genetic testing is that it primarily interrogates and reports the relatively tiny fraction of variants (~50,000 or so per individual) that are in or near protein-coding regions. Although there are several scientifically valid reasons for this, the main one is that the structure of genes and the triplet protein code make it possible to predict the effect of a DNA sequence change with reasonable accuracy. Yet, we know that noncoding variants are one of the reasons that >50% of tested patients fail to receive a diagnosis from comprehensive genetic testing.
After all, there are many types of regulatory elements in or near genes which affect their protein production:
It should be noted that even as clinical genome sequencing replaces exome sequencing, the scope of reported variants remains on the coding regions. The widely-used 2015 ACMG Variant Interpretation Guidelines were developed around protein-altering variants. This holds true for the updated guidelines that are pervasively in use by groups such as ClinGen despite not being fully published at the time of writing. Variant interpretations for noncoding variants are woefully under-represented in databases such as ClinVar. All of this needs to change.
I volunteer for a ClinGen Variant Curation Expert Panel (VCEP) which develops the gene-specific variant interpretation protocols for sets of genes and applies them to the ClinVar-submitted variants for those genes. My VCEP is for Leber Congenital Amaurosis / early-onset retinal disease, which are among the major causes of inherited blindness. Our first curated gene, RPE65, is primarily associated with recessive disease that now has a gene therapy treatment (Luxturna). Interestingly, we have incorporated “response to gene therapy” into our interpretation protocols, but that’s a story for another time. The development of interest here is that at least a couple of patients who received and responded to Luxturna are compound-heterozygous for a coding variant and a noncoding variant in RPE65. Thus, we have begun discussing how to adapt our rigorous, coding-centric variant interpretation protocol for noncoding variants.
The good news is that an expert panel has drafted, tested, and published recommendations for clinical interpretation of noncoding variants.
Their study, published toward the end of 2022 in Genome Medicine, provides guiding principles and numerous specific adaptations of ACMG evidence codes for noncoding variants. Here, I’ll review and summarize some of the key findings.
Variant Interpretation Requires A Gene-Disease Relationship
First, as noted by the authors, to avoid a substantial burden of analysis and reporting, clinical interpretation should only be applied to variants that:
- Map to cis-regulatory elements that have well-established, functionally-validated links to target genes, and
- Affect genes that have documented association with a phenotype (disease) relevant for the patient harboring the variant
In other words, as is true with coding variants, clinical variant interpretation should only be applied to variants in genes that have an established gene-disease association. The authors specifically suggest “definitive, strong, or moderate level using the ClinGen classification approach or green for the phenotype of interest in PanelApp” though in practice, many labs still rely on OMIM as the arbiter of gene-disease relationships.
Defining Cis-Regulatory Elements for Genes
The expert panel defined several types of noncoding regions in which variants can be interpreted, including:
- Introns and UTRs as defined by well-validated transcripts, e.g. the canonical transcripts designated by the MANE database.
- Promoters, defined as the region of open chromatin surrounding the canonical transcription start site (TSS) as delineated by functional epigenetic data (e.g. ATAC-seq, DNAse-seq) in a disease-relevant tissue or cell type. If not available for a relevant type, consensus open chromatin regions from ENCODE can be used. If even that is not possible, one can use the TSS +/- 250 bp.
- Other cis-regulatory elements (e.g. transcription factor binding sites from CHiP-Seq or H3K27Ac/H3K4Me1-marked active enhancers) provided that they have experimental evidence linking them to the gene of interest.
The most obvious evidence of a link between a CRE and a target gene would be something like Hi-C, which captures regions that physically interact with gene promoters. However, the authors also would accept regions whose functional perturbation has a demonstrated effect on expression or which harbor established one or more expression quantitative trait loci (eQTLs) for the target gene.
Naturally, most of these strategies for CRE definition require access to rich epigenetic data, which not everyone currently enjoys. The authors recognize this and express their hope that the research community will make such datasets publicly available.
ACMG Evidence for Noncoding Variants
The authors next lay out some specific ACMG evidence codes along with guidance of whether and how they can be adapted to apply to noncoding variants.
Evidence Codes That Already Apply
Of course, as noted by the authors, some types of evidence are agnostic to variant location and can be applied as-is, including:
- Allele frequency information (evidence codes PM2, BS1, BS2, and BA1).
- De novo variant evidence (codes PS2 and PM6)
- Co-segregion evidence (codes PP1 and BS4)
These types of evidence are the “gimmes” for most variant interpretation, but I should point out that under the currently modified guidance, most of them are either benign evidence or weak-to-moderate evidence of pathogenicity (e.g. PM2 should now only be applied at supporting level). So these do help, but only a little. The exception to that is a de novo variant, which in many cases could qualify for PS2 (possibly downgraded). De novo mutations, unlike inherited variants, are exceptionally rare — on average, 60-70 per individual genome-wide compared to 4-6 million — and I think that when tackling noncoding variants, these are a good place to start.
Adaptation of Evidence Codes for Noncoding Variants
Here are the main ACMG evidence codes discussed, their original definitions under 2015 guidelines, and the expert recommendation on if/when they can be applied to noncoding variants:
Evidence_Code | Original_Definition | Adapted_Definition |
PVS1 | Predicted null variant (nonsense, frameshift, start loss, essential splice site) in a gene where loss-of-function is a known mechanism of disease | (You wish). The authors state that PVS1 should not be used for noncoding variants. |
PM1 | Located in a mutational hotspot and/or well-established functional domain without benign variation | Disrupts a TF binding motif whose disruption is shown to be pathogenic, or maps to a cluster of pathogenic variants within a well-defined CRE |
PS1 | Same amino acid change as a well-established pathogenic variant, but caused by a different nucleotide change | At supporting level, for splicing variants where a different substitution has been classified as LP, provided this one has a similar predicted effect |
PM5 | Disrupts same residue as a well-established pathogenic variant, but changes it to a different amino acid | Predicted to have the same impact on the same gene as established pathogenic variants but not at the same base. |
PM3 | For recessive disorders, detected in trans with a known pathogenic variant | Same usage, provided it’s rare enough, but at supporting level for genes where this is more likely to occur by chance (large size / high mutation rate). |
PS3/B3 | Well-established functional studies show deleterious effect (PS3) or demonstrate no effect (BS3) | RNAseq: PS3 if aberrant splicing isoforms are detected, BS3 if not detected, in both cases provided that the expression profile is similar to controls and sufficient depth is achieved. Reporter gene assays: PS3 if significant differences between variant and wild-type constructs, provided that the system is validated. MAVE assays should follow existing guidance (PMID: 31862013). |
PP3/BP4 | Computational tools predict a deleterious effect (PP3) or a benign impact (BP4), often based on REVEL score for missense variants. | Similar usage but computational tools designed for noncoding variants should be used. Examples: CADD, DANN, ReMM (Genomiser), FATHMM_MKL, GREEN-DB |
The authors offer numerous caveats and cautions for many of the above codes for the obvious reason that the study of regulatory regions (and the impact of variants therein) is a rapidly evolving field of research. Also, I find some of this guidance a bit confusing. For example, the authors note that PS1 should be downgraded to supporting level for a different nucleotide change at the same position as an established variant, but they do not mention downgrading PM5, which can be applied for variants at different positions expected to have the same regulatory impact. Of course, some areas of guidance are intentionally left vague, such as the choice of computational tools/thresholds to use for PP3/BP4.
Noncoding Variants: The Time Is Now
We know noncoding variants are important, and these guidelines provide a reasonable starting point for selecting ones to report and interpreting them with clinical rigor. It’s time to get to work. In my next post, I highlight how we can narrow the search space for noncoding variants with inferences from an analysis of undiagnosed rare disease patients in Genomics England (GEL).
Leave a Reply