Rare Coding Variants Illuminate New Immune Pathways in Multiple Sclerosis: What Exome-Scale Genetics Adds Beyond GWAS
Multiple sclerosis (MS) is a prototypical complex autoimmune disease of the central nervous system: hundreds of common variants contribute to risk, each with small effect sizes, and together they explain a meaningful—but incomplete—fraction of genetic liability. Large genome-wide association studies (GWAS) have identified >230 common-variant signals, and consortium-scale analyses estimate that all common variants across the autosomal genome account for roughly one-fifth of MS risk heritability, leaving a large remainder unexplained by standard GWAS designs. The core question this paper tackles is whether some of that remainder lives in low-frequency and rare coding variation—alleles that are harder to tag by linkage disequilibrium (LD) and therefore tend to evade common-variant GWAS.
Why rare coding variants are a different kind of signal than GWAS hits
GWAS is excellent at finding common alleles (say minor allele frequency, MAF >5%) because those variants are well represented on genotyping arrays and can be imputed via LD. Low-frequency and rare alleles are different: they often show weak LD with nearby common variants, so they are poorly imputed and effectively “invisible” to GWAS even if they have moderate biological effects. Coding variants are especially appealing because they are more interpretable and experimentally tractable than most non-coding signals—an amino acid substitution in a protein (or a stop-gain) is a concrete hypothesis you can test in cells, organoids, or animal models. This paper leans into that logic: if rare coding alleles carry risk, they can point to disease genes that common-variant studies might never nominate.
Study design: exome-array genotyping at international scale
The International Multiple Sclerosis Genetics Consortium assembled a very large case-control dataset (32,367 MS cases and 36,012 controls) across Australia, 10 European countries, and multiple U.S. cohorts, then genotyped low-frequency coding variation using the Illumina HumanExome BeadChip (or a custom array incorporating exome-chip content). After stringent QC, they meta-analyzed ~120,991 low-frequency coding variants across autosomal exons, including >104k rare non-synonymous and >2k rare nonsense variants. The exome array is not whole-exome sequencing, but it is a cost-efficient compromise: the paper notes it captures the majority of low-frequency and rare coding variants present in large European reference datasets across the 0.0001–0.05 MAF range, while missing much of the ultra-rare tail.
Variant-level findings: seven genome-wide signals and four gene discoveries beyond GWAS
At the single-variant level, the authors performed association testing within 14 strata, using linear mixed models to control population structure, and applied a stringent multiple-testing threshold (Bonferroni p < 3.5×10⁻⁷). They report seven significant coding variants in six genes outside the extended MHC region. Two signals (in TYK2 and GALC) fall within previously implicated GWAS regions and show LD with known common-variant associations, reinforcing earlier biology. The key novelty is the set of associations that are neither in LD with nor close to known common-variant loci—meaning GWAS would not have reliably detected them. Among these are missense variants in PRF1 (p.A91V), HDAC7 (p.R166H), NLRP8 (p.I942M), and two tightly linked missense variants in PRKRA (p.D33G and p.P11L), highlighting four genes that appear to influence MS risk independently of common-variant signals.
The main quantitative punchline: low-frequency coding variation explains a measurable slice of MS liability
Because even a study this large is underpowered to detect all rare effects individually, the authors also asked a more global question: “How much variance in case-control status is attributable to low-frequency coding variants as a class?” Using restricted maximum likelihood heritability modeling (a framework extended from common-variant analyses to rare-variant contexts), they partitioned variance by allele frequency. Their meta-analysis across cohorts estimates that low-frequency coding variants (MAF < 5%) explain ~11.34% of the observed case-control variance, corresponding to a mean of ~4.1% on the liability scale. When further splitting low-frequency into intermediate (1–5%) versus rare (< 1%), rare coding variants alone explain ~9.0% of observed variance (~3.2% liability). Interpreted alongside prior work showing ~20% of MS heritability attributable to common variants, the study argues that a non-trivial additional component of risk resides in low-frequency coding alleles that standard GWAS pipelines largely miss.
Mechanistic convergence: Tregs, IFN-γ biology, NF-κB signaling, and innate immunity
One of the most valuable aspects of coding discoveries is how directly they map to immunological hypotheses. The paper emphasizes immune dysfunction as central to MS pathogenesis and discusses plausible mechanistic links for the newly implicated genes. PRF1 encodes perforin, critical for granzyme-mediated cytotoxicity; the implicated p.A91V variant has prior functional evidence consistent with altered killing efficiency and inflammatory cytokine outputs, which the authors connect to regulatory T cell (Treg) phenotypes and IFN-γ dysregulation observed in MS. HDAC7 is positioned as a regulator of FOXP3-mediated repression and T cell development/survival, tying genetic risk to Treg biology and thymic programming. PRKRA participates in antiviral response pathways that can amplify NF-κB signaling and interferon programs, aligning with a broader MS theme of inflammatory signal transduction. NLRP8, an innate immune sensor, underscores that MS genetic risk is not solely an adaptive-immune story; innate pathways (and their cross-talk with adaptive cells) remain in play.
What this changes for MS genetics and what should come next
Conceptually, this study shifts MS genetics from “hundreds of common variants plus a big unknown remainder” toward a frequency-stratified architecture where low-frequency coding alleles measurably contribute to risk and can reveal genes absent from GWAS catalogs. Practically, it motivates three next steps: (1) larger sample sizes and sequencing-based studies to capture ultra-rare alleles that exome arrays miss; (2) functional follow-up that is variant-specific (e.g., knock-in perturbations, cell-type-resolved assays in Tregs, NK cells, thymocytes, and innate immune compartments) because coding variants can have pleiotropic and context-dependent effects; and (3) integration with existing GWAS fine-mapping to connect common regulatory architecture with rarer protein-altering mechanisms in the same pathways. If successful, that pipeline doesn’t just “explain heritability”—it builds a more actionable causal map of MS immunopathology, where genes like PRF1, HDAC7, PRKRA, and NLRP8 become experimentally testable nodes rather than statistical abstractions.
Disclaimer: This blog post is based on the provided research article and is intended for informational purposes only. It is not intended to provide medical advice. Please consult with a healthcare professional for any health concerns.
References:
Mitrovič, M., Patsopoulos, N. A., Beecham, A. H., Dankowski, T., Goris, A., Dubois, B., ... & Cotsapas, C. (2018). Low-frequency and rare-coding variation contributes to multiple sclerosis risk. Cell, 175(6), 1679-1687.
