Integrated Multi-Omics and Machine Learning Reveal Novel Immune Gene Networks in Multiple Sclerosis

Multiple sclerosis (MS) remains one of the most challenging neuroinflammatory disorders to decode because its biological origins span immune dysfunction, neurodegeneration, and a highly polygenic genetic architecture. In this article, Chen and colleagues address a central problem in MS genetics: although genome-wide association studies have identified more than 200 susceptibility signals, many of these loci fall in noncoding regions and do not directly reveal which genes are functionally driving disease risk. The study therefore asks a more mechanistic question—whether inherited variants shape MS by altering gene expression, RNA splicing, and ultimately protein abundance in the brain—and whether those signals can also support clinically useful risk prediction.

A layered design that moves beyond conventional GWAS
The strength of the paper lies in its multi-omics design. The authors integrated the largest available MS GWAS dataset, comprising 14,802 cases and 26,703 controls of European ancestry, with large brain cortex eQTL and sQTL resources derived from 2,865 RNA-sequenced samples from 2,443 individuals. They then combined these findings with weighted gene coexpression network analysis in peripheral blood mononuclear cells, machine-learning-based feature selection, immune infiltration analysis, and independent protein-level validation using brain pQTL data. Methodologically, this is important because it does not stop at statistical association; it tries to prioritize genes that are supported across transcriptional regulation, coexpression structure, and proteomic evidence, thereby increasing biological plausibility.

From broad association signals to biologically plausible candidate genes
Using summary-data-based Mendelian randomization and colocalization analysis, the investigators identified 28 significant sQTL loci corresponding to 18 unique splicing-associated genes and 66 eQTL-associated genes linked to MS risk; after stricter colocalization filtering, 15 sGenes and 51 eGenes remained supported as likely sharing causal variants with MS susceptibility. A notable insight is that roughly 72% of the prioritized splicing genes were distinct from the expression genes, suggesting that altered RNA splicing may explain a substantial fraction of genetic risk not captured by standard expression analyses alone. When these genetically supported candidates were intersected with MS-related coexpression modules, the list narrowed to 23 shared genes, including IL7, RGS1, SP140, TNFRSF1A, TRAF3, TSPAN31, ZC2HC1A, and others that collectively point toward immune regulation as a central axis of disease biology.

Machine learning turns mechanistic biology into a predictive signature
One of the paper’s most translationally ambitious steps is the construction of a diagnostic gene signature. Starting from the shared genes, the authors applied LASSO regression to derive a 10-gene model consisting of ACP2, IL7, MYNN, RGS1, SAE1, SP140, TRAF3, TSPAN31, TYMP, and ZC2HC1A. This panel showed excellent discrimination in the discovery dataset, with an AUC of 1.0 in training and 0.983 in internal validation, and it retained performance above 0.70 across three independent external datasets. These results do not yet establish a clinical diagnostic test, but they do suggest that biologically anchored transcriptomic signatures may help stratify MS risk or support earlier recognition when combined with clinical and radiological data. The authors are appropriately cautious, noting that the model still requires testing against other neurological diseases that can mimic MS.

Immune dysregulation emerges as the dominant mechanistic theme
The biological interpretation of the results is remarkably coherent. Functional enrichment analyses converged on lymphocyte activation, regulation of the immune system, NF-κB signalling, and Epstein–Barr virus-related pathways, all of which are highly relevant to contemporary models of MS pathogenesis. The immune infiltration analysis added another layer by showing a consistent pattern: genes associated with increased MS risk tended to correlate positively with naïve CD4+ T cells and resting mast cells, but negatively with activated mast cells, whereas protective genes displayed the opposite profile. The authors interpret this as evidence for impaired peripheral immune tolerance and altered immune surveillance, with the possibility that disrupted mast-cell and T-cell crosstalk contributes to the transition from immune homeostasis to neuroinflammatory disease.

Why ZC2HC1A and TRAF3 are the study’s most compelling leads
Among all prioritized candidates, ZC2HC1A and TRAF3 emerged as the most persuasive because they were supported at both the transcript and protein levels. Integration of brain pQTL data with MS GWAS statistics showed that increased genetically predicted protein abundance of ZC2HC1A and TRAF3 was significantly associated with MS risk, with strong colocalization probabilities (PP4 = 0.987 and 0.991, respectively). The study further linked both genes to the Hedgehog signalling pathway, while TRAF3 was discussed as a potentially important regulator of B-cell survival, NF-κB signalling, metabolism, and antiviral immune homeostasis. ZC2HC1A is less well characterized functionally, but the paper places it in a credible immunoregulatory context, especially through links to T-cell activation and the neighboring IL7 locus. Taken together, these two genes represent the article’s most promising bridge between statistical genetics and experimentally testable MS biology.

Scientific significance, translational promise, and necessary caution
This article is a strong example of how modern disease genetics is moving from locus discovery toward causal prioritization. Its central contribution is not merely the identification of more MS-associated genes, but the assembly of convergent evidence that certain genes—especially TRAF3 and ZC2HC1A—may sit at important regulatory intersections linking gene regulation, immune-cell behavior, and disease susceptibility. At the same time, the study has clear limitations: the datasets are largely restricted to individuals of European ancestry, the Mendelian-randomization framework is more informative for common than rare variants, stringent multiple-testing correction may have removed true signals, and the predictive model has not yet been validated against non-MS neurological disorders. Even so, the paper offers a substantial advance by showing that integrated genomics, transcriptomics, proteomics, and machine learning can generate a more mechanistic and clinically relevant map of MS susceptibility than GWAS alone.

Disclaimer: This blog post is based on the provided research article and is intended for informational purposes only. It is not intended to provide medical advice. Please consult with a healthcare professional for any health concerns.

References:
Chen, M., Zhao, D., Fan, H. et al. Integrated multi-omics and machine learning prioritize key immune genes for multiple sclerosis risk prediction. Mamm Genome 37, 38 (2026). https://doi.org/10.1007/s00335-026-10207-6

Genetics

Alper Bülbül

Geneticists and Bioinformatician

Integrated Multi-Omics and Machine Learning Reveal Novel Immune Gene Networks in Multiple Sclerosis

Alper Bülbül

About Me

Feature posts

Genetics

Genetics

Blog Tags

Follow Me

Integrated Multi-Omics and Machine Learning Reveal Novel Immune Gene Networks in Multiple Sclerosis

Share:

Alper Bülbül

Genetics

Genetics

Blog Tags

Follow Me