Polygenic Prediction Across Ancestries: Evaluating Multiple Sclerosis Risk Scores in South Asian Populations
Multiple sclerosis (MS) is a complex immune-mediated disease in which genetic susceptibility is distributed across many loci of small effect, rather than being driven by a single causal gene. The strongest genetic contribution arises from the major histocompatibility complex (MHC) region—particularly classical HLA variation—yet genome-wide association studies (GWAS) have also identified hundreds of additional risk variants outside this locus. Against this background, polygenic risk scores (PRS) have emerged as a practical way to aggregate the effects of numerous common variants into a single quantitative index of inherited risk. In principle, MS PRS could support research and preventive trial design by enriching cohorts for individuals more likely to develop MS (e.g., studies of putative preventive strategies such as EBV vaccination or vitamin D supplementation), and may eventually inform risk stratification in preclinical or “prodromal” settings. However, the clinical promise of PRS is constrained by a widely observed problem: scores trained in European-ancestry datasets often transfer poorly to other ancestries, raising both scientific concerns and the possibility of exacerbating health inequalities.
Study Aim and Design: Testing Cross-Ancestry Portability of MS PRS
The article directly addresses this portability gap by evaluating whether an MS PRS derived from a large European-ancestry GWAS predicts MS less accurately in individuals of South Asian ancestry than in a European-ancestry cohort. The authors analyze two longitudinal resources: (i) Genes & Health (G&H), comprising British–Bangladeshi and British–Pakistani participants, and (ii) UK Biobank (UKB), predominantly White British. Cases and controls were defined using linked electronic health records (EHR), with MS identified by ICD-10 code G35 after harmonization across coding systems in G&H. After quality control, the G&H analysis set included 40,532 individuals, with 42 MS cases and 40,490 controls (as summarized in the cohort flow diagram on page 3). UKB analyses were restricted to genetically European-ancestry participants and used parallel quality-control principles, enabling an ancestry-focused comparison.
Genotyping and PRS Construction: Clumping, Thresholding, and the MHC Question
A central methodological choice was the use of a clumping-and-thresholding PRS pipeline (PRSice-2), which attempts to build a score from approximately independent association signals by pruning variants in linkage disequilibrium (LD) and selecting SNPs under various GWAS P-value thresholds. Discovery weights were taken from the International Multiple Sclerosis Genetics Consortium (IMSGC) 2019 GWAS meta-analysis (14,802 cases; 26,703 controls). In G&H, participants were genotyped on the Illumina Global Screening Array and imputed to the multi-ancestral TOPMed reference panel; variants were filtered for imputation quality, allele frequency, and other standard criteria. Importantly, the investigators constructed multiple families of PRS: scores including the MHC region, scores excluding the MHC (to focus on non-MHC polygenicity), and scores limited to MHC variants only. This design was intended to isolate whether the most influential MS locus behaves differently when European-derived tag SNPs are applied in a South Asian-ancestry cohort, where LD patterns and allele frequencies can diverge substantially.
Primary Results in South Asian Ancestry: Statistically Detectable but Limited Explained Liability
In the G&H cohort, European-derived PRS were associated with MS status, but the magnitude of explained liability was modest. The optimal PRS including the MHC explained ~1.1% of liability (adjusted Nagelkerke’s pseudo-R² = 0.011), while the optimal PRS excluding the MHC explained ~1.5% (pseudo-R² = 0.015). Notably, an MHC-only PRS did not show a statistically significant association with MS status in this cohort. The visualizations on page 5 illustrate these patterns: density plots show only partial separation between cases and controls, and the ROC curves show moderate discrimination. Although the area under the curve (AUC) for PRS models in G&H is around 0.70–0.71, the paper emphasizes that a substantial fraction of this discrimination is already captured by non-genetic covariates (age, sex, and principal components), as reflected in the relatively high null-model AUC (page 6). Put differently, the PRS adds information, but not enough—at least in this dataset—to support confident individual-level prediction.
Benchmarking Against European Ancestry: Clear Performance Gain in UK Biobank
To contextualize performance, the authors applied comparable PRS methods to UKB European-ancestry participants. Using the full European-ancestry UKB sample (2,091 MS cases; 374,866 controls), PRS performance was materially higher than in G&H, with liability explained of ~4.4% for the MHC-including PRS and ~2.3% for the non-MHC PRS (page 6). Because sample size can inflate apparent performance and stability, the authors also implemented a permutation-style subsampling exercise: they repeatedly drew UKB subsets matched to the G&H case-control counts (42 cases; 40,490 controls) and estimated PRS performance across 1,000 replicates. Even under matched sample size, the MHC-including PRS explained substantially more liability in European-ancestry UKB than in G&H, and the cross-cohort contrast is displayed in the comparative plot on page 6 (Figure 3). These results support the central claim: European-trained MS PRS are less accurate when applied to South Asian-ancestry individuals.
Interpretation: LD, Allele Frequencies, and Why the MHC May Behave Differently
The discussion offers a mechanistic explanation consistent with broader PRS literature: reduced portability is likely driven more by differences in LD structure and minor allele frequency than by a fundamentally different set of causal variants across populations. PRS often rely on tag SNPs that correlate with causal variation in the discovery population; when LD differs, the same tag SNPs may not capture the same causal signal in another ancestry, reducing predictive accuracy. The MHC findings are particularly instructive. Given that classical HLA risk alleles often show shared directionality across ancestries, the lack of improvement from adding MHC-tagging variants in G&H may reflect inadequate tagging by European GWAS SNPs in South Asian LD backgrounds, compounded by statistical imprecision from the small number of MS cases. The paper is careful to flag additional limitations: potential EHR misclassification or missed cases, lack of an external South Asian validation cohort, the possibility of overfitting because PRS selection and evaluation occur in the same dataset, and technical differences between cohorts (different genotyping arrays and imputation panels). Collectively, these factors mean the estimated effect sizes should be interpreted cautiously, but they do not undermine the overarching conclusion about reduced cross-ancestry performance.
Implications and Next Steps: Avoiding Inequitable Genomic Translation
The primary scientific implication is straightforward: if PRS are to be used for MS risk stratification—whether for trial enrichment, prevention research, or eventual clinical decision support—then reliance on European-only GWAS will yield systematically weaker performance in underrepresented populations. The ethical and public health implication is equally clear: deploying poorly transferable PRS could inadvertently reinforce disparities by offering more accurate genomic tools to some ancestry groups than others. The authors therefore argue for expanding ancestrally diverse MS genetic studies to build discovery datasets that better capture variation, LD, and allele frequencies across global populations. They also acknowledge that methodological innovations (e.g., multi-ancestry training strategies and fine-mapping-informed approaches) may improve transferability, but the persistent bottleneck is the scarcity of large, well-phenotyped non-European datasets. In this sense, the study functions both as an empirical evaluation of MS PRS portability and as a policy-relevant demonstration that equitable precision medicine requires equitable representation in genomic research.
Disclaimer: This blog post is based on the provided research article and is intended for informational purposes only. It is not intended to provide medical advice. Please consult with a healthcare professional for any health concerns.
References:
Breedon, J. R., Marshall, C. R., Giovannoni, G., van Heel, D. A., Dobson, R., & Jacobs, B. M. (2023). Polygenic risk score prediction of multiple sclerosis in individuals of South Asian ancestry. Brain Communications, 5(2), fcad041.
