TY - JOUR
T1 - Polygenic risk prediction
T2 - why and when out-of-sample prediction R2 can exceed SNP-based heritability
AU - Major Depressive Disorder Working Group of the Psychiatric Genomics Consortium
AU - Wang, Xiaotong
AU - Walker, Alicia
AU - Revez, Joana A.
AU - Ni, Guiyan
AU - Adams, Mark J.
AU - McIntosh, Andrew M.
AU - Wray, Naomi R.
AU - Ripke, Stephan
AU - Mattheisen, Manuel
AU - Trzaskowski, Maciej
AU - Byrne, Enda M.
AU - Abdellaoui, Abdel
AU - Agerbo, Esben
AU - Air, Tracy M.
AU - Andlauer, Till F.M.
AU - Bacanu, Silviu Alin
AU - Bækvad-Hansen, Marie
AU - Beekman, Aartjan T.F.
AU - Bigdeli, Tim B.
AU - Binder, Elisabeth B.
AU - Bryois, Julien
AU - Buttenschøn, Henriette N.
AU - Bybjerg-Grauholm, Jonas
AU - Cai, Na
AU - Castelao, Enrique
AU - Christensen, Jane Hvarregaard
AU - Clarke, Toni Kim
AU - Coleman, Jonathan R.I.
AU - Colodro-Conde, Lucía
AU - Couvy-Duchesne, Baptiste
AU - Craddock, Nick
AU - Crawford, Gregory E.
AU - Davies, Gail
AU - Degenhardt, Franziska
AU - Derks, Eske M.
AU - Direk, Nese
AU - Dolan, Conor V.
AU - Dunn, Erin C.
AU - Eley, Thalia C.
AU - Escott-Price, Valentina
AU - Kiadeh, Farnush Farhadi Hassan
AU - Finucane, Hilary K.
AU - Foo, Jerome C.
AU - Forstner, Andreas J.
AU - Frank, Josef
AU - Gaspar, Héléna A.
AU - Gill, Michael
AU - Goes, Fernando S.
AU - Gordon, Scott D.
AU - Hottenga, Jouke Jan
N1 - Publisher Copyright:
© 2023 American Society of Human Genetics
PY - 2023/7/6
Y1 - 2023/7/6
N2 - In polygenic score (PGS) analysis, the coefficient of determination (R-2 ) is a key statistic to evaluate efficacy. R-2 is the proportion of phenotypic variance explained by the PGS, calculated in a cohort that is independent of the genome-wide association study (GWAS) that provided estimates of allelic effect sizes. The SNP-based heritability (h(2) (SNP), the proportion of total phenotypic variances attributable to all common SNPs) is the theoretical upper limit of the out-of-sample prediction R-2. However, in real data analyses R-2 has been reported to exceed h(SNP)(2), which occurs in parallel with the observation that h(SNP)(2) estimates tend to decline as the number of cohorts being meta-analyzed increases. Here, we quantify why and when these observations are expected. Using theory and simulation, we show that if heterogeneities in cohort-specific h(SNP)(2) exist, or if genetic correlations between cohorts are less than one, h(SNP)(2) estimates can decrease as the number of cohorts being meta-analyzed increases. We derive conditions when the out-of-sample prediction R-2 will be greater than h(SNP)(2) and show the validity of our derivations with real data from a binary trait (major depression) and a continuous trait (educational attainment). Our research calls for a better approach to integrating information from multiple cohorts to address issues of between-cohort heterogeneity.
AB - In polygenic score (PGS) analysis, the coefficient of determination (R-2 ) is a key statistic to evaluate efficacy. R-2 is the proportion of phenotypic variance explained by the PGS, calculated in a cohort that is independent of the genome-wide association study (GWAS) that provided estimates of allelic effect sizes. The SNP-based heritability (h(2) (SNP), the proportion of total phenotypic variances attributable to all common SNPs) is the theoretical upper limit of the out-of-sample prediction R-2. However, in real data analyses R-2 has been reported to exceed h(SNP)(2), which occurs in parallel with the observation that h(SNP)(2) estimates tend to decline as the number of cohorts being meta-analyzed increases. Here, we quantify why and when these observations are expected. Using theory and simulation, we show that if heterogeneities in cohort-specific h(SNP)(2) exist, or if genetic correlations between cohorts are less than one, h(SNP)(2) estimates can decrease as the number of cohorts being meta-analyzed increases. We derive conditions when the out-of-sample prediction R-2 will be greater than h(SNP)(2) and show the validity of our derivations with real data from a binary trait (major depression) and a continuous trait (educational attainment). Our research calls for a better approach to integrating information from multiple cohorts to address issues of between-cohort heterogeneity.
KW - Explain
KW - Human height
KW - Large proportion
UR - https://www.scopus.com/pages/publications/85164270154
U2 - 10.1016/j.ajhg.2023.06.006
DO - 10.1016/j.ajhg.2023.06.006
M3 - Article
C2 - 37379836
AN - SCOPUS:85164270154
SN - 0002-9297
VL - 110
SP - 1207
EP - 1215
JO - American Journal of Human Genetics
JF - American Journal of Human Genetics
IS - 7
ER -