Incorporating Bioinformatics Data to Understand Alcohol-related GWAS Results

Abstract

PURPOSE. Genome-wide association studies (GWAS) of alcohol dependence and genetically correlated externalizing disorders have produced a number of significant and suggestive findings. Now attention is shifting from cataloging these genetic associations to understanding their functional role in disease pathogenesis. The purpose of these studies is to illustrate how bioinformatics information can elucidate the biology underlying alcohol-related GWAS findings; and to examine whether incorporating bioinformatics information improves predictive power in tests of polygenic association. METHODS. Single nucleotide variants (SNVs) identified in a GWAS of an externalizing factor score in the Collaborative Study on the Genetics of Alcoholism (COGA) case-control GWAS sample (n = 1905) were tested for overrepresentation in genomic regions with putative regulatory functions. Polygenic risk scores derived from a GWAS of an alcohol problems factor score (in an independent discovery sample) were used to predict alcohol frequency and intoxication phenotypes in the FinnTwin 12 sample (n = 1128). Epigenomic annotations came from the ENCODE and RoadMap Epigenetics public access databases. RESULTS. Variants with smaller p-values for the COGA externalizing factor score were enriched for multiple histone modification marks and transcription factor binding sites across multiple cell types, showing up to 20-fold enrichment. Additionally, there was brain-specific enrichment in two histone modification marks associated with gene activation across multiple cell types (e.g., for H3K79me2 p-values ranged from 3.29e-03 to 7.83e-07). In tests of polygenic association in FinnTwin12, preferentially weighting variants that met nominal p-value thresholds of p < 0.01 and p < 0.05 in the discovery sample that were also located under a DNase I peak (i.e., in an open chromatin region and likely to have a regulatory function) resulted in a stronger association signal compared to traditional polygenic scoring methods that use all variants meeting these nominal p-value thresholds. Additional analyses indicate that the per-SNV effect is larger for the variants under a DNase I peak compared SNVs filtered by p-value only. CONCLUSIONS. The results add to emerging evidence that disease-associated variants are stratified, underscoring the importance of incorporating genomic annotation information in the interpretation of GWAS results. Discussion will focus on future direction in this area, including the use of bioinformatics information to improve polygenic risk prediction and identify new genetic signals, and the implications of this biological understanding for studies of gene-environment interplay for alcohol use disorder.