Abstract
The human major histocompatibility complex (MHC) region has been shown to be associated with numerous diseases. However, it remains a challenge to pinpoint the causal variants for these associations because of the extreme complexity of the region. We thus sequenced the entire 5-Mb MHC region in 20,635 individuals of Han Chinese ancestry (10,689 controls and 9,946 patients with psoriasis) and constructed a Han-MHC database that includes both variants and HLA gene typing results of high accuracy. We further identified multiple independent new susceptibility loci in HLA-C, HLA-B, HLA-DPB1 and BTNL2 and an intergenic variant, rs118179173, associated with psoriasis and confirmed the well-established risk allele HLA-C*06:02. We anticipate that our Han-MHC reference panel built by deep sequencing of a large number of samples will serve as a useful tool for investigating the role of the MHC region in a variety of diseases and thus advance understanding of the pathogenesis of these disorders.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Accession codes
References
MHC Sequencing Consortium. Complete sequence and gene map of a human major histocompatibility complex. Nature 401, 921–923 (1999).
Trowsdale, J. The MHC, disease and selection. Immunol. Lett. 137, 1–8 (2011).
Trowsdale, J. & Knight, J.C. Major histocompatibility complex genomics and human disease. Annu. Rev. Genomics Hum. Genet. 14, 301–323 (2013).
Corvin, A. & Morris, D.W. Genome-wide association studies: findings at the major histocompatibility complex locus in psychosis. Biol. Psychiatry 75, 276–283 (2014).
Welter, D. et al. The NHGRI GWAS Catalog, a curated resource of SNP–trait associations. Nucleic Acids Res. 42, D1001–D1006 (2014).
Horton, R. et al. Variation analysis and gene annotation of eight MHC haplotypes: the MHC Haplotype Project. Immunogenetics 60, 1–18 (2008).
de Bakker, P.I.W. et al. A high-resolution HLA and SNP haplotype map for disease association studies in the extended human MHC. Nat. Genet. 38, 1166–1172 (2006).
Raychaudhuri, S. et al. Five amino acids in three HLA proteins explain most of the association between MHC and seropositive rheumatoid arthritis. Nat. Genet. 44, 291–296 (2012).
Patsopoulos, N.A. et al. Fine-mapping the genetic association of the major histocompatibility complex in multiple sclerosis: HLA and non-HLA effects. PLoS Genet. 9, e1003926 (2013).
Kim, K. et al. The HLA-DRβ1 amino acid positions 11–13–26 explain the majority of SLE–MHC associations. Nat. Commun. 5, 5902 (2014).
Cao, H. et al. An integrated tool to study MHC region: accurate SNV detection and HLA genes typing in human MHC region using targeted high-throughput sequencing. PLoS One 8, e69388 (2013).
1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).
Robinson, J. et al. The IPD and IMGT/HLA database: allele variant databases. Nucleic Acids Res. 43, D423–D431 (2015).
Gourraud, P.-A. et al. HLA diversity in the 1000 Genomes dataset. PLoS One 9, e97282 (2014).
Prugnolle, F. et al. Pathogen-driven selection and worldwide HLA class I diversity. Curr. Biol. 15, 1022–1027 (2005).
Gragert, L., Madbouly, A., Freeman, J. & Maiers, M. Six-locus high resolution HLA haplotype frequencies derived from mixed-resolution DNA typing for the entire US donor registry. Hum. Immunol. 74, 1313–1320 (2013).
Browning, S.R. & Browning, B.L. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81, 1084–1097 (2007).
Detrait, M. et al. Suggestive evidence of a role of HLA-DRB4 mismatches in the outcome of allogeneic hematopoietic stem cell transplantation with HLA-10/10–matched unrelated donors: a French–Swiss retrospective study. Bone Marrow Transplant. 50, 1316–1320 (2015).
Erlich, H.A. et al. Next generation sequencing reveals the association of DRB3*02:02 with type 1 diabetes. Diabetes 62, 2618–2622 (2013).
Okada, Y. et al. Risk for ACPA-positive rheumatoid arthritis is driven by shared HLA amino acid polymorphisms in Asian and European populations. Hum. Mol. Genet. 23, 6916–6926 (2014).
Kim, K., Bang, S.-Y., Lee, H.-S. & Bae, S.-C. Construction and application of a Korean reference panel for imputing classical alleles and amino acids of human leukocyte antigen genes. PLoS One 9, e112546 (2014).
Jia, X. et al. Imputing amino acid polymorphisms in human leukocyte antigens. PLoS One 8, e64683 (2013).
Tang, H. et al. A large-scale screen for coding variants predisposing to psoriasis. Nat. Genet. 46, 45–50 (2014).
Zhang, X., He, P., Wei, S., Chen, S. & Xu, S. Evidence for a major psoriasis susceptibility locus at 6p21 (PSORS1) and a novel candidate region at 4q31 by genome-wide scan in Chinese Hans. J. Invest. Dermatol. 21, 1361–1366 (2002).
Fan, X. et al. Fine mapping of the psoriasis susceptibility locus PSORS1 supports HLA-C as the susceptibility gene in the Han Chinese population. PLoS Genet. 4, e1000038 (2008).
Feng, B.J. et al. Multiple loci within the major histocompatibility complex confer risk of psoriasis. PLoS Genet. 5, e1000606 (2009).
Okada, Y. et al. Fine mapping major histocompatibility complex associations in psoriasis and its clinical subtypes. Am. J. Hum. Genet. 95, 162–172 (2014).
Chen, P.-L. et al. Comprehensive genotyping in two homogeneous Graves' disease samples reveals major and novel HLA association alleles. PLoS One 6, e16635 (2011).
Arnett, H.A. & Viney, J.L. Immune modulation by butyrophilins. Nat. Rev. Immunol. 14, 559–569 (2014).
Yang, J., Lee, S.H., Goddard, M.E. & Visscher, P.M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).
Zuo, X. et al. Whole-exome SNP array identifies 15 new susceptibility loci for psoriasis. Nat. Commun. 6, 6793 (2015).
Chan, C.J., Smyth, M.J. & Martinet, L. Molecular mechanisms of natural killer cell activation in response to cellular stress. Cell Death Differ. 21, 5–14 (2014).
Suh, W.K. et al. Interaction of MHC class I molecules with the transporter associated with antigen processing. Science 264, 1322–1326 (1994).
Myers, E.W. et al. A whole-genome assembly of Drosophila. Science 287, 2196–2204 (2000).
Li, R. et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265–272 (2010).
Cao, H. et al. De novo assembly of a haplotype-resolved human genome. Nat. Biotechnol. 33, 617–622 (2015).
Jiang, L. et al. Novel risk loci for rheumatoid arthritis in Han Chinese and congruence with risk variants in Europeans. Arthritis Rheumatol. 66, 1121–1132 (2014).
Dubois, P.C. et al. Multiple common variants for celiac disease influencing immune gene expression. Nat. Genet. 42, 295–302 (2010).
Chu, X. et al. A genome-wide association study identifies two new risk loci for Graves' disease. Nat. Genet. 43, 897–901 (2011).
Li, Y. et al. A genome-wide association study in Han Chinese identifies a susceptibility locus for primary Sjögren's syndrome at 7q11.23. Nat. Genet. 45, 1361–1365 (2013).
Mbarek, H. et al. A genome-wide association study of chronic hepatitis B identified novel risk locus in a Japanese population. Hum. Mol. Genet. 20, 3884–3892 (2011).
Chang, S.-W. et al. A genome-wide association study on chronic HBV infection and its clinical progression in male Han–Taiwanese. PLoS One 9, e99724 (2014).
Hu, Z. et al. New loci associated with chronic hepatitis B virus infection in Han Chinese. Nat. Genet. 45, 1499–1503 (2013).
Zhang, X.-J. et al. Psoriasis genome-wide association study identifies susceptibility variants within LCE gene cluster at 1q21. Nat. Genet. 41, 205–210 (2009).
Yin, X. et al. Genome-wide meta-analysis identifies multiple novel associations and ethnic heterogeneity of psoriasis susceptibility. Nat. Commun. 6, 6916 (2015).
McClellan, J. & King, M.-C. Genetic heterogeneity in human disease. Cell 141, 210–217 (2010).
Zhou, F. et al. Supporting data for “Deep sequencing of human major histocompatibility complex region contributes to studies of complex disease”. GigaScience. Database http://dx.doi.org/10.5524/100156 (2015).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).
Adzhubei, I.A. et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248–249 (2010).
Kumar, P., Henikoff, S. & Ng, P.C. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat. Protoc. 4, 1073–1081 (2009).
Chun, S. & Fay, J.C. Identification of deleterious mutations within three human genomes. Genome Res. 19, 1553–1561 (2009).
Schwarz, J.M., Rödelsperger, C., Schuelke, M. & Seelow, D. MutationTaster evaluates disease-causing potential of sequence alterations. Nat. Methods 7, 575–576 (2010).
Acknowledgements
We thank the faculty and staff at Anhui Medical University and BGI-Shenzhen who contributed to the Han-MHC project. We acknowledge grant support from the Key Program of the National Natural Science Foundation of China (81130031), the National Science Fund for Excellent Young Scholars (81222022), the Top-Notch Young Talents Program of China, the Pre-National Basic Research Program of China (973 plan; 2012CB722404), the National Natural Science Foundation of China (81573035, 81273301, 81271747, 81370044, 8157120504 and 81502713), the Natural Science Foundation of Anhui Province (1508085JGD05), the Program of Outstanding Talents of Anhui Medical University and the Shenzhen municipal government of China (CXZZ20140904154910774).
Author information
Authors and Affiliations
Contributions
Xuejun Zhang and Jun Wang conceived the study and designed scientific objectives. Xuejun Zhang, Jun Wang, L.S., L.H., J. Liu and X. Zuo participated in the study design. L.S. and H.C. led the project and manuscript preparation. H.C., T.Z. and Xiaomin Liu managed the project. Xiaoguang Zhang, R.X., B.L., G.C., C.S., C. Zhu, X.F., M.Y., C. Zhang, L.Y., M.C., L.T., L.W., Y.X., S.Z., G.L., L.Z., Y. Wu, Z.Z., Y. Cui, Z.W., C. Yang, P.W., L.X., X.C., A.Z., X.G., F. Zhang, J.X., M.Z., J. Zheng, J. Zhang, X. Yu and S.Y. conducted sample selection and data management, undertook recruitment, collected phenotype data, undertook related data handling and calculations, managed recruitment and obtained biological samples. H.J., F.X., Xiao Liu, J. Wu and J. Li generated the sequence data. T.Z., Xiaomin Liu, Yuanwei Zhang, X.J., J.M., Q.L., J.S., X. Zhuang, H.S., Yijie Zhang, Y. Wang, H.X., M.B., Y. Chen, W.C., H.Y., Jian Wang and C. Ye performed polymorphism analysis and constructed the Han-MHC database. F. Zhou, H.C., T.Z., Xiaoguang Zhang, Xiaomin Liu, G.C., Yuanwei Zhang, X. Zheng, J.G., Y.S., X. Yin, Jianan Wang, T.K., X.X., Y.L. and L.H. conducted the association analysis. F. Zhou, H.C., X. Zuo, T.Z., Xiaoguang Zhang and Xiaomin Liu did most of the writing with contributions from all authors. All authors contributed to the final manuscript, with Xuejun Zhang, Jun Wang, L.S., L.H., F. Zhou, H.C., X. Zuo, T.Z., Xiaoguang Zhang and Xiaomin Liu having key roles.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Integrated supplementary information
Supplementary Figure 1 Distribution of healthy control samples included in the MHC sequencing study.
In total, 10,689 normal individuals were recruited in the sequencing stage.
Supplementary Figure 2 The average depth distribution along the MHC region of the sequenced samples.
Supplementary Figure 4 Distribution of the number of variants in 10,689 individuals.
The average SNP number in these samples approached 15,000, and the average indel number approached 2,000.
Supplementary Figure 5 Allelic frequencies of the eight most polymorphic genes.
Allelic frequencies for HLA-B, HLA-DRB1, HLA-C, HLA-A, MICA, HLA-DQB1, HLA-DQA1 and HLA-DPB1 (sorted on the basis of diversity and arranged from highest to lowest) in the Chinese population are given in each chart. Only the top 20 alleles for each gene are shown in the diagram.
Supplementary Figure 6 MHC haplotype frequency distribution.
Haplotypes are given as HLA A-B-C-DRB1-DQB1. Of the extended haplotypes, HLA A30-B13-C06-DR07-DQ02 was the most common haplotype with a frequency of 3.89%.
Supplementary Figure 7 The Pearson coefficient R between different geographical regions.
(a–c) Results are shown for SNPs (a), amino acids and HLA types (b) and MHC haplotypes (c). Green, southern versus central China; black, northern versus central China; blue, southern versus northern China.
Supplementary Figure 8 The frequency of MHC haplotypes.
Only haplotypes with a frequency of ≥0.5% in the Han Chinese population are shown. The red line represents the overall frequency in the Han Chinese population. The blue, green and brown lines represent the frequency in northern, central and southern Han Chinese populations, respectively.
Supplementary Figure 11 The Pearson r2 value between imputed and standard alleles of the five classical HLA genes at different allele frequencies.
The mean r2 value is 0.97 for common alleles, 0.93 for low-frequency alleles and 0.81 for rare alleles.
Supplementary Figure 12 Functional annotation of an identified intergenic variant (rs118179173).
The first six rows show H3K4me3 (green) data for CD4+ memory primary cells, CD4+ naive primary cells, CD8+ memory primary cells, CD8+ naive primary cells, Treg primary cells and the GM12878 cell line (B lymphocyte, lymphoblastoid; International HapMap Project CEPH/Utah, European Caucasian, Epstein–Barr virus). The next five rows show H3K27ac (blue) data for CD4+ memory primary cells, CD4+ naive primary cells, CD8+ memory primary cells, CD8+ naive primary cells and the GM12878 cell line. Then, the next six rows show H3K36me3 (green) data for CD4+ memory primary cells, CD4+ naive primary cells, CD8+ memory primary cells, CD8+ naive primary cells, Treg primary cells and the GM12878 cell line. The chromatin states displayed are for CD4+ memory primary cells, CD4+ naive primary cells, CD8+ memory primary cells, CD8+ naive primary cells and the GM12878 cell line. The detailed color schemes for the chromatin states are listed below. Briefly, red corresponds to active transcriptional start sites (TSSs), yellow corresponds to enhancers, green corresponds to transcription and white corresponds to quiescent regions. DNase I hypersensitivity sites are for CD4+ primary cells, CD8+ primary cells, CD14+ monocytes, Treg, TH1 and TH2 cells, and the GM12878 cell line. All data are publicly available from ENCODE and NIH Roadmap. Raw data were plotted using the website http://epigenomegateway.wustl.edu/browser/.
Supplementary Figure 13 Three-dimensional ribbon model for HLA-B.
Key amino acid positions identified in psoriasis association analysis are highlighted.
Supplementary Figure 15 The effect of purifying selection on variants.
(a) The relationship between MAFs and functional prediction scores from SIFT, PolyPhen-2, LRT and MutationTaster. Each prediction score showed a significant negative correlation with MAF in the studied samples. (b) The rare variant excess in most functional sequences, which varies systematically between types (for example, transcription factor motif variants have higher rare variant excess than splicing variants). Interestingly, the least conserved nonsynonymous variants show similar rare variant loads to UTR and synonymous regions, suggesting that these alternative transcripts are under very weak selective constraint.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–15, Supplementary Tables 2, 4, 6, 7 and 9 and Supplementary Note. (PDF 2188 kb)
Supplementary Table 1
HLA type frequency. (XLSX 31 kb)
Supplementary Table 3
Selected tagging SNPs. (XLSX 33 kb)
Supplementary Table 5
MHC haplotype frequency. (XLSX 214 kb)
Supplementary Table 8
HLA types from sequencing data and the 1000 Genomes Project. (XLSX 13 kb)
Rights and permissions
About this article
Cite this article
Zhou, F., Cao, H., Zuo, X. et al. Deep sequencing of the MHC region in the Chinese population contributes to studies of complex disease. Nat Genet 48, 740–746 (2016). https://doi.org/10.1038/ng.3576
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/ng.3576
This article is cited by
-
Targeted capture enrichment and sequencing identifies HLA variants associated with the severity of COVID-19
Genes & Genomics (2023)
-
Cancer cell-expressed BTNL2 facilitates tumour immune escape via engagement with IL-17A-producing γδ T cells
Nature Communications (2022)
-
HLA imputation and its application to genetic and molecular fine-mapping of the MHC region in autoimmune diseases
Seminars in Immunopathology (2022)
-
MHC associations of ankylosing spondylitis in East Asians are complex and involve non-HLA-B27 HLA contributions
Arthritis Research & Therapy (2020)
-
HLA-A*02:01 allele is associated with tanshinone-induced cutaneous drug reactions in Chinese population
The Pharmacogenomics Journal (2020)