Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Deep sequencing of the MHC region in the Chinese population contributes to studies of complex disease

Abstract

The human major histocompatibility complex (MHC) region has been shown to be associated with numerous diseases. However, it remains a challenge to pinpoint the causal variants for these associations because of the extreme complexity of the region. We thus sequenced the entire 5-Mb MHC region in 20,635 individuals of Han Chinese ancestry (10,689 controls and 9,946 patients with psoriasis) and constructed a Han-MHC database that includes both variants and HLA gene typing results of high accuracy. We further identified multiple independent new susceptibility loci in HLA-C, HLA-B, HLA-DPB1 and BTNL2 and an intergenic variant, rs118179173, associated with psoriasis and confirmed the well-established risk allele HLA-C*06:02. We anticipate that our Han-MHC reference panel built by deep sequencing of a large number of samples will serve as a useful tool for investigating the role of the MHC region in a variety of diseases and thus advance understanding of the pathogenesis of these disorders.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Diversity of 29 MHC genes in the Han Chinese population.
Figure 2: Plots of stepwise conditional association of the variants for psoriasis in the MHC region.
Figure 3: HLA allele frequency in Han Chinese and European populations.

Similar content being viewed by others

Accession codes

Primary accessions

Sequence Read Archive

References

  1. MHC Sequencing Consortium. Complete sequence and gene map of a human major histocompatibility complex. Nature 401, 921–923 (1999).

  2. Trowsdale, J. The MHC, disease and selection. Immunol. Lett. 137, 1–8 (2011).

    Article  CAS  PubMed  Google Scholar 

  3. Trowsdale, J. & Knight, J.C. Major histocompatibility complex genomics and human disease. Annu. Rev. Genomics Hum. Genet. 14, 301–323 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Corvin, A. & Morris, D.W. Genome-wide association studies: findings at the major histocompatibility complex locus in psychosis. Biol. Psychiatry 75, 276–283 (2014).

    Article  CAS  PubMed  Google Scholar 

  5. Welter, D. et al. The NHGRI GWAS Catalog, a curated resource of SNP–trait associations. Nucleic Acids Res. 42, D1001–D1006 (2014).

    Article  CAS  PubMed  Google Scholar 

  6. Horton, R. et al. Variation analysis and gene annotation of eight MHC haplotypes: the MHC Haplotype Project. Immunogenetics 60, 1–18 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. de Bakker, P.I.W. et al. A high-resolution HLA and SNP haplotype map for disease association studies in the extended human MHC. Nat. Genet. 38, 1166–1172 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Raychaudhuri, S. et al. Five amino acids in three HLA proteins explain most of the association between MHC and seropositive rheumatoid arthritis. Nat. Genet. 44, 291–296 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Patsopoulos, N.A. et al. Fine-mapping the genetic association of the major histocompatibility complex in multiple sclerosis: HLA and non-HLA effects. PLoS Genet. 9, e1003926 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  10. Kim, K. et al. The HLA-DRβ1 amino acid positions 11–13–26 explain the majority of SLE–MHC associations. Nat. Commun. 5, 5902 (2014).

    Article  CAS  PubMed  Google Scholar 

  11. Cao, H. et al. An integrated tool to study MHC region: accurate SNV detection and HLA genes typing in human MHC region using targeted high-throughput sequencing. PLoS One 8, e69388 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).

  13. Robinson, J. et al. The IPD and IMGT/HLA database: allele variant databases. Nucleic Acids Res. 43, D423–D431 (2015).

    Article  CAS  PubMed  Google Scholar 

  14. Gourraud, P.-A. et al. HLA diversity in the 1000 Genomes dataset. PLoS One 9, e97282 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  15. Prugnolle, F. et al. Pathogen-driven selection and worldwide HLA class I diversity. Curr. Biol. 15, 1022–1027 (2005).

    Article  CAS  PubMed  Google Scholar 

  16. Gragert, L., Madbouly, A., Freeman, J. & Maiers, M. Six-locus high resolution HLA haplotype frequencies derived from mixed-resolution DNA typing for the entire US donor registry. Hum. Immunol. 74, 1313–1320 (2013).

    Article  CAS  PubMed  Google Scholar 

  17. Browning, S.R. & Browning, B.L. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81, 1084–1097 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Detrait, M. et al. Suggestive evidence of a role of HLA-DRB4 mismatches in the outcome of allogeneic hematopoietic stem cell transplantation with HLA-10/10–matched unrelated donors: a French–Swiss retrospective study. Bone Marrow Transplant. 50, 1316–1320 (2015).

    Article  CAS  PubMed  Google Scholar 

  19. Erlich, H.A. et al. Next generation sequencing reveals the association of DRB3*02:02 with type 1 diabetes. Diabetes 62, 2618–2622 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Okada, Y. et al. Risk for ACPA-positive rheumatoid arthritis is driven by shared HLA amino acid polymorphisms in Asian and European populations. Hum. Mol. Genet. 23, 6916–6926 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Kim, K., Bang, S.-Y., Lee, H.-S. & Bae, S.-C. Construction and application of a Korean reference panel for imputing classical alleles and amino acids of human leukocyte antigen genes. PLoS One 9, e112546 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  22. Jia, X. et al. Imputing amino acid polymorphisms in human leukocyte antigens. PLoS One 8, e64683 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Tang, H. et al. A large-scale screen for coding variants predisposing to psoriasis. Nat. Genet. 46, 45–50 (2014).

    Article  CAS  PubMed  Google Scholar 

  24. Zhang, X., He, P., Wei, S., Chen, S. & Xu, S. Evidence for a major psoriasis susceptibility locus at 6p21 (PSORS1) and a novel candidate region at 4q31 by genome-wide scan in Chinese Hans. J. Invest. Dermatol. 21, 1361–1366 (2002).

    Article  Google Scholar 

  25. Fan, X. et al. Fine mapping of the psoriasis susceptibility locus PSORS1 supports HLA-C as the susceptibility gene in the Han Chinese population. PLoS Genet. 4, e1000038 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Feng, B.J. et al. Multiple loci within the major histocompatibility complex confer risk of psoriasis. PLoS Genet. 5, e1000606 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  27. Okada, Y. et al. Fine mapping major histocompatibility complex associations in psoriasis and its clinical subtypes. Am. J. Hum. Genet. 95, 162–172 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Chen, P.-L. et al. Comprehensive genotyping in two homogeneous Graves' disease samples reveals major and novel HLA association alleles. PLoS One 6, e16635 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Arnett, H.A. & Viney, J.L. Immune modulation by butyrophilins. Nat. Rev. Immunol. 14, 559–569 (2014).

    Article  CAS  PubMed  Google Scholar 

  30. Yang, J., Lee, S.H., Goddard, M.E. & Visscher, P.M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Zuo, X. et al. Whole-exome SNP array identifies 15 new susceptibility loci for psoriasis. Nat. Commun. 6, 6793 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Chan, C.J., Smyth, M.J. & Martinet, L. Molecular mechanisms of natural killer cell activation in response to cellular stress. Cell Death Differ. 21, 5–14 (2014).

    Article  CAS  PubMed  Google Scholar 

  33. Suh, W.K. et al. Interaction of MHC class I molecules with the transporter associated with antigen processing. Science 264, 1322–1326 (1994).

    Article  CAS  PubMed  Google Scholar 

  34. Myers, E.W. et al. A whole-genome assembly of Drosophila. Science 287, 2196–2204 (2000).

    Article  CAS  PubMed  Google Scholar 

  35. Li, R. et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265–272 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Cao, H. et al. De novo assembly of a haplotype-resolved human genome. Nat. Biotechnol. 33, 617–622 (2015).

    Article  CAS  PubMed  Google Scholar 

  37. Jiang, L. et al. Novel risk loci for rheumatoid arthritis in Han Chinese and congruence with risk variants in Europeans. Arthritis Rheumatol. 66, 1121–1132 (2014).

    Article  CAS  PubMed  Google Scholar 

  38. Dubois, P.C. et al. Multiple common variants for celiac disease influencing immune gene expression. Nat. Genet. 42, 295–302 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Chu, X. et al. A genome-wide association study identifies two new risk loci for Graves' disease. Nat. Genet. 43, 897–901 (2011).

    Article  CAS  PubMed  Google Scholar 

  40. Li, Y. et al. A genome-wide association study in Han Chinese identifies a susceptibility locus for primary Sjögren's syndrome at 7q11.23. Nat. Genet. 45, 1361–1365 (2013).

    Article  CAS  PubMed  Google Scholar 

  41. Mbarek, H. et al. A genome-wide association study of chronic hepatitis B identified novel risk locus in a Japanese population. Hum. Mol. Genet. 20, 3884–3892 (2011).

    Article  CAS  PubMed  Google Scholar 

  42. Chang, S.-W. et al. A genome-wide association study on chronic HBV infection and its clinical progression in male Han–Taiwanese. PLoS One 9, e99724 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  43. Hu, Z. et al. New loci associated with chronic hepatitis B virus infection in Han Chinese. Nat. Genet. 45, 1499–1503 (2013).

    Article  CAS  PubMed  Google Scholar 

  44. Zhang, X.-J. et al. Psoriasis genome-wide association study identifies susceptibility variants within LCE gene cluster at 1q21. Nat. Genet. 41, 205–210 (2009).

    Article  CAS  PubMed  Google Scholar 

  45. Yin, X. et al. Genome-wide meta-analysis identifies multiple novel associations and ethnic heterogeneity of psoriasis susceptibility. Nat. Commun. 6, 6916 (2015).

    Article  CAS  PubMed  Google Scholar 

  46. McClellan, J. & King, M.-C. Genetic heterogeneity in human disease. Cell 141, 210–217 (2010).

    Article  CAS  PubMed  Google Scholar 

  47. Zhou, F. et al. Supporting data for “Deep sequencing of human major histocompatibility complex region contributes to studies of complex disease”. GigaScience. Database http://dx.doi.org/10.5524/100156 (2015).

  48. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  49. McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  51. Adzhubei, I.A. et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248–249 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Kumar, P., Henikoff, S. & Ng, P.C. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat. Protoc. 4, 1073–1081 (2009).

    Article  CAS  PubMed  Google Scholar 

  53. Chun, S. & Fay, J.C. Identification of deleterious mutations within three human genomes. Genome Res. 19, 1553–1561 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Schwarz, J.M., Rödelsperger, C., Schuelke, M. & Seelow, D. MutationTaster evaluates disease-causing potential of sequence alterations. Nat. Methods 7, 575–576 (2010).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We thank the faculty and staff at Anhui Medical University and BGI-Shenzhen who contributed to the Han-MHC project. We acknowledge grant support from the Key Program of the National Natural Science Foundation of China (81130031), the National Science Fund for Excellent Young Scholars (81222022), the Top-Notch Young Talents Program of China, the Pre-National Basic Research Program of China (973 plan; 2012CB722404), the National Natural Science Foundation of China (81573035, 81273301, 81271747, 81370044, 8157120504 and 81502713), the Natural Science Foundation of Anhui Province (1508085JGD05), the Program of Outstanding Talents of Anhui Medical University and the Shenzhen municipal government of China (CXZZ20140904154910774).

Author information

Authors and Affiliations

Authors

Contributions

Xuejun Zhang and Jun Wang conceived the study and designed scientific objectives. Xuejun Zhang, Jun Wang, L.S., L.H., J. Liu and X. Zuo participated in the study design. L.S. and H.C. led the project and manuscript preparation. H.C., T.Z. and Xiaomin Liu managed the project. Xiaoguang Zhang, R.X., B.L., G.C., C.S., C. Zhu, X.F., M.Y., C. Zhang, L.Y., M.C., L.T., L.W., Y.X., S.Z., G.L., L.Z., Y. Wu, Z.Z., Y. Cui, Z.W., C. Yang, P.W., L.X., X.C., A.Z., X.G., F. Zhang, J.X., M.Z., J. Zheng, J. Zhang, X. Yu and S.Y. conducted sample selection and data management, undertook recruitment, collected phenotype data, undertook related data handling and calculations, managed recruitment and obtained biological samples. H.J., F.X., Xiao Liu, J. Wu and J. Li generated the sequence data. T.Z., Xiaomin Liu, Yuanwei Zhang, X.J., J.M., Q.L., J.S., X. Zhuang, H.S., Yijie Zhang, Y. Wang, H.X., M.B., Y. Chen, W.C., H.Y., Jian Wang and C. Ye performed polymorphism analysis and constructed the Han-MHC database. F. Zhou, H.C., T.Z., Xiaoguang Zhang, Xiaomin Liu, G.C., Yuanwei Zhang, X. Zheng, J.G., Y.S., X. Yin, Jianan Wang, T.K., X.X., Y.L. and L.H. conducted the association analysis. F. Zhou, H.C., X. Zuo, T.Z., Xiaoguang Zhang and Xiaomin Liu did most of the writing with contributions from all authors. All authors contributed to the final manuscript, with Xuejun Zhang, Jun Wang, L.S., L.H., F. Zhou, H.C., X. Zuo, T.Z., Xiaoguang Zhang and Xiaomin Liu having key roles.

Corresponding authors

Correspondence to Lennart Hammarström, Liangdan Sun, Jun Wang or Xuejun Zhang.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Distribution of healthy control samples included in the MHC sequencing study.

In total, 10,689 normal individuals were recruited in the sequencing stage.

Supplementary Figure 2 The average depth distribution along the MHC region of the sequenced samples.

Supplementary Figure 3 Workflow for the basic analysis pipeline used in this study.

Supplementary Figure 4 Distribution of the number of variants in 10,689 individuals.

The average SNP number in these samples approached 15,000, and the average indel number approached 2,000.

Supplementary Figure 5 Allelic frequencies of the eight most polymorphic genes.

Allelic frequencies for HLA-B, HLA-DRB1, HLA-C, HLA-A, MICA, HLA-DQB1, HLA-DQA1 and HLA-DPB1 (sorted on the basis of diversity and arranged from highest to lowest) in the Chinese population are given in each chart. Only the top 20 alleles for each gene are shown in the diagram.

Supplementary Figure 6 MHC haplotype frequency distribution.

Haplotypes are given as HLA A-B-C-DRB1-DQB1. Of the extended haplotypes, HLA A30-B13-C06-DR07-DQ02 was the most common haplotype with a frequency of 3.89%.

Supplementary Figure 7 The Pearson coefficient R between different geographical regions.

(ac) Results are shown for SNPs (a), amino acids and HLA types (b) and MHC haplotypes (c). Green, southern versus central China; black, northern versus central China; blue, southern versus northern China.

Supplementary Figure 8 The frequency of MHC haplotypes.

Only haplotypes with a frequency of ≥0.5% in the Han Chinese population are shown. The red line represents the overall frequency in the Han Chinese population. The blue, green and brown lines represent the frequency in northern, central and southern Han Chinese populations, respectively.

Supplementary Figure 9 Linkage map of HLA-DRB3, HLA-DRB4 and HLA-DRB5 with HLA-DRB1.

Supplementary Figure 10 HLA type imputation accuracy.

Supplementary Figure 11 The Pearson r2 value between imputed and standard alleles of the five classical HLA genes at different allele frequencies.

The mean r2 value is 0.97 for common alleles, 0.93 for low-frequency alleles and 0.81 for rare alleles.

Supplementary Figure 12 Functional annotation of an identified intergenic variant (rs118179173).

The first six rows show H3K4me3 (green) data for CD4+ memory primary cells, CD4+ naive primary cells, CD8+ memory primary cells, CD8+ naive primary cells, Treg primary cells and the GM12878 cell line (B lymphocyte, lymphoblastoid; International HapMap Project CEPH/Utah, European Caucasian, Epstein–Barr virus). The next five rows show H3K27ac (blue) data for CD4+ memory primary cells, CD4+ naive primary cells, CD8+ memory primary cells, CD8+ naive primary cells and the GM12878 cell line. Then, the next six rows show H3K36me3 (green) data for CD4+ memory primary cells, CD4+ naive primary cells, CD8+ memory primary cells, CD8+ naive primary cells, Treg primary cells and the GM12878 cell line. The chromatin states displayed are for CD4+ memory primary cells, CD4+ naive primary cells, CD8+ memory primary cells, CD8+ naive primary cells and the GM12878 cell line. The detailed color schemes for the chromatin states are listed below. Briefly, red corresponds to active transcriptional start sites (TSSs), yellow corresponds to enhancers, green corresponds to transcription and white corresponds to quiescent regions. DNase I hypersensitivity sites are for CD4+ primary cells, CD8+ primary cells, CD14+ monocytes, Treg, TH1 and TH2 cells, and the GM12878 cell line. All data are publicly available from ENCODE and NIH Roadmap. Raw data were plotted using the website http://epigenomegateway.wustl.edu/browser/.

Supplementary Figure 13 Three-dimensional ribbon model for HLA-B.

Key amino acid positions identified in psoriasis association analysis are highlighted.

Supplementary Figure 14 Example of a spurious variant site.

Supplementary Figure 15 The effect of purifying selection on variants.

(a) The relationship between MAFs and functional prediction scores from SIFT, PolyPhen-2, LRT and MutationTaster. Each prediction score showed a significant negative correlation with MAF in the studied samples. (b) The rare variant excess in most functional sequences, which varies systematically between types (for example, transcription factor motif variants have higher rare variant excess than splicing variants). Interestingly, the least conserved nonsynonymous variants show similar rare variant loads to UTR and synonymous regions, suggesting that these alternative transcripts are under very weak selective constraint.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–15, Supplementary Tables 2, 4, 6, 7 and 9 and Supplementary Note. (PDF 2188 kb)

Supplementary Table 1

HLA type frequency. (XLSX 31 kb)

Supplementary Table 3

Selected tagging SNPs. (XLSX 33 kb)

Supplementary Table 5

MHC haplotype frequency. (XLSX 214 kb)

Supplementary Table 8

HLA types from sequencing data and the 1000 Genomes Project. (XLSX 13 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, F., Cao, H., Zuo, X. et al. Deep sequencing of the MHC region in the Chinese population contributes to studies of complex disease. Nat Genet 48, 740–746 (2016). https://doi.org/10.1038/ng.3576

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng.3576

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing