Quantification and detection of genetic risk factors in the familial aggregation of cancer: Habilitation thesis

Research output: ThesisDoctoral Thesis (compilation)


Cancer is a multifactorial inheritance disorder caused by a combination of small variations in genes, often acting together with environmental factors. GWASs provide a potential method to investigate the genetic basis of such complex diseases, but they are limited to two main classes of cancer susceptibility variants with different levels of risk and prevalence in the general population: rare moderate­penetrance variants or common low­penetrance alleles. While a subgroup of risk alleles contributing to the disease heritability has been detected, the underlying biological constraints are missing, and estimates are needed to predict the number of common variants not yet associated with the disease. The remaining loci may have the potential to explain the majority of the familial recurrence of these cancers. Homozygosity mapping has the advantage of potentially being able to also identify rare recessive variants potentially causal for different cancers. Functional inferences may help to ascertain whether causal genetic variants are also mQTLs and may be influencing disease risk partly via epigenetic pathways.
This postdoctoral thesis is aimed to quantify and identify genetic risk factors in different cancers. Using genome-wide association data on Hodgkin's lymphoma (HL) by considering all single-nucleotide polymorphisms (SNPs) simultaneously, we estimated that heritability accounts for 35% of the total variance. A validation on Swedish population data from the multigenerational pedigree showed a heritability of 40%. The methodological proof was applied to testicular germ cell tumors. A recent follow up meta-analysis of two genome-wide association studies (GWASs) and a new HL study set then identified additional key regulators of disease susceptibility for HL. The homozygosity burden was used to investigate the genetic architecture of HL, thyroid cancer (TC) and breast cancer among specific populations and to identify inbreeding and additional risk loci. A GWAS could also identify 10 new risk loci for the development of monoclonal gammopathy of undetermined significance (MGUS). Finally, the complex connection of SNPs and the epigenetic regulation of gene expression were investigated to address whether methylated quantitative trait loci (mQTLs) exist in the vicinity of smoking-related CpG sites.
Common genetic variation in Hodgkin Lymphoma
GWASs have identified several risk loci for HL and demonstrated the genetic variation for this cancer. Evidence for inherited genetic risk is also provided by the family history and the very high concordance between monozygotic twins. Little is known about the genetic and environmental contributions. A common measure for describing the phenotypic variation due to genetics is the heritability. Using GWAS data on 906 HL cases by considering all typed SNPs simultaneously, we have calculated that the heritability accounts for ~35% of the total variation in HL (95% confidence interval 6–62%). These findings are consistent with similar heritability estimates of ∼0.40 (95% confidence interval 0.17–0.58) based on Swedish population data. Our estimates support the underlying polygenic basis for susceptibility to HL, and show that heritability based on the population data is somehow larger than heritability based on the genomic data because of the possibility of some missing heritability in the GWAS data. Besides that there is still major evidence for multiple loci contributing to HL susceptibility on chromosomes other than chromosome 6 that need to be detected.
Effects of homozygosity and inbreeding in different cancers
GWASs have also identified several SNPs for the risk of TC. Most cancer risk genes identified so far function in a co-dominant manner, and studies have not found evidence for recessively acting disease loci in TC. Data from a previously conducted GWAS were used for the estimation of the heritability, the detection of runs of homozygosity (ROH) and the determination of inbreeding to unravel their influence on TC. Inbreeding was significantly higher among cases than controls. The size, number and length of ROHs per person were also higher in cases than in controls. Sixteen recurrent ROHs were identified. Several ROHs harbor genes associated with risk of TC. The results support the existence of recessive alleles in TC susceptibility. The methodological proof was applied to HL and breast cancer.
The effect of methylation quantitative trait loci (mQTLs)
mQTLs are genetic variants affecting DNA methylation patterns of CpG sites. Their roles in influencing the disturbances of smoking-related epigenetic changes have not been well established. A study was conducted to address whether mQTLs exist in the vicinity of smoking-related CpG sites and to examine their associations with smoking exposure and all-cause mortality in older adults. We identified that 70 out of 151 previously reported smoking-related CpG sites were significantly associated with 192 SNPs within the 50 kb search window of each locus. The 192 mQTLs significantly influenced the active smoking-related DNA methylation changes, with percentage changes ranging from 0.01 to 18.96%. However, these identified mQTLs were not directly associated with active smoking exposure or all-cause mortality. Our findings clearly demonstrated that if not dealt with properly, the mQTLs might impair the power of epigenetic-based models of smoking exposure to a certain extent. The genetic variants could be the key factor to distinguish between the heritable and smoking-induced impact on epigenome disparities.
Multiple methods have been developed for the analysis of available genome-wide data improving our understanding of cancer and explaining part of the missing heritability that we were not able to identify using simple association metrics. Merging of existing research teams and consortia have allowed us to succeed in the identification of novel variants and represent a significant advance in our understanding of genetic susceptibility and risk to cancer.
Such multi-consortium efforts may become a necessity in the future to identify even more low-frequency and rare variants with small or even moderate effects. More empirical and simulation studies are needed to assess the advantages and disadvantages of some of the proposed techniques that were presented here and to delineate their optimal application and interpretation of the results that they produce.
With the advent of the next-generation sequencing (NGS) technology, sequencing costs are dropping continuously and make it feasible to use it in the field of genetic research. Although it is a quite new methodology, it has already been successful in elucidating the causal variants underlying many Mendelian diseases and will continue to do so. In addition, it is expected that NGS will enhance the study of complex diseases, as it holds the promises of fulfilling the missing heritability gap by incorporating rare variants into the analysis. Besides, NGS has the potential to serve as a short bridge between associated region and causative polymorphisms, a path that has yielded few fruits in GWASs.
Whole-genome and whole-exome sequencing are two different strategies that can be used in sequencing studies. Whole-genome association studies are not uncommon yet, as price of sequencing are still not low enough to enable it. One alternative is to restrict sequencing to protein coding regions, known as exome, which accounts for only 1% of the genome and is suspect to harbor a high proportion of risk variants. Individuals can be preferentially chosen from families with enriched number of cases or from the extreme tails of the phenotype distribution, further enhancing the strategy. However, as price continues to decrease, the tendency is to move to whole-genome sequencing, as it allows a more comprehensive analysis of the genome.
Although promising, the analysis of the impressive amount of data made available through the use of NGS won’t be a trivial task. Although costs for sequencing is decreasing, computational costs for storing, analyzing and maintaining this data are still substantial. Differentiation between pathogenic and non-pathogenic association will be complicated by the large amount of detected variants. Due to the possibility of sequencing errors, positive findings will benefit from the use of different sequencing technology for confirming results, especially when analysis suggests putative de novo mutations . The size will continue to matter, as large data sets will be required for identifying variants with small effects.
Original languageEnglish
Awarding Institution
  • University Hospital Heidelberg
  • Hemminki, Kari, Supervisor
Award date2020 Jan 17
Publication statusPublished - 2019 Dec 1
Externally publishedYes

Subject classification (UKÄ)

  • Public Health, Global Health, Social Medicine and Epidemiology

Free keywords

  • Epidemiology
  • genome-wide association studies


Dive into the research topics of 'Quantification and detection of genetic risk factors in the familial aggregation of cancer: Habilitation thesis'. Together they form a unique fingerprint.

Cite this