• Transposon-derived repeats in the human genome and 5-methylcytosine-associated mutations in adjacent genes

      Bjornsson, Hans T; Ellingsen, Lotta M; Jonsson, Jon J (Elsevier/North-Holland, 2006-03-29)
      Transposon-derived repeats (TDR) represent approximately 50% of the human genome. A transposon suppression system has been proposed to explain why transposon-derived repeats (TDR) seldom cause mutations in humans. If this system is based on DNA methylation, a correlation might exist between amount of TDR adjacent to genes and frequency of coding sequence mutations due to m5C deaminations. To test this hypothesis we selected 385 genes based on availability of accurate information on their genome structure and mutation patterns (at least 10 mutations described in the Human Gene Mutation Database (HGMD)). The CENSOR program was used to estimate amount and class of TDR for the gene region and an arbitrarily selected 1 KB from each end. We assumed all C --> T transitions to be possible 5-methylcytosine-associated mutations (MAM) and calculated the number and proportion of MAM in the 385 genes. If there is a strong correlation between methylation of certain CpX dinuclecotides and TDR we might be able to detect it despite limitations of available data for this analysis. We found statistically significant correlations between: i) TDR and number of MAM in genes (r = 0.118, p = 0.02), ii) SINE-TDR and proportion of CpG --> TpG (r = 0.11, p = 0.03); limited to MIR elements only (r = 0.14, p = 0.006), and iii) LINE-TDR and proportion of CpT --> TpT (r = 0.166, p = 0.04). The group of genes with no TDR had a statistically significant lower proportion of MAM (184/479, 0.38 vs. 6466/14524, 0.46; p = 0.009) with differences noted for CpA --> TpA (35/479, 0.073 vs. 1380/11474; p = 0.003). In addition, CpT --> TpT were least common in genes with no TDR (8/479, 0.017), intermediate in genes with TDR in genomic sequence but not mRNA (337/11474, 0.029) and most common in genes with TDR within mature mRNA (121/3050, 0.040; p for trend = 0.003). Our data suggest that TDR adjacent to genes may sometimes influence methylation of cytosines in coding sequences to a degree that it affects mutation patterns. These observations should be followed up with further database analysis and biochemical studies.