Friday, February 14, 2014

G-to-T transversions account for Szybalski's Rule

Previously, I described a data-mining experiment showing that the purine content of coding regions of DNA increases in concert with the total A+T content of DNA, not only for bacterial chromosomes (see graph below) but mitochondrial DNA, and eukaryotic DNA. The general finding of excess purines on message strands of DNA is known as Szybalski's Rule. What I found is that the accumulation of purines on message strands in accordance with Szybalski's Rule is not random; it depends, in a predictable way, on the overall A+T content of the genome. For genomes in which A+T content is below 63% (G+C above 65%), purines actually accumulate on the transcribed (or antisense) strand of DNA, rather than the message strand.
Purine content versus A+T content for coding-region DNA in 1363 bacterial species.
Codon analysis shows unambiguously that as genes become richer in A+T content (or as G+C content goes down), the excess of purines on the message strand becomes larger and larger.

The increase in purine content can be exactly accounted for via G-to-T transversions. That is to say, all of the excess adenine on the message strand can be accounted for by loss of guanines on the transcribed strand.

An example will make this clear. Collectively, the coding regions of Streptomyces cattleya contain bases in the following relative amounts:

A: 13.59
G: 35.30
C: 37.85
T: 13.23

S. Cattleya happens to fall exactly on the regression line in the above graph, at A+T = 0.2682 and A+G = 0.4889. The spirochete Borrelia burgdorferi (strain 118a) also falls on the regression line, at A+T = 0.7117 and A+G = 0.5536. It has coding-region base contents of:

A: 38.71
G: 16.65
C: 12.17
T: 32.46

The transcribed strand of S. cattleya can be inferred to have an average guanine content of 37.85%, since the message strand has a cytosine content of 37.85%. In B. burgdorferi, the transcribed-strand guanine has dropped to 12.17%. The difference between the two is 25.68%. On the message strand, adenine content goes from 13.59% for S. cattleya to 38.71% for B. burgdorferi, a difference of 25.12%. The implication is that if organisms evolve along the general path of the regression line in the above graph, all of the increase in message-strand adenine content can be accounted for by the loss in transcribed-strand guanine content. (The loss of guanine, in this example, was 25.68%, which is comparable to the increase of adenine, 25.12%, differing by only two parts per hundred.) Other organisms show a similar pattern of the change in transcribed-strand guanine equaling the change in message-strand adenine.

These numbers imply that guanines on one strand can become adenines on the other strand, which is exactly what happens in G-to-T transversion mutations, which occur through the well-known mechanism of guanine becoming oxidized to 8-oxo-guanine, which in turn pairs with adenine (and leads to substitution of the 8-oxo-guanine with thymine).

Of the four bases in DNA, guanine is well known to be the base most vulnerable to oxidation. Accordingly, oxidation-driven G-to-T transversions are the most common type of transversion. Mutations of this type can cause a shift in overall DNA G+C content (toward higher A+T content). If G-to-T mutations occur preferentially on one DNA strand, the result will be accumulation of adenine on the opposite strand. This is what happens in nature, apparently. Differential repair of DNA strands at transcription time drives the accumulation of purines on the message strand. (See this post for additional discussion, with data, of how the unique repairosome of obligate anaerobes affects differential strand buildup of purines.)

Substantial work has shown that an AT mutational bias (a tendency for G:C pairs to become A:T pairs) exists in bacteria, even for organisms at the extremes of genome G+C content. This is usually taken to mean that GC-to-AT transition mutations are more common than AT-to-GC transitions. Such discussions need to include GC-to-TA transversion mutations, as well. The most common form of DNA damage is oxidation of guanine to 8-oxo-guanine. This strongly suggests that G-to-T transversions are an important driver of changes in genomic G+C content; and combined with asymmetric strand repair, the predominance of such mutations provides a theoretical basis (which has heretofore been lacking) for Szybalski's Rule.