Prashanthi Govindarajan ? , Sathya Govindarajan ? & Ethirajan Govindarajan ?
Abstract-This paper proposes a novel concept called "Percentage Nucleotide Concentration of genomes" in terms of cellular automata evolutions of adjoints of Adenine, Thymine, Guanine, and Cytosine. The adjoints of the given a genome sequenceare the characteristic binary string sequences. For example, the adjoint of Adenine of a given genome sequence is a binary string consisting of 0's and 1's where 1's corresponds to the presence of Adenine in the genome sequence. So, one can have four adjoint sequences of Adenine, Thymine, Guanine, and Cytosine corresponding to a given genome sequence. One-dimensional three neighborhood binary value cellular automata rules could be applied to an adjoint sequence and the desired number of evolutions obtained.These rules aredefined by linear Boolean functions and one can have 256 such linear Boolean functions.Nucleotide concentration is computed for an adjoint sequence and its variation evaluated for its successive evolutions based on a cellular automaton rule.
he purpose of the research carried out and reported in this paper is whether it is possible to categorize a set of genomes like the human genome repository. The concept of "%nucleotide concentration" introduced in this paper seems to show a way to accomplish this task. The genesis of the formulation of this concept originates from chemistry, wherein the quantificational notion of percentage ionic concentration of hydrogen (pH value) is used to categorize solutions into three (i) water, whose pH value is 7, (ii) acidic solutions whose pH values are less than 7 and (iii) alkaline solutions whose pH values are greater than 7. On the same lines, an effort was made to categorize genome sets based on four values (i) % nucleotide concentration of Adenine (pA), (ii) % nucleotide concentration of Thymine (pT), (iii) % nucleotide concentration of Guanine (pG) and (iv) % nucleotide concentration of Cytosine (pC). It is reasonable to surmise that these values, possibly their compositions would categorize a given set of genomes. The formulation of the concept is briefly explained below. Section 2 of this paper describes the concept formulation.
Section 3 of this paper describes the fundamental notions of adjoints of a genome and their evolution using one dimensional cellular automata rules defined by linear Boolean functions. Section 4 provides experimental results of a case study pertaining to evaluation of Concentration of Nucleotides in terms of Adjoints of BrucellaSuis 1330 Genome Sequence.
Analogous to the notion of pH value of a solution, the values of pA, pT, pG and pC of a genome sequence and possibly composition of these values like the proportion pA:pT:pG:pC seems to pave a way to classify and characterize genome sets. The definition of "Percentage Nucleotide Concentration" of a genome sequence is given below.
Given a genome sequence, the number of a particular nucleotide, say A, present in that genome sequence is counted and the sum is divided by the total number of nucleotides in that genome sequence. The fraction when multiplied by 100 yields the "Percentage Concentration of Adenine pA". Similarly, one can evaluate pT, pG and pC.
Adjoint of a particular nucleotide in a genome sequence is the binary sequence obtained by substituting the particular nucleotides in the genome sequence by 1's and the others by 0's. For example, let us consider a sample sequence of BrucellaSuis 1330 for a case study. The actual length of the genome sequence of BrucellaSuis 1330 is 5806. A cellular T automaton is an idealized parallel processing system consisting of an array of numbers (1-D, 2-D and more) realized using updating rules based on certain neighborhood. For example, a one-dimensional cellular automaton would consist of a finite-length array as shown below.
--- --- --- i-1 i i+1 --- --- ---Consider an ith cell in the array. This cell has a neighbor i-1 on its left and another i+1 on its right. All three put together is called a three neighborhood. One can assign a site (cell) variable ?i-1, ?i, and ?i+1 to the three neighborhood cells. At a particular instant of time, these variables take on numerical values, say either a 0 or a 1. In such a case, the variables are denoted as ?ti-1, ?ti, and ?ti+1. The value of the ith cell at the next instant of time is evaluated using an updating rule that involves the present values of the ith, (i-1)th and (i+1)th cells. This updating rule is essentially a linear Boolean function of three variables. One can construct 256 linear Boolean functions as updating rules of one-dimensional threeneighborhood binary-valued cellular automata. Each rule defines an automaton by itself. So, one dimensional binary valued three neighborhood cellular automata (123CA) rules could be used to model adjoints of a genome sequence. The first twenty linear Boolean functions of cellular automata 123CA are listed below with their decimal equivalents.
For the case study rule number 90 is applied to the adjoints of BrucellaSuis 1330 genome sequence and 500 evolutions generated. Rule 90 is shown below.
(? ???1 ? ? ??+1 ) + (? ? ???1 ? ??+1 )90Since the image of the 500 evolutions of BrucellaSuis 1330 is large, a small portion of the images are presented in this paper.
The values of pA, pT, pG and pCof the BrucellaSuis 1330 genome sequence are computed for the adjoints A(n), T(n),G(n) and C(n) and their 500 evolutions using 123CA rules based one linear Boolean functions. Fig. 1 shows the evolutions of the adjoints of A(n), T(n). G(n) and C(n) using the linear Boolean function rule 90 of 123CA. The values are tabulated and the corresponding graphs shown subsequently. Table 1 shows the pA values of A(n) of BrucellaSuis 1330 genome sequence and the 500 generations of A(n) using rule 90 of 123CA. Figs. 2 and 3 shows the graphs of the variations of pA values of all generations. Table 2 shows the pT values of T(n) of BrucellaSuis 1330 genome sequence and the 500 generations of T(n) using rule 90 of 123CA. Figs. 4and 5 shows the graph of the variations of pT values of all generations. Table 3 shows the pG values of G(n) of BrucellaSuis 1330 genome sequence and the 500 generations of G(n) using rule 90 of 123CA. Fig. 4 shows the graph of variations of pA values of all generations. Table 4 shows the pC values of C(n) of BrucellaSuis 1330 genome sequence and 500 generations of C(n) using rule 90 of 123CA. Fig. 5 shows the graph of the variations of pC values of all generations. This paper proposes a novel concept called "Percentage Nucleotide Concentration of genomes" in terms of cellular automata evolutions of adjoints of Adenine, Thymine, Guanine, and Cytosine. The research carried out and reported in this paper exhibits the possibility to categorize a set of genomes like the human genome repository. In short, the concept of "Percentage Nucleotide Concentration (PNC)" introduced in this paper seems to show a way to accomplish this task.
Rule number 90 is applied to A(n) | |||
and its 500 generations. It is | |||
observed that the pA value | |||
becomes minimum at regular | |||
intervals of 1, 2, 4, 8, 16, 32, 64, 128 | |||
and 256. This indicates a fractal | |||
behavior | of | the | evolution. |
Min(A(n))=30.2965 | and | ||
Max(A(n))=31.4502. The deviation is | |||
1.15. |
Rule number 90 is applied to | |
T(n) and its 500 generations. It | |
is observed that the pT value | |
becomes minimum at regular | |
intervals of 1, 2, 4, 8, 16, 32, | |
64, 128 and 256. This | |
indicates a fractal behavior of | |
the evolution. | |
Min(A(n))=30.45126; | |
Max(A(n))=33.06924. | The |
deviation is 2.61. |
Rule number 90 is applied to G(n) | |||
and its 500 generations. It is | |||
observed that the pG value | |||
becomes minimum at regular | |||
intervals of 1, 2, 4, 8, 16, 32, 64, 128 | |||
and 256. This indicates a fractal | |||
behavior | of | the | evolution. |
Min(A(n))=43.00723 | and | ||
Max(A(n))=44.29900 The deviation | |||
is 1.46. |
The Case for Junk DNA. 10.1371/journal.pgen.1004351.ISSN1553-7404. PLoS 2014. 10 (5) p. e1004351.
An integrated encyclopedia of DNA elements in the human genome. 22955616. PMC3439153. Bibcode: 2012Natur.489...57T, 2012. 489 p. .
Genomic Views of Distant-Acting Enhancers. 10.1038/nature08451.PMC2923221. 19741700. Bibcode: 2009Natur.461..199V, September 2009. 461 p. .
MicroRNAs: Control and Loss of Control in Human Physiology and Disease. 10.1007/s00268-008-9836-x.PMC2933043. 19030926. World J April 2009. 33 (4) p. .
8.2% of the Human Genome Is Constrained: Variation in Rates of Turnover across Functional Element Classes in the Human Lineage. 10.1371/journal.pgen.1004525.PMC4109858. 25057982. PLoS Genet 2014. 10 (7) p. e1004525.
Dollo's law and the death and resurrection of genes. 7991619. PMC45421. PNAS..9112283M, December 1994. 1994. 91 p. .
On the immortality of television sets: "function" in the human genome according to the evolution-free gospel of ENCODE" (PDF). 10.1093/gbe/evt028.PMC3622293. 23431001. Genome Biology 2013. 5 (3) p. .
ENCODE Project Writes Eulogy for Junk DNA. 10.1126/science.337.6099.1159. 22955811. Science 6 September 2012. 337 (6099) p. .
Tuning in to the signals: noncoding sequence conservation in vertebrate genomes. 10.1016/j.tig.2008.04.005. 18514361. Trends Genet July 2008. 24 (7) p. .
InvAluable junk: the cellular impact and function of Alu and B2 RNAs. 10.1002/iub.227.PMC4049031. 19621349. IUBMB Life Aug 2009. 61 (8) p. .
Genomic gems: SINE RNAs regulate mRNA production. 10.1016/j.gde.2010.01.004.PMC2859989. 20176473. Current Opinion in Genetics & Development February 2010. 20 (2) p. .
Pseudogene evolution and natural selection for a compact genome. 10833048. J. Hered 2000. 91 (3) p. .
The modulation of DNA content: proximate causes and ultimate consequences. 10.1101/gr.9.4.317(inactive2015-02-01. 10207154. Genome Res April 1999. 9 (4) p. .
Group I introns: Moving in new directions. 10.4161/rna.6.4.9334. 19667762. RNA 2009. 6 (4) p. .
The extent of functionality in the human genome. 10.1186/1877-6566-7-2. The HUGO Journal 2013. 7 (1) p. 2.
Defining functional DNA elements in the human genome. 10.1073/pnas.1318948111.PMC4035993. 24753594. Bibcode: 2014PNAS..111.6131K, 2014. 111 p. .
Plant biology: Coding in non-coding RNAs. 10.1038/nature14378. Nature 25 March 2015. 520 (7545) p. .
Human endogenous retroviruses: transposable elements with potential?. 10.1111/j.1365-2249.2004.02592.x.PMC1809191. 15373898. Clin ExpImmunol Oct 2004. 138 (1) p. .
What fraction of the human genome is functional?. 10.1101/gr.116814.110.PMC3205562. 21875934. Genome 2011. 21 p. .
The C-value paradox, junk DNA, and ENCODE. CurrBiol 2012. 22 (21) p. .
Useful 'junk': Alu RNAs in the human transcriptome. 10.1007/s00018-007-7084-0. 17514354. Cell. Mol. Life Sci July 2007. 64 (14) p. .
Is junk DNA bunk? A critique of ENCODE. 10.1073/pnas.1221376110.PMC3619371. 23479647. Bibcode: 2013PNAS..110.5294D, 2013. 110 p. .
Hypervariable minisatellite DNA is a hotspot for homologous recombination in human cells. 10.1016/0092-8674(90)90719-U. 2295091. Cell 1990. 60 (1) p. .
Pseudogenes. 10.1155/2012/424526.PMC3352212. 22611337. Comp Funct Genomics 2012. 2012. p. 424526.