One Dimensional Cellular Automata Prashanthi Govindarajan ? , Sathya Govindarajan ? & Ethirajan Govindarajan ? I. Introduction To be precise, Fig. 1 shows three levels of nucleotides. One can generate 64 strands of length 3. As the length increases, the number of strands he four nucleotides A, T, G, and C get connected by phosphodiester bonds to form strands. Strand formation depends on innumerable factors related to inter and intra cellular parameters and functions. One cannot precisely say that a particular strand gets formed using such and such rules. The infinite possibilities of strand formation cannot be determined experimentally or in the framework of classical genetics. One can alternatively formulate a notion of "Language of Genomes" wherein one can finitely specify infinite strands, Fig. 1 shows a finitely generated quaternary tree structure of strand formation of nucleic acids. T increases as per the formula 4n, where n is the length of the strand. Strands of length three are called triplet codons or 3-tuple codons. Similarly, one can think of ntuple codons where n is any number.
A genome sequence is a chain of four nucleotides A, T, G and C. The numerical representation of a genome sequence is a sequence of four numbers 1, 2, 3 and 4. Linear prediction of a strand could be carried out using linear prediction algorithms from a sub sequence of length 8. Alternatively, one can evolve generations of genome sequences from a given fulllength genome sequence using one-dimensional cellular automata rules. Section 2 describes the notions of adjoints of nucleotides corresponding to a genome sequence. Section 3 describes the notions of cellular automata and linear Boolean functions. Section 4 provides the results of applying linear Boolean functions on adjoint strings of nucleotides. Section 5 demonstrates the results of combining evolution patterns of adjoint sequences dyadically. Section 6 presents various observations made from the study and proposes future perspectives of cellular automatabased genome analytics.
Adjoint of a particular nucleotide in a genome sequence is the binary sequence obtained by substituting the particular nucleotides in the genome sequence by 1's and the others by 0's. For example, let us consider a sample sequence G, A, A, T, G, A, T, T, A, C, C, A, A, G, G, C of length 16. Now the adjoint of adenine (A) is the binary string A(n) = 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0. The adjoint of thymine (T) is the binary string T(n) = 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0. The adjoint of guanine (G) is the binary string G(n) = 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0. The adjoint of cytocine (C) is binary string C(n) = 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1. The first segment of 40 nucleotides of a genome sequence of Brucella Suis 1330 is considered here for a case study. The actual length of the genome sequence of Brucella Suis 1330 is 5806. The sample sequence is given below.
A(n) = 0110010010011000011000011000000000000000 T(n) = 0001001100000000000001000001010011001100 G(n) = 1000100000000110000100000100000000000011 C(n) = 0000000001100001100010100010101100110000A cellular automaton is an idealized parallel processing system consisting of an array of numbers (1-D, 2-D and more) realized using updating rules based on certain neighborhood. For example, a onedimensional cellular automaton would consist of a finite length array as shown below.
A cellular automaton is an idealized parallel processing system consisting of an array of numbers (1-D, 2-D and more) realized using updating rules based on certain neighborhood. For example, a one dimensional cellular automaton would consist of a finite length array as shown below.
-
-- --- --- i-1 i i+1 --- --- ---Consider an ith cell in the array. This cell has a neighbor i-1 on its left and another i+1 on its right. All three put together is called a three-neighborhood. One can assign a site (cell) variable ?i-1, ?i, and ?i+1 to the three-neighborhood cells. At a particular instant of time, these variables take on numerical values, say either a 0 or a 1. In such a case, the variables are denoted as ?ti-1, ?ti, and ?ti+1. The value of the ith cell at the next instant of time is evaluated using an updating rule that involves the present values of the ith, (i-1)th and (i+1)th cells. This updating rule is essentially a linear Boolean function of three variables. One can construct 256 linear Boolean functions as updating rules of one-dimensional threeneighborhood binary-valued cellular automata. Each rule defines an automaton by itself. So, one-dimensional binary-valued three-neighborhood cellular automata (123CA) rules could be used to model adjoints of a genome sequence. The first thirty linear Boolean functions of cellular automata 123CA are listed below with their decimal equivalents.
Decimal Equivalent 0 0
(?? ? ???1 ?? ? ?? ?? ? ??+1 ) 1 (?? ? ???1 ?? ? ?? ?? ??+1 ) 2 (?? ? ???1 ?? ? ?? ) 3 (?? ? ???1 ?? ?? ?? ? ??+1 ) 4 (?? ? ???1 ?? ? ??+1 ) 5 (?? ? ???1 ?? ?? ?? ? ??+1 )+(?? ? ???1 ?? ? ?? ?? ??+1 ) 6 (?? ? ???1 ?? ? ??+1 ) + (?? ? ???1 ?? ? ?? ) 7 (?? ? ???1 ?? ?? ?? ??+1 ) 8 (?? ? ???1 ?? ? ?? ?? ? ??+1 ) + (?? ? ???1 ?? ?? ?? ??+1 ) 9 (?? ? ???1 ?? ??+1 )10(?? ? ???1 ?? ? ?? ) + (?? ? ???1 ?? ??+1 ) (?? ? ???1 ?? ?? ) (?? ? ???1 ?? ? ??+1 ) + (?? ? ???1 ?? ?? ) (?? ? ???1 ?? ?? ) + (?? ? ???1 ?? ??+1 ) (?? ? ???1 ) (?? ???1 ?? ? ?? ?? ? ??+1 ) (?? ? ?? ?? ? ??+1 ) (?? ???1 ?? ? ?? ?? ? ??+1 ) + (?? ? ???1 ?? ? ?? ?? ??+1 ) (?? ? ?? ?? ? ??+1 ) + (?? ? ???1 ?? ? ?? ) (?? ???1 ?? ? ?? ?? ? ??+1 ) + (?? ? ???1 ?? ?? ?? ? ??+1 ) (?? ? ?? ?? ? ??+1 ) + (?? ? ???1 ?? ? ??+1 ) (?? ???1 ?? ? ?? ?? ? ??+1 ) + (?? ? ???1 ?? ?? ?? ? ??+1 ) + (?? ? ???1 ?? ? ?? ?? ??+1 ) (?? ? ?? ?? ? ??+1 ) + (?? ? ???1 ?? ? ??+1 ) + (?? ? ???1 ?? ? ?? ) (?? ???1 ?? ? ?? ?? ? ??+1 ) + (?? ? ???1 ?? ?? ?? ??+1 ) (?? ? ???1 ?? ?? ?? ??+1 ) + (?? ? ?? ?? ? ??+1 ) (?? ???1 ?? ? ?? ?? ? ??+1 ) + (?? ? ???1 ?? ??+1 ) (?? ? ?? ?? ? ??+1 ) + (?? ? ???1 ?? ??+1 ) (?? ???1 ?? ? ?? ?? ? ??+1 ) + (?? ? ???1 ?? ?? ) (?? ? ?? ?? ? ??+1 ) + (?? ? ???1 ?? ?? ) (?? ???1 ?? ? ?? ?? ? ??+1 ) + (?? ? ???1 ?? ?? ) + (?? ? ???1 ?? ??+1 )IV. Cellular Automata Evolutions of Genome Adjoints
The genome sequence of Brucella Suis 1330 is considered here for a case study. Due to space limitations, a part of the genome sequence and its adjoints are shown below. As defined already, adjoint of genome sequence concerning a particular nucleotide is the binary string obtained by marking a '1' in the place of that particular nucleotide and by marking a '0' in the places of other nucleotides. A segment consisting of 60 nucleotides of Brucella Suis 1330 is shown below.
The adjoints of the genome sequence segment are given below.
Adjoint A(n)
Adjoint T(n) Adjoint G(n) Adjoint C(n)Cellular automata evolutions of adjoints of a genome are carried out using 256 rules of 123CA. As an example, rule number 137 of 123CA, that is, (? ? ???1 ? ? ?? ? ? ??+1 ) + (? ?? ? ??+1 ) is applied to adjoints of Brucella Suis 1330 genome and results shown below in Fig. 2.
Evolution of A(n) Evolution of T(n) Evolution of G(n) Evolution of C(n)Fig. 2: Evolution of adjoints using rule 137 of 123CA
The size of the images shown in Fig. 2 is 500x500, though the actual size is 5806x500. The first 500 columns of the actual images are clipped and presented here for visual clarity. From Fig. 2, it is clear that the evolution pattern of each adjoint is different. One can observe that there are certain fractal patterns in the evolutions and such fractals are distributed in the images very differently. For instance, the zoomed in versions of the evolution patterns of A(n), T(n), G(n) and C(n) using rule 137 are shown in Figs. 3, 4, 5 and 6 respectively.
From the above empirical study, it is observed that cellular automata modeling and simulation of evolutions of adjoints of a given genome sequence and the inter-pattern operations and relations exhibit distinct patterns of fractals and fractal distributions. The novel technique and results presented in this paper are outcome of prolonged research carried out in the mathematical modeling of genomes and their evolutions. It is evident that one can as well look into the possibilities of genome editing using such cellular automata tools.
The authors express their profound gratitude to the management of Pentagram Research Centre Private Limited, Hyderabad for their boundless support in carrying out research in their premises. The authors further put on record the invaluable guidance of Prof. Dr E G Rajan, President of Pentagram Research and data scientists Mr. Rahul Sharma, Mr. Srikanth Maddikunta and Mr. Shubham Karande.
Integrative annotation of variants from 1092 humans: application to cancer genomics. 10.1126/science.1235587.PMC3947637. 24092746. Science April 2013. 342 (6154) p. .
The Case for Junk DNA. 10.1371/journal.pgen.1004351.ISSN1553-7404. PLoS Genetics 2014. 10 (5) p. e1004351.
Noncoding RNA: what is functional and what is junk?. 10.3389/fgene.2015.00002. 25674102. Frontiers in 1664-8021. 2015. 6 (2) .
An integrated encyclopedia of DNA elements in the human genome. 10.1038/nature11247.PMC3439153. 22955616. Bibcode:2012Natur.489...57T, 2012. 489 p. .
Genomic Views of Distant-Acting Enhancers. 19741700. PMC2923221. Bibcode: 2009Natur.461..199V, September 2009. 461 p. .
Pseudogenes: are they "junk" or functional DNA?. 14616058. Annu. Rev. Genet 2003. 37 p. .
MicroRNAs: Control and Loss of Control in Human Physiology and Disease. 10.1007/s00268-008-9836-x.PMC2933043. 19030926. World J Surg April 2009. 33 (4) p. .
8.2% of the Human Genome Is Constrained: Variation in Rates of Turnover across Functional Element Classes in the Human Lineage. 25057982. PMC4109858. PLoS Genet 2014. 10 (7) p. e1004525.
Genetics: Junk DNA as an evolutionary force. 17024082. Bibcode: 2006Natur.443..521B, 2006. 443 p. .
Dollo's law and the death and resurrection of genes. 7991619. PMC45421. Bibcode:1994PNAS...9112283M, December 1994. 91 p. .
On the immortality of television sets: "function" in the human genome according to the evolution-free gospel of ENCODE" (PDF). 10.1093/gbe/evt028.PMC3622293. 23431001. Genome Biology and Evolution 2013. 5 (3) p. .
IFNL3 mRNA structure is remodeled by a functional non-coding polymorphism associated with hepatitis C virus clearance. 10.1038/srep16037. 26531896. Scientific Reports 4 November 2015. 5 p. 16037.
Searching for functional genetic variants in non-coding DNA. 10.1111/j.1440-1681.2008.04880.x. 18307723. Clin. Exp. Pharmacol. Physiol April 2008. 35 (4) p. .
ENCODE Project Writes Eulogy for Junk DNA. 10.1126/science.337.6099.1159. 22955811. Science 6 September 2012. 337 (6099) p. .
Junk DNA gets credit for making us who we are. New Scientist Edward E. Max, M.D., Ph.D. 57 (ed.) March 2010. 56. (Ayala FJ)
Chimeric EWSR1-FLI1 regulates the Ewing sarcoma susceptibility gene EGR2 via a GGAA microsatellite. 10.1038/ng.3363. 26214589. Nature Genetics 47 (9) p. .
InvAluable junk: the cellular impact and function of Alu and B2 RNAs. 10.1002/iub.227.PMC4049031. 19621349. IUBMB Life Aug 2009. 61 (8) p. .
Genomic gems: SINE RNAs regulate mRNA production. 10.1016/j.gde.2010.01.004.PMC2859989. 20176473. Current Opinion in Genetics & Development February 2010. 20 (2) p. .
Pseudogene evolution and natural selection for a compact genome. 10.1093/jhered/91.3.221. 10833048. J. Hered 2000. 91 (3) p. .
The modulation of DNA content: proximate causes and ultimate consequences. 10.1101/gr.9.4.317(inactive2015-02-01. 10207154. Genome Res April 1999. 9 (4) p. .
The most frequent short sequences in noncoding DNA. 10.1093/nar/gkp1094.PMC2831315. PMID. Nucleic Acids Res March 2010. 19966278. 38 (4) p. .
Group I introns: Moving in new directions. 10.4161/rna.6.4.9334. 19667762. RNA Biol 2009. 6 (4) p. .
Differential lineage-specific amplification of transposable elements is responsible for genome size variation in Gossypium. 10.1101/gr.5282906.PMC1581434. 16954538. Genome Res Oct 2006. 16 (10) p. .
The extent of functionality in the human genome. 10.1186/1877-6566-7-2. The HUGO Journal 2013. 7 (1) p. 2.
Long-Range Correlation and Partial 1/falpha Spectrum in a Non-Coding DNA Sequence" (PDF). 10.1209/02955075/17/7/014. Europhys. Lett 1992. 17 (7) p. .
Selfish DNA: the ultimate parasite. 10.1038/284604a0. 7366731. Nature April 1980. 284 p. . (Bibcode:1980Natur.284..604O.)
Transcriptional silencing of long noncoding RNA GNG12-AS1 uncouples its transcriptional and product-related functions. nature.com. Nature. Retrieved 21 Feb 2016.
The term "junk DNA" repelled mainstream researchers from studying noncoding genetic material for many years. 17503549. Scientific American May 2007. 296 (5) p. . (What is "junk" DNA, and what is it worth?)
Widespread purifying selection on RNA structure in mammals. 10.1093/nar/gkt596.PMC3783177. 23847102. Nucleic Acids Research June 2013. 41 (17) p. .
Defining functional DNA elements in the human genome. 24753594. PMC4035993. Bibcode: 2014PNAS..111.6131K, 2014. 111 p. .
Functional evolution of noncoding DNA. 10.1016/S0959-437X(02)00355-6. 12433575. Current Opinion in Genetics & Development December 2002. 12 (6) p. .
Plant biology: Coding in non-coding RNAs. 10.1038/nature14378. Nature 25 March 2015. 520 (7545) p. .
Human endogenous retroviruses: transposable elements with potential?. 10.1111/j.1365-2249.2004.02592.x.PMC1809191. 15373898. PMID 11237011. International Human Genome Sequencing Consortium, Oct 2004. February 2001. 138 p. . (Nature. Bibcode: 2001Natur.409..860L.)
What fraction of the human genome is functional?. 10.1101/gr.116814.110.PMC3205562. 21875934. Genome Research 2011. 21 p. .
Regulating Evolution. 10.1038/scientificamerican0508-60. 1844. Scientific American May 2008. 298 (5) p. .
The C-value paradox, junk DNA, and ENCODE. Curr Biol 2012. 22 (21) p. .
Useful 'junk': Alu RNAs in the human transcriptome. 10.1007/s00018-007-7084-0. 17514354. Cell. Mol. Life Sci July 2007. 64 (14) p. .
So Much "junk. DNA in Our Genome. Gordon and Breach H. H. Smith (ed.) 1972. 2013-05-15. p. .
Long-range correlations in nucleotide sequences. 10.1038/356168a0. 1301010. Bibcode:1992Natur.356..168P, 1992. 356 p. .
Long-range correlations properties of coding and noncoding DNA sequences: GenBank analysis. 10.1103/PhysRevE.51.5084. Phys. Rev. E 1995. 1995PhRvE. 51 (5) p. .
How much noncoding DNA do eukaryotes require?" (PDF). 10.1016/j.jtbi.2008.02.005. 18384817. J. Theor. Biol 2008. 252 (4) p. .
Tuning in to the signals: noncoding sequence conservation in vertebrate genomes. 10.1016/j.tig.2008.04.005. 18514361. Trends Genet July 2008. 24 (7) p. .
The place and function of noncoding DNA in the evolution of variability. doi:10.5779/ hypothesis. v7i1.146. Hypothesis 2009. 7 (1) p. e7.
Is junk DNA bunk? A critique of ENCODE. 10.1073/pnas.1221376110.PMC3619371. 23479647. Bibcode: 2013PNAS..110.5294D, (USA
Selfish genes, the phenotype paradigm and genome evolution. 10.1038/284601a0. 6245369. Bibcode: 1980Natur.284..601D, 1980. 284 p. .
Hypervariable minisatellite DNA is a hotspot for homologous recombination in human cells. 10.1016/0092-8674(90)90719-U. 2295091. Cell 1990. 60 (1) p. .
Pseudogenes. 10.1155/2012/424526.PMC3352212. 22611337. Comp Funct Genomics 2012. 2012. p. 424526.