# Modeling and Simulation of Genome Evolution Using Linear Boolean Functions Associated with One Dimensional Cellular Automata Prashanthi Govindarajan ? , Sathya Govindarajan ? & Ethirajan Govindarajan ? I. Introduction To be precise, Fig. 1 shows three levels of nucleotides. One can generate 64 strands of length 3. As the length increases, the number of strands he four nucleotides A, T, G, and C get connected by phosphodiester bonds to form strands. Strand formation depends on innumerable factors related to inter and intra cellular parameters and functions. One cannot precisely say that a particular strand gets formed using such and such rules. The infinite possibilities of strand formation cannot be determined experimentally or in the framework of classical genetics. One can alternatively formulate a notion of "Language of Genomes" wherein one can finitely specify infinite strands, Fig. 1 shows a finitely generated quaternary tree structure of strand formation of nucleic acids. T increases as per the formula 4n, where n is the length of the strand. Strands of length three are called triplet codons or 3-tuple codons. Similarly, one can think of ntuple codons where n is any number. A genome sequence is a chain of four nucleotides A, T, G and C. The numerical representation of a genome sequence is a sequence of four numbers 1, 2, 3 and 4. Linear prediction of a strand could be carried out using linear prediction algorithms from a sub sequence of length 8. Alternatively, one can evolve generations of genome sequences from a given fulllength genome sequence using one-dimensional cellular automata rules. Section 2 describes the notions of adjoints of nucleotides corresponding to a genome sequence. Section 3 describes the notions of cellular automata and linear Boolean functions. Section 4 provides the results of applying linear Boolean functions on adjoint strings of nucleotides. Section 5 demonstrates the results of combining evolution patterns of adjoint sequences dyadically. Section 6 presents various observations made from the study and proposes future perspectives of cellular automatabased genome analytics. Adjoint of a particular nucleotide in a genome sequence is the binary sequence obtained by substituting the particular nucleotides in the genome sequence by 1's and the others by 0's. For example, let us consider a sample sequence G, A, A, T, G, A, T, T, A, C, C, A, A, G, G, C of length 16. Now the adjoint of adenine (A) is the binary string A(n) = 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0. The adjoint of thymine (T) is the binary string T(n) = 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0. The adjoint of guanine (G) is the binary string G(n) = 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0. The adjoint of cytocine (C) is binary string C(n) = 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1. The first segment of 40 nucleotides of a genome sequence of Brucella Suis 1330 is considered here for a case study. The actual length of the genome sequence of Brucella Suis 1330 is 5806. The sample sequence is given below. A(n) = 0110010010011000011000011000000000000000 T(n) = 0001001100000000000001000001010011001100 G(n) = 1000100000000110000100000100000000000011 C(n) = 0000000001100001100010100010101100110000 A cellular automaton is an idealized parallel processing system consisting of an array of numbers (1-D, 2-D and more) realized using updating rules based on certain neighborhood. For example, a onedimensional cellular automaton would consist of a finite length array as shown below. # III. Cellular Automata and Linear Boolean Functions A cellular automaton is an idealized parallel processing system consisting of an array of numbers (1-D, 2-D and more) realized using updating rules based on certain neighborhood. For example, a one dimensional cellular automaton would consist of a finite length array as shown below. - -- --- --- i-1 i i+1 --- --- --- Consider an ith cell in the array. This cell has a neighbor i-1 on its left and another i+1 on its right. All three put together is called a three-neighborhood. One can assign a site (cell) variable ?i-1, ?i, and ?i+1 to the three-neighborhood cells. At a particular instant of time, these variables take on numerical values, say either a 0 or a 1. In such a case, the variables are denoted as ?ti-1, ?ti, and ?ti+1. The value of the ith cell at the next instant of time is evaluated using an updating rule that involves the present values of the ith, (i-1)th and (i+1)th cells. This updating rule is essentially a linear Boolean function of three variables. One can construct 256 linear Boolean functions as updating rules of one-dimensional threeneighborhood binary-valued cellular automata. Each rule defines an automaton by itself. So, one-dimensional binary-valued three-neighborhood cellular automata (123CA) rules could be used to model adjoints of a genome sequence. The first thirty linear Boolean functions of cellular automata 123CA are listed below with their decimal equivalents. # Linear Boolean Function Decimal Equivalent 0 0 (?? ? ???1 ?? ? ?? ?? ? ??+1 ) 1 (?? ? ???1 ?? ? ?? ?? ??+1 ) 2 (?? ? ???1 ?? ? ?? ) 3 (?? ? ???1 ?? ?? ?? ? ??+1 ) 4 (?? ? ???1 ?? ? ??+1 ) 5 (?? ? ???1 ?? ?? ?? ? ??+1 )+(?? ? ???1 ?? ? ?? ?? ??+1 ) 6 (?? ? ???1 ?? ? ??+1 ) + (?? ? ???1 ?? ? ?? ) 7 (?? ? ???1 ?? ?? ?? ??+1 ) 8 (?? ? ???1 ?? ? ?? ?? ? ??+1 ) + (?? ? ???1 ?? ?? ?? ??+1 ) 9 (?? ? ???1 ?? ??+1 )10(?? ? ???1 ?? ? ?? ) + (?? ? ???1 ?? ??+1 ) (?? ? ???1 ?? ?? ) (?? ? ???1 ?? ? ??+1 ) + (?? ? ???1 ?? ?? ) (?? ? ???1 ?? ?? ) + (?? ? ???1 ?? ??+1 ) (?? ? ???1 ) (?? ???1 ?? ? ?? ?? ? ??+1 ) (?? ? ?? ?? ? ??+1 ) (?? ???1 ?? ? ?? ?? ? ??+1 ) + (?? ? ???1 ?? ? ?? ?? ??+1 ) (?? ? ?? ?? ? ??+1 ) + (?? ? ???1 ?? ? ?? ) (?? ???1 ?? ? ?? ?? ? ??+1 ) + (?? ? ???1 ?? ?? ?? ? ??+1 ) (?? ? ?? ?? ? ??+1 ) + (?? ? ???1 ?? ? ??+1 ) (?? ???1 ?? ? ?? ?? ? ??+1 ) + (?? ? ???1 ?? ?? ?? ? ??+1 ) + (?? ? ???1 ?? ? ?? ?? ??+1 ) (?? ? ?? ?? ? ??+1 ) + (?? ? ???1 ?? ? ??+1 ) + (?? ? ???1 ?? ? ?? ) (?? ???1 ?? ? ?? ?? ? ??+1 ) + (?? ? ???1 ?? ?? ?? ??+1 ) (?? ? ???1 ?? ?? ?? ??+1 ) + (?? ? ?? ?? ? ??+1 ) (?? ???1 ?? ? ?? ?? ? ??+1 ) + (?? ? ???1 ?? ??+1 ) (?? ? ?? ?? ? ??+1 ) + (?? ? ???1 ?? ??+1 ) (?? ???1 ?? ? ?? ?? ? ??+1 ) + (?? ? ???1 ?? ?? ) (?? ? ?? ?? ? ??+1 ) + (?? ? ???1 ?? ?? ) (?? ???1 ?? ? ?? ?? ? ??+1 ) + (?? ? ???1 ?? ?? ) + (?? ? ???1 ?? ??+1 ) IV. Cellular Automata Evolutions of Genome Adjoints The genome sequence of Brucella Suis 1330 is considered here for a case study. Due to space limitations, a part of the genome sequence and its adjoints are shown below. As defined already, adjoint of genome sequence concerning a particular nucleotide is the binary string obtained by marking a '1' in the place of that particular nucleotide and by marking a '0' in the places of other nucleotides. A segment consisting of 60 nucleotides of Brucella Suis 1330 is shown below. The adjoints of the genome sequence segment are given below. Adjoint A(n) Adjoint T(n) Adjoint G(n) Adjoint C(n) Cellular automata evolutions of adjoints of a genome are carried out using 256 rules of 123CA. As an example, rule number 137 of 123CA, that is, (? ? ???1 ? ? ?? ? ? ??+1 ) + (? ?? ? ??+1 ) is applied to adjoints of Brucella Suis 1330 genome and results shown below in Fig. 2. Evolution of A(n) Evolution of T(n) Evolution of G(n) Evolution of C(n) Fig. 2: Evolution of adjoints using rule 137 of 123CA The size of the images shown in Fig. 2 is 500x500, though the actual size is 5806x500. The first 500 columns of the actual images are clipped and presented here for visual clarity. From Fig. 2, it is clear that the evolution pattern of each adjoint is different. One can observe that there are certain fractal patterns in the evolutions and such fractals are distributed in the images very differently. For instance, the zoomed in versions of the evolution patterns of A(n), T(n), G(n) and C(n) using rule 137 are shown in Figs. 3, 4, 5 and 6 respectively. # VI. Observations and Conclusions From the above empirical study, it is observed that cellular automata modeling and simulation of evolutions of adjoints of a given genome sequence and the inter-pattern operations and relations exhibit distinct patterns of fractals and fractal distributions. The novel technique and results presented in this paper are outcome of prolonged research carried out in the mathematical modeling of genomes and their evolutions. It is evident that one can as well look into the possibilities of genome editing using such cellular automata tools. 1![Fig. 1: Quaternary tree structure for strand formation](image-2.png "Fig. 1 :") ![Computer Science and Technology Volume XX Issue I Version I Year Now the adjoints of this sample sequence of length 40 are given below.](image-3.png "") 34567![Fig. 3: Zoomed in version of evolution pattern of A(n)](image-4.png "Fig. 3 :Fig. 4 :Fig. 5 :Fig. 6 :Fig. 7 :") 8910![Fig. 8: Zoomed in version of addition of patterns of A and T](image-5.png "Fig. 8 :Fig. 9 :Fig. 10 :") ![](image-6.png "") © 2020 Global Journals ## Acknowledgement The authors express their profound gratitude to the management of Pentagram Research Centre Private Limited, Hyderabad for their boundless support in carrying out research in their premises. The authors further put on record the invaluable guidance of Prof. Dr E G Rajan, President of Pentagram Research and data scientists Mr. Rahul Sharma, Mr. Srikanth Maddikunta and Mr. Shubham Karande. * Worlds Record Breaking Plant: Deletes its Noncoding "Junk" DNA". Design & Trend May 12. 2013 References Références Referencias 1. Retrieved 2013-06-04 * ENCODE Project Writes Eulogy for Junk DNA EPennisi 10.1126/science.337.6099.1159 22955811 Science 337 6099 6 September 2012 * An integrated encyclopedia of DNA elements in the human genome 10.1038/nature11247.PMC3439153 22955616 Bibcode:2012Natur.489...57T 2012 489 * Non-coding RNAs and Epigenetic Regulation of Gene Expression: Drivers of Natural Selection FabricoCosta 2012. 1904455948 Caister Academic Press Morris, Kevin V 7 Non-coding RNAs, Epigenomics, and Complexity in Human Cells * Junk DNA: A Journey Through the Dark Matter of the Genome NessaCarey 2015 Columbia University Press ISBN 9780231170840 * Scientists attacked over claim that 'junk DNA' is vital to life RobinMckie 24 February 2013 The Observer * The C-value paradox, junk DNA, and ENCODE SeanEddy Curr Biol 22 21 2012 * Is junk DNA bunk? A critique of ENCODE WDoolittle Ford 10.1073/pnas.1221376110.PMC3619371 23479647 Bibcode: 2013PNAS..110.5294D USA 2013 110 * The Case for Junk DNA AlexanderFPalazzo TGregory Ryan 10.1371/journal.pgen.1004351.ISSN1553-7404 PLoS Genetics 10 5 e1004351 2014 * On the immortality of television sets: "function" in the human genome according to the evolution-free gospel of ENCODE" (PDF) DanGraur YichenZheng NicholasPrice RicardoB RAzevedo1 RebeccaAZufall EranElhaik 10.1093/gbe/evt028.PMC3622293 23431001 Genome Biology and Evolution 5 3 2013 * CPPonting * What fraction of the human genome is functional? RCHardison 10.1101/gr.116814.110.PMC3205562 21875934 Genome Research 21 2011 * Defining functional DNA elements in the human genome MKellis 24753594 PMC4035993 Bibcode: 2014PNAS..111.6131K 2014 111 * 8.2% of the Human Genome Is Constrained: Variation in Rates of Turnover across Functional Element Classes in the Human Lineage ChrisMRands StephenMeader ChrisPPonting GertonLunter 25057982 PMC4109858 PLoS Genet 10 7 e1004525 2014 * The extent of functionality in the human genome JSMattick MEDinger 10.1186/1877-6566-7-2 The HUGO Journal 7 1 2 2013 * Non-Coding RNAs and Epigenetic Regulation of Gene Expression: Drivers of Natural Selection KevinMorris 2012. 1904455948 Caister Academic Press Norfolk, UK * GElgar TVavouri * Tuning in to the signals: noncoding sequence conservation in vertebrate genomes Vavouri 10.1016/j.tig.2008.04.005 18514361 Trends Genet 24 7 July 2008 * TRGregory PDHebert * The modulation of DNA content: proximate causes and ultimate consequences Hebert 10.1101/gr.9.4.317(inactive2015-02-01 10207154 Genome Res 9 4 April 1999 * Hypervariable minisatellite DNA is a hotspot for homologous recombination in human cells WPWahls 10.1016/0092-8674(90)90719-U 2295091 Cell 60 1 1990 * Plant biology: Coding in non-coding RNAs PeterMWaterhouse RogerPHellens 10.1038/nature14378 Nature 520 7545 25 March 2015 * MLi CMarin-Muller UBharadwaj KHChow QYao C;Chen ;Marin-Muller ;Bharadwaj ;Chow Yao * MicroRNAs: Control and Loss of Control in Human Physiology and Disease Chen 10.1007/s00268-008-9836-x.PMC2933043 19030926 World J Surg 33 4 April 2009 * Genomic Views of Distant-Acting Enhancers AVisel EMRubin LAPennacchio 19741700 PMC2923221 Bibcode: 2009Natur.461..199V September 2009 461 * HNielsen SDJohansen * Group I introns: Moving in new directions Johansen 10.4161/rna.6.4.9334 19667762 RNA Biol 6 4 2009 * Pseudogenes in the ENCODE regions: Consensus annotation, analysis of transcription, and evolution DZheng AFrankish RBaertsch June 2007 * 17568002 PMC1891343 Genome Res 17 6 * Dollo's law and the death and resurrection of genes CRMarshall ECRaff RARaff ;Raff Raff 7991619 PMC45421 Bibcode:1994PNAS...9112283M December 1994 91 * Pseudogenes YTutar 10.1155/2012/424526.PMC3352212 22611337 Comp Funct Genomics 424526 2012. 2012 * DAPetrov DLHartl * Pseudogene evolution and natural selection for a compact genome Hartl 10.1093/jhered/91.3.221 10833048 J. Hered 91 3 2000 * SLPonicsan JFKugel Ja;Goodrich Kugel * Genomic gems: SINE RNAs regulate mRNA production Goodrich 10.1016/j.gde.2010.01.004.PMC2859989 20176473 Current Opinion in Genetics & Development 20 2 February 2010 * JHäsler TSamuelsson K;Strub Samuelsson * Useful 'junk': Alu RNAs in the human transcriptome Strub 10.1007/s00018-007-7084-0 17514354 Cell. Mol. Life Sci 64 14 July 2007 * RDWalters JFKugel Ja;Goodrich Kugel * InvAluable junk: the cellular impact and function of Alu and B2 RNAs Goodrich 10.1002/iub.227.PMC4049031 19621349 IUBMB Life 61 8 Aug 2009 * Human endogenous retroviruses: transposable elements with potential? PNNelson PHooley DRoden HDavari Ejtehadi PRylance PWarren JMartin PGMurray 10.1111/j.1365-2249.2004.02592.x.PMC1809191 15373898 PMID 11237011 International Human Genome Sequencing Consortium Oct 2004. February 2001 138 Nature. Bibcode: 2001Natur.409..860L. * Doubling genome size without polyploidization: dynamics of retrotransposition-driven genomic expansions in Oryza australiensis BPiegu RGuyot NPicault ARoulin ASanyal ASaniyal HKim KCollura Oct 2006 a wild relative of rice * 10.1101/gr.5290206.PMC1581435 16963705 Genome Res 16 10 * Differential lineage-specific amplification of transposable elements is responsible for genome size variation in Gossypium JSHawkins HKim JDNason RAWing JFWendel 10.1101/gr.5282906.PMC1581434 16954538 Genome Res 16 10 Oct 2006 * So Much "junk SusumuOhno DNA in Our Genome. Gordon and Breach H. H. Smith 1972. 2013-05-15 * Selfish genes, the phenotype paradigm and genome evolution WFDoolittle C;Sapienza Sapienza 10.1038/284601a0 6245369 Bibcode: 1980Natur.284..601D 1980 284 * Another source is genome duplication followed by a loss of function due to redundancy * Selfish DNA: the ultimate parasite LEOrgel Fh;Crick Crick 10.1038/284604a0 7366731 Nature 284 April 1980 Bibcode:1980Natur.284..604O. * AKhajavinia WMakalowski * The term "junk DNA" repelled mainstream researchers from studying noncoding genetic material for many years Makalowski 17503549 Scientific American 296 5 May 2007 What is "junk" DNA, and what is it worth? * Genetics: Junk DNA as an evolutionary force Christian;Biémont CVieira 17024082 Bibcode: 2006Natur.443..521B 2006 443 * Noncoding RNA: what is functional and what is junk? AlexanderFPalazzo ElizaSLee 10.3389/fgene.2015.00002 25674102 Frontiers in 1664-8021 6 2 2015 * Functional evolution of noncoding DNA MZLudwig 10.1016/S0959-437X(02)00355-6 12433575 Current Opinion in Genetics & Development 12 6 December 2002 * JCobb CBüsst SPetrou SHarrap J;Ellis Büsst * ;Petrou Harrap * Searching for functional genetic variants in non-coding DNA Ellis 10.1111/j.1440-1681.2008.04880.x 18307723 Clin. Exp. Pharmacol. Physiol 35 4 April 2008 * Integrative annotation of variants from 1092 humans: application to cancer genomics ;Khurana 10.1126/science.1235587.PMC3947637 24092746 Science 342 6154 April 2013 * Yi-FanLu * IFNL3 mRNA structure is remodeled by a functional non-coding polymorphism associated with hepatitis C virus clearance DavidMMauger DavidBGoldstein ThomasJUrban KevinMWeeks SheltonSBradrick 10.1038/srep16037 26531896 Scientific Reports 5 16037 4 November 2015 * ThomasGGrünewald * Virginie;Bernard Pascale;Gilardi-Hebenstreit Virginie;Raynal Surdez ;Didier Marie-MingAynaud 10.1038/ng.3363 * OlivierMirabeau 10.1038/ng.3363 * Chimeric EWSR1-FLI1 regulates the Ewing sarcoma susceptibility gene EGR2 via a GGAA microsatellite Florencia;Cidre-Aranaz FranckTirode 10.1038/ng.3363 26214589 Nature Genetics 47 9 * The most frequent short sequences in noncoding DNA JASubirana X;Messeguer Messeguer 10.1093/nar/gkp1094.PMC2831315 PMID Nucleic Acids Res 38 4 March 2010. 19966278 * SEAhnert * How much noncoding DNA do eukaryotes require?" (PDF) TM AFink 10.1016/j.jtbi.2008.02.005 18384817 J. Theor. Biol 252 4 2008 * Widespread purifying selection on RNA structure in mammals MASmith 10.1093/nar/gkt596.PMC3783177 23847102 Nucleic Acids Research 41 17 June 2013 * The place and function of noncoding DNA in the evolution of variability VDileep doi:10.5779/ hypothesis. v7i1.146 Hypothesis 7 1 e7 2009 * Regulating Evolution SeanBCarroll 10.1038/scientificamerican0508-60 1844 Scientific American 298 5 May 2008 * Transcriptional silencing of long noncoding RNA GNG12-AS1 uncouples its transcriptional and product-related functions LStojic nature.com. Nature. Retrieved 21 Feb 2016 * Junk DNA gets credit for making us who we are EwenCallaway New Scientist Edward E. Max, M.D., Ph.D. 57 56 March 2010 Ayala FJ * Pseudogenes: are they "junk" or functional DNA? Ayala 14616058 Annu. Rev. Genet 37 2003 * C.-KPeng SVBuldyrev ALGoldberger SHavlin FSciortino MSimons HEStanley * Long-range correlations in nucleotide sequences SVBuldyrev Goldberger S;Al; Havlin F;Sciortino M;Simons Stanley He 10.1038/356168a0 1301010 Bibcode:1992Natur.356..168P 1992 356 * WLi ,KKaneko * Long-Range Correlation and Partial 1/falpha Spectrum in a Non-Coding DNA Sequence" (PDF) KKaneko Bibcode:1992EL.....17..655L 10.1209/02955075/17/7/014 Europhys. Lett 17 7 1992 * Long-range correlations properties of coding and noncoding DNA sequences: GenBank analysis SVBuldyrev ALGoldberger SHavlin RNMantegna MMatsa C.-KPeng MSimons HEStanley; Goldberger AHavlin SMantegna RMatsa MPeng C.-KSimons MStanley H 10.1103/PhysRevE.51.5084 Phys. Rev. E 51 5 1995. 1995PhRvE