# On the Notion of Percentage Nucleotide Concentration of Genome Sequences in Terms of Cellular Automata Evolutions of Adjoints Sequences Prashanthi Govindarajan ? , Sathya Govindarajan ? & Ethirajan Govindarajan ? Abstract-This paper proposes a novel concept called "Percentage Nucleotide Concentration of genomes" in terms of cellular automata evolutions of adjoints of Adenine, Thymine, Guanine, and Cytosine. The adjoints of the given a genome sequenceare the characteristic binary string sequences. For example, the adjoint of Adenine of a given genome sequence is a binary string consisting of 0's and 1's where 1's corresponds to the presence of Adenine in the genome sequence. So, one can have four adjoint sequences of Adenine, Thymine, Guanine, and Cytosine corresponding to a given genome sequence. One-dimensional three neighborhood binary value cellular automata rules could be applied to an adjoint sequence and the desired number of evolutions obtained.These rules aredefined by linear Boolean functions and one can have 256 such linear Boolean functions.Nucleotide concentration is computed for an adjoint sequence and its variation evaluated for its successive evolutions based on a cellular automaton rule. # I. Introduction he purpose of the research carried out and reported in this paper is whether it is possible to categorize a set of genomes like the human genome repository. The concept of "%nucleotide concentration" introduced in this paper seems to show a way to accomplish this task. The genesis of the formulation of this concept originates from chemistry, wherein the quantificational notion of percentage ionic concentration of hydrogen (pH value) is used to categorize solutions into three (i) water, whose pH value is 7, (ii) acidic solutions whose pH values are less than 7 and (iii) alkaline solutions whose pH values are greater than 7. On the same lines, an effort was made to categorize genome sets based on four values (i) % nucleotide concentration of Adenine (pA), (ii) % nucleotide concentration of Thymine (pT), (iii) % nucleotide concentration of Guanine (pG) and (iv) % nucleotide concentration of Cytosine (pC). It is reasonable to surmise that these values, possibly their compositions would categorize a given set of genomes. The formulation of the concept is briefly explained below. Section 2 of this paper describes the concept formulation. Section 3 of this paper describes the fundamental notions of adjoints of a genome and their evolution using one dimensional cellular automata rules defined by linear Boolean functions. Section 4 provides experimental results of a case study pertaining to evaluation of Concentration of Nucleotides in terms of Adjoints of BrucellaSuis 1330 Genome Sequence. # II. Concept Formulation Analogous to the notion of pH value of a solution, the values of pA, pT, pG and pC of a genome sequence and possibly composition of these values like the proportion pA:pT:pG:pC seems to pave a way to classify and characterize genome sets. The definition of "Percentage Nucleotide Concentration" of a genome sequence is given below. # Definition Given a genome sequence, the number of a particular nucleotide, say A, present in that genome sequence is counted and the sum is divided by the total number of nucleotides in that genome sequence. The fraction when multiplied by 100 yields the "Percentage Concentration of Adenine pA". Similarly, one can evaluate pT, pG and pC. # One-Dimensional Three Neighborhood Cellular Automata Evolutions of Adjointsof a Genome Sequence Adjoint of a particular nucleotide in a genome sequence is the binary sequence obtained by substituting the particular nucleotides in the genome sequence by 1's and the others by 0's. For example, let us consider a sample sequence of BrucellaSuis 1330 for a case study. The actual length of the genome sequence of BrucellaSuis 1330 is 5806. A cellular T automaton is an idealized parallel processing system consisting of an array of numbers (1-D, 2-D and more) realized using updating rules based on certain neighborhood. For example, a one-dimensional cellular automaton would consist of a finite-length array as shown below. --- --- --- i-1 i i+1 --- --- --- Consider an ith cell in the array. This cell has a neighbor i-1 on its left and another i+1 on its right. All three put together is called a three neighborhood. One can assign a site (cell) variable ?i-1, ?i, and ?i+1 to the three neighborhood cells. At a particular instant of time, these variables take on numerical values, say either a 0 or a 1. In such a case, the variables are denoted as ?ti-1, ?ti, and ?ti+1. The value of the ith cell at the next instant of time is evaluated using an updating rule that involves the present values of the ith, (i-1)th and (i+1)th cells. This updating rule is essentially a linear Boolean function of three variables. One can construct 256 linear Boolean functions as updating rules of one-dimensional threeneighborhood binary-valued cellular automata. Each rule defines an automaton by itself. So, one dimensional binary valued three neighborhood cellular automata (123CA) rules could be used to model adjoints of a genome sequence. The first twenty linear Boolean functions of cellular automata 123CA are listed below with their decimal equivalents. # Linear Boolean Function Decimal Equivalent 0 0 (?? ? ???1 ?? ? ?? ?? ? ??+1 ) 1 (?? ? ???1 ?? ? ?? ?? ??+1 ) 2 (?? ? ???1 ?? ? ?? ) 3 (?? ? ???1 ?? ?? ?? ? ??+1 ) 4 (?? ? ???1 ?? ? ??+1 ) 5 (?? ? ???1 ?? ?? ?? ? ??+1 )+(?? ? ???1 ?? ? ?? ?? ??+1 ) 6 (?? ? ???1 ?? ? ??+1 ) + (?? ? ???1 ?? ? ?? ) 7 (?? ? ???1 ?? ?? ?? ??+1 ) 8 (?? ? ???1 ?? ? ?? ?? ? ??+1 ) + (?? ? ???1 ?? ?? ?? ??+1 ) 9 (?? ? ???1 ?? ??+1 ) 10 (?? ? ???1 ?? ? ?? ) + (?? ? ???1 ?? ??+1 ) 11 (?? ? ???1 ?? ?? ) 12 (?? ? ???1 ?? ? ??+1 ) + (?? ? ???1 ?? ?? ) 13 (?? ? ???1 ?? ?? ) + (?? ? ???1 ?? ??+1 ) 14 (?? ? ???1 ) 15 (?? ???1 ?? ? ?? ?? ? ??+1 ) 16 (?? ? ?? ?? ? ??+1 ) 17 (?? ???1 ?? ? ?? ?? ? ??+1 ) + (?? ? ???1 ?? ? ?? ?? ??+1 ) 18 (?? ? ?? ?? ? ??+1 ) + (?? ? ???1 ?? ? ?? ) 19 (?? ???1 ?? ? ?? ?? ? ??+1 ) + (?? ? ???1 ?? ?? ?? ? ??+1 )20 For the case study rule number 90 is applied to the adjoints of BrucellaSuis 1330 genome sequence and 500 evolutions generated. Rule 90 is shown below. (? ???1 ? ? ??+1 ) + (? ? ???1 ? ??+1 )90 Since the image of the 500 evolutions of BrucellaSuis 1330 is large, a small portion of the images are presented in this paper. # Concentration of Nucleotidesin Adjoints of Brucellasuis 1330 Genome Sequence The values of pA, pT, pG and pCof the BrucellaSuis 1330 genome sequence are computed for the adjoints A(n), T(n),G(n) and C(n) and their 500 evolutions using 123CA rules based one linear Boolean functions. Fig. 1 shows the evolutions of the adjoints of A(n), T(n). G(n) and C(n) using the linear Boolean function rule 90 of 123CA. The values are tabulated and the corresponding graphs shown subsequently. Table 1 shows the pA values of A(n) of BrucellaSuis 1330 genome sequence and the 500 generations of A(n) using rule 90 of 123CA. Figs. 2 and 3 shows the graphs of the variations of pA values of all generations. Table 2 shows the pT values of T(n) of BrucellaSuis 1330 genome sequence and the 500 generations of T(n) using rule 90 of 123CA. Figs. 4and 5 shows the graph of the variations of pT values of all generations. Table 3 shows the pG values of G(n) of BrucellaSuis 1330 genome sequence and the 500 generations of G(n) using rule 90 of 123CA. Fig. 4 shows the graph of variations of pA values of all generations. Table 4 shows the pC values of C(n) of BrucellaSuis 1330 genome sequence and 500 generations of C(n) using rule 90 of 123CA. Fig. 5 shows the graph of the variations of pC values of all generations. This paper proposes a novel concept called "Percentage Nucleotide Concentration of genomes" in terms of cellular automata evolutions of adjoints of Adenine, Thymine, Guanine, and Cytosine. The research carried out and reported in this paper exhibits the possibility to categorize a set of genomes like the human genome repository. In short, the concept of "Percentage Nucleotide Concentration (PNC)" introduced in this paper seems to show a way to accomplish this task. ![Fig. 1: Evolutions of the adjoints of A(n), T(n).G(n) and C(n).](image-2.png "") 23![Fig. 2: pA values of A(n) and of its evolutions Ae(n) pA e = 1 30.50293 e = 2 30.29625 e = 4 30.38236 e = 8 31.01963 e = 16 31.34688 e = 32 30.83018 e = 64 30.89907 e = 128 31.45022 e = 256 30.96796 Fig. 3: Minimum pA values of A(n) and of its evolutions](image-3.png "Fig. 2 :Fig. 3 :") 45![Fig. 4: pT values of T(n) and of its evolutionsTe(n) pT e = 1 30.45126 e = 2 32.15639 e = 4 31.94971 e = 8 32.3803 e = 16 32.65587 e = 32 32.19084 e = 64 31.82914 e = 128 31.82914 e = 256 33.06924Fig. 5: Minimum pT values of T(n) and of its evolutions](image-4.png "Fig. 4 :Fig. 5 :") 6![Fig. 6: pG values of G(n) and of its evolutions](image-5.png "Fig. 6 :") 7![Fig. 7: Minimum pG values of G(n) and of its evolutions Table 4: pC values of C(n) and its 500 evolutions](image-6.png "Fig. 7 :") 8![Fig. 8: pC values of C(n) and of its evolutions Ce(n) pC e = 1 40.7165 e = 2 41.31932 e = 4 39.37306 e = 8 40.69928 e = 16 39.32139 e = 32 39.88977 e = 64 40.47537 e = 128 39.57975 e = 256 40.95763](image-7.png "Fig. 8 :") 9![Fig. 9: Minimum pC values of C(n) and of its evolutions V. Conclusions](image-8.png "Fig. 9 :") 1Rule number 90 is applied to A(n)and its 500 generations. It isobserved that the pA valuebecomes minimum at regularintervals of 1, 2, 4, 8, 16, 32, 64, 128and 256. This indicates a fractalbehavioroftheevolution.Min(A(n))=30.2965andMax(A(n))=31.4502. The deviation is1.15. 2Rule number 90 is applied toT(n) and its 500 generations. Itis observed that the pT valuebecomes minimum at regularintervals of 1, 2, 4, 8, 16, 32,64, 128 and 256. Thisindicates a fractal behavior ofthe evolution.Min(A(n))=30.45126;Max(A(n))=33.06924.Thedeviation is 2.61. 3Rule number 90 is applied to G(n)and its 500 generations. It isobserved that the pG valuebecomes minimum at regularintervals of 1, 2, 4, 8, 16, 32, 64, 128and 256. This indicates a fractalbehavioroftheevolution.Min(A(n))=43.00723andMax(A(n))=44.29900 The deviationis 1.46. © 2020 Global Journals ## Acknowledgements * Worlds Record Breaking Plant: Deletes its Noncoding "Junk" DNA". Design & Trend May 12. 2013 References Références Referencias 1. Retrieved 2013-06-04 * ENCODE Project Writes Eulogy for Junk DNA EPennisi 10.1126/science.337.6099.1159 22955811 Science 337 6099 6 September 2012 * An integrated encyclopedia of DNA elements in the human genome 22955616 PMC3439153 Bibcode: 2012Natur.489...57T 2012 489 * Non-coding RNAs and Epigenetic Regulation of Gene Expression: Drivers of Natural Selection FabricoCosta 2012. 1904455948 Caister Academic Press Morris, Kevin V 7 Non-coding RNAs, Epigenomics, and Complexity in Human Cells * Junk DNA: A Journey Through the Dark Matter of the Genome NessaCarey 2015 Columbia University Press ISBN 9780231170840 * Scientists attacked over claim that 'junk DNA' is vital to life RobinMckie 24 February 2013 The Observer * The C-value paradox, junk DNA, and ENCODE SeanEddy CurrBiol 22 21 2012 * Is junk DNA bunk? A critique of ENCODE WDoolittle Ford 10.1073/pnas.1221376110.PMC3619371 23479647 Bibcode: 2013PNAS..110.5294D 2013 110 * The Case for Junk DNA AlexanderFPalazzo TGregory Ryan 10.1371/journal.pgen.1004351.ISSN1553-7404 PLoS 10 5 e1004351 2014 * On the immortality of television sets: "function" in the human genome according to the evolution-free gospel of ENCODE" (PDF) DanGraur YichenZheng NicholasPrice RicardoB RAzevedo1 RebeccaAZufall Eranelhaik 10.1093/gbe/evt028.PMC3622293 23431001 Genome Biology 5 3 2013 * CPPonting * What fraction of the human genome is functional? RCHardison 10.1101/gr.116814.110.PMC3205562 21875934 Genome 21 2011 * Defining functional DNA elements in the human genome MKellis 10.1073/pnas.1318948111.PMC4035993 24753594 Bibcode: 2014PNAS..111.6131K 2014 111 * 8.2% of the Human Genome Is Constrained: Variation in Rates of Turnover across Functional Element Classes in the Human Lineage ChrisMRands StephenMeader ChrisPPonting Gertonlunter 10.1371/journal.pgen.1004525.PMC4109858 25057982 PLoS Genet 10 7 e1004525 2014 * The extent of functionality in the human genome JSMattick MEDinger 10.1186/1877-6566-7-2 The HUGO Journal 7 1 2 2013 * Non-Coding RNAs and Epigenetic Regulation of Gene Expression: Drivers of Natural Selection KevinMorris 2012. 1904455948 Caister Academic Press Norfolk, UK * Tuning in to the signals: noncoding sequence conservation in vertebrate genomes GElgar T;Vavouri Vavouri 10.1016/j.tig.2008.04.005 18514361 Trends Genet 24 7 July 2008 * TRGregory PDHebert * The modulation of DNA content: proximate causes and ultimate consequences Hebert 10.1101/gr.9.4.317(inactive2015-02-01 10207154 Genome Res 9 4 April 1999 * Hypervariable minisatellite DNA is a hotspot for homologous recombination in human cells WPWahls 10.1016/0092-8674(90)90719-U 2295091 Cell 60 1 1990 * Plant biology: Coding in non-coding RNAs PeterMWaterhouse RogerPHellens 10.1038/nature14378 Nature 520 7545 25 March 2015 * MLi CMarin-Muller UBharadwaj KHChow QYao C;Chen ;Marin-Muller ;Bharadwaj ;Chow Yao * MicroRNAs: Control and Loss of Control in Human Physiology and Disease Chen 10.1007/s00268-008-9836-x.PMC2933043 19030926 World J 33 4 April 2009 * Genomic Views of Distant-Acting Enhancers AVisel EMRubin LAPennacchio 10.1038/nature08451.PMC2923221 19741700 Bibcode: 2009Natur.461..199V September 2009 461 * HNielsen SDJohansen * Group I introns: Moving in new directions Johansen 10.4161/rna.6.4.9334 19667762 RNA 6 4 2009 * Pseudogenes in the ENCODE regions: Consensus annotation, analysis of transcription, and evolution DZheng AFrankish RBaertsch June 2007 * 10.1101/gr.5586307.PMC1891343 17568002 Genome Res 17 6 * Dollo's law and the death and resurrection of genes CRMarshall ECRaff RARaff ;Raff Raff 7991619 PMC45421 PNAS..9112283M December 1994. 1994 91 * Pseudogenes YTutar 10.1155/2012/424526.PMC3352212 22611337 Comp Funct Genomics 424526 2012. 2012 * DAPetrov DLHartl * Pseudogene evolution and natural selection for a compact genome Hartl 10833048 J. Hered 91 3 2000 * SLPonicsan JFKugel Ja;Goodrich Kugel * Genomic gems: SINE RNAs regulate mRNA production Goodrich 10.1016/j.gde.2010.01.004.PMC2859989 20176473 Current Opinion in Genetics & Development 20 2 February 2010 * JHäsler TSamuelsson K;Strub Samuelsson * Useful 'junk': Alu RNAs in the human transcriptome Strub 10.1007/s00018-007-7084-0 17514354 Cell. Mol. Life Sci 64 14 July 2007 * RDWalters JFKugel Ja;Goodrich Kugel * InvAluable junk: the cellular impact and function of Alu and B2 RNAs Goodrich 10.1002/iub.227.PMC4049031 19621349 IUBMB Life 61 8 Aug 2009 * Human endogenous retroviruses: transposable elements with potential? PNNelson PHooley DRoden HDavariejtehadi PRylance PWarren JMartin PGMurray 10.1111/j.1365-2249.2004.02592.x.PMC1809191 15373898 Clin ExpImmunol 138 1 Oct 2004