On the Notion of Percentage Nucleotide Concentration of Genome Sequences in Terms of Cellular Automata Evolutions of Adjoints Sequences

Table of contents

1. On the Notion of Percentage Nucleotide Concentration of Genome Sequences in Terms of Cellular Automata Evolutions of Adjoints Sequences

Prashanthi Govindarajan ? , Sathya Govindarajan ? & Ethirajan Govindarajan ?

Abstract-This paper proposes a novel concept called "Percentage Nucleotide Concentration of genomes" in terms of cellular automata evolutions of adjoints of Adenine, Thymine, Guanine, and Cytosine. The adjoints of the given a genome sequenceare the characteristic binary string sequences. For example, the adjoint of Adenine of a given genome sequence is a binary string consisting of 0's and 1's where 1's corresponds to the presence of Adenine in the genome sequence. So, one can have four adjoint sequences of Adenine, Thymine, Guanine, and Cytosine corresponding to a given genome sequence. One-dimensional three neighborhood binary value cellular automata rules could be applied to an adjoint sequence and the desired number of evolutions obtained.These rules aredefined by linear Boolean functions and one can have 256 such linear Boolean functions.Nucleotide concentration is computed for an adjoint sequence and its variation evaluated for its successive evolutions based on a cellular automaton rule.

2. I. Introduction

he purpose of the research carried out and reported in this paper is whether it is possible to categorize a set of genomes like the human genome repository. The concept of "%nucleotide concentration" introduced in this paper seems to show a way to accomplish this task. The genesis of the formulation of this concept originates from chemistry, wherein the quantificational notion of percentage ionic concentration of hydrogen (pH value) is used to categorize solutions into three (i) water, whose pH value is 7, (ii) acidic solutions whose pH values are less than 7 and (iii) alkaline solutions whose pH values are greater than 7. On the same lines, an effort was made to categorize genome sets based on four values (i) % nucleotide concentration of Adenine (pA), (ii) % nucleotide concentration of Thymine (pT), (iii) % nucleotide concentration of Guanine (pG) and (iv) % nucleotide concentration of Cytosine (pC). It is reasonable to surmise that these values, possibly their compositions would categorize a given set of genomes. The formulation of the concept is briefly explained below. Section 2 of this paper describes the concept formulation.

Section 3 of this paper describes the fundamental notions of adjoints of a genome and their evolution using one dimensional cellular automata rules defined by linear Boolean functions. Section 4 provides experimental results of a case study pertaining to evaluation of Concentration of Nucleotides in terms of Adjoints of BrucellaSuis 1330 Genome Sequence.

3. II. Concept Formulation

Analogous to the notion of pH value of a solution, the values of pA, pT, pG and pC of a genome sequence and possibly composition of these values like the proportion pA:pT:pG:pC seems to pave a way to classify and characterize genome sets. The definition of "Percentage Nucleotide Concentration" of a genome sequence is given below.

4. Definition

Given a genome sequence, the number of a particular nucleotide, say A, present in that genome sequence is counted and the sum is divided by the total number of nucleotides in that genome sequence. The fraction when multiplied by 100 yields the "Percentage Concentration of Adenine pA". Similarly, one can evaluate pT, pG and pC.

5. One-Dimensional Three Neighborhood Cellular Automata Evolutions of Adjointsof a Genome Sequence

Adjoint of a particular nucleotide in a genome sequence is the binary sequence obtained by substituting the particular nucleotides in the genome sequence by 1's and the others by 0's. For example, let us consider a sample sequence of BrucellaSuis 1330 for a case study. The actual length of the genome sequence of BrucellaSuis 1330 is 5806. A cellular T automaton is an idealized parallel processing system consisting of an array of numbers (1-D, 2-D and more) realized using updating rules based on certain neighborhood. For example, a one-dimensional cellular automaton would consist of a finite-length array as shown below.

--- --- --- i-1 i i+1 --- --- ---

Consider an ith cell in the array. This cell has a neighbor i-1 on its left and another i+1 on its right. All three put together is called a three neighborhood. One can assign a site (cell) variable ?i-1, ?i, and ?i+1 to the three neighborhood cells. At a particular instant of time, these variables take on numerical values, say either a 0 or a 1. In such a case, the variables are denoted as ?ti-1, ?ti, and ?ti+1. The value of the ith cell at the next instant of time is evaluated using an updating rule that involves the present values of the ith, (i-1)th and (i+1)th cells. This updating rule is essentially a linear Boolean function of three variables. One can construct 256 linear Boolean functions as updating rules of one-dimensional threeneighborhood binary-valued cellular automata. Each rule defines an automaton by itself. So, one dimensional binary valued three neighborhood cellular automata (123CA) rules could be used to model adjoints of a genome sequence. The first twenty linear Boolean functions of cellular automata 123CA are listed below with their decimal equivalents.

6. Linear Boolean Function

Decimal Equivalent 0 0 (?? ? ???1 ?? ? ?? ?? ? ??+1 ) 1 (?? ? ???1 ?? ? ?? ?? ??+1 ) 2 (?? ? ???1 ?? ? ?? ) 3 (?? ? ???1 ?? ?? ?? ? ??+1 ) 4 (?? ? ???1 ?? ? ??+1 ) 5 (?? ? ???1 ?? ?? ?? ? ??+1 )+(?? ? ???1 ?? ? ?? ?? ??+1 ) 6 (?? ? ???1 ?? ? ??+1 ) + (?? ? ???1 ?? ? ?? ) 7 (?? ? ???1 ?? ?? ?? ??+1 ) 8 (?? ? ???1 ?? ? ?? ?? ? ??+1 ) + (?? ? ???1 ?? ?? ?? ??+1 ) 9 (?? ? ???1 ?? ??+1 ) 10 (?? ? ???1 ?? ? ?? ) + (?? ? ???1 ?? ??+1 ) 11 (?? ? ???1 ?? ?? ) 12 (?? ? ???1 ?? ? ??+1 ) + (?? ? ???1 ?? ?? ) 13 (?? ? ???1 ?? ?? ) + (?? ? ???1 ?? ??+1 ) 14 (?? ? ???1 ) 15 (?? ???1 ?? ? ?? ?? ? ??+1 ) 16 (?? ? ?? ?? ? ??+1 ) 17 (?? ???1 ?? ? ?? ?? ? ??+1 ) + (?? ? ???1 ?? ? ?? ?? ??+1 ) 18 (?? ? ?? ?? ? ??+1 ) + (?? ? ???1 ?? ? ?? ) 19 (?? ???1 ?? ? ?? ?? ? ??+1 ) + (?? ? ???1 ?? ?? ?? ? ??+1 )20

For the case study rule number 90 is applied to the adjoints of BrucellaSuis 1330 genome sequence and 500 evolutions generated. Rule 90 is shown below.

(? ???1 ? ? ??+1 ) + (? ? ???1 ? ??+1 )90

Since the image of the 500 evolutions of BrucellaSuis 1330 is large, a small portion of the images are presented in this paper.

7. Concentration of Nucleotidesin Adjoints of Brucellasuis 1330 Genome Sequence

The values of pA, pT, pG and pCof the BrucellaSuis 1330 genome sequence are computed for the adjoints A(n), T(n),G(n) and C(n) and their 500 evolutions using 123CA rules based one linear Boolean functions. Fig. 1 shows the evolutions of the adjoints of A(n), T(n). G(n) and C(n) using the linear Boolean function rule 90 of 123CA. The values are tabulated and the corresponding graphs shown subsequently. Table 1 shows the pA values of A(n) of BrucellaSuis 1330 genome sequence and the 500 generations of A(n) using rule 90 of 123CA. Figs. 2 and 3 shows the graphs of the variations of pA values of all generations. Table 2 shows the pT values of T(n) of BrucellaSuis 1330 genome sequence and the 500 generations of T(n) using rule 90 of 123CA. Figs. 4and 5 shows the graph of the variations of pT values of all generations. Table 3 shows the pG values of G(n) of BrucellaSuis 1330 genome sequence and the 500 generations of G(n) using rule 90 of 123CA. Fig. 4 shows the graph of variations of pA values of all generations. Table 4 shows the pC values of C(n) of BrucellaSuis 1330 genome sequence and 500 generations of C(n) using rule 90 of 123CA. Fig. 5 shows the graph of the variations of pC values of all generations. This paper proposes a novel concept called "Percentage Nucleotide Concentration of genomes" in terms of cellular automata evolutions of adjoints of Adenine, Thymine, Guanine, and Cytosine. The research carried out and reported in this paper exhibits the possibility to categorize a set of genomes like the human genome repository. In short, the concept of "Percentage Nucleotide Concentration (PNC)" introduced in this paper seems to show a way to accomplish this task.

Figure 1.
Fig. 1: Evolutions of the adjoints of A(n), T(n).G(n) and C(n).
Figure 2. Fig. 2 :Fig. 3 :
23Fig. 2: pA values of A(n) and of its evolutions Ae(n) pA e = 1 30.50293 e = 2 30.29625 e = 4 30.38236 e = 8 31.01963 e = 16 31.34688 e = 32 30.83018 e = 64 30.89907 e = 128 31.45022 e = 256 30.96796 Fig. 3: Minimum pA values of A(n) and of its evolutions
Figure 3. Fig. 4 :Fig. 5 :
45Fig. 4: pT values of T(n) and of its evolutionsTe(n) pT e = 1 30.45126 e = 2 32.15639 e = 4 31.94971 e = 8 32.3803 e = 16 32.65587 e = 32 32.19084 e = 64 31.82914 e = 128 31.82914 e = 256 33.06924Fig. 5: Minimum pT values of T(n) and of its evolutions
Figure 4. Fig. 6 :
6Fig. 6: pG values of G(n) and of its evolutions
Figure 5. Fig. 7 :
7Fig. 7: Minimum pG values of G(n) and of its evolutions Table 4: pC values of C(n) and its 500 evolutions
Figure 6. Fig. 8 :
8Fig. 8: pC values of C(n) and of its evolutions Ce(n) pC e = 1 40.7165 e = 2 41.31932 e = 4 39.37306 e = 8 40.69928 e = 16 39.32139 e = 32 39.88977 e = 64 40.47537 e = 128 39.57975 e = 256 40.95763
Figure 7. Fig. 9 :
9Fig. 9: Minimum pC values of C(n) and of its evolutions V. Conclusions
Figure 8. Table 1 :
1
Rule number 90 is applied to A(n)
and its 500 generations. It is
observed that the pA value
becomes minimum at regular
intervals of 1, 2, 4, 8, 16, 32, 64, 128
and 256. This indicates a fractal
behavior of the evolution.
Min(A(n))=30.2965 and
Max(A(n))=31.4502. The deviation is
1.15.
Figure 9. Table 2 :
2
Rule number 90 is applied to
T(n) and its 500 generations. It
is observed that the pT value
becomes minimum at regular
intervals of 1, 2, 4, 8, 16, 32,
64, 128 and 256. This
indicates a fractal behavior of
the evolution.
Min(A(n))=30.45126;
Max(A(n))=33.06924. The
deviation is 2.61.
Figure 10. Table 3 :
3
Rule number 90 is applied to G(n)
and its 500 generations. It is
observed that the pG value
becomes minimum at regular
intervals of 1, 2, 4, 8, 16, 32, 64, 128
and 256. This indicates a fractal
behavior of the evolution.
Min(A(n))=43.00723 and
Max(A(n))=44.29900 The deviation
is 1.46.
1

Appendix A

Appendix A.1 Acknowledgements

Appendix B

  1. , 10.1101/gr.5586307.PMC1891343. 17568002. Genome Res 17 (6) p. .
  2. The Case for Junk DNA. Alexander F Palazzo , T Gregory , Ryan . 10.1371/journal.pgen.1004351.ISSN1553-7404. PLoS 2014. 10 (5) p. e1004351.
  3. An integrated encyclopedia of DNA elements in the human genome. 22955616. PMC3439153. Bibcode: 2012Natur.489...57T, 2012. 489 p. .
  4. Genomic Views of Distant-Acting Enhancers. A Visel , E M Rubin , L A Pennacchio . 10.1038/nature08451.PMC2923221. 19741700. Bibcode: 2009Natur.461..199V, September 2009. 461 p. .
  5. MicroRNAs: Control and Loss of Control in Human Physiology and Disease. Chen . 10.1007/s00268-008-9836-x.PMC2933043. 19030926. World J April 2009. 33 (4) p. .
  6. 8.2% of the Human Genome Is Constrained: Variation in Rates of Turnover across Functional Element Classes in the Human Lineage. Chris M Rands , Stephen Meader , Chris P Ponting , Gertonlunter . 10.1371/journal.pgen.1004525.PMC4109858. 25057982. PLoS Genet 2014. 10 (7) p. e1004525.
  7. , C P Ponting .
  8. Dollo's law and the death and resurrection of genes. C R Marshall , E C Raff , R A Raff , ; Raff , Raff . 7991619. PMC45421. PNAS..9112283M, December 1994. 1994. 91 p. .
  9. On the immortality of television sets: "function" in the human genome according to the evolution-free gospel of ENCODE" (PDF). Dan Graur , Yichen Zheng , Nicholas Price , Ricardo B R Azevedo1 , Rebecca A Zufall , Eranelhaik . 10.1093/gbe/evt028.PMC3622293. 23431001. Genome Biology 2013. 5 (3) p. .
  10. , D A Petrov , D L Hartl .
  11. Pseudogenes in the ENCODE regions: Consensus annotation, analysis of transcription, and evolution, D Zheng , A Frankish , R Baertsch . June 2007.
  12. ENCODE Project Writes Eulogy for Junk DNA. E Pennisi . 10.1126/science.337.6099.1159. 22955811. Science 6 September 2012. 337 (6099) p. .
  13. Non-coding RNAs and Epigenetic Regulation of Gene Expression: Drivers of Natural Selection, Fabrico Costa . 2012. 1904455948. Morris, Kevin V: Caister Academic Press. (7 Non-coding RNAs, Epigenomics, and Complexity in Human Cells)
  14. Tuning in to the signals: noncoding sequence conservation in vertebrate genomes. G Elgar , T; Vavouri , Vavouri . 10.1016/j.tig.2008.04.005. 18514361. Trends Genet July 2008. 24 (7) p. .
  15. InvAluable junk: the cellular impact and function of Alu and B2 RNAs. Goodrich . 10.1002/iub.227.PMC4049031. 19621349. IUBMB Life Aug 2009. 61 (8) p. .
  16. Genomic gems: SINE RNAs regulate mRNA production. Goodrich . 10.1016/j.gde.2010.01.004.PMC2859989. 20176473. Current Opinion in Genetics & Development February 2010. 20 (2) p. .
  17. Pseudogene evolution and natural selection for a compact genome. Hartl . 10833048. J. Hered 2000. 91 (3) p. .
  18. The modulation of DNA content: proximate causes and ultimate consequences. Hebert . 10.1101/gr.9.4.317(inactive2015-02-01. 10207154. Genome Res April 1999. 9 (4) p. .
  19. , H Nielsen , S D Johansen .
  20. , J Häsler , T Samuelsson , K; Strub , Samuelsson .
  21. Group I introns: Moving in new directions. Johansen . 10.4161/rna.6.4.9334. 19667762. RNA 2009. 6 (4) p. .
  22. The extent of functionality in the human genome. J S Mattick , M E Dinger . 10.1186/1877-6566-7-2. The HUGO Journal 2013. 7 (1) p. 2.
  23. Non-Coding RNAs and Epigenetic Regulation of Gene Expression: Drivers of Natural Selection, Kevin Morris . 2012. 1904455948. Norfolk, UK: Caister Academic Press.
  24. Defining functional DNA elements in the human genome. M Kellis . 10.1073/pnas.1318948111.PMC4035993. 24753594. Bibcode: 2014PNAS..111.6131K, 2014. 111 p. .
  25. , M Li , C Marin-Muller , U Bharadwaj , K H Chow , Q Yao , C; Chen , ; Marin-Muller , ; Bharadwaj , ; Chow , Yao .
  26. Junk DNA: A Journey Through the Dark Matter of the Genome, Nessa Carey . 2015. Columbia University Press. (ISBN 9780231170840)
  27. Plant biology: Coding in non-coding RNAs. Peter M Waterhouse , Roger P Hellens . 10.1038/nature14378. Nature 25 March 2015. 520 (7545) p. .
  28. Human endogenous retroviruses: transposable elements with potential?. P N Nelson , P Hooley , D Roden , H Davariejtehadi , P Rylance , P Warren , J Martin , P G Murray . 10.1111/j.1365-2249.2004.02592.x.PMC1809191. 15373898. Clin ExpImmunol Oct 2004. 138 (1) p. .
  29. What fraction of the human genome is functional?. R C Hardison . 10.1101/gr.116814.110.PMC3205562. 21875934. Genome 2011. 21 p. .
  30. , R D Walters , J F Kugel , Ja; Goodrich , Kugel .
  31. Scientists attacked over claim that 'junk DNA' is vital to life, Robin Mckie . 24 February 2013. (The Observer)
  32. The C-value paradox, junk DNA, and ENCODE. Sean Eddy . CurrBiol 2012. 22 (21) p. .
  33. , S L Ponicsan , J F Kugel , Ja; Goodrich , Kugel .
  34. Useful 'junk': Alu RNAs in the human transcriptome. Strub . 10.1007/s00018-007-7084-0. 17514354. Cell. Mol. Life Sci July 2007. 64 (14) p. .
  35. , T R Gregory , P D Hebert .
  36. Is junk DNA bunk? A critique of ENCODE. W Doolittle , Ford . 10.1073/pnas.1221376110.PMC3619371. 23479647. Bibcode: 2013PNAS..110.5294D, 2013. 110 p. .
  37. Worlds Record Breaking Plant: Deletes its Noncoding "Junk" DNA". Design & Trend, May 12. 2013. (References Références Referencias 1. Retrieved 2013-06-04)
  38. Hypervariable minisatellite DNA is a hotspot for homologous recombination in human cells. W P Wahls . 10.1016/0092-8674(90)90719-U. 2295091. Cell 1990. 60 (1) p. .
  39. Pseudogenes. Y Tutar . 10.1155/2012/424526.PMC3352212. 22611337. Comp Funct Genomics 2012. 2012. p. 424526.
Notes
1
© 2020 Global Journals
Date: 2020-01-15