# 2011 December nference of gene regulation mechanism from time series of gene expression patterns are getting more important especially due to the invention of DNA microarray technology. Expression profiles of several thousands of genes are now being produced for further analyses. Some methods have been proposed for the inference of gene regulation mechanism from time series of gene expression patterns. A statistical method is proposed by Arkin, Shen and Ross. They used correlation matrices to infer chemical reaction networks from time series of measured concentration of species. Though they treated chemical reaction networks, they also suggested that their method might be applied to genetic networks. However, it seems difficult to apply their method to the inference of large scale networks. A metabolic pathway is suggested to be inferred by DeRisi, Iyer and Brown from gene expression patterns of Saccharomyces cerevisiae obtained by using DNA microarrays. A network model similar to the Boolean network model is constructed by Yuh, Bolouri and Davidson from time series of expression patterns I General Terms : Gene network, gene regulatory network, Genetic network, gene expression pattern, Boolean network, microarray data, Sum of product, Boolean algebra. Keywords : Generation, consistent network. relating to a sea urchin gene. But their inference methods are not systematic or automatic. Besides, some studies have been done on the inference of genetic networks from state transition data using the Boolean network . On the other hannd, Liang, Fuhrman and Somogyi proposed an algorithm named REVEAL for inference of Boolean networks (corresponding to genetic networks) from state transition tables (corresponding to time series of gene expression patterns). REVEAL used information theoretic principles in order to reduce the search space. They made some computational experiments on REVEAL. The results suggested that only a small number of state transition pairs (100 pairs from 1015) were sufficient for inferring Boolean networks with 50 nodes (genes) whose in degree (the number of input nodes to a node) was bounded by 3. Tatsuya AKUTSU, Satoru MIYANO and Satoru KUHARA gave a mathematical proof for their observation. We will extend their algorithm to identify genetic networks from gene expression patterns derived by gene disruptions and gene over expressions using a Boolean network-like model. They proved mathematically a lower bound and an upper bound of the number of expression patterns required to identify the network correctly. In this paper we will try to extend try to provide an algorithm to generate all the consistent genetic networks from a small number of gene expression patterns under the Boolean network model. a) Genetic network A gene regulatory network or genetic regulatory network (GRN) is a collection of DNA segments in a cell which interact with each other (indirectly through their RNA and protein expression products) and with other substances in the cell, thereby governing the rates at which genes in the network are transcribed into mRNA. # b) Boolean Network A Boolean network G (V;F) consists of a set V = {v 1 ,v 2 ,??,v n } of nodes representing genes and a list F = (f 1 ,f 2 ,??,f n ) of Boolean functions, where a Boolean function f i (v i1 ,v i2 ,?,v ik ) with inputs from specified nodes v i1 ,v i2 ?,v ik is assigned to each node v i . For a subset U ? ? E-mail : E-mail : E-mail : E-mail : , Md. Rokon Uddin ¥ c) Definition of Problem Let (I j ;O j ) be a pair of expression patterns of {v 1 ,v 2 ,?.,v n }, where I j corresponds to the INPUT and O j corresponds to the OUTPUT. We call the pair (Ij;Oj) an example. We say that a node vi in a Boolean network G(V;F) is consistent with an example (Ij;Oj), if Oj(vi)= f i (I j (v i1 ,?I j (v ik )) holds. We say that a Boolean network G(V;F) is consistent with (I j ;O j ) if all nodes are consistent with (Ij;Oj). There are 2 n minterms of n variables, since a variable in the minterm expression can be in either its direct or its complemented form--two choices per n variables. It is apparent that minterm n gives a true value (i.e., 1 ) for just one combination of the input variables. If one is given a truth table of a logical function, it is possible to write the function as a "sum of products". This is a special form of disjunctive normal form. For example, if given the truth table for the arithmetic sum bit u of one bit position's logic of an adder circuit, as a function of x and y from the addends and the carry in, ci: of V, an expression pattern of U is a function from U to {0,1}. An expression pattern of V is also called a state of a Boolean network. That is, represents the states of nodes (genes), where each node is assumed to take either 0 (not-express) or 1 (express) as its state value. © Figure 1(a) : A genetic network represented by Boolean network Suppose we have n number of nodes (v 1 , v 2 ,??..v n ) and a pair of expression pattern (I,O), where I correspond to input pattern and O corresponds to output pattern. V 1 to v n are assumed to take either 0 (not-express) or 1 (express) as its state value. First we generate a Boolean network consistent with (I,O) and then eliminate nodes which aren't essential to generate more networks. The idea is as follows:-1. Repeat step 2 for each node v i Ñ?" V. 1 v 2 v 3 v ' 1 v ' 2 v ' 3 v I 1 I 2 I 3 1 0 0 0 1 0 0 1 1 0 0 1 0 1 1 1 0 0 O 1 O 2 O 3 G 1 2 ' 1 v v = , 2 ' 2 v v = AND (NOT 3 v ) = ' 3 v NOT 3 v d) # Generate Boolean equation and assign in ?? ?? ? using Sum of Product (SOP) notation based on input and output expression patterns (I and O). As here are 3 genes we have got 3 Boolean expressions for them. They can be formulated according to previous discussion . They are listed below:- ?? 1 ? = ?(3,5), ?? 2 ? = ?(0,5), ?? 3 ? = ?(0,3,6,7). Using the above example first we'll calculate the running time of the propose 4d algorithm. To generate an Boolean equation for any state first we have to check how many 1s it has in output pattern. It can have maximum 2 n 1s. So O(n*2 n ) is the running time for generation of each equation. As there are n nodes and one variable is associated with each node so running time of step two of the algorithm is O(n 2 * 2 n ). Again for each node, there can be maximum 2 n minterms and step 4 eliminates one minterm in each iteration. Now two minterms can be chosen in ? ?? 2 ? ways (let m = 2 n ). If we reduce one minterm at a time then complexity of step 4 can be calculated as follows:- In first iteration there are m minterms. So we can choose two of them to reduce into one in ? ?? 2 ? ways. In second iteration there are m-1 minterms. So we can choose two of them to reduce into one in ? ?? ?1 2 ? ways and so on. Similarly at the end we should have two minterms and we can choose them in one way. So the complexity will be On the other hand if we assign serial numbers to each row of the input expression patterns from 0 to 2 n -1 then some of products can be expressed with summation notation (?). For example, for the above table as in output column the row 1,2,4 and 7 contain 1, so the output ex[ression can be formulated as u(ci, x, y) = ? (1,2,4,7). ð??"ð??"(??) = ? ?? 2 ? +? ?? ?1 2 ? +? ?? ?2 2 ? +?.???+ ? ?? ?(?? ?3) 2 ? + ? ?? ?(?? ?2) 2 ? = ? ?? +1 3 ? = (?? +1)! 3!(?? ?2)! = ?? 3 ??? 6 = (2 It is apparent that this notation can be simplified using the help of computer more easily and efficiently. Observing that the rows that have an output of 1 are the 2nd, 3rd, 5th, and 8th, we can write u as a sum of minterms m 1 ,m 2 ,m 4 , and m 7 . If we wish to verify this: u(ci, x, y) = m 1 + m 2 + m 4 + m 7 = (ci' x' y) + (ci' x y') + (ci x' y') + (ci x y) evaluated for all 8 combinations of the three variables will match the table. network take binary values which are updated synchronously, whereas quantities of gene expressions in real cells are not binary and are changing continuously in time. However, owing to its simplicity, the proposed algorithm can be extended in various way. This algorithm can be extended to identify, count and enumerate all the consistent networks also. Finally, we believe that our theoretical results, along with the mathematical computations encourage the attempts to discover the gene regulation mechanism from time series of gene expression patterns. ![For a set of examples EX = {(I 1 ;O 1 ),(I 2 ;O 2 ),??..(I m ;O m )}, we say that G(V;F) (resp. node vi) is consistent with EX if G(V;F) (resp. node vi) is consistent with all (Ij;Oj) for 1 <= j <= m. Then, the problems are defined as:-Consistency : Given n (the number of nodes) and EX, decide whether or not there exists a Boolean network consistent with EX and output one if it exists;Generation : Given n (the number of nodes) and EX, generate all the number of Boolean networks consistent with EX.](image-2.png "") 1![a) : A genetic network with its input/output patterns and a consistent Boolean network uncomplemented form) is called a minterm. Thus, a minterm is a logical expression of n variables that employs only the complement operator and the conjunction operator. For example, abc, ab'c and abc' are 3 examples of the 8 minterms for a Boolean function of the three variables a, b and c. The customary reading of the last of these is a AND b AND NOT-c.](image-3.png "Table 1 (") ![For example, minterm 5, a b' c, is true only when a and c both are true and b is false-the input arrangement where a = 1, b = 0, c = 1 results in 1.](image-4.png "") ![2011 Global Journals Inc. (US) Global Journal of Computer Science and Technology Volume XI Issue XXI Version I 48 2011 December Generation of Genetic Networks from a Small Number of Gene Expression Patterns under the Boolean Network Model](image-5.png "") ![Process Development e) Formulation of minterms For a boolean function of n variables , a product term in which each of the n variables appears once (in either its complemented or](image-6.png "") 3![Repeat step 4 and 5 for each ?? ?? ? . 4. Eliminate variables v i from ?? ?? ? using Boolean algebra until elimination is impossible. 5. This is a new Boolean network consistent with input and output expression patterns (I and O), store it. g) Complexity Consider the following expression pattern:-Table 2(a) : Generation of Boolean network from input /output expression patterns](image-7.png "3 .") 3686![This function expresses the complexity of one simplifying one equation. There are at most n equations.So the overall complexity is given by8 ?? ?2 ?? 6 *n. So the running time of the algorithm is O( 8 ?? ?2 ?? 6 *n + n 2 * 2 n ) = O(n*8 n ). © 2011 Global Journals Inc. (US) Global Journal of Computer Science and Technology Volume XI Issue XXI Version I 49 2011 December Generation of Genetic Networks from a Small Number of Gene Expression Patterns under the Boolean Network Model](image-8.png "?? ) 3 ?2 ?? 6 = 8 ?? ?2 ?? 6") 1(a) : Minterms formulation from a Boolean network © 2011 Global Journals Inc. (US) ## EXPERIMENTAL OUTCOMES a) Performance Analysis This paper improves the previous woks done by the authors worldwide. The improvements are listed below:-1) It generates only the consistent network. So overhead of checking inconsistent networks is eliminated. 2) Huge improvement is achieved in running time. 3) As all consistent networks are generated, so counting problem is solved simultaneously. 4) The in degree/out degree have no limit. A node (gene) can be expressed by any number of nodes(genes). IV. ## CONCLUSION We have proved mathematically that to generate all consistent Boolean networks it requires O(n*8 n ) times. For that purpose, we proposed a simple algorithm. Of course, real biological systems are different from Boolean networks nodes in a Boolean * Identification Of Genetic networks From A Small Number Of Gene Expression Patterns Under The Boolean Network Model AkutsuTatsuya MiyanoSatoru KuharaSatoru * REVEAL, a general reverse engineering algorithm for inference of genetic network architectures SLiang SFuhrman RSomogyi Paci_c Symposium on Biocomputing 3 18 1998 * GeneNetwork: an interactive tool for reconstruction of genetic networks using microarray data Chia-ChinWu Hsuan-ChengHuang Hsueh-FenJuan Shui-TeinChen * Inferring Gene Regulatory Networks fromRaw Data: A Molecular Epistemics Approach DAKightley NChandra KElliston Pacific Symposium on Biocomputing 2004 9 * Inference of Genetic Regulatory Networks with Recurrent Neural Network Models Using Particle Swarm Optimization RuiXu DonaldCWunsch II RonaldLFrank * Identification of gene regulatory networks by strategic gene disruptions and gene over expressions TAkutsu SKuhara OMaruyama SMiyano Proc. 9th ACM-IAM Symp. Discrete Algorithms 9th ACM-IAM Symp. Discrete Algorithms 1998 695 * A test case of correlation metric construction of a reaction pathway from measurements AArkin PShen JRoss Science 277 1275 1997 * Exploring the metabolic and genetic control of gene expression on a genomic scale JLDerisi VRLyer POBrown Science 278 680 1997 * An Introduction to Computational Learning Theory MJKearns UVVazirani 1994 The MIT Press * Circuit simulation of genetic networks HHMcadams LShapiro Science 269 650 1995 * Modeling the complexity of genetic networks: Understanding multigene and pleiotropic regulation RSomogyi CASniegoski Complexity 1 45 1996 * Dynamical behaviour of biological regulatory networks -I. Biological role of feedback loops and practical use of the concept of the loop-characteristic state RThomas DThie_Ry MKaufman Bulletin of Mathematical Biology 57 247 1995 * Wikipedia the free encyclopedia * DonaldEKnuth Concrete Mathematics 14 * Large-scale temporal gene expression mapping of central nervous system development XWen Proc. Natl. Acad. Sci. USA Natl. Acad. Sci. USA 1998 95 334 * Genomic regulation modeled as a network with basins of attraction AWuensche Paci_c Symposium on Biocomputing 3 89 1998 * Genomic Cis-regulatory logic: experimental and computational analysis of a sea urchin gene EHBolouri Davidson Science 279 1896 1998 * REFERENCES REFERENCES REFERENCIAS 17