# Introduction roteins are molecules with macro structure that are responsible for a wide range of vital biochemical functions, which includes acting as oxygen, cell signaling, antibody production, nutrient transport and building up muscle fibers. Specifically, the proteins are chains of amino acids, of which there are 20 different types, coupled by peptide bonds [2]. The three-tiered structural hierarchy possessed by proteins is typically referred to as primary and tertiary structure. This is because the higher-level and secondary level [1], [2] structures determine the function of the proteins and consequently, the insight into its function can be inferred from that. As genome sequencing projects are increasing tremendously. The SWISS-PORT databases [3], [4] of primary protein structures are expanding tremendously. Protein Data Banks are not growing at a faster rate due to innate difficulties in finding the levels of the structures. Structure determination [5], [6] procedure experimental setups will be very expensive, time consuming, require more labor and may not applicable to all the proteins. Keeping in view of shortcomings of laboratory procedures in predicting the structure of protein major research have been dedicated to protein prediction of high level structures using computational techniques. Anfinsen did a pioneering work predicting the protein structure from amino acid sequences [6], [7]. This is usually called as protein folding problem which is the greatest challenge in bioinformatics. This is the ability to predict the higher level structures from the amino acid sequence. By predicting the structure of protein the topology of the chain can be described. The tree dimensional arrangement of amino acid sequences can be described by tertiary structure. They can be predicted independent of each other. Functionality of the protein can be affected by the tertiary structure, topology and the tertiary structure. Structure aids in the identification of membrane proteins, location of binding sites and identification of homologous proteins [9], [10], [11] to list a few of the benefits, and thus highlighting the importance, of knowing this level of structure This is the reason why considerable efforts have been devoted in predicting the structure only. Knowing the structure of a protein is extremely important and can also greatly enhance the accuracy of tertiary structure prediction. Furthermore, proteins can be classified according to their structural elements, specifically their alpha helix and beta sheet content. # Related Works in Structure Prediction The Objective of structure prediction is to identify whether the amino acid residue of protein is in helix, strand or any other shape. In 1960 as a initiative step of structure prediction the probability of respective structure element is calculated for each amino acid by taking single amino acid properties consideration [1], [3], [6]. This method of structure prediction is said to be first generation technique. Later this work extended by considering the local environment of amino acid said to be second generation technique. In case of particular amino acid structure prediction adjacent residues information also needed, it considers the local environment of amino acid it gives 65% structure information. So that extension work gives 60% accuracy. The third generation technique includes machine learning, knowledge about proteins, several algorithms which gives 70% accuracy. Neural networks [10], [11] are also useful in implementing structure prediction programs like PHD, SAM-T99. The evolution process is directed by the popular Genetic Algorithm (GA) with the underlying philosophy of survival of the fittest gene. This GA framework can be adopted to arrive at the desired CA rule structure appropriate to model a physical system. The goals of GA formulation are to enhance the understanding of the ways CA performs computations and to learn how CA may be evolved to perform a specific computational task and to understand how evolution creates complex global behavior in a locally interconnected system of simple cells. Artificial immune systems are motivated by the theory of immunology. The biological immune system functions to protect the body against pathogens or antigens that could potentially cause harm. It works by producing antibodies that identify, bind to, and finally eliminate the pathogens. Even though the number of antigens is far larger than the number of antibodies, the biological immune system has evolved to allow it to deal with the antigens. The immune system will learn the criteria of the antigens so that in future it can react both to those antigens it has encountered before as well as to entirely new ones. In 2002, de Castro and Timmis [17], suggested that "for a system to be characterized as an artificial immune system, it has to embody at least a basic model of an immune component (e.g. cell, molecule, organ), it has to have been designed using the ideas from theoretical and/or experimental immunology. IV. Step 1: Generate a AIS-PSMACA with k number of attractor basins. # Design of MACA based Pattern Classifier with Artificial Immune System Step 2: Distribute S into k attractor basins (nodes). Step 3: Evaluate the distribution of examples in each attractor basin Step 4: If all the examples (S") of an attractor basin (node) belong to only one class, then label the attractor basin (leaf node) for that class. Step 5: If examples (S") of an attractor basin belong to K" number of classes, then Partition (S", K"). Step 6: Stop. A special class of non-linear CA, termed as Multiple Attractor CA (SPECIAL MACA), has been proposed to develop the model. Theoretical analysis, reported in this chapter, provides an estimate of the noise accommodating capability of the proposed SPECIAL MACA based associative memory model. Characterization of the basins of attraction of the proposed model establishes the sparse network of nonlinear CA (SPECIAL MACA) as a powerful pattern recognizer for memorizing unbiased patterns. It provides an efficient and cost-effective alternative to the dense network of neural net for pattern recognition. Detailed analysis of the SPECIAL MACA rule space establishes the fact that the rule subspace of the pattern recognizing/classifying CA lies at the edge of chaos. Such a CA, as projected in [20], is capable of executing complex computation. The analysis and experimental results reported in the current and next chapters confirm this viewpoint. A SPECIAL MACA employing the CA rules at the edge of chaos is capable of performing complex computation associated with pattern recognition. # c) Algorithm Single Point Crossover Input : Two randomly selected rule vectors (Parent 1 and 2). Output : Resultant rule vectors (Offspring 1 and 2). Step 1: Randomly generate a number "q" in between 1 and n. Step 2: Take the first q rules (symbols) from first rule vector (Parent 1) and the (n-q) rules of Parent 2. Form a new rule vector (Offspring 1) concatenating these rules. Step 3: Form Offspring 2 by concatenating the first q rules of Parent 2 and the last (n-q) rules of Parent 1. Step 4: Stop. # d) Random Generation of Initial Population To form the initial population, it must be ensured that each solution randomly generated is a combination of an n-bit DS with 2m number of attractor basins (Classifier #1) and an m-bit DV (Classifier #2). The chromosomes are randomly synthesized according to the following steps. V. # Experimental Step ? Select the target CA protein (amino acid sequence) T, whose structure is to be predicted. ? Perform a AIS-PSMACA search, using the primary amino acid sequence Tp of the target CA protein T. The objective is being to locate a set of CA proteins, S = {S1, S2?} of similar sequence ? Select from S the primary structure Bp of a base CA protein, with a significant match to the target CA protein. A AIS-PSMACA [16],[18] search produces a measure of similarity between each CA protein in S and the target CA protein T. Therefore, Bp can be chosen as the CA protein with the highest such value ? Obtain the base CA protein"s structure, Bs, from the PDB ? Using Bp, create an input sequences Ib (corresponding to the base CA protein) by replacing each amino acid in the primary structure with its hydrophobia city value. The output sequences Ob is created by replacing the structural elements in Bs with the values, 200, 600, 800 for helix C, strand and coil respectively ? Solve the system identification problem, by performing CA de convolution with the output sequences Ob and the input sequence Ib to obtain the CA response, or the sought after running the algorithm. ? # Experimental Results In the experiments conducted, the base proteins are assigned the values 300,700,900 for helix C, strand and coil respectively. We have found an structure numbering scheme that is build on Boolean characters of CA which predicts the coils, stands and helices separately. The MACA based prediction procedure as described in the previous section is then executed, and each occurrence of each sequences in the resulting output, is predicted. The query sequence analyzer was designed and identification of the green terminals of the protein is simulated in the figure 4. The analysis of the sequence and the place of joining of the proteins are also pointed out in the figure 5. Experimental results Figure 7, 8 which include the similarity and accuracy graph with each of the components are separately plotted. # Conclusion Existing structure-prediction methods can predict the structure of protein with 75% accuracy. To provide a more thorough analysis of the viability of our proposed technique more experiments will be conducted .Our results indicate that such a level of accuracy is attainable, and can be potentially surpassed with our method. AIS-AIS-PSMACA provides the best overall accuracy that ranges between 80% and 89.8% depending on the dataset. # Global Journal of Computer Science and Technology Volume XIII Issue IV Version I ![Definition: CA is defined a four tipple Where G -> Grid (Set of cells) Z -> Set of possible cell states N -> Set which describe cells neighborhoods F -> Transition Function (Rules of automata) The concept of the homogeneous structure of CA was initiated in early 1950s by J. Von Neumann. It was conceived as a general framework for modeling complex structures, capable of self-reproduction and self-repair. Subsequent developments have taken place in several phases and in different directions. a) Artificial Immune Systems](image-2.png "") ![An n-bit MACA with k-attractor basins can be viewed as a natural classifier. It classifies a given set of patterns into k number of distinct classes, each class containing the set of states in the attractor basin. To enhance the classification accuracy of the machine, most of the works have employed MACA to classify patterns into two classes (say I and II). The following example illustrates an MACA based two class pattern classifier.](image-3.png "") 110![Figure 1 : Example of MACA with basin 0000 Global Journal of Computer Science and Technology](image-4.png "Figure 1 : 10 G") 1![Randomly partition n into m number of integers such that n1 + n2 + ? ? ? + nm = n.](image-5.png "1 .") 2![For each ni, randomly generate a valid Dependency Vector (DV). 3. Synthesize Dependency String (DS) through concatenation of m number of DVs for Classifier #1. 4. Randomly synthesize an m-bit Dependency Vector (DV) for Classifier #2. 5. Synthesize a chromosome through concatenation of Classifier #1 and Classifier #2.](image-6.png "2 .") 12![Transform the amino acid sequence of Tp into a discrete time sequences It, and convolve with F; thereby producing the predicted structure (Ot = It*F) of the target CA protein ? The result of this calculation Ot is a vector of numerical values. For values between 0 and 200, a helix C is predicted, and between 600 and 800, a strand is predicted by CA. All other values will be predicted as a coil by MACA. This produces mapping for the required target structure Ts of the target CA protein T. Global Journal of Computer Science and Technology Volume XIII Issue IV Version I AIS-PSMACA: Towards Proposing an Artificial Immune System for Strengthening PSMACA: An Automated Protein Structure Prediction using Multiple Attractor Cellular Automata](image-7.png "12 G") 14Target: 1PFCPrediction AccuracyTarget: 1PP2Prediction AccuracyTarget: 1QL8Prediction AccuracyYearExp 1 65%Exp 585%Exp 985%Exp 2 65%Exp 690%Exp 1090%15Exp 3 69% Exp 4 71% Prediction Method Prediction Accuracy for 1PFC Prediction Accuracy for Exp 7 83% Exp 11 Exp 8 87% Exp 12 1PP2 DSP 92% 70% PHD 70% 68% 68% 77% SAM-T9982% 91% Prediction Accuracy 1QL8 96% 84% 87%forVolume XIII Issue IV Version I ( D D D D D D D D )SS Pro AIS-PSMACA AIS-AIS-PSMACA70% 90% 92%73% 85% 83%81% 97% 96%Global Journal of Computer Science and Technology G© 2013 Global Journals Inc. (US) * Digital Signal Processing in Protein Secondary Structure Prediction DebasisMitra MichaelSmith Innovations in Applied Artificial Intelligence Lecture Notes in Computer Science 3029 2004 * PSMACA: An Automated Protein Structure Prediction using MACA (Multiple Attractor Cellular Automata) PKiran Sree & Dr Inampudi RameshBabu Journal of Bioinformatics and Intelligent Control 2 3 American Scientific Publications JBIC) in * JadwigaBienkowsk RickLathrop THREADING ALGORITHMS * Homology Modeling With Internal Coordinate Mechanics: Deformation Zone Mapping and Improvements of Models via Conformational Search Abagyan PROTEINS: Structure, Function and Genetics 1 1997. 1997 * Effect of secondary structure prediction on protein fold recognition and database search NAlexandrov VSolovyev Genome Informatics 7 1996. 1996 Alexandrov and Solovyev * Principles that govern the folding of protein chains CBAnfinsen Science 181 1973 * Bidirectional Dynamics for Protein Secondary Structure Prediction [Baldi Sequence Learning: Paradigms, Algorithms and Applications Springer 2000. 2000 * The SWISS-PROT protein knowledgebase and its supplement TrEMBL in Boeckmann Nucleic Acids Res 31 2003. 2003. 2003 * Rosetta in CASP4: Progress in Ab Initio Protein Structure Prediction [Bonneau PROTEINS: Structure, Function and Genetics 5 2001. 2001 * PhilipEBourne HelgeWeissig Structural Bioinformatics John Wiley & Sons 2003. 2003 * Introduction to Protein Structure CBrandon JTooze 1999. 1999 Garland Publishing Brandon and Tooze * New Methods for Accurate Prediction of Protein Secondary Structure JChandonia MKarplus PROTEINS: Structure, Function and Genetics 35 1999. 1999 * Prediction of the secondary structure of proteins from their amino acid sequence PChou GFasman Advanced Enzymology 47 1978. 1978 * Comparative Modeling of CASP3 Targets Using PSI-BLAST and SCWRL RDunbrack PROTEINS: Structure, Function and Genetics 3 1999. 1999 * Prediction of Hydrophobic Cores of Proteins Using Wavelet Analysis HHirakawa SKuhara Genome Informatics 8 1997. 1997 * On Hydrophobicity Correlations in Protein Chains AIrback ESandelin Biophysical Journal 79 2000. 2000 Irback and Sandelin * Evidence for nonrandom hydrophobicity structures in protein chains Irback Dictionary of Protein Secondary Structure: Pattern Recognition of Hydrogen-Bonded and Geometrical Features 1996. 1996. September. 1983 93 Biopolymers * Hidden Markov Models for Detecting Remote Protein Homologies Karplus 1998. 1998 * Bioinformatics 14 10 * Computational Studies of Protein Folding JSkolnick AKolinski Computing in Science and Engineering 2001. 2001 Skolnick and Kolinski * Protein Threading by Recursive Dynamic Programming Thiele Journal of Molecular Biology 290 1999 * Is It Possible To Analyze DNA and Protein Sequences by the Methods of Digital Signal Processing Veljkovic IEEE Transactions on Biomedical Engineering 32 5 1985. 1985. 1985 * Identification of Protein Coding Regions in Genomic DNA Using Unsupervised FMACA Based Pattern Classifier PSree IRameshBabu International Journal of Computer Science & Network Security 1738-7906 Number 1 2008 * Identification of Protein Coding Regions in Genomic DNA EricESnyder GaryDStormo ICCS Transactions 2002 * Identification of coding regions in genomic DNA sequences: an application of dynamic programming and neural networks E ESnyder G DStormo Nucleic Acids Res 21 3 1993 February 11 * PFlocchini FGeurts AMingarelli Santoro Convergence and Aperiodicity in Fuzzy Cellular Automata: Revisiting Rule 90 2000 * FMACA: A Fuzzy Cellular Automata Based Pattern Classifier PMaji PPChaudhuri Proceedings of 9th International Conference on Database Systems 9th International Conference on Database SystemsKorea 2004. 2004 * Improving Quality of Clustering using Cellular Automata for Information retrieval PSree GV SRaju IRameshBabu SViswanadha Raju International Journal of Computer Science 1549-3636 4 2 2008 Science Publications-USA * Face Detection from still and Video Images using Unsupervised Cellular Automata with K means clustering algorithm PSree IRameshBabu Vision and Image Processing 2008 8 Issue II * An introduction to computing with neural nets RLippmann IEEE ASSP Mag 4 22 2004 * FMACA: A Fuzzy Cellular Automata Based Pattern Classifier PMaji PPChaudhuri Proceedings of 9th International Conference on Database Systems 9th International Conference on Database Systems 2004. 2004 * Fuzzy Cellular Automata for Modeling Pattern Classifier PMaji PPChaudhuri IEICE 2004 * Global Journal of Computer Science and Technology Volume XIII Issue IV Version I 2