# Introduction

roteins are molecules with macro structure that are responsible for a wide range of vital biochemical functions, which includes acting as oxygen, cell signaling, antibody production, nutrient transport and building up muscle fibers. Specifically, the proteins are chains of amino acids, of which there are 20 different types, coupled by peptide bonds [2]. The three-tiered structural hierarchy possessed by proteins is typically referred to as primary and tertiary structure. This is because the higher-level and secondary level [1], [2] structures determine the function of the proteins and consequently, the insight into its function can be inferred from that.

As genome sequencing projects are increasing tremendously. The SWISS-PORT databases [3], [4] of primary protein structures are expanding tremendously. Protein Data Banks are not growing at a faster rate due to innate difficulties in finding the levels of the structures. Structure determination [5], [6] procedure experimental setups will be very expensive, time consuming, require more labor and may not applicable to all the proteins. Keeping in view of shortcomings of laboratory procedures in predicting the structure of protein major research have been dedicated to protein prediction of high level structures using computational techniques. Anfinsen did a pioneering work predicting the protein structure from amino acid sequences [6], [7]. This is usually called as protein folding problem which is the greatest challenge in bioinformatics. This is the ability to predict the higher level structures from the amino acid sequence.

By predicting the structure of protein the topology of the chain can be described. The tree dimensional arrangement of amino acid sequences can be described by tertiary structure. They can be predicted independent of each other. Functionality of the protein can be affected by the tertiary structure, topology and the tertiary structure. Structure aids in the identification of membrane proteins, location of binding sites and identification of homologous proteins [9], [10], [11] to list a few of the benefits, and thus highlighting the importance, of knowing this level of structure This is the reason why considerable efforts have been devoted in predicting the structure only. Knowing the structure of a protein is extremely important and can also greatly enhance the accuracy of tertiary structure prediction. Furthermore, proteins can be classified according to their structural elements, specifically their alpha helix and beta sheet content. 


# Related Works in Structure Prediction

The Objective of structure prediction is to identify whether the amino acid residue of protein is in helix, strand or any other shape. In 1960 as a initiative step of structure prediction the probability of respective structure element is calculated for each amino acid by taking single amino acid properties consideration [1], [3], [6]. This method of structure prediction is said to be first generation technique. Later this work extended by considering the local environment of amino acid said to be second generation technique. In case of particular amino acid structure prediction adjacent residues information also needed, it considers the local environment of amino acid it gives 65% structure information. So that extension work gives 60% accuracy. The third generation technique includes machine learning, knowledge about proteins, several algorithms which gives 70% accuracy. Neural networks [10], [11] are also useful in implementing structure prediction programs like PHD, SAM-T99.

The evolution process is directed by the popular Genetic Algorithm (GA) with the underlying philosophy of survival of the fittest gene. This GA framework can be adopted to arrive at the desired CA rule structure appropriate to model a physical system. The goals of GA formulation are to enhance the understanding of the ways CA performs computations and to learn how CA may be evolved to perform a specific computational task and to understand how evolution creates complex global behavior in a locally interconnected system of simple cells. Artificial immune systems are motivated by the theory of immunology. The biological immune system functions to protect the body against pathogens or antigens that could potentially cause harm. It works by producing antibodies that identify, bind to, and finally eliminate the pathogens. Even though the number of antigens is far larger than the number of antibodies, the biological immune system has evolved to allow it to deal with the antigens. The immune system will learn the criteria of the antigens so that in future it can react both to those antigens it has encountered before as well as to entirely new ones. In 2002, de Castro and Timmis [17], suggested that "for a system to be characterized as an artificial immune system, it has to embody at least a basic model of an immune component (e.g. cell, molecule, organ), it has to have been designed using the ideas from theoretical and/or experimental immunology.

IV.  Step 1: Generate a AIS-PSMACA with k number of attractor basins.


# Design of MACA based Pattern Classifier with Artificial Immune System

Step 2: Distribute S into k attractor basins (nodes).

Step 3: Evaluate the distribution of examples in each attractor basin

Step 4: If all the examples (S") of an attractor basin (node) belong to only one class, then label the attractor basin (leaf node) for that class.

Step 5: If examples (S") of an attractor basin belong to K" number of classes, then Partition (S", K").

Step 6: Stop.

A special class of non-linear CA, termed as Multiple Attractor CA (SPECIAL MACA), has been proposed to develop the model. Theoretical analysis, reported in this chapter, provides an estimate of the noise accommodating capability of the proposed SPECIAL MACA based associative memory model. Characterization of the basins of attraction of the proposed model establishes the sparse network of nonlinear CA (SPECIAL MACA) as a powerful pattern recognizer for memorizing unbiased patterns. It provides an efficient and cost-effective alternative to the dense network of neural net for pattern recognition. Detailed analysis of the SPECIAL MACA rule space establishes the fact that the rule subspace of the pattern recognizing/classifying CA lies at the edge of chaos. Such a CA, as projected in [20], is capable of executing complex computation. The analysis and experimental results reported in the current and next chapters confirm this viewpoint. A SPECIAL MACA employing the CA rules at the edge of chaos is capable of performing complex computation associated with pattern recognition.


# c) Algorithm Single Point Crossover

Input : Two randomly selected rule vectors (Parent 1 and 2). Output : Resultant rule vectors (Offspring 1 and 2).

Step 1: Randomly generate a number "q" in between 1 and n.

Step 2: Take the first q rules (symbols) from first rule vector (Parent 1) and the (n-q) rules of Parent 2. Form a new rule vector (Offspring 1) concatenating these rules.

Step 3: Form Offspring 2 by concatenating the first q rules of Parent 2 and the last (n-q) rules of Parent 1.

Step 4: Stop.


# d) Random Generation of Initial Population

To form the initial population, it must be ensured that each solution randomly generated is a combination of an n-bit DS with 2m number of attractor basins (Classifier #1) and an m-bit DV (Classifier #2). The chromosomes are randomly synthesized according to the following steps.  V.


# Experimental Step

? Select the target CA protein (amino acid sequence) T, whose structure is to be predicted.

? Perform a AIS-PSMACA search, using the primary amino acid sequence Tp of the target CA protein T.

The objective is being to locate a set of CA proteins, S = {S1, S2?} of similar sequence

? Select from S the primary structure Bp of a base CA protein, with a significant match to the target CA protein. A AIS-PSMACA [16],[18] search produces a measure of similarity between each CA protein in S and the target CA protein T. Therefore, Bp can be chosen as the CA protein with the highest such value

? Obtain the base CA protein"s structure, Bs, from the PDB

? Using Bp, create an input sequences Ib (corresponding to the base CA protein) by replacing each amino acid in the primary structure with its hydrophobia city value. The output sequences Ob is created by replacing the structural elements in Bs with the values, 200, 600, 800 for helix C, strand and coil respectively

? Solve the system identification problem, by performing CA de convolution with the output sequences Ob and the input sequence Ib to obtain the CA response, or the sought after running the algorithm.

?   


# Experimental Results

In the experiments conducted, the base proteins are assigned the values 300,700,900 for helix C, strand and coil respectively. We have found an structure numbering scheme that is build on Boolean characters of CA which predicts the coils, stands and helices separately. The MACA based prediction procedure as described in the previous section is then executed, and each occurrence of each sequences in the resulting output, is predicted. The query sequence analyzer was designed and identification of the green terminals of the protein is simulated in the figure 4. The analysis of the sequence and the place of joining of the proteins are also pointed out in the figure 5. Experimental results Figure 7, 8 which include the similarity and accuracy graph with each of the components are separately plotted. 


# Conclusion

Existing structure-prediction methods can predict the structure of protein with 75% accuracy. To provide a more thorough analysis of the viability of our proposed technique more experiments will be conducted .Our results indicate that such a level of accuracy is attainable, and can be potentially surpassed with our method. AIS-AIS-PSMACA provides the best overall accuracy that ranges between 80% and 89.8% depending on the dataset.


# Global Journal of Computer Science and Technology

Volume XIII Issue IV Version I  
![Definition: CA is defined a four tipple <G, Z, N, F> Where G -> Grid (Set of cells) Z -> Set of possible cell states N -> Set which describe cells neighborhoods F -> Transition Function (Rules of automata) The concept of the homogeneous structure of CA was initiated in early 1950s by J. Von Neumann. It was conceived as a general framework for modeling complex structures, capable of self-reproduction and self-repair. Subsequent developments have taken place in several phases and in different directions. a) Artificial Immune Systems](image-2.png "")
![An n-bit MACA with k-attractor basins can be viewed as a natural classifier. It classifies a given set of patterns into k number of distinct classes, each class containing the set of states in the attractor basin. To enhance the classification accuracy of the machine, most of the works have employed MACA to classify patterns into two classes (say I and II). The following example illustrates an MACA based two class pattern classifier.](image-3.png "")
110![Figure 1 : Example of MACA with basin 0000 Global Journal of Computer Science and Technology](image-4.png "Figure 1 : 10 G")
1![Randomly partition n into m number of integers such that n1 + n2 + ? ? ? + nm = n.](image-5.png "1 .")
2![For each ni, randomly generate a valid Dependency Vector (DV). 3. Synthesize Dependency String (DS) through concatenation of m number of DVs for Classifier #1. 4. Randomly synthesize an m-bit Dependency Vector (DV) for Classifier #2. 5. Synthesize a chromosome through concatenation of Classifier #1 and Classifier #2.](image-6.png "2 .")
12![Transform the amino acid sequence of Tp into a discrete time sequences It, and convolve with F; thereby producing the predicted structure (Ot = It*F) of the target CA protein ? The result of this calculation Ot is a vector of numerical values. For values between 0 and 200, a helix C is predicted, and between 600 and 800, a strand is predicted by CA. All other values will be predicted as a coil by MACA. This produces mapping for the required target structure Ts of the target CA protein T. Global Journal of Computer Science and Technology Volume XIII Issue IV Version I AIS-PSMACA: Towards Proposing an Artificial Immune System for Strengthening PSMACA: An Automated Protein Structure Prediction using Multiple Attractor Cellular Automata](image-7.png "12 G")


14Target: 1PFCPrediction AccuracyTarget: 1PP2Prediction AccuracyTarget: 1QL8Prediction AccuracyYearExp 1 65%Exp 585%Exp 985%Exp 2 65%Exp 690%Exp 1090%15Exp 3 69% Exp 4 71% Prediction Method Prediction Accuracy for 1PFC Prediction Accuracy for Exp 7 83% Exp 11 Exp 8 87% Exp 12 1PP2 DSP 92% 70% PHD 70% 68% 68% 77% SAM-T9982% 91% Prediction Accuracy 1QL8 96% 84% 87%forVolume XIII Issue IV Version I ( D D D D D D D D )SS Pro AIS-PSMACA AIS-AIS-PSMACA70% 90% 92%73% 85% 83%81% 97% 96%Global Journal of Computer Science and Technology
			G© 2013 Global Journals Inc. (US)
		
		
* 
	
		Digital Signal Processing in Protein Secondary Structure Prediction
		
			DebasisMitra
		
		
			MichaelSmith
		
	
		Innovations in Applied Artificial Intelligence Lecture Notes in Computer Science
		
			3029
			
			2004
		
	
* 
	
		PSMACA: An Automated Protein Structure Prediction using MACA (Multiple Attractor Cellular Automata)
		
			PKiran Sree & Dr Inampudi
		
		
			RameshBabu
		
	
		Journal of Bioinformatics and Intelligent Control
		
			2
			3
			American Scientific Publications
		
	
	JBIC) in


* 
	
		
			JadwigaBienkowsk
		
		
			RickLathrop
		
	
		THREADING ALGORITHMS
		
	
* 
	
		Homology Modeling With Internal Coordinate Mechanics: Deformation Zone Mapping and Improvements of Models via Conformational Search
		
			Abagyan
		
	
		PROTEINS: Structure, Function and Genetics
		
			1
			
			1997. 1997
		
	
* 
	
		Effect of secondary structure prediction on protein fold recognition and database search
		
			NAlexandrov
		
		
			VSolovyev
		
	
		Genome Informatics
		
			7
			
			1996. 1996
		
	
	Alexandrov and Solovyev


* 
	
		Principles that govern the folding of protein chains
		
			CBAnfinsen
		
	
		Science
		
			181
			
			1973
		
	
* 
	
		Bidirectional Dynamics for Protein Secondary Structure Prediction
		
			[Baldi
		
	
		Sequence Learning: Paradigms, Algorithms and Applications
				
			Springer
			2000. 2000
			
		
* 
	
		The SWISS-PROT protein knowledgebase and its supplement TrEMBL in
		
			Boeckmann
		
	
		Nucleic Acids Res
		
			31
			
			2003. 2003. 2003
		
	
* 
	
		Rosetta in CASP4: Progress in Ab Initio Protein Structure Prediction
		
			[Bonneau
		
	
		PROTEINS: Structure, Function and Genetics
		
			5
			
			2001. 2001
		
	
* 
	
		
			PhilipEBourne
		
		
			HelgeWeissig
		
		Structural Bioinformatics
				
			John Wiley & Sons
			2003. 2003
		
	
* 
	
		Introduction to Protein Structure
		
			CBrandon
		
		
			JTooze
		
		
			1999. 1999
			Garland Publishing
		
	
	Brandon and Tooze


* 
	
		New Methods for Accurate Prediction of Protein Secondary Structure
		
			JChandonia
		
		
			MKarplus
		
	
		PROTEINS: Structure, Function and Genetics
		
			35
			
			1999. 1999
		
	
* 
	
		Prediction of the secondary structure of proteins from their amino acid sequence
		
			PChou
		
		
			GFasman
		
	
		Advanced Enzymology
		
			47
			
			1978. 1978
		
	
* 
	
		Comparative Modeling of CASP3 Targets Using PSI-BLAST and SCWRL
		
			RDunbrack
		
	
		PROTEINS: Structure, Function and Genetics
		
			3
			
			1999. 1999
		
	
* 
	
		Prediction of Hydrophobic Cores of Proteins Using Wavelet Analysis
		
			HHirakawa
		
		
			SKuhara
		
	
		Genome Informatics
		
			8
			
			1997. 1997
		
	
* 
	
		On Hydrophobicity Correlations in Protein Chains
		
			AIrback
		
		
			ESandelin
		
	
		Biophysical Journal
		
			79
			
			2000. 2000
		
	
	Irback and Sandelin


* 
	
		Evidence for nonrandom hydrophobicity structures in protein chains
		
			Irback
		
	
		Dictionary of Protein Secondary Structure: Pattern Recognition of Hydrogen-Bonded and Geometrical Features
				
			1996. 1996. September. 1983
			93
			
		
	Biopolymers


* 
	
		Hidden Markov Models for Detecting Remote Protein Homologies
		
			Karplus
		
		
			1998. 1998
		
	
* 
	
		
		Bioinformatics
		
			14
			10
			
		
* 
	
		Computational Studies of Protein Folding
		
			JSkolnick
		
		
			AKolinski
		
	
		Computing in Science and Engineering
		
			2001. 2001
		
	
	Skolnick and Kolinski


* 
	
		Protein Threading by Recursive Dynamic Programming
		
			Thiele
		
	
		Journal of Molecular Biology
		
			290
			
			1999
		
	
* 
	
		Is It Possible To Analyze DNA and Protein Sequences by the Methods of Digital Signal Processing
		
			Veljkovic
		
	
		IEEE Transactions on Biomedical Engineering
		
			32
			5
			
			1985. 1985. 1985
		
	
* 
	
		Identification of Protein Coding Regions in Genomic DNA Using Unsupervised FMACA Based Pattern Classifier
		
			PSree
		
		
			IRameshBabu
		
	
		International Journal of Computer Science & Network Security
		1738-7906
		
			Number
			1
			2008
		
	
* 
	
		Identification of Protein Coding Regions in Genomic DNA
		
			EricESnyder
		
		
			GaryDStormo
		
	
		ICCS Transactions
		
			2002
		
	
* 
	
		Identification of coding regions in genomic DNA sequences: an application of dynamic programming and neural networks
		
			E ESnyder
		
		
			G DStormo
		
	
		Nucleic Acids Res
		
			21
			3
			
			1993 February 11
		
	
* 
	
		
			PFlocchini
		
		
			FGeurts
		
		
			AMingarelli
		
		
			Santoro
		
	
		Convergence and Aperiodicity in Fuzzy Cellular Automata: Revisiting Rule 90
				
			2000
		
	
* 
	
		FMACA: A Fuzzy Cellular Automata Based Pattern Classifier
		
			PMaji
		
		
			PPChaudhuri
		
	
		Proceedings of 9th International Conference on Database Systems
				9th International Conference on Database SystemsKorea
		
			2004. 2004
			
		
* 
	
		Improving Quality of Clustering using Cellular Automata for Information retrieval
		
			PSree
		
		
			GV SRaju
		
		
			IRameshBabu
		
		
			SViswanadha
		
		
			Raju
		
	
		International Journal of Computer Science
		1549-3636
		
			4
			2
			
			2008
			Science Publications-USA
		
	
* 
	
		Face Detection from still and Video Images using Unsupervised Cellular Automata with K means clustering algorithm
		
			PSree
		
		
			IRameshBabu
		
	
		Vision and Image Processing
				
			2008
			8
			
		
	Issue II


* 
	
		An introduction to computing with neural nets
		
			RLippmann
		
	
		IEEE ASSP Mag
		
			4
			22
			
			2004
		
	
* 
	
		FMACA: A Fuzzy Cellular Automata Based Pattern Classifier
		
			PMaji
		
		
			PPChaudhuri
		
	
		Proceedings of 9th International Conference on Database Systems
				9th International Conference on Database Systems
		
			2004. 2004
			
		
* 
	
		Fuzzy Cellular Automata for Modeling Pattern Classifier
		
			PMaji
		
		
			PPChaudhuri
		
	
		IEICE
		
			2004
		
	
* 
	
		Global Journal of Computer Science and Technology Volume XIII Issue IV Version I 2