# Introduction NA contains lots of information. For DNA sequence to transcript and form RNA which copies the required information, we need a promoter. So promoter plays a vital role in DNA transcription. It is defined as "the sequence in the region of the upstream of the transcriptional start site (TSS)''?. Identifying a new promoter in a DNA sequence will lead to find a new protein. If we identify the promoter region we can extract information regarding gene expression patterns, cell specificity and development. Promoters will regulate a gene expression. Some of the genetic diseases which are associated with variations in promoters are asthma, beta thalassemia and rubinsteintaybi syndrome. Promoter sequence can be used to control the speed of translation from DNA into protein. It is also used in genetically modified foods. # II. # Literature Review Steven Salzberg [7] has used a decision tree algorithm for locating protein coding region. This algorithm is adoptable and can handle DNA sequences of length 54,108 and 162. P.Maji [8] et al. has developed neural network tree classifier for prediction of splice junction and coding regions in genomic DNA. A decision tree named as NNTree (Neural Network Tree) is constructed by dividing the training set with their corresponding labels to recursively generates a tree. Ying Xu [9] et al. has developed an improved system GRAIL II which is a hybrid AI system which can predict the number of exons in a human DNA sequence and also supports gene modeling. This process combines edge signal like accepter, donor, translation start site detection and coding feature analysis. Eric E Snyder [10] et al. has applied dynamic programming and neural networks for predicting protein coding regions from a genomic DNA. They have developed a program Gene Parser which first scores the DNA sequences based on exon-intron specific measures like local compositional complexity, codon usage, length distribution, 6-tuple frequency and periodic asymmetry. Edward C Uberbacher [11] et al. has proposed a method which combines some set of sensor algorithms and neural network to predict the protein coding regions in eukaryotes. The programs developed will calculate the values of seven sensors that were considered by the authors. The measures are frame bias matrix, Fickett(three periodicity) , dinucleotide fractal dimension, coding six tuple word preferences, coding six tuple in frame preferences, word commonality and repetitive six tuple word preferences. J. Pinho [12] et al. has proposed a three state model for protein coding region prediction. Authors have considered three base periodicity property. M.Q. Zhang [13] has used quadratic discriminant analysis method named as MZEF for identifying protein coding regions in genomic human DNA. David J. States [14] at el. proposed a computer program named BLASTC which In vertebrates only five percentage of the gene is made up of exons. Genes mostly will have seven to eight exons with 145 bp length at an average. Introns have 3365 bp length at an average. Promoter comprises a small percentage of entire genome. The features of promoters are different from other functional regions like exons, introns and 3'UTRs. These facts make protein coding and promoter region predictions as very difficult tasks. uses sequence similarity and codon utilization for predicting the protein coding regions. Method [8] takes more time to construct a tree for sequences of length 162. The height of the trees is also a major concern for using this algorithm with DNA sequences of more length. Method [9] suffers with less accuracy due to more error rate at classifier nodes. Methods [10], [11], [12] depends more on the statistical information. After this literature survey the concern of a new classifier is to achieve a good classifier accuracy and develop a classifier which can handle DNA sequences of length more than 162 with a fewer nodes. Jia Zeng [15] et al. has proposed a hierarchical promoter prediction system named as SCS where they have used signal, structure and context features .Xiomeng Li [16] et al. has proposed a method PCA-HPR (Principal Component Analysis-Human Promoter Recognition) to predict the promoters and transcription sites (TSS). Sridgar Hannenhalli [17] et al. tried to enhance the accuracy of promoter prediction by combining CpG island feature with information of independent signals which are biologically motivated and these cover most of the knowledge to predict the promoter in human genome. Shuanhu Wu et al. have proposed a method [18] for enhancing the performance of human promoter region identification by selecting most important features of DNA sequence for each different functional region.Uwe Ohler et al. have proposed a model [19] which integrates physical properties of DNA into a probabilistic eukaryotic promoter prediction system.Goni J Ramon et al. has proposed a system ProStar [20] which uses structural parameters for promoter region identification. Authors only used descriptors derived from physical first principles. Vladimir B. Bajic [21] et al. has developed new software for identifying promoters in a DNA sequence of vertebrates. This program takes input as DNA sequence and generates a list of predicted TSS (Transcription Stating Site).Michael Q.Zhang [22] has proposed a new program for predicting a core promoter in human gene named as CorePromoter. After the literature survey on promoter prediction, the main goal of proposed classifier is to reduce the false prediction rates and improve specificity and sensitivity values. Human promoter data sets are collected from DBTSS database consist of 30,966 of length 251. We have used 7,741 for constructing an IN-AIS-MACA tree and 7,741 for checking the accuracy of the tree. Rest of the 15,483 promoter sequences are used for testing the proposed classifier. # III. # Design of In-Ais-Maca Human non-promoter data sets are collected from EID and UTRdb databases. We have extracted 75,438 exons from EID database, where 18,859 are used for constructing an IN-AIS-MACA tree and 18,860 data components are used for checking the accuracy of the constructed tree. Rest of 37,719 data components are used for testing the classifier. We have extracted 53,684 introns from EID database, where 13,421 are used for constructing the tree and 13,421 are used for checking the accuracy of the constructed tree, rest of the 26,842 are used for testing the classifier. We have extracted 80,538 3'UTRs from UTR dB. In that 21 No information regarding the reading frame is used in our study. We are going to predict both regions where nothing is known. Each window should belong to a single class (promoter/non-promoter, coding /non-V. Step 6: Store the basins (Be, Bi, Bu, Bp, Bpr). # Learning of In-Ais-Maca Step 7: Repeat the steps 1 to 6 till the completion of input or individual attractor basins count is 6. Step 8: Stop Where Be represents the exon basins, Bi represents the intron basins, Bu represents the 3'UTR basins, Bp represents the promoter basins and Bpr represents the protein coding region basins. # VI. # Testing of In-Ais-Maca The accuracy of protein coding region prediction with IN-AIS-MACA depends on the accuracy of exon prediction. As the promoter prediction module has reported 96.5% accuracy, the protein coding region prediction accuracy gets improved. The main aim of this algorithm is to process the DNA sequence based on the features and distribute it into any one of the basin. # Algorithm: Input: DNA Sequence Output: Class of the sequence Step 1: Read the DNA sequence in the multiples of three. Step 2: Encode the sequence in the multiples of three Step 3: Extract the features Step 4: Check whether the input belongs to EXON class, if not, go to step 6. If it is found as EXON report the corresponding class and boundary. Step 5: (a) Read the encoded DNA sequence starting with the upper bound to the end of the string. # Data Sets and Methods # Global Journal of Computer Science and Technology Volume XIV Issue II Version I Step 6: Check whether the sequence belongs to intron, 3'UTR or promoter. 6a) Choose the best fitness rule to direct the sequence to the attractor basins of Bi,Bu,Bp 6c) Report the boundaries and respective class. Step 7: Stop. # VII. # Output & Experimental Results of In-Ais-Maca The output1 shown below is a DNA sequence of length 252bp. The output of promoter prediction has indicated initial exon at 30 to 64. So the protein coding interface starts its processing from 64 to 251.The next internal and terminal exons are reported in both the strands. Output # Comparison of the Performance of In-Ais-Maca IN-AIS-MACA uses the strength of existing AIS-PRMACA design to predict both PR & PCR regions. The accuracy, Se, Sp and execution time of PR prediction with IN-AIS-MACA is same as of AIS-PRMACA reported in chapter 6. So we report the accuracy, Se and Sp of predicting PCR using this IN-AIS-MACA. The important challenge of IN-AIS-MACA is to reduce the total prediction time (TPT) of both PCR and PR which will be discussed in this section. The performance of IN-AIS-MACA is measured with Se,Sp and accuracy as shown in table 1. We have extended the DT and NNtree to accommodate 252 length DNA sequences and compared the results with them. IN-AIS-MACA reports a high sensitivity, specificity, accuracy of 0.934, 0.925 and 0.93 respectively. This improved performance, when compared with AIS-MACA prediction for 252bp length DNA sequence is due to the classifier accuracy of AI-PRMACA. # Table 1 : IN-AIS-MACA Performance in PCR prediction If the accuracy of AIS-PRMACA to predict the first exon is more, then the accuracy of predicting the PCR with IN-AIS-MACA is more. The accuracy of AIS-PRMACA prediction of exon is 94.5%, so there is a considerable improvement of PCR prediction with IN-AIS-MACA particularly in the 252bp length DNA sequences. IN-AIS-MACA maintains good balance between Se and Sp, Se+Sp ie 1.859. The performance of a decision tree in processing lengths of 252bp is poor due to the height of the tree build for predicting the PCR is more. Decision tree reports an accuracy of 86.5%. NNtree performs better compared with DT reports 87.3% accuracy. Performance of both classifiers suffers when processing a DNA sequence of length more than 162. # IX. Execution Time Comparisons with In-Ais-Maca The aim of IN-AIS-MACA is to predict both PCR and PR in human DNA sequence of length 252bp. Since this is the first algorithm to handle predictions of both regions, we have chosen better algorithms in combination , to report the corresponding execution times of individual predictions and total predictions. In the first combination we have used classifiers AIS-MACA # IN-AIS-MACA Se,Sp,Accuracy Comparison # Table 2 : IN-AIS-MACA total prediction time comparison In the fourth combination we have used classifiers decision tree and AIS-PRMACA which reports the total prediction time of 1930 ms. In the fifth combination we have used classifiers NNtree and AIS-PRMACA which reports the total prediction time of 1897 ms. In the sixth combination we have used classifiers dicodon usage and AIS-PRMACA which reports the total prediction time of 1897 ms. The proposed classifier IN-AIS-MACA reports a total prediction time of 1031ms which is best among all the reported classifiers in table 2 and figure 3. Identifying both PCR and PR with a minimum execution time leads to a faster gene prediction. For achieving higher accuracies with IN-AIS-MACA to predict protein coding regions and promoter regions, we have to analyze three important parameters. The first parameter is the number of generations. We have to extract higher accuracies with lesser generations. Figure 4 shows that the minimum number of generations that required to achieve a higher accuracy for PCR prediction is 75. # Conclusion We have successfully developed an integrated classifier which can predict both protein coding and promoter regions in human DNA of length 252bp. IN -AIS-MACA reports a Sensitivity (Se) of 0.934 ,Specificity(Sp) of 0.925 and accuracy of 93% which makes this as the best algorithm for predicting both PCR and PR. The important contribution of this classifier lies in predicting both these regions with an execution time of 1031ms, which will faster the gene perdition rate. ![IN-AIS-MACA design IN-AIS-MACA partial design is shown in Fig: 1. IN-AIS-MACA takes a DNA sequence as input and extracts the features. Initially IN-AIS-MACA checks whether the given sequence belongs to an exon or not.If it belongs to an exon, the exact boundaries with nonpromoter class will be displayed. These boundaries will be used to trace the protein coding region starting from that boundary. Since the first exon boundary is already predicted say (P, Q), this algorithm reads the encoded DNA sequence starting with Q to the end of the string say R. The IN-AIS-MACA tree is built only for a length R-Q for PCR prediction. If the input does not belong to exons then it is checked whether it is an intron or 3'UTR or a promoter. The corresponding class and boundary is displayed.Global Journal of Computer Science and TechnologyVolume XIV Issue II Version I Journals Inc. (US)](image-2.png "") 1![Figure 1 :](image-3.png "Figure 1 :") VIII.Year 20141: DNA Sequence GAATTCTTGTTGAGAAGGAATTGGGCTCAATGAAGTTCGGGGATATTCCAAGTGAATTATTCCAGTGAGTGTTATTCAG CAATGGACGTGACTGTCGTTTGCCAGATCAGCAGAAGCCGAAAGGAATCCTTTCGGCTTCTGCTGATCTGGCAAAC4GACAGTCACGTCCATTGCTGAATAACACTCACTGGAATAATTCACTTGGAATATCCCCGAACTTCATTGAGCCCAATT CCTTCTCAACAAGAATTCVolume XIV Issue II Version I ( D D D D ) G# Sequence Kiran_63jntuh Length = 252 bp Sequence Kiran_63jntuh, Start End Score 30 64 0.61 Sequence Name Program ATGAAGTTCGGGGATATTCCAAGTGAATTATTCC Human Promoter Prediction Non Promoter Sequence/Exon Type of Exon Boundary Strand Kiran_63jntuh IN-AIS-MACA First 82 189 + Kiran_63jntuh IN-AIS-MACA First 82 207 + Kiran_63jntuh IN-AIS-MACA First 82 222 +Global Journal of Computer Science and TechnologyKiran_63jntuh Kiran_63jntuh Kiran_63jntuh Kiran_63jntuh Kiran_63jntuh Kiran_63jntuh Kiran_63jntuh Kiran_63jntuh Kiran_63jntuh Kiran_63jntuh Kiran_63jntuh Kiran_63jntuh Kiran_63jntuh Kiran_63jntuh Kiran_63jntuh Kiran_63jntuhIN-AIS-MACA IN-AIS-MACA IN-AIS-MACA IN-AIS-MACA IN-AIS-MACA IN-AIS-MACA IN-AIS-MACA Internal 53 First 198 First 198 First 198 First 198 First 198 First 198 IN-AIS-MACA Internal 66 IN-AIS-MACA Internal 80 IN-AIS-MACA Internal 80 IN-AIS-MACA Internal 80 IN-AIS-MACA Internal 106 IN-AIS-MACA Internal 106 IN-AIS-MACA Terminal 106 IN-AIS-MACA Terminal 106 IN-AIS-MACA Terminal 111207 214 222 226 232 207 222 87 199 207 222 132 207 136 197 136+ + + + + + + + + + + + + + + +Kiran_63jntuhIN-AIS-MACA Terminal 111197+Kiran_63jntuhIN-AIS-MACA Terminal 167193+Kiran_63jntuhIN-AIS-MACA Terminal 167197+Kiran_63jntuhIN-AIS-MACA Internal 151249-Kiran_63jntuhIN-AIS-MACA Internal 151249-Kiran_63jntuhIN-AIS-MACA Internal 130249-Kiran_63jntuhIN-AIS-MACA Terminal 194249- Kiran_63jntuhIN-AIS-MACA Terminal 76249-Kiran_63jntuhIN-AIS-MACA Terminal 72249-MethodSeSpSe+SpAccuracyIN-AIS-MACA0.9340.9251.8590.93Decision Tree0.8510.8791.730.865Neural Network0.8760.871.7460.873Tree0.960.94Se, Sp , Accuracy0.84 0.86 0.88 0.9 0.920.820.8IN-AIS-MACADecision TreeNeural NetworkStandared MethodsTree X. Parameters Manipulation for HigherAccuracies of In-Ais-MacaExecutionExecutionTotalMethodtime to predict PCRtime to predict PRPrediction Time(TPT)(ms)(ms)(ms)Year 2014IN-AIS-MACA AIS-MACA & AIS+PRMACA1031 7961031 10311031 18276AIS-MACA & SCS79611211917Volume XIV Issue II Version IAIS-MACA & McPromoter DT & AIS-PRMACA NNTree & AIS-PRMACA Dicodon Usage & AIS-PRMACA796 899 866 9561025 1031 1031 10311821 1930 1897 1987D D D D ) G(Total Global Journal of Computer Science and Technology0 500 1000 1500 2000 2500 © 2014 Global Journals Inc. (US) © 2014 Global Journals Inc. (US) pp. 241 * The Babel of bioinformatics TeresaKAttwood Science 290 5491 2000 * Assessment of protein coding measures JamesWFickett Chang-ShungTung Nucleic acids research 20 24 1992 * DBTSS: database of human transcription start sites, progress report 2006 RiuYamashita YutakaSuzuki HiroyukiWakaguri KatsukiTsuritani KentaNakai SumioSugano Nucleic acids research 34 1 2006 suppl * EID: the Exon-Intron Database-an exhaustive database of protein-coding introncontaining genes SergeSaxonov IrajDaizadeh AlexeiFedorov WalterGilbert Nucleic acids research 28 1 2000 * UTRdb and UTRsite: specialized databases of sequences and functional elements of 5? and 3? untranslated regions of eukaryotic mRNAs GrazianoPesole SabinoLiuni GiorgioGrillo FlavioLicciulli FlavioMignone CarmelaGissi CeciliaSaccone Nucleic acids research 30 1 2002. 2002 * AIX-MACA-Y Multiple Attractor Cellular Automata Based Clonal Classifier for Promoter and Protein Coding Region Prediction PokkuluriSree Inampudi RameshKiran Babu Journal of Bioinformatics and Intelligent Control 3 1 2014 * Locating protein coding regions in human DNA using a decision tree algorithm StevenSalzberg Journal of Computational Biology 2 3 1995 * Neural Network Tree for Identification of Splice Junction and Protein Coding Region in DNA PradiptaMaji SushmitaPaul Scalable Pattern Recognition Algorithms Springer International Publishing 2014 * Recognizing exons in genomic sequence using GRAIL II YingXu RMural MShah EUberbacher Genetic engineering 16 253 1993 * Identification of protein coding regions in genomic DNA EricESnyder GaryDStormo Journal of molecular biology 248 1 1995 * Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach EdwardCUberbacher RichardJMural Proceedings of the National Academy of Sciences 88 24 1991 * A three-state model for DNA protein-coding regions ArmandoJPinho JRAntónio VeraNeves Afreixo ACCarlos Paulo Jorge SgBastos Ferreira Biomedical Engineering 53 11 2006 IEEE Transactions on * Identification of protein coding regions in the human genome by quadratic discriminant analysis MZhang Proceedings of the National Academy of Sciences 94 2 373 1997. 2008 Bioinformation * Promoter prediction in the human genome SridharHannenhalli SamuelLevy Bioinformatics 17 1 2001 * Eukaryotic promoter prediction based on relative entropy and positional information ShuanhuWu XudongXie Alan Wee-ChungLiew HongYan Physical Review E 75 4 41908 2007 * Joint modeling of DNA sequence and physical properties to improve eukaryotic promoter recognition UweOhler HeinrichNiemann Guo-ChunLiao GeraldMRubin Bioinformatics 17 1 2001 * Determining promoter location based on DNA structure first-principles calculations JGoñi AlbertoRamon DavidPérez ModestoTorrents Orozco Genome Biol 8 12 R263 2007 * Dragon Promoter Finder: recognition of vertebrate RNA polymerase II promoters VladimirBBajic HongSeng AllenSeah GuanglanChong Zhang LYJudice VladimirKoh Brusic Bioinformatics 18 1 2002 * Identification of human gene core promoters in silico MichaelQZhang Genome research 8 3 1998 * SCS: Signal, context, and structure features for genome-wide human promoter recognition JiaZeng Xiao-YuZhao Xiao-QinCao HongYan Computational Biology and Bioinformatics 7 3 2010 IEEE/ACM Transactions on * PCA-HPR: A principle component analysis model for human XiaomengLi JiaZeng HongYan * Identification of protein coding regions by database similarity search WarrenGish DavidJStates Nature genetics 3 3 1993