# I. Introduction ardiovascular diseases are the critical reason of human deaths happening worldwide. The statistics indicating that this disease causes annually around 17.3 million deaths [1].The inadequate blood supply to the heart causes necrosis of myocardial tissue, which is clinically referred as Myocardial Infarction (MI). The MI was claimed 7.6 million deaths among 58 million deaths worldwide in 2005 [2]. The advancements in clinical practices to diagnose and prevent MI are evinced to be not significant, since the count of human deaths due to MI is high that compared to the deaths caused by any other disease [1] [2]. The current diagnosis of MI is based on clinical symptoms including chest pain and difficulty to breath, ECG pattern variants, and potential drop and raise of blood floating in cardiac muscles (cardiac troponins also referred as cTns) [3]. Though the phenomenal advances in clinical diagnosis strategies found, still the substantial constraints are evinced in current clinical diagnosis strategies. The advances in hs-cTni assays [4] have evinced high detection of cardio vascular disease cases (Increased true positive rate) but significant normal cases have been labeled as cardio vascular prone (decreased true negative rate), which is a potential constraint. Another advanced approach of diagnose the cardio vascular disease diagnostic measure is the cardiac miRNAs as biomarkers [5]. The prediction outcomes of this model are trivial due to limited size and tissue specific expression. Hence it is obvious to have more significant and automated detection strategies, which are using the cardiac miRNAs as primary input [6]. The serum inflammatory markers such as BNP, CRP are also considered as cardiovascular biomarkers but the detection accuracy observed with slight improvement [7][8] [9]. The acts such as clinical pathology and biology are the crucial to define cardiac biomarkers, which are expensive and less accurate. In contrast to this, the gene expression profiling quantifies the gene expressions formed by the large quantity of genes in order to identify biomarkers, which is analogous and concurrent across the multiple pathways. Hence the gene expression profiling is potential and feasible to quantify the biomarkers to diagnose cardio vascular diseases [10]. The biomarkers defined by Gene expression profiling are potential and those are not evinced by the pathology and biology based clinical processes. The rest of manuscript describes the related work in section 2, the Metaheuristic Approach on Gene expression Data (MAGED) that followed by section 4, which elaborates the experimental study of the proposal. Finally the section 5 concludes the contribution of the manuscript. # II. Related Work Gene expression analysis is a potential approach to discover profound biomarkers of cardio vascular diseases. The contemporary literature contains signifycant contributions in defining biomarkers through gene C expression analysis. Randi et al., [11] devised a gene expression analysis that conceded 482 genes associated to the composition of plaques found in arteries. Many of these genes were not considered for atherosclerosis in earlier diagnosis strategies. Archacki et al., [12] proposed a gene expression profiling strategy that resulted 56 different genes for atherosclerosisprone and salubrious human coronary arteries. Among these 56, the 49 genes were not associated to coronary artery disease earlier. The model devised in [13] discovered set of genes those enables classification according to age and sex, which are having strong association with obstreperous CAD in the patients, who are not diagnosed as diabetic. The contributions in [14] and [15] profiled variant gene expressions to differentiate the cardio myopathies with influence of ischemic and nonischemic conditions. Min KD et al., [16] contributed profiling and analysis of gene expressions to notice the divergent genes associated to congestive heart failure. Suresh R et al., [17] studied the salubrious and MI patients that discovered biomarkers and imbalanced pathways those significant evince the reappearance MI in patients effected once with MI. Liew et al., [18] defined sequence tags from gene expressions using microarray analysis that compares mRNA molecules found in cellular components of the blood with mRNA molecules found in9divergent human tissues comprising heart. The correlation observed from this comparison concluded that 84% of mRNA molecules were overlapped with mRNA molecules of heart and 80% were overlapped with mRNA molecules of other tissues. mRNA molecules of cellular components of the blood are costing minimal and feasible to access in order to substitute gene expression in other tissues. The contributions found in contemporary literature are specific to discover the influential genes of Myocardial Infarction. None of these are capable to identify the given gene expression is prone to CAD under MI and UA or the expression is salubrious. This evinces the need of novel contributions to discover the state of a given gene expression is prone to CAD under MI and UA or salubrious. This helps to deploy the case based reasoning to treat the patients prone to CAD under MI and UA differently. In this regard this manuscript attempted to define metaheuristic approach on gene expression data (MAGED) to discover the state of a given gene expression is prone to CAD under MI and UA or salubrious. The MAGED is machine learning strategy that learns from the labeled gene expression data of Cardia Vascular Diseased, Unstable Angina, Myocardial Infarction and Salubrious cases. given gene expression data. In order to this the given gene expressions are partitioned into their respective categories of coronary artery disease (CAD), unstable angina (UA), Myocardial Infarction (MI) and salubrious (blood samples taken were diagnosed as healthy). The data also includes gene expression data collected from the blood samples taken from the people clinically proven as normal. The genes involved in each gene expression are considered as features of the respective category. Since the gene expression contains dense number of genes and majority of them may be insignificant to respective category of the disease. Henceforth, the feature optimization process (see sec 3.1) will be carried out to eliminate these insignificant features. The gene range will be discretized further to compare two genes through equality by approximation (see sec 3.2). Afterwards the confidence of each feature towards all categories of gene expression data will be assessed (see sec 3.3) that follows the assessment of each gene expression confidence against the features of all categories (see sec 3.4). Further the confidence obtained for each feature and gene expression of respective category will be used as input to define the metaheuristic scales to estimate the scope of coronary artery disease, the unstable angina and myocardial infarction. # a) Feature Optimization For each disease context considered, the gene expression dataset The attribute set 1 2 | | { ( ) , ( ) ,..., ( ) } i i F F f i f i f i = and 1 2 | | { ( ) , ( ) ,..., ( ) } n n F F f n f n f n = are feature1 2 | ( ) | ( ) { ( ) , ( ) ,. . . ( ) } j j G i G i g ij g ij g ij = be the set of genes as values observed for feature ( ) j f i of gene expressions represented by i D . Similarly the attribute set 1 2 | ( ) | ( ) { ( ) , ( ) ,. . . ( ) } j j G n G n g nj g nj g nj = be the set of genes # III. Metaheuristic Approach on Gene Expression Data The objective of the MAGED is to define a metaheuristic scale by the knowledge gained from the as values observed for feature ( ) j f n of gene expressions represented by n D . Since the gene expression is the combination of numerous count of genes, the size of feature set can lead to process complexity. In order to overcome the # Global Journal of Computer Science and Technology Volume XVI Issue IV Version I 22 Year 2016 ( ) process complexity, the insignificant features should be identified and discarded. The feature ( ) j f i of i F is said to be insignificant feature, if genes ( ) j G i of ( ) j f i are almost similar to the genes ( ) j G n of feature ( ) j f n of n F .Hence to identify the insignificant features, we adopt hamming distance that applied on genes of each feature as vectors from each disease and normal cases. The hamming distance with 0 or less than the given threshold indicates that the respective feature is insignificant. The process of hamming distance is explored below: i. Hamming Distance The value of Hamming Distance obtained here is to denote the difference between genes assigned to same feature from gene expression data of diseased and normal cases. This is one of the significant strategy to assess the difference between to elements in coding theory. The hamming distance between given vectors Let CZ ? ? // is a vector of size 0 { 1, 2,3,.....max( . )} foreach i i n m ? = Begin ({ } { }) 0 i i i i if cx cx CX cy cy CY then ? ? ? ? ? ? { } { } i i i i CZ cx cx CX cy cy CY ? ? ? ? ? ? Else 1 CZ ? End | | 1 { } CZ CX CY j hd CZ i â??" = = ? // CX # b) Gene and Gene Expression Confidence Assessment Then these genes found for each optimal feature of respective gene expression data set and the gene expressions of that data set will be used further to assess the gene and gene expression confidence. In order to this, initially the gene pairs will be defined such that each pair contains two genes and each gene representing different feature of the same dataset. Then we assess the associativity support of each gene pair. The associativity support can be described as the ratio of gene expressions contains that pair against the total number of gene expressions in respective dataset. The process of assessing # Assessing Gene and Gene Expression Confidence In order to assess the confidence of genes and gene expressions of respective gene expression dataset i D , a mutual relation graph will be formed between gene expressions and genes of respective i D . There will be an edge between a gene and gene expression if and only if the selected gene exists in that gene expression. Then each edge between gene and gene expression is weighted as follows. | ( )| 1 { ( )} G i j j j g g G i = ? ? ? Begin | | 1 { ( ) ( ) } l D k j k k e l g e l = ? ? ? Begin 0 j g w = { } | ( ) | 1 ( ) k e l m m k j m m g g e l g g = ? ? ? ? ? Begin { , } m j m p g g = ( ) ( ) j m w g s p + = End ( ) ( ) | ( ) | 1 j k j g e l k w g w e l â??" = ? End End The weights obtained for edges between genes and gene expressions in mutual graph are further used to assess the gene and gene expression confidence towards respective CAD (coronary artery disease), UA associativity support of each gene pair is described in following section (see sec 3.2.1). # ii. Assessing gene pair correlation Let i P be the set and contains all possible unique gene pairs from respective dataset i D . The possible unique gene pairs will found as follows: (unstable angina), MI (myocardial infarction) and Normal datasets. Further we measure the each feature confidence towards gene expression dataset i D as follow | ( )| 1 { ( ) } G i j j j g G i g = ? ? ? Begin | | 1 { ( ) ( ) ( ) } i j i D g D j k j i k k c w g e i g D e i ? = = ? ? ? ? ? © 2016p is | | 1 {1 { , } ( ) } ( ) | | i D k l v v i i g g e i s p D = ? ? = ? // The ratio of number of gene expressions contain both genesagainst total number of genes The correlation of each pair of genes found in gene expressions of each respective gene expression data set of coronary artery disease, unstable angina, myocardial infarction and normal cases should be estimated using the process explored in sec 3.2.1. //aggregating the weight of gene j g towards each gene expression ( ) k e i of respective dataset i D and the same is considered as the respective gene confidence towards dataset i D # End Similarly each respective gene expression confidence towards gene expression dataset i D is measured as follows | | 1 { ( ) ( ) } i D j i j j e i D e i = ? ? ? Begin | ( )| ( ) 1 { ( ) ( ) ( ) } j i k i G i e i D k g D j k i j k c w g c e i g D e i ? ? = = ? ? ? ? ? ? // The sum of product of each gene weight and the respective gene confidence, such that the gene exists in selective gene expression is the confidence of that gene expression End The confidence of genes and gene expressions of each respective gene expression data set of CAD, UA, MI and salubrious cases should be estimated using the process explored in sec 3.2.2. Then these confidence values of gene expression e with respect to CAD ,UA , MI and N will be used to estimate the given expression state is salubrious, prone to coronary artery disease, Unstable Angina or Myocardial Infarction according to the following conditions. { } { } | ( )| 1 | ( )| 1 ( ) ( ) ( ) ( ) CAD i CAD j G D g CAD i i CAD i i e CAD G D g CAD j j CAD j c w g g G D e g c c w g g G D ? = ? ? = ? ? ? ? ? = ? ? ? ? ? //? ? ? ? < ? ? ? ? ? < ? ? ? ? ? < ? ? ? ? ? > ? ? Then Salubrious state Confirmed if ( )( ) ( ) (? ? ? ? < ? ? ? ? ? < ? ? ? ? ? < ? ? ? ? ? ? ? ? Then Prone to Salubrious state # IV. Experimental Study The experimental study was carried out on a set of gene expressions taken from multiple benchmark datasets [19] Expression prediction value is 0.79, the CAD, UA and MIgene expression detection rate (also known as sensitivity) is 0.93, the salubrious gene expression detection rate (also known as specificity) is 0.782 and the overall success rate (also known as accuracy, which is the ratio between true prediction of all types of gene expressions and all given number of gene expressions) is 0.90. These statistics indicating that the MAGED is find to significant to identify the CAD, UA and MI prone expression prediction value (also known as precision or positive prediction value) is 0.93, Salubrious Gene gene expressions with success percentage of 93% (since sensitivity is 0.93), but the detection of salubrious cases, the success rate is 78% (since specificity is 0.782). The computer aided medical diagnosis should # V. Conclusion This paper introduced a learning model that device heuristics to scale the given patient record is disease prone or normal. The proposed learning model delivers two heuristics called Scale to Diseased health Scope and Scale to Normal Health Scope. In contrast to the existing benchmarking models, these heuristics are further used as scales to assess the given patient record is disease prone or normal. The medical records labeled as diseased and normal are used to device the heuristics and respectively. In order to this all unique values of all the attributes are considered as features, and then the influence weight of these features towards their respective datasets. The influence weights further will be used to assess the influence weight of the each record in dataset. From these influence weights of the records of respective dataset will be used to assess the proposed heuristics. The experimental results are optimistic and concluding the prediction accuracy and robustness. This work can be extended to identify the impact of feature correlation towards minimizing the process and computational complexity of the learning process. ![will be considered for training towards defining metaheuristic scale. Each gene expression is representted by sequence of genes for the set of features selected of respective diseases context. This description binds to all datasets ofgene expressions representing coronary artery diseases, Unstable Angina, from the blood samples of salubrious cases. The sets](image-2.png "D") CY ,{ } CZ i is the th i element of the vector CZ and|| CZ is the size of the vector CZ 1Disease (286 expressions), Unstable Angina (275expressions), Myocardial Infarction (277 expressions)and salubrious condition (276 expressions). The geneexpressions of respective category are considered asseparate datasets labeled as CAD D , UA D , MI D and N D .Each dataset CAD D , UA D ,MI D and N D partitioned intotest and training sets. The 75% of gene expressions ofeach dataset are considered as training set and rest25% of gene expressions considered as test set. 2 © 2016 Global Journals Inc. (US) MAGED: Metaheuristic Approach on Gene Expression Data: Predicting the Coronary Artery Disease and The Scope of Unstable Angina and Myocardial Infarction Further the confidence of e towards UA D , MI D and N D assessed as : // the aggregate of product of each gene confidence and weight of that exists in ( ) UA G D and e , which divides by the aggregate of confidence of all genes exists in ( ) // the aggregate of product of each gene confidence and weight of that exists in ( ) UA G D and e , which divides by the aggregate of confidence of all genes exists in ( ) // the aggregate of product of each gene confidence and weight of that exists in ( ) UA G D and e , which divides by the aggregate of confidence of all genes exists in ( ) * Heart disease and stroke statistics-2015 update: a report from the DMozaffarian EJBenjamin ASGo DKArnett MJBlaha MCushman American Heart Association. Circ 131 2015 * World Health Organization definition of myocardial infarction: 2008-09 revision SMendis KThygesen KKuulasmaa SGiampaoli MMähönen KNBlackett LLisheng International journal of epidemiology 40 1 2011 * Universal definition of myocardial infarction KThygesen JSAlpert HDWhite Europ Heart J 28 2007 * Will the universal definition of myocardial infarction criteria result in an over diagnosis of myocardial infarction? KMEggers LLind PVenge BLindahl The Amer J of Card 103 2009 * miRNAs at the heart of the matter ZWang XLuo YLu BYang J of Mol Med 86 2008 * Critical appraisal of CRP measurement for the prediction of coronary heart disease events: new data and systematic review of 31 prospective cohorts MDe Planell-Saguer MCRodicio TShah JPCasas JACooper ITzoulaki RSofat VMccormack Inter J of Epid 8 2009 Clin * C-reactive protein and reclassification of cardiovascular risk in the Framingham Heart Study PwfWilson MPencina PJacques JSelhub D'agostino R O'Donnell CJ Circ: Card Qual and Outc 2 2008 * DMPedrotty MPMorley TPCappola * Transcriptomic biomarkers of cardiovascular disease. Prog in Card Dis 2012 55 * Identification of differentially expressed genes in coronary atherosclerotic plaques from patients with stable or unstable angina by cDNA array analysis AMRandi EBiguzzi FFalciani PMerlini SBlakemore EBramucci J of Throm and Haem 1 2003 * Identification of new genes differentially expressed in coronary artery disease by expression profiling SArchacki GAngheloiu XLTian FLTan NDipaola GQShen Phys Genom 15 2003 * Development of a blood-based gene expression algorithm for assessment of obstructive coronary artery disease in nondiabetic patients MRElashoff JAWingrove PBeineke SEDaniels WGTingley SRosenberg BMC Med Genom 4 2011 * Identification of a gene expression profile that differentiates between ischemic and nonischemic cardiomyopathy MMKittleson SQYe RAIrizarry KMMinhas GEdness JVConte Circ 110 2004 * Gene expression analysis of ischemic and nonischemic cardiomyopathy: shared and distinct genes in the development of heart failure MMKittleson KMMinhas RAIrizarry SQYe GEdness EBreton Phys Genom 21 2005 * Identification of genes related to heart failure using global gene expression profiling of human failing myocardium KDMin MAsakura YLiao KNakamaru HOkazaki TTakahashi Bioch and Biophy Res Comm 393 2010 * Transcriptome from circulating cells suggests dysregulated pathways associated with long-term recurrent events following first-time sdhs snhs RSuresh XLi AChiriac KGoel ATerzic CPerez-Terzic * 10.1016/j.clinbiochem.2013.02.017 23499588 Biochem 46 2013 * Novel and conventional biomarkers for prediction of incident myocardial infarction OMelander CNewton-Cheh PAlmgren BHedblad GBerglund GEngström J of Mol and Cell Card 74 2014 * The peripheral blood transcriptome dynamically reflects system wide biology: a potential diagnostic tool CCLiew JMa HCTang RZheng AADempsey The J. of Lab and Clin Med 147 126 2006 * datasets cardiovascular events in the community The J of the Amer Med Assoc 302 2009