# I. Introduction

ardiovascular diseases are the critical reason of human deaths happening worldwide. The statistics indicating that this disease causes annually around 17.3 million deaths [1].The inadequate blood supply to the heart causes necrosis of myocardial tissue, which is clinically referred as Myocardial Infarction (MI).

The MI was claimed 7.6 million deaths among 58 million deaths worldwide in 2005 [2]. The advancements in clinical practices to diagnose and prevent MI are evinced to be not significant, since the count of human deaths due to MI is high that compared to the deaths caused by any other disease [1]  [2].

The current diagnosis of MI is based on clinical symptoms including chest pain and difficulty to breath, ECG pattern variants, and potential drop and raise of blood floating in cardiac muscles (cardiac troponins also referred as cTns) [3]. Though the phenomenal advances in clinical diagnosis strategies found, still the substantial constraints are evinced in current clinical diagnosis strategies. The advances in hs-cTni assays [4] have evinced high detection of cardio vascular disease cases (Increased true positive rate) but significant normal cases have been labeled as cardio vascular prone (decreased true negative rate), which is a potential constraint. Another advanced approach of diagnose the cardio vascular disease diagnostic measure is the cardiac miRNAs as biomarkers [5]. The prediction outcomes of this model are trivial due to limited size and tissue specific expression. Hence it is obvious to have more significant and automated detection strategies, which are using the cardiac miRNAs as primary input [6]. The serum inflammatory markers such as BNP, CRP are also considered as cardiovascular biomarkers but the detection accuracy observed with slight improvement [7][8] [9].

The acts such as clinical pathology and biology are the crucial to define cardiac biomarkers, which are expensive and less accurate. In contrast to this, the gene expression profiling quantifies the gene expressions formed by the large quantity of genes in order to identify biomarkers, which is analogous and concurrent across the multiple pathways. Hence the gene expression profiling is potential and feasible to quantify the biomarkers to diagnose cardio vascular diseases [10]. The biomarkers defined by Gene expression profiling are potential and those are not evinced by the pathology and biology based clinical processes.

The rest of manuscript describes the related work in section 2, the Metaheuristic Approach on Gene expression Data (MAGED) that followed by section 4, which elaborates the experimental study of the proposal. Finally the section 5 concludes the contribution of the manuscript.


# II. Related Work

Gene expression analysis is a potential approach to discover profound biomarkers of cardio vascular diseases. The contemporary literature contains signifycant contributions in defining biomarkers through gene C expression analysis. Randi et al., [11] devised a gene expression analysis that conceded 482 genes associated to the composition of plaques found in arteries. Many of these genes were not considered for atherosclerosis in earlier diagnosis strategies. Archacki et al., [12] proposed a gene expression profiling strategy that resulted 56 different genes for atherosclerosisprone and salubrious human coronary arteries. Among these 56, the 49 genes were not associated to coronary artery disease earlier. The model devised in [13] discovered set of genes those enables classification according to age and sex, which are having strong association with obstreperous CAD in the patients, who are not diagnosed as diabetic. The contributions in [14] and [15] profiled variant gene expressions to differentiate the cardio myopathies with influence of ischemic and nonischemic conditions. Min KD et al., [16] contributed profiling and analysis of gene expressions to notice the divergent genes associated to congestive heart failure. Suresh R et al., [17] studied the salubrious and MI patients that discovered biomarkers and imbalanced pathways those significant evince the reappearance MI in patients effected once with MI.

Liew et al., [18] defined sequence tags from gene expressions using microarray analysis that compares mRNA molecules found in cellular components of the blood with mRNA molecules found in9divergent human tissues comprising heart. The correlation observed from this comparison concluded that 84% of mRNA molecules were overlapped with mRNA molecules of heart and 80% were overlapped with mRNA molecules of other tissues. mRNA molecules of cellular components of the blood are costing minimal and feasible to access in order to substitute gene expression in other tissues.

The contributions found in contemporary literature are specific to discover the influential genes of Myocardial Infarction. None of these are capable to identify the given gene expression is prone to CAD under MI and UA or the expression is salubrious. This evinces the need of novel contributions to discover the state of a given gene expression is prone to CAD under MI and UA or salubrious. This helps to deploy the case based reasoning to treat the patients prone to CAD under MI and UA differently. In this regard this manuscript attempted to define metaheuristic approach on gene expression data (MAGED) to discover the state of a given gene expression is prone to CAD under MI and UA or salubrious. The MAGED is machine learning strategy that learns from the labeled gene expression data of Cardia Vascular Diseased, Unstable Angina, Myocardial Infarction and Salubrious cases.

given gene expression data. In order to this the given gene expressions are partitioned into their respective categories of coronary artery disease (CAD), unstable angina (UA), Myocardial Infarction (MI) and salubrious (blood samples taken were diagnosed as healthy). The data also includes gene expression data collected from the blood samples taken from the people clinically proven as normal.

The genes involved in each gene expression are considered as features of the respective category. Since the gene expression contains dense number of genes and majority of them may be insignificant to respective category of the disease. Henceforth, the feature optimization process (see sec 3.1) will be carried out to eliminate these insignificant features. The gene range will be discretized further to compare two genes through equality by approximation (see sec 3.2). Afterwards the confidence of each feature towards all categories of gene expression data will be assessed (see sec 3.3) that follows the assessment of each gene expression confidence against the features of all categories (see sec 3.4). Further the confidence obtained for each feature and gene expression of respective category will be used as input to define the metaheuristic scales to estimate the scope of coronary artery disease, the unstable angina and myocardial infarction.


# a) Feature Optimization

For each disease context considered, the gene expression dataset  The attribute set
1 2 | | { ( ) , ( ) ,..., ( ) } i i F F f i f i f i = and 1 2 | | { ( ) , ( ) ,..., ( ) } n n F F f n f n f n = are feature1 2 | ( ) | ( ) { ( ) , ( ) ,. . . ( ) } j j G i G i g ij g ij g ij =
be the set of genes as values observed for feature ( ) j f i of gene expressions represented by i D . Similarly the attribute set
1 2 | ( ) | ( ) { ( ) , ( ) ,. . . ( ) } j j G n
G n g nj g nj g nj = be the set of genes


# III. Metaheuristic Approach on Gene Expression Data

The objective of the MAGED is to define a metaheuristic scale by the knowledge gained from the as values observed for feature ( ) j f n of gene expressions represented by n D . Since the gene expression is the combination of numerous count of genes, the size of feature set can lead to process complexity. In order to overcome the


# Global Journal of Computer Science and Technology

Volume XVI Issue IV Version I 22 Year 2016 ( ) process complexity, the insignificant features should be identified and discarded. The feature ( ) j f i of i F is said to be insignificant feature, if genes ( ) j G i of ( ) j f i are almost similar to the genes ( ) j G n of feature ( ) j f n of n F

.Hence to identify the insignificant features, we adopt hamming distance that applied on genes of each feature as vectors from each disease and normal cases.

The hamming distance with 0 or less than the given threshold indicates that the respective feature is insignificant. The process of hamming distance is explored below: i. Hamming Distance The value of Hamming Distance obtained here is to denote the difference between genes assigned to same feature from gene expression data of diseased and normal cases. This is one of the significant strategy to assess the difference between to elements in coding theory.

The hamming distance between given vectors  
Let CZ ? ? // is a vector of size 0 { 1, 2,3,.....max( . )} foreach i i n m ? = Begin ({ } { }) 0 i i i i if cx cx CX cy cy CY then ? ? ? ? ? ? { } { } i i i i CZ cx cx CX cy cy CY ? ? ? ? ? ? Else 1 CZ ? End | | 1 { } CZ CX CY j hd CZ i â??" = = ? // CX

# b) Gene and Gene Expression Confidence Assessment

Then these genes found for each optimal feature of respective gene expression data set and the gene expressions of that data set will be used further to assess the gene and gene expression confidence.

In order to this, initially the gene pairs will be defined such that each pair contains two genes and each gene representing different feature of the same dataset. Then we assess the associativity support of each gene pair. The associativity support can be described as the ratio of gene expressions contains that pair against the total number of gene expressions in respective dataset. The process of assessing


# Assessing Gene and Gene Expression Confidence

In order to assess the confidence of genes and gene expressions of respective gene expression dataset i D , a mutual relation graph will be formed between gene expressions and genes of respective i D . There will be an edge between a gene and gene expression if and only if the selected gene exists in that gene expression. Then each edge between gene and gene expression is weighted as follows.
| ( )| 1 { ( )} G i j j j g g G i = ? ? ? Begin | | 1 { ( ) ( ) } l D k j k k e l g e l = ? ? ? Begin 0 j g w = { } | ( ) | 1 ( ) k e l m m k j m m g g e l g g = ? ? ? ? ? Begin { , } m j m p g g = ( ) ( ) j m w g s p + = End ( ) ( ) | ( ) | 1 j k j g e l k w g w e l â??" = ? End End
The weights obtained for edges between genes and gene expressions in mutual graph are further used to assess the gene and gene expression confidence towards respective CAD (coronary artery disease), UA associativity support of each gene pair is described in following section (see sec 3.2.1).


# ii. Assessing gene pair correlation

Let i P be the set and contains all possible unique gene pairs from respective dataset i D . The possible unique gene pairs will found as follows:

(unstable angina), MI (myocardial infarction) and Normal datasets.

Further we measure the each feature confidence towards gene expression dataset i D as follow 
| ( )| 1 { ( ) } G i j j j g G i g = ? ? ? Begin | | 1 { ( ) ( ) ( ) } i j i D g D j k j i k k c w g e i g D e i ? = = ? ? ? ? ? © 2016p is | | 1 {1 { , } ( ) } ( ) | | i D k l v v i i g g e i s p D = ? ? = ? //
The ratio of number of gene expressions contain both genesagainst total number of genes

The correlation of each pair of genes found in gene expressions of each respective gene expression data set of coronary artery disease, unstable angina, myocardial infarction and normal cases should be estimated using the process explored in sec 3.2.1.

//aggregating the weight of gene j g towards each gene expression ( ) k e i of respective dataset i D and the same is considered as the respective gene confidence towards dataset i D


# End Similarly each respective gene expression confidence towards gene expression dataset
i D is measured as follows | | 1 { ( ) ( ) } i D j i j j e i D e i = ? ? ? Begin | ( )| ( ) 1 { ( ) ( ) ( ) } j i k i G i e i D k g D j k i j k c w g c e i g D e i ? ? = = ? ? ? ? ? ? //
The sum of product of each gene weight and the respective gene confidence, such that the gene exists in selective gene expression is the confidence of that gene expression End The confidence of genes and gene expressions of each respective gene expression data set of CAD, UA, MI and salubrious cases should be estimated using the process explored in sec 3.2.2.  Then these confidence values of gene expression e with respect to CAD ,UA , MI and N will be used to estimate the given expression state is salubrious, prone to coronary artery disease, Unstable Angina or Myocardial Infarction according to the following conditions.  
{ } { } | ( )| 1 | ( )| 1 ( ) ( ) ( ) ( ) CAD i CAD j G D g CAD i i CAD i i e CAD G D g CAD j j CAD j c w g g G D e g c c w g g G D ? = ? ? = ? ? ? ? ? = ? ? ? ? ? //? ? ? ? < ? ? ? ? ? < ? ? ? ? ? < ? ? ? ? ? > ? ? Then Salubrious state Confirmed if ( )( ) ( ) (? ? ? ? < ? ? ? ? ? < ? ? ? ? ? < ? ? ? ? ? ? ? ? Then Prone to Salubrious state

# IV. Experimental Study

The experimental study was carried out on a set of gene expressions taken from multiple benchmark datasets [19]  Expression prediction value is 0.79, the CAD, UA and MIgene expression detection rate (also known as sensitivity) is 0.93, the salubrious gene expression detection rate (also known as specificity) is 0.782 and the overall success rate (also known as accuracy, which is the ratio between true prediction of all types of gene expressions and all given number of gene expressions) is 0.90. These statistics indicating that the MAGED is find to significant to identify the CAD, UA and MI prone expression prediction value (also known as precision or positive prediction value) is 0.93, Salubrious Gene gene expressions with success percentage of 93% (since sensitivity is 0.93), but the detection of salubrious cases, the success rate is 78% (since specificity is 0.782). The computer aided medical diagnosis should  


# V. Conclusion

This paper introduced a learning model that device heuristics to scale the given patient record is disease prone or normal. The proposed learning model delivers two heuristics called Scale to Diseased health Scope and Scale to Normal Health Scope. In contrast to the existing benchmarking models, these heuristics are further used as scales to assess the given patient record is disease prone or normal. The medical records labeled as diseased and normal are used to device the heuristics and respectively. In order to this all unique values of all the attributes are considered as features, and then the influence weight of these features towards their respective datasets. The influence weights further will be used to assess the influence weight of the each record in dataset. From these influence weights of the records of respective dataset will be used to assess the proposed heuristics. The experimental results are optimistic and concluding the prediction accuracy and robustness. This work can be extended to identify the impact of feature correlation towards minimizing the process and computational complexity of the learning process.
![will be considered for training towards defining metaheuristic scale. Each gene expression is representted by sequence of genes for the set of features selected of respective diseases context. This description binds to all datasets ofgene expressions representing coronary artery diseases, Unstable Angina, from the blood samples of salubrious cases. The sets](image-2.png "D")


CY ,{ } CZ i is the th i element of the vector CZ and|| CZ is the size of the vector CZ
1Disease (286 expressions), Unstable Angina (275expressions), Myocardial Infarction (277 expressions)and salubrious condition (276 expressions). The geneexpressions of respective category are considered asseparate datasets labeled as CAD D , UA D , MI D and N D .Each dataset CAD D , UA D ,MI D and N D partitioned intotest and training sets. The 75% of gene expressions ofeach dataset are considered as training set and rest25% of gene expressions considered as test set.
2
			© 2016 Global Journals Inc. (US)
			MAGED: Metaheuristic Approach on Gene Expression Data: Predicting the Coronary Artery Disease and The Scope of Unstable Angina and Myocardial Infarction
		
		
Further the confidence of e towards UA D , MI D and N D assessed as :

// the aggregate of product of each gene confidence and weight of that exists in ( ) UA G D and e , which divides by the aggregate of confidence of all genes exists in ( )

// the aggregate of product of each gene confidence and weight of that exists in ( ) UA G D and e , which divides by the aggregate of confidence of all genes exists in ( )

// the aggregate of product of each gene confidence and weight of that exists in ( ) UA G D and e , which divides by the aggregate of confidence of all genes exists in ( )
			
			
* 
	
		Heart disease and stroke statistics-2015 update: a report from the
		
			DMozaffarian
		
		
			EJBenjamin
		
		
			ASGo
		
		
			DKArnett
		
		
			MJBlaha
		
		
			MCushman
		
	
		American Heart Association. Circ
		
			131
			
			2015
		
	
* 
	
		World Health Organization definition of myocardial infarction: 2008-09 revision
		
			SMendis
		
		
			KThygesen
		
		
			KKuulasmaa
		
		
			SGiampaoli
		
		
			MMähönen
		
		
			KNBlackett
		
		
			LLisheng
		
	
		International journal of epidemiology
		
			40
			1
			
			2011
		
	
* 
	
		Universal definition of myocardial infarction
		
			KThygesen
		
		
			JSAlpert
		
		
			HDWhite
		
	
		Europ Heart J
		
			28
			
			2007
		
	
* 
	
		Will the universal definition of myocardial infarction criteria result in an over diagnosis of myocardial infarction?
		
			KMEggers
		
		
			LLind
		
		
			PVenge
		
		
			BLindahl
		
	
		The Amer J of Card
		
			103
			
			2009
		
	
* 
	
		miRNAs at the heart of the matter
		
			ZWang
		
		
			XLuo
		
		
			YLu
		
		
			BYang
		
	
		J of Mol Med
		
			86
			
			2008
		
	
* 
	
		Critical appraisal of CRP measurement for the prediction of coronary heart disease events: new data and systematic review of 31 prospective cohorts
		
			MDe Planell-Saguer
		
		
			MCRodicio
		
		
			TShah
		
		
			JPCasas
		
		
			JACooper
		
		
			ITzoulaki
		
		
			RSofat
		
		
			VMccormack
		
	
		Inter J of Epid
		
			8
			
			2009
		
	
	Clin


* 
	
		C-reactive protein and reclassification of cardiovascular risk in the Framingham Heart Study
		
			PwfWilson
		
		
			MPencina
		
		
			PJacques
		
		
			JSelhub
		
		
			D'agostino
		
		
			R
		
		
			O'Donnell
		
		
			CJ
		
	
		Circ: Card Qual and Outc
		
			2
			
			2008
		
	
* 
	
		
			DMPedrotty
		
		
			MPMorley
		
		
			TPCappola
		
		
* 
	
		Transcriptomic biomarkers of cardiovascular disease. Prog in Card Dis
		
			2012
			55
			
		
* 
	
		Identification of differentially expressed genes in coronary atherosclerotic plaques from patients with stable or unstable angina by cDNA array analysis
		
			AMRandi
		
		
			EBiguzzi
		
		
			FFalciani
		
		
			PMerlini
		
		
			SBlakemore
		
		
			EBramucci
		
	
		J of Throm and Haem
		
			1
			
			2003
		
	
* 
	
		Identification of new genes differentially expressed in coronary artery disease by expression profiling
		
			SArchacki
		
		
			GAngheloiu
		
		
			XLTian
		
		
			FLTan
		
		
			NDipaola
		
		
			GQShen
		
	
		Phys Genom
		
			15
			
			2003
		
	
* 
	
		Development of a blood-based gene expression algorithm for assessment of obstructive coronary artery disease in nondiabetic patients
		
			MRElashoff
		
		
			JAWingrove
		
		
			PBeineke
		
		
			SEDaniels
		
		
			WGTingley
		
		
			SRosenberg
		
	
		BMC Med Genom
		
			4
			
			2011
		
	
* 
	
		Identification of a gene expression profile that differentiates between ischemic and nonischemic cardiomyopathy
		
			MMKittleson
		
		
			SQYe
		
		
			RAIrizarry
		
		
			KMMinhas
		
		
			GEdness
		
		
			JVConte
		
	
		Circ
		
			110
			
			2004
		
	
* 
	
		Gene expression analysis of ischemic and nonischemic cardiomyopathy: shared and distinct genes in the development of heart failure
		
			MMKittleson
		
		
			KMMinhas
		
		
			RAIrizarry
		
		
			SQYe
		
		
			GEdness
		
		
			EBreton
		
	
		Phys Genom
		
			21
			
			2005
		
	
* 
	
		Identification of genes related to heart failure using global gene expression profiling of human failing myocardium
		
			KDMin
		
		
			MAsakura
		
		
			YLiao
		
		
			KNakamaru
		
		
			HOkazaki
		
		
			TTakahashi
		
	
		Bioch and Biophy Res Comm
		
			393
			
			2010
		
	
* 
	
		Transcriptome from circulating cells suggests dysregulated pathways associated with long-term recurrent events following first-time sdhs snhs
		
			RSuresh
		
		
			XLi
		
		
			AChiriac
		
		
			KGoel
		
		
			ATerzic
		
		
			CPerez-Terzic
		
		
* 
	
		
		10.1016/j.clinbiochem.2013.02.017
		23499588
	
	
		Biochem
		
			46
			
			2013
		
	
* 
	
		Novel and conventional biomarkers for prediction of incident myocardial infarction
		
			OMelander
		
		
			CNewton-Cheh
		
		
			PAlmgren
		
		
			BHedblad
		
		
			GBerglund
		
		
			GEngström
		
	
		J of Mol and Cell Card
		
			74
			
			2014
		
	
* 
	
		The peripheral blood transcriptome dynamically reflects system wide biology: a potential diagnostic tool
		
			CCLiew
		
		
			JMa
		
		
			HCTang
		
		
			RZheng
		
		
			AADempsey
		
	
		The J. of Lab and Clin Med
		
			147
			126
			2006
		
	
* 
	
		datasets cardiovascular events in the community
		
	
		The J of the Amer Med Assoc
		
			302
			
			2009