# Introduction

n association rule is a general form of dependence where an element has some dependence with another element. Margaret H.Dunham and S.Sridhar [1] said Spatial Association Rule mining (SAR) is about generating association rules about spatial data objects. Either the antecedent or the consequent of the rule must contain some spatial predicates (such as near). Spatial association rules are implications of one set of data by another. Due to the relationships involved the spatial components; one entity can affect the behavior of other entity. Spatial data items are naturally linked to neighboring data elements (e.g., contiguous geographic positions), these data elements are not statistically independent. This makes the spatial data mining different from the normal transactional data mining.

Various activities involved in the SAR is computing the spatial relationships, generating the frequent sets and extracting the association rules. The existing approaches use quantitative reasoning, which computes distance relationships during the frequent set generation [3]- [4]. These approaches deal only with points, consider only quantitative relationships and do not consider non spatial attributes of geographic data, Author ? : Assistant Professor, Dept of MCA, Thiagarajar School of Management, Madurai,Tamilnadu, India. Author ? : Principal, Indra Ganesan College of Engineering, Tirchirapalli,Tamilnadu, India. E-mail-arunasethupathy@gmail.com which may be fundamental importance of knowledge discovery. Qualitative spatial reasoning [5]- [7] considers distance and topological relationships between a reference geographic object type and a set of relevant feature types represented by any geometric primitive (e.g. points, lines, and polygons). Bogorny.V [8] used qualitative spatial reasoning approach with prior knowledge and removes well known patterns completely by early pruning the input space and the frequent item sets. Salvatore Orlando [9] discussed about the various kinds of spatial predicates which can be involved in spatial association rules. In this paper, we are concentrating on the second and third step for the SAR.

A novel two step refinement algorithm based on Hybrid Evolutionary Algorithm (HEA) which uses genetic algorithm with ant colony optimization for generating the spatial association rules and clustering the generated rules for the required groups is developed and implemented. In the first step HEA algorithm is used to enhance the performance of Multi Objective Genetic Algorithm (MOGA) by incorporating local search with Ant colony optimization (ACO), for multi objective association rule mining. In the proposed HEA algorithm, MOGA is conducted to provide the diversity of associations thereafter; ant colony optimization is performed to come out of local optima. From the simulation results, it is shown that the proposed HEA algorithm has superior performance when compared to other existing algorithms. In the second step, we group the rules generated for finding the various target groups by clustering based on genetic algorithm. The proposed methodology is to use genetic algorithm over the existing rule cover algorithm for keeping nearest neighbors in common cluster and to be insensitive to data input order. Rules are grouped based on consequent information of the rules generated by step 1. Groups of rules are in the form X i -> Y for i=1,2,?,n. That is, different rule antecedents X i 's are collected into one group for a same rule consequent Y.

Moses santhakumar said [29] it is the second largest city in Tamilnadu state, having a very old history of about two thousand six hundred years and is often referred to as the Athens of East. Having different locations for residential, commercial, industrial, educational, public and semi-public habitats, we have chosen Madurai as the place of study and collected data from various parts of it.

Aditi pai and Deepika Khatri [30] said women are now a critical consumer segment for marketers to tap, not only for household and conventional women's products but also for services. Yet all women do not behave the same, their habits differ according to their status such as marital and working status. In our paper we have taken women in city Madurai for the target segmentation.

The paper is organized as follows: Section 2 deals with the background concepts of SAR, MOGA and the ACO applied for the optimization of the rule generation, method of clustering the rules. Section 3 deals first step of the algorithm developed, Section 4 explains the second step of the algorithm. Section 5 deals with approach followed in this paper and discuss the results obtained and Section 6 gives the conclusion of the paper.


# II.


# Background Study

This section is divided into three parts. Section a.) discusses the need for considering SAR as a multi objective problem, section b.) discusses the use of genetic algorithm and the ACO for the optimizing SAR and section c.) says about the Clustering of association rules.


# a) Multi objective view on SAR

Existing algorithms for the SAR try to measure the quality of generated rule by considering only one evaluation criterion. Due to the growing need of the knowledge from the spatial data, we can consider the problem as a multi objective one rather than the single objective. Multi-objective optimization deals with solving optimization problems which involve multiple objectives. Most real-world search and optimization problems involve multiple objectives (such as minimizing fabrication cost and maximize product reliability and others) and should be ideally formulated and solved as a multi-objective optimization problem.


# b) Evolutionary algorithms for optimization

Over the past decade, population-based evolutionary algorithms (EAs) have been found to be quite useful in solving multi-objective optimization problems, simply because of their ability to find multiple optimal solutions in a single simulation run. Alex A. Freitas [10] explained the motivation for using genetic algorithms in the discovery of high-level prediction rules. GA perform a global search and cope better with attribute interaction than the greedy rule induction algorithms often used in data mining. According to Dehuri. S et al [12] genetic algorithms for rule discovery can be divided into two broad approaches, the Michigan approach and the Pittsburgh approach. Peter P. Wakabi-Waiswa and Venansius Baryamureeba [13] said the biggest distinguishing feature between the two is that in the mapping of chromosome to the rule.

ACO is a paradigm for designing meta heuristic algorithms for combinatorial optimization problems. The ACO algorithm was first introduced by Colorni, Dorigo and Maniezzo [13]- [14] and the first Ant System (AS) was discussed by Dorigo [15] in his Ph.D. thesis. The ACO is a meta-heuristic algorithm, which utilizes the inspiration from real ant colonies behaviours to find a shortest path from a food source to the nest without using visual cues by exploiting pheromone information [16]- [18].

In order to reduce the total number of rules generated we used evolutionary algorithm. We tend to use the evolutionary algorithms since the solution space and the number of rules generated will be exponentially growing. We use the HEA, which inturn is using MOGA and ACO.


# c) Clustering of the association rules

Clustering association rules is one of the meaningful ways of grouping association rules into different clusters. When the spatial association rules are generated in order to identify the group of targets we are using the clustering approach. Waler A.Kosters et al, [19] selected highly ranked (based on confidence) association rules one by one and formed cluster of objects covered by each rule until all the objects in the database are covered. Toivonen.H et al [20] formed cluster of rules of the form X i -> Y, that is, rules with different antecedent but with same consequent Y and they extracted representative rules for each cluster as knowledge for the cluster. Pi Dechang and Qin Xiaolin [21] formed cluster of rules based on structure distance of antecedent. Alipio Jorge [22] formed hierarchical clustering of rules based on different distance methods used for rules. G. Li, and H.J.Hamilton [23] discussed different ways of pruning redundant rules including rule cover method. All Associative Classifier (AC) CBA, CMAR proposed by W. Li et. al [24]. RMR proposed by A. Thabtah, and P. I. Cowling [25] and MCAR proposed by Adriano Veloso et. al [26] generate cluster of rules called class-association rule (CAR) with class label as same consequent and they use database (rule) cover to select potential rules to build (AC) classifier model. In most of the association rule mining, confidence measure is used to rank association rules. Also, other measures such as chi-square, laplace-accuracy is used to select highly ranked rules.


# III.

Application of HEA for Spatial Association rule mining SAR uses apriori algorithm for the generation of the rules. Here the rules generated by apriori using the hybrid algorithm. The possible rules are represented as chromosomes and a suitable encoding/decoding scheme has been defined, it also provides the diversity of associations among the rules generated by elitism.We follow the Michigan approach for the optimization. To increase the efficiency of the MOGA, we are using the ACO, which limits the algorithm from falling to the local optimal solution.

The procedures of HEA are as follows. First, MOGA searches the solution space and generates association lists to provide the initial population for ACO. Next, ACO is executed, when ACO terminates, the crossover and mutation operations of MOGA generate new population. ACO and GA search alternately and cooperatively in the solution space. Then the rules are clustered using the rule cover based on the consequent information.


# a) String Representation

Chromosomes are encoded as real numbers the number of genes in each chromosome is equal to the number of item sets considered. Each gene will have 4 digits for vector index. A sample chromosome may look like as follows: 0001 0102 0204 0302 0401 0500 0601 0702 0802 0901 1002 1101 1201

Here, the first two numbers in each gene represents the attribute and the next two denotes the value , fourth gene has the value 0302 where 03 refers to the age group and 02 refers to the third age group ranges from 23 to 25. Like wise all the gene has been encoded, once the initial population is generated now we are ready to apply genetic operators.


# b) Fitness

The fitness function is calculated as the arithmetic weighted average confidence, comprehensibility and J-Measure. The fitness function is given by f(x) = (w1 * Comprehensibility) + (w2 * J -Measure) + (w3 * Confidence) _____________________________________________________ w1+w2+w3

where w1,w2,w3 are used defined weights.


# d)

Reproduction (Selection)

The selection process selects chromosomes from the mating pool directed by the survival of the fittest concept of natural genetic systems. In the proportional selection strategy adopted in this paper, a chromosome is assigned a number of copies, which is proportional to its fitness in the population, go into the mating pool for further genetic operations. Roulette wheel selection is used for the proportional selection strategy.


# e) Crossover

Crossover is a probabilistic process that exchanges information between two parent chromosomes for generating two child chromosomes. In this paper, single point crossover with a fixed crossover probability of C is used. For chromosomes of length l, a random integer, called the crossover point, is generated in the range [1, l -1]. The portions of the chromosomes lying to the right of the crossover point are exchanged to produce two offspring.


# Mutation

Each chromosome undergoes mutation with a fixed probability M. For binary representation of chromosomes, a bit position (or gene) is mutated by simply flipping its value. Since we are considering real numbers in this paper, a random position is chosen in the chromosome and replace by a random number between 0-9.

Step 1: Pseudo code for optimization of rule generation 1. while (t <= no_of_gen) 2. M_Selection(Population(t)) 


# Clustering the rules

We are using the classifier model which uses the consequent information for grouping. The clusters will be formed who are having their consequent as similar pattern. We have first grouped based on the attributes; it may be homogeneous like urban core, suburbs, rural or hierarchical groups like metropolitan area, major cities, and neighborhoods. Further group based on the purpose like segmenting the population by consumer behavior. S.Kannan and R.Bhaskaran [27] proposed an algorithm for clustering the rules.

The key factors to be considered in the spatial clustering algorithm is keeping nearest neighbors in common cluster and the cluster to be insensitive to the data input order. In our novel approach we achieve this objective by with a Pareto based multiple-objective genetic evolutionary algorithm.The MOGA is used to achieve the multi Let R y ={ X i -> Y | i=1,2,?,n } be a set of n rules for some item-set Y and m(X i Y ) be rule cover, which is the set of tuples/records covered by the rule X i -> Y in the dataset D.

Let C y be the cluster rule cover for a group or cluster of rules R y . i.e., C y = m(R y ) = U i=1,2,?n m(X i Y)

From cluster rule set Ry, find a small set of k rules r y called representative rule set such that m(r y ) is almost equal to m(R y ). i.e., m(r y ) ? m(R y ), or U i=1,2,?k m(X i Y ) ?U i=1,2,?n m(X i Y ), where k<< n

Step 2 : Pseudo code for clustering the rules generated Input : set of rules generated by the HEA R y ={ X i -> Y | i=1,2,?,n } and the rule cover. Apply GA for rearranging the rules in various orders based on the fitness preferred by the user.

1. Generate the cluster rule cover 2. count = number of records in the cluster cover 3. while(no of records in the cluster cover > 2% of count) Sort all the rules in the R y in the descending order of the rule cover. Take the first rule r with highest rule cover If the no of records in the rule cover is <= 2% of count Exit while loop End if. 4. r y = r y U r 5. Delete the highest rule cover from the cluster cover 6. End While Output : the representative rule set.

Apply GA for retaining nearest neighbours in common cluster.The optimized representative rule set is used for the segmentation of the consequent.

GA is applied at the first stage for the arrangement of the rules based on the fitness; this is to help the clustering for not suffering from the order of the input. In the second.


# Results and discussions

We have used the synthesized dataset for our research. The general procedure of data mining is:

? Question raise ? Data preparation (including data selection, data pre treatment and data transformation) ? Data arrangement ? Model building/data mining ? Result evaluation and explanation.

The specific procedure is as following.

(1) As done by Xinqi Zheng and Lu Zhao [28] we take advantage of "import wizard" in Matlab to accomplish the import of data file. Until now, the data fields and character fields are saved separately.

(2) Run algorithm step 1 to generate the rules.

(3) Run algorithm step 2 to generate the target group.

The environmental setup are Population size = 100, Mutation rate (M) = 0.5, Crossover rate ( C )=0.8, Elitism = 0.5. The stopping criterion used is the non evolution of the archive during 100 generations, once the minimal number of generations has been over passed.

Keeping the fitness as 50% we have computed the results. In Fig 1 the comparison has been done for the number of rules generated to the support count given with the apriori algorithm, apriori algorithm optimized with the MOGA and the apriori algorithm optimized with HEA proposed in Step 1. the nearest neighbors based on the metrics given by the user. Application of GA is to increase the quality of 1. When the Support is increased the numbers of rules generated are decreasing and the use of HEA also performs a significant change in the number of rules fitness, this rectify the sensitiveness to the data input order. After applying the rule cover, GA is used to retain better compared to the other two algorithms. HEA outperforms MOGA by an average of 8% for the four support levels ( 5%,10%,15% and 20%). It shows the good improvement in the 20% support level.

In  Lift ratio says us how much better the rule is better as predicting the result than just assuming the result in the first place. It is defined as the ratio of the records that support the entire rule to the number that would be expected, assuming there was no relationship between the items. From Fig 2 we can have the following observation, Lift ratio for the HEA is better than the other two algorithms. This shows the efficiency of the HEA to identify the rules for predicting the result by improving the information about the increase in probability of the consequent to the given antecedent part.

In  By the refinement of the rules generated HEA algorithm in step 2 by the cluster concept is useful in narrowing the segmentation. From the synthetic dataset we have used the representative rules to find the preference based segmentation for the women based on 7 preferences and the distance they can maximum travel to get it. We have segmented the women into five segments and the representative rules have minimum support 20% fitness as 50% and the lift ratio as 38 %.

By the application of genetic algorithm for the keeping nearest neighbors in common cluster and to be insensitive to data input order, the time of generation of the clusters has been significantly reduced. This is depicted in Fig 4 and 1. On an average 11 % decrease in the time of forming the clusters is performed by the proposed approach. This is due to the reason of applying GA for the rearrangement of rules before clustering. 2. The proposed approach works well when the number of rules applied for clustering is 1000 and the time reduction is about 16.6% 3. The average increase in the Dunn's Validity Index is 6.91% The students and the just married women are willing to relocate more for the preference. 2. Young mothers are limited to relocate 3. Most of the target group is willing to relocate less for the bus stop 4. The preference of the students and the just married women were alike.


# Comparision based on the Lift ratio generated for various support levels

We can generate number of observations and this can be used to the planning of the city based on the preferences, marketing and the logistics can be planned.


# VI.


# Conclusion

This paper proposed a methodology for the optimization of SAR using the Hybrid Evolutionary Algorithm in the first phase and finding the target groups by the refinement of the rules by clustering. The results for the first phase is promising and the second phase also lay a opening for the identification of target groups which can be further extended to the preferential based mining. In this paper the case taken for the phase two is the women target groups with various categories and their preferences. The work can be extended to the classification of the rule generated and the phase two can be also optimized with the help of the evolutionary algorithms. 
1![Fig 1 : Comparison of the three algorithms based on the number of rules generated From Fig 1 we can have the following observations.](image-2.png "Fig 1 :")
![Fig 2 the comparison has been done for the lift ratio for the top 500 rules generated to the support count given with the apriori algorithm, apriori algorithm optimized with the MOGA and the apriori algorithm optimized with HEA proposed in Step 1.](image-3.png "")


A Two Step Optimized Spatial Association Rule Mining Algorithm By Hybrid Evolutionary AlgorithmAnd Cluster Segmentation© 2011 Global Journals Inc. (US) c) using the application of genetic algorithms to the rules before applying the rule cover algorithm. GA is used to rearrange the rules in various order based on the the clusters.
			© 2011 Global Journals Inc. (US) July
			Global Journal of Computer Science and TechnologyVolume XI Issue XII Version I July
			© 2011 Global Journals Inc. (US)
			© 2011 Global Journals Inc. (US)
			Global Journal of Computer Science and TechnologyVolume XI Issue XII Version I July
			© 2011 Global Journals Inc. (US) Global Journal of Computer Science and Technology Volume XI Issue XII Version I July (1989), pp.
			Global Journal of Computer Science and TechnologyVolume XI Issue XII Version I July
			© 2011 Global Journals Inc. (US)
		
		
* 
	
		
			RAgarwal
		
		
			RSrikant
		
		fast Algorithms for mining association rules in large databases, International conference on very large databases, VLDB, 20
				San Fransico. Proceedings? California
		
			Morgan Kaufmann
			1994. 1994
			
		
* 
	
		Data Mining Introductory and Advanced Topics
		
			HMargaret
		
		
			SDunham
		
		
			Sridhar
		
		
			2006
			Pearson Education
		
	
* 
	
		A Joinless Approach for Co-location Pattern Mining: A Summary of Results
		
			JSYoo
		
		
			SShekhar
		
		
			MCelik
		
	
		5th IEEE-ICDM
				Houston
		
			IEEE Computer Society
			2005. 2005
			
		
* 
	
		A partial join approach for mining co-location patterns
		
			JSYoo
		
		
			SShekhar
		
	
		12th ACM-GIS
				Washington
		
			ACM Press
			2004
			
		
* 
	
		Mining and Filtering Multi-level Spatial Association Rules with ARES
		
			MAppice
		
		
			MBerardi
		
		
			MCeci
		
		
			DMalerba
		
	
		15th ISMIS
				New York
		
			Springer
			2005
			
		
* 
	
		Mining Association Rules in Spatio-Temporal Data: An Analysis of Urban Socioeconomic and Land Cover Change
		
			JMennis
		
		
			JWLiu
		
	
		Transactions in GIS
		
			9
			1
			
			2005. January
		
	
* 
	
		Discovery of spatial association rules in geographic information databases
		
			KKoperski
		
		
			JHan
		
	
		4th SSD
				Portland
		
			Springer
			1995
			
		
* 
	
		Enchancing spatial association rule mining in geographic databases
		
			VBogorny
		
	
		Instituto de Informatica da UFRGS
		
			October 2006
		
	
	PhD Thesis


* 
	
		Postgraduate Program in Computer Science, Pontificia Universidade Catolica do Parana Rua Imaculada Conceicao
		
			AlexAFreitas
		
		1155. Curitiba -PR. 80215- 901
		
			Brazil
		
	
	A Survey of Evolutionary Algorithms for Data Mining and Knowledge Discovery


* 
	
		Multi-objective Genetic Algorithm for Association Rule Mining Using a Homogeneous Dedicated Cluster of Workstations
		
			SDehuri
		
		
			AKJagadev
		
		
			AGhosh
		
	
		American Journal of Applied Sciences
		1546-9239
		
			3
			11
			
			2006. 2006
		
	
	And Mall R


* 
	
		Extraction of Interesting Association Rules Using Genetic Algorithms
		
			PPeter
		
		
			VenansiusWakabi-Waiswa
		
		
			Baryamureeba
		
		
		International Journal of Computing and ICT Research
		
			2
			1
			
		
* 
	
		Positive feedback as a search strategy
		
			AColorni
		
		
			MDorigo
		
		
			VManiezzo
		
		No. 91-016
		
			1991
			Milano, Italy
		
	
	Technical Report
	Politecnico di


* 
	
		The ant system: an autocatatlytic process
		
			AColorni
		
		
			MDorigo
		
		
			VManiezzo
		
		No. 91-016
		
			1991
			Milano, Italy
		
	
	Technical Report
	Politecnico di


* 
	
		Optimization, Learning and Natural Algorithms
		
			MDorigo
		
		
			1992
			Italy
		
		
			Politecnico di Milano
		
	
	Ph.D. Thesis


* 
	
		Trails and Uturns in the selection of the shortest path by the ant lasius niger
		
			RBeckers
		
		
			JLDeneubourg
		
		
			SGoss
		
	
		Journal of Theoretical Biology
		
			159
			
			1992
		
	
* 
	
		Self-organized shortcuts in the argentine ant
		
			SGoss
		
		
			SAron
		
		
			JLDeneubourg
		
		
			JMPasteels
		
	
		Naturwissenschaften
		
			76
			12
			
		
* 
	
		The Ants
		
			BHolldobler
		
		
			EOWilson
		
		
			1990
			Springer-Verlag
			Berlin
		
	
* 
	
		Mining Clusters with Association browsing and summarization of large sets of Association Rules
		
			AWaler
		
		
			ElenaKosters
		
		
			Marchiori
		
		
			AJArd
		
		
			Oerlemans
		
	
		Proceedings of the 4th SIAM International Conference on Data Mining
				the 4th SIAM International Conference on Data MiningOrlando, FL
		
			2004
			
		
* 
	
		Basic association rules
		
			GLi
		
		
			HJHamilton
		
	
		Proceedings of the 4th SIAM International Conference on Data Mining
				the 4th SIAM International Conference on Data MiningOrlando, FL
		
			2004
			
		
* 
	
		CMAR: accurate and efficient classification based on multiple classassociation rules
		
			WLi
		
		
			JHan
		
		
			JPei
		
	
		Proceedings of IEEE International Conference on Data Mining (ICDM2001)
				IEEE International Conference on Data Mining (ICDM2001)
		
			2001
			
		
* 
	
		A greedy classification algorithm based on association rule
		
			AThabtah
		
		
			PICowling
		
	
		Appl. Soft Comput
		
			7
			3
			
			June 2007
		
	
* 
	
		Multi-label Lazy Associative Classification
		
			AdrianoVeloso
		
		
			WagnerMeira
		
		
			MarcosGonçalves
		
		
			MohammedZaki
		
		
			2007. 2007
			
		
	Knowledge Discovery in Databases: PKDD


* 
	
		Association Rule Pruning based on Interestingness Measures with Clustering
		
			SKannan
		
		
			RBhaskaran
		
	
		IJCSI International Journal of Computer Science Issues
		
			6
			1
			
			2009
		
	
* 
	
		Association Rule Analysis of Spatial Data Mining Based on Matlab
		
			XinqiZheng
		
		
			LuZhao
		
		IEEEDOI10.1109/WKDD.2008.21
	
	
		Workshop on Knowledge Discovery and Data Mining
				
			2008
		
	
* 
	
		Transportation system management for Madurai city using GIS
		
			MosesSanthakumar
		
	
		Map India
				
			2003
		
	
* 
	
		She buys to conquer
		
			AditiPai
		
		
			DeepikaKhatri
		
		
			April 25, 2008
			India Today
		
	
* 
	
		Well separated clusters and optimal fuzzy partitions
		
			JCDunn
		
	
		J. Cybern
		
			4
			
			1974
		
	
* 
	
		Pruning and grouping discovered association rules
		
			HToivonen
		
		
			MKlemettinen
		
		
			PRonkainen
		
		
			KHatonen
		
		
			HMannila
		
	
		Proc. ECML-95 Workshop on Statistics, Machine Learning, and Knowledge Discovery in Database
				ECML-95 Workshop on Statistics, Machine Learning, and Knowledge Discovery in Database
		
			April 1995
			
		
* 
	
		A new Fuzzy Clustering algorithm on Association rules for knowledge management
		
			PiDechang
		
		
			QinXiaolin
		
	
		Information Technology Journal
		
			7
			1
			
			2008
		
	
* 
	
		Hierarchical Clustering for thematic Rules
		
			AlipioJorge
		
	
		Advances in Intelligent Data Analysis
		Lecture Notes in Computer Science
		
			JIda-99) (d
			JNHand
			MRKok
			Berthold
		
		
			Springer
			1999
			1642