# Introduction

ata mining is a technique that helps to extract important data from a large database. It is the process of sorting through large amounts of data and picking out relevant information through the use of certain sophisticated algorithms. As more data is gathered, with the amount of data doubling every three years, data mining is becoming an increasingly important tool to transform this data into information. Data mining techniques are the result of a long process of research and product development. This evolution began when business data was first stored on computers, continued with improvements in data access, and more recently, generated technologies that allow users to navigate through their data in real time. Data mining takes this evolutionary process beyond retrospective data access and navigation to prospective and proactive information delivery. The myth possible with data mining includes automated prediction of trends and behaviors and automated discovery of previously unknown patterns. 


# Analysis of Apriori Algorithm

Apriori was proposed by Agrawal and Srikant in 1994. The algorithm finds the frequent set L in the database D. It makes use of the downward closure property. The algorithm is a bottom search, moving upward level; it prunes many of the sets which are unlikely to be frequent sets, thus saving any extra efforts. Apriori algorithm is an algorithm of association rule mining. It is an important data mining model studied extensively by the database and data mining community. It Assume all data are categorical. It is initially used for Market Basket Analysis to find how items purchased by customers are related.

The problem of finding association rules can be stated as : Given a database of sales transactions, it is desirable to discover the important associations among different items such the presence of some items in a transaction will imply the presence of other items in the same transaction. As example of an association rule is: Contains (T, "baby food") ? Contains (T, "diapers") [Support= 4%, Confidence=40%]

The interpretation of such rule is as follows: ? 40% of transactions that contains baby food also contains diapers; ? 4% of all transactions contain both of these items. The above association rule is called singledimension because it involves a single attribute or predicate (Contains). The main problem is to find all association rules that satisfy minimum support and minimum confidence thresholds, which are provided by user and/or domain experts. A rule is frequent if its support is greater than the minimum support threshold and strong if its confidence is more than the minimum confidence threshold.

Discovering all association rules is considered as two phase process where we find all frequent item sets having minimum support. The search space to enumeration all frequent item sets is on the magnitude of 2 * n. In second step, we generate strong rules. Any association that satisfies the threshold will be used to generate an association rule. The first phase in discovering all association rules is considered to be the most important one because it is time consuming due to the huge search space (the power set of the set of all items) and the second phase can be accomplished in a straightforward manner.


# III.


# Algorithm for Apriori

The pseudo code for the algorithm is given below. For a transaction database , and a support threshold of . Usual set theoretic notation is employed; though note that is a multi set. is the candidate set for level . Generate () algorithm is assumed to generate the candidate sets from the large item sets of the preceding level, heeding the downward closure lemma.

accesses a field of the data structure that represents candidate set , which is initially assumed to be zero. Many details are omitted below, usually the most important part of the implementation is the data structure used for storing the candidate sets, and counting their frequencies.


# IV.

Steps in finding the association rules using Apriori

A large supermarket tracks sales data by stockkeeping unit (SKU) for each item, and thus is able to know what items are typically purchased together. Apriori is a moderately efficient way to build a list of frequent purchased item pairs from this data. Let the database of transactions consist of the sets {1,2,3,4}, {1,2}, {2,3,4}, {2,3}, {1,2,4}, {3,4}, and {2,4}. Each number corresponds to a product such as "butter" or "bread". The first step of Apriori is to count up the frequencies, called the supports, of each member item separately: Table 1 explains the working of Apriori algorithm. We can define a minimum support level to qualify as "frequent," which depends on the context. For this case, let min support = 3. Therefore, all are frequent. The next step is to generate a list of all 2-pairs of the frequent items. Had any of the above items not been frequent, they wouldn't have been included as a possible member of possible 2-item pairs. In this way, priori prunes the tree of all possible sets. In next step we again select only these items (now 2-pairs are items) which are frequent and generate a list of all 3-triples of the frequent items (by connecting frequent pairs with frequent single items). In the example, there are no frequent 3-triples. Most common 3-triples are {1,2,4} and {2,3,4}, but their support is equal to 2 which is smaller than our min support. 


# Global Journal of Computer Science and Technology

Volume XII Issue X Version I  Table 1:   Table 2 : V.


# Implementing Apriori algorithm in WEKA

WEKA is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. WEKA contains tools for data preprocessing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes. WEKA is open source software issued under the GNU General Public License.


# VI.


# Data set features

A closed questionnaire of 56 questions, labeled A,B,C?? BB was prepared and circulated among 56 students. Maximum questions were having four options to answer. answers caught as a1, a2, a3, a4 ( a means answer) . These questionnaire were circulated randomly to avoid mass copying of the answers and then collected after one hour. Out of 56 , only 43 questionnaire were correct in all respects. Remaining 13 needed interactions with the corresponding students as few questions on them were not answered by them. Since 13 students refused to re answer these, we have rejected them out. Microsoft Excel was used to tabulate the data in the questionnaire and 43 rows were created . A CSV( Comma Separated Values) sheet was made from it which has been fed as input to the WEKA Algorithm. 


# VII.


# Rules generated


# Conclusions

In this paper, we have studied association rule mining over survey dataset. Our study shows that mining multiple-level association rules from databases has wide applications and efficient algorithms can be developed for discovery of interesting and strong such rules in the database The larger the set of frequent item sets the more the number of rules presented to the user, many of which are redundant. 
![The calculations of the Support(S) and Confidence(C) are very simple:? CONF (A ? B) = SUPP(AUB) ? SUPP(A) ? S (A) = (Number of transactions containing item A) /( Total number of transactions in the database) A ? B) =( Number of transactions containing items A and B) / (Total number of transactions in the database)](image-2.png "")
![1. G=g1 34 ==> K=k1 34 conf:(1) [ Those who join face book groups also have knowledge of Security Settings, Accuracy -34 %] 2. Ae=ae1 33 ==> Af=af1 33 conf:(1) [ Those who use internet for preparing projects also use internet for preparing seminars , Accuracy -33 %] 3. D=d1 Ac=ac1 33 ==> Af=af1 33 conf:(1) [Those who have active account on facebook , and download lecture notes , also use internet for preparing seminars, Accuracy -33%] 4. G=g1 Af=af1 33 ==> K=k1 33 conf:(1) [Those who joined groups in facebook , and download seminar from internet , also have knowledge of security settings of facebook Accuracy -33%] 5. K=k1 Ae=ae1 32 ==> Af=af1 32 conf:(1) [Those who download lecture notes , also use internet for preparing projects and seminars, Accuracy -32%] 6. D=d1 G=g1 31 ==> K=k1 31 conf:(1) [Those who have active account on facebook and joins groups on facebook , can have knowledge of security settings, Accuracy-31%] VIII.](image-3.png "")
![K. Pujari, " Data Mining Techniques", 14th impression, 2008 2. R. Agrawal, T. Imielinski, A. Swami,"Mining Association Rules Between Sets of Items in Large Databases", Proc. SIGMOD Conference, 1993. 3. Rakesh Agrawal and Ramakrishnan Srikant, "Fast algorithms for mining association rules in large](image-4.png "")

			© 2012 Global Journals Inc. (US)
		
		
* 
	
		Proc. of the 20th International Conference on Very Large Data Bases, VLDB
				
			JorgeBBocca
			MatthiasJarke
			CarloZaniolo
		
		of the 20th International Conference on Very Large Data Bases, VLDBSantiago, Chile
		
			September 1994
			
		
* 
	
		Finding interesting rules from large sets of discovered association rules
		
			MKlementtinen
		
	
		Proceedings of the CIKM
				the CIKM
		
			1994