Extended Apriori for association rule mining: Diminution based utility weightage measuring approach

Table of contents

1. INTRODUCTION

ince accessibility of huge amounts of data and knowledge is on the rise, data mining has occupied a significant place in the field of information industry [1]. Data mining is a vital part of the process of Knowledge Discovery in Databases (KDD) [22] which is a non-trivial excavation of hidden, implied, never revealed data and comparatively has a large usage [6].

In broad, data mining methods are categorized into two ways:

2. Descriptive mining

It is an account of putting forward a group of data and its characteristics in a succinct and summarized way. One of the most significant of the descriptive kind mining is Association Rule Mining (ARM) which was introduced by Agarwal et.al. [2].

3. Predictive mining

This method makes to surmise the outline of the information to give some assumptions [13].

Progressively many network side scholars and researchers of computer science, particularly those who are dedicatedly working in the area of Knowledge Discovery in Data (KDD), mainly concentrate and accentuate on Association Rule Mining (ARM) [14]. Association rule mining [2,3] has been extensively utilized to detect and unravel data mining complicated issues which involve financial troubles, and business dealings [4].

The difficulties of using the mining association rules are divided into two steps.

1. First step is to evaluate the item sets that often occur in the databases. 2. The second one is to produce the association rules.

Just the once if the some item sets are found occurring often, then the production of the association rules is uncomplicated and can be accomplished in a specified time [5].

Conventional ARM algorithms judge all the data items in the similar passion by assuming that their weight-age as always 1 if they are identified or 0 if they are not identified which intangibly drives to miss some of the very functional outlines of the data [7]. In order to trounce these drawbacks of the traditional way of mining, utility mining method [9] [10] and weighted association rule mining [16] has come into existence.

Utility data mining is a latest field of development entranced in every type of utility factors in data mining procedures and accentuated to assimilate the services also called as utilities in the data mining methods [15]. Utility of any particular data or item is a reliant on the individuals and is measured in terms of aesthetic values or other expressions of individual inclination [7]. When an action in the database and its related minimum utility threshold value and a utility chart are monitored then the objective of the utility mining is to identify and determine each and every high utility item set [12]. In general pros and cons of the item set in the business is not possible to be derived by the utilization of the values that shore up the rules. So this rationally proves that utility mining can be more advantageous than the conventional association rule mining [12].

But while considering the weighted association rule mining (WARM) [16], there has been a modification of not counting the item sets that occurred in an action of the database which made a compulsion to acclimatize the conventional help to the weighted one [23]. This method also segments the consumers on the basis of their reliable nature and impending count of procurements [8]. For an illustration, one consumer may buy 15 items and some other may have 5 items at a time but the conventional association rule method considers these couple of actions in a similar way. Hence the procedure of considering the actions in a same conduct in the conventional association rule method mislays some of the vital information [8]. As a result, Weighted ARM deals mainly with the magnitude of individual substances in a database [17,18,19]. For a case in point, goods that has more advantages or which are under the process of endorsement are given prior able constraints in comparison to rest [20].

Incorporating the mutual characteristics (weightage and utility) for excavation of rules is treated as an addition to the weighted association rule mining which means that the data weights are most important in a particular set of actions besides it also deals with the number of possible appearances of the data in those actions. This has made a concern in categorizing the data appearances and their weights and also in detecting the prior able data which put in more to the benefits of the business [21]. By considering this Sandhu et al [28] proposed a model that identifying association rules based on UW-Score, which calculated based on characteristics weightage and utility. In their proposal they were not considering diminution occurred in contextual factors Here, we put forward an efficient method that makes the utilization of the traditional Apriori algorithm to engender a group of association rules from a database out of which a pooled Exact Utility Weighted Score (EUW-Score) is calculated. In due course, sub values of the priority given weighted, utility and diminution considered constraints are derived on the source of the EUW-score and the tentative outcome exhibit the effectiveness.

The later part of the paper is structured as given: The section 2 explains the methods involved. The proposed methodology based on usage of mining utility-oriented association rules is explained in section 3. Section 4 concludes the paper.

4. II.

5. METHODS INVOLVED

Apriori, a notable algorithm for ARM, is one of the frequently used processes of discovering an assortment of data properties which functionally is associated to bring about the data and are chiefly based on range of occurrences. But the extraction through the number of occurrences does not bring in the attention of the scholars, and to overcome this some more measures are included in the Apriori algorithm for an efficient mining of association rules. They are: a) Weightage Unlike the general transaction database which projects the total amount of characteristics by some number, the traditional algorithms like Apriori mine association rules utilizes a binary mapped database that depicts the occurrence of the data or an item in one course of an action, thus allowing to gather and verify some good number of information related to the characteristics of the data, that results in recurrent but few number of weight-age rules. Even in an ordinary user transaction, some times the data that has a good weight-age occurs rarely, but it must also be involved in the recurrent item set. This procedure is followed in our approach, for mining a subset that has high significance.

6. b) Utility

The individual utility (Gain) of the characteristics is the subsequent measure involved in the approach to give a good standard to the ARM.

7. c) Diminution

The individual Diminution that occurs when item failed to raise the utility (Gain), which balance the utility and provide actual gain of the characteristics is the subsequent measure involved to the approach to give a good standard to the ARM. Some of the service standards in the business would be neglected in a process of mining. As these rules, when mined without these service standards will lead to a plausible loss. Those standards are attained by this method through utility measure (U-gain). Weight-age and utility measures are individually incorporated in copious researches [21,24,25,26] so as to make their methods more efficient but those procedures need high capability. These procedures are effectively utilized in this methodology to extract the association rules from a database.

8. III.

9. PROPOSED METHODOLOGY

Assuming D as a database having n number of transactions T and m number of attributes I= [i1,i2 ,.....,im] with positive real number weights Wi. Ui specifies the profit associated with the i attribute of utility table U with m count of utility values.

The methodology based on weight-age and utility involves some key steps like:

Step1: Extraction of the association rules from D by utilizing Apriori.

Step 2: Generation of W-gain value.

Step 3: Generation of U-gain value.

Step 4: Generation of D-sum value.

Step 5: Generation of DUW-score through W-gain and Ugain.

Step 6: Deriving the vital association rules by taking UWscore into consideration.

10. a) Extraction of the Association Rules through Apriori

To begin with, association rules are excavated from a transaction database D with n transactions. We represent the database D as:

(1) Each transaction T in D encompasses with 'm' number of attributes I= [i1,i2 ,.....,im] related to it and each attribute i is symbolized by weights Wi.

To extract the rules, a typical Apriori algorithm is used in our methodology. A binary mapped database BT is applied for extracting the association rules in conventional Apriori process by which the initial database D is converted to binary mapped database BT such that it comprises of binary 0 and 1 denoting the non-existence and existence of attributes. By the succeeding equation the weights Wi are mapped onto the binary values.

(

Consequently, an input to the Apriori algorithm [2] is produced by the binary mapped database BT for extraction of association rules which are processed in two steps of Apriori as follows:

? Recurrent Item set Generation: Produces minsupport which signifies that each and every feasible set of attributes that comprise support value higher than a predefined threshold.

? Association Rule Generation: Produces minconfidence which signifies that association rules from the item sets that comprise confidence higher than a predefined threshold.

The composition of a typical association constraint is: A ? B, where A symbolizes the antecedent and B symbolizes the consequent and these are subset of the items in the binary mapped database, such that A ? I, B ? I and A ? B= ? and is construed as B co-existing if A exists. The support S and confidence C clasps the constraint A ? B in the transaction database D, if item sets A and B are contained in S% and C % of the transactions. Therefore:

Support (3) Confidence (A ? B) = P(B|A)= support (A ? B)/ support (A)(4)

The pseudo code for the Apriori algorithm is:

A k amount of association rules R=[R1,R2 ,.....,Rk] are produced with apriori algorithm and is sent as input to the successive part in the methodology for weight-age and utility calculation. Each attribute of k association rules of R the is determined with some measures, that is for an association rule R i of the form, [A, B] => C, where, A, B and C are considered as the attributes, the derivations U-gain, W-gain and UW-score are evaluated for each attribute A, B and C independently.

At first, a decremented arrangement is done for the produced k association rules considering their respected confidence level. The listing of rearranged association rules is specified by b) Generating the value of W-gain At the start the initial rule R1' is chosen from the rearranged list S and the independent attributes of R1' are derived followed by the computation of W-gain.

Definition 1: Item weight (Wi): Item weight value Wi, is a nonnegative integer which is termed as the total magnitude evaluation of the attribute present in the transaction database D.

Definition 2: Weighted Gain (W-gain): W-gain is termed as the summation of weights of each item W i of an attribute that is involved in each and every transaction of the database D as referred in the given equation: Correspondingly the initial rule R1' is preferred from the rearranged list S and the independent attributes of R1' are derived. By considering the U-factor and the utility value Ui ,the value of U-gain for each character attribute is determined.

T T D T n ? ? ? ? ? ? ? ? = ? ? ? ? ? ? ? ? 0 0 1 0 { if W T i k B r if W T i k = ? = ? ? (A ? B) = P (A ? B) 1 1 1 { arg 1 };( 2; 0;

Definition 3: Item Utility (Ui): In general every character has a precincts of the gain related to that particular character or attribute and is delineated as the Item utility Ui.

Definition 4: Utility table U: A quantity of 'm' utility values Ui are encompassed in the utility table U with the attributes related in the transaction database D. We represent the utility table as: (6) Definition 5: Utility factor (U-factor): The constant value of utility factor (U-factor) is derived by the addition of every utility items (Ui) of the utility table U .We define it as: (7) Consider, m is the amount of attributes involved in the transaction database.

Definition 6: Utility Gain (U-gain): The calculation of an attribute's authentic utility by considering its U-factor is referred as the Utility Gain and we define it as follows: (8) For every attribute in the association rule R1' the value of U-gain is calculated.

Definition 7: Diminution table: A quantity of 'm' diminution values DMi are encompassed in the diminution table DM with the attributes related in the transaction database D. We represent the diminution table as: Definition 8: Diminution factor (D-factor): The constant value of diminution factor (D-factor) is derived by the addition of every diminution items ( i DM ) of the diminution table DM. We define it as: (10) Consider, m is the no of attributes involved in the transaction database.

Definition 9: Diminution Sum (D-sum): The calculation of an attribute's authentic utility by considering its U-factor is referred as the Utility Gain and we define it as follows: (11) For every attribute in the association rule R1' the value of D-sum is calculated.

11. d) Generation of EUW-score through W-gain, U-gain and D-sum

From the values derived by calculating W-gain, U-gain and D-sum for the each attribute, they are merged together into one value named as UW-score for every independent association rule.

Definition 10: Exact Utility Weighted Score (EUW-score): EUW-score is derived by computing the proportion between the addition of products of W-gain, U-gain and D-sum for each attribute in the association rule to the total amount of attributes present in the rule. (12) Here, | R | denotes the amount of attributes present in the association rule.

The equations ( 5),( 8) and ( 11) and ( 12) aimed to determine the W-gain, U-gain, D-sum and EUWscore These equations are looped for the remaining association rules R2 ' to R k' involved in the rearranged list S. And for total 'k' number of association rules in the rearranged list S will be calculated with a EUW-Score related to it and the association rules in the rearranged list S are consequently rearranged by taking EUW-score into consideration to get

Figure 1. ( 5 )
5Extended Apriori for association rule mining: Diminution based utility weightage measuring approach © 2011 Global Journals Inc. (US) Global Journal of Computer Science and Technology Volume XI Issue XXII Version I
Figure 2.
R1', R2 ',.., Rk ' }, S R ? , where conf (R1 ') ? conf (R2 ') ? ?conf (R3 ')?.. ? ?conf (Rk ').
Figure 3.
Here we term, i w as the weight of item in an attribute and | | T as the amount of transactions in the database D.c) Generating the value of U-gain
1
2
3
4
5

Appendix A

Appendix A.1

consequential values of the weighted and utility related association rules is given by , where and The significant improvement in minimizing number of rules can be observable in following graphs.

Appendix A.2 CONCLUSION

By considering the weight factor, utility and diminution, the methodology used by us has given a chance to provide a proficient high utility association rules. At the outset, the planned methodology has enabled to make utilization of the conventional Apriori algorithm to create a group of association rules from a database. Depending on weightage (W-gain), utility (Ugain) and diminution (D-sum) complications a joint Exact Utility Weighted Score (EUW-Score) is generated for each association rule extracted. Considering the EUW-Score generated, eventually a subset of notable association rules are derived at.

Appendix B

  1. Recognizing & rioritizing Of Critical Success Factors (CSFs) On Data Mining Algorithm's Implementation In Banking Industry: Evidence From Banking Business System. Ali Rajabzadeh Ghatari , Nasibeh Mohamadi , Aida Honarmand , Parviz Ahmadi , Nasibeh Mohamadi . Proceedings of EABR & TLC, (EABR & TLCPrague, Czech Republic
    ) 2009.
  2. Weighted Support Association Rule Mining using Closed Itemset Lattices in Parallel. A M J Md , P Zubair Rahman , Balasubram . International Journal of Computer Science and Network security March 2009. 9 (3) p. .
  3. Visually Aided Exploration of Interesting Association Rules. Bing Liu , Wynne Hsu , Ke Wang , Shu Chen . Proceedings of the Third Pacific-Asia Conference on Methodologies for Knowledge Discovery and Data Mining, (the Third Pacific-Asia Conference on Methodologies for Knowledge Discovery and Data Mining) 1999. p. .
  4. Mining Association Rules with Weighted Items. C H Cai , A W C Fu , C Cheng , W Kwong . Proceedings of the International Symposium on Database Engineering and Applications, (the International Symposium on Database Engineering and ApplicationsCardiff, Wales, UK
    ) July 1998. p. .
  5. Mining temporal rare utility Itemsets in large databases using relative utility thresholds. Chun-Jung Chu , Vincent S Tseng , Tyne Liang . International Journal of Innovative Computing, Information and Control November 2008. 4 (11) .
  6. An efficient algorithm for mining high utility itemsets with negative item values in large databases. Chun-Jung Chua , Vincent S Tsengb , Tyne Liang . Applied Mathematics and Computation 2009. 215 (2) p. .
  7. Weighted Association Rule Mining using Weighted Support and Significance Framework. Feng Tao , Fionn Murtagh , Mohsen Farid . Proceedings of the International Conference on Knowledge Discovery and Data Mining, (the International Conference on Knowledge Discovery and Data MiningWashington
    ) 2003. p. .
  8. Mining Long High Utility Itemsets in Transaction Databases. Guangzhu Yu , Shihuang Shao , Xianhui Zeng . WSEAS Transactions On Information Science & Applications February 2008. 5 (2) p. .
  9. A Foundational Approach to Mining Itemset Utilities from Databases. Hong Yao , Howard J Hamilton , Cory J Butz . Proceedings of the Third SIAM International Conference on Data Mining, (the Third SIAM International Conference on Data MiningOrlando, Florida
    ) 2004. p. .
  10. Mining itemset utilities from transaction databases. Hong Yaoa , Howard J Hamilton . Data & Knowledge Engineering December 2006. 59 (3) p. .
  11. Attribute-Oriented Induction in Data Mining. J Han , Y Fu . Advances in Knowledge Discovery and Data Mining, 1996. AAAI Press/The MIT Press. p. .
  12. High-utility pattern mining: A method for discovery of high-utility item sets. Jianying Hu , Aleksandra Mojsilovic . Pattern Recognition November 2007. 40 (11) p. .
  13. Pushing Frequency Constraint to Utility Mining Model. Jing Wang , Ying Liu , Lin Zhou , Yong Shi , Xingquan Zhu . Proceedings of the 7th international conference on Computational Science, (the 7th international conference on Computational ScienceBeijing, China
    ) 2007. p. .
  14. Mining Weighted Association Rules without Preassigned Weights. Ke Sun , Fengshan Bai . IEEE Transactions on Knowledge and Data Engineering April 2008. 20 (4) .
  15. Data mining: an overview from a database perspective. M S Chen , J Han , P S Yu . IEEE Transactions on Knowledge and Data Engineering 1996. 8 (6) p. .
  16. Fuzzy Weighted Association Rule Mining with Weighted Support and Confidence Framework. M Sulaiman Khan , Maybin Muyeba , Frans Coenen . International Workshops on New Frontiers in Applied Data Mining, (Osaka, Japan
    ) May 20-23, 2009. p. .
  17. A Weighted Utility Framework for Mining Association Rules. M Sulaiman Khan , Maybin Muyeba , Frans Coenen . proceedings of European Symposium on Computer Modeling and Simulation, (European Symposium on Computer Modeling and SimulationLiverpool
    ) September 2008. p. .
  18. An Improvement in Apriori Algorithm Using Profit and Quantity. P S Sandhu , D S Dhaliwal , S N Panda , A Bisht . Computer and Network Technology (ICCNT), 2010 Second International Conference on, April 2010. p. .
  19. Mining association rules between sets of items in large databases. R Agrawal , T Imielinski , A Swami . proceedings of the international Conference on Management of Data, ACM SIGMOD, (the international Conference on Management of Data, ACM SIGMODWashington, DC
    ) May 1993. p. .
  20. Fast algorithms for mining association rules. R Agrawal , R Srikant . Proceedings of 20th International Conference on Very Large Data Bases, (20th International Conference on Very Large Data BasesSantiago, Chile
    ) September 1994. p. .
  21. Mining Weighted Association Rules. S Lu , H Hu , F Li . Intelligent Data Analysis, August 2001. 5 p. .
  22. A Fast Algorithm for Mining High Utility Itemsets. S Shankar , T Babu Nishanth , Jayanthi S Purusothaman . proceedings of IEEE International Advance Computing Conference, (IEEE International Advance Computing ConferencePatiala,India
    ) 2009.
  23. Efficient mining of weighted interesting patterns with a strong weight and/or support affinity. Unil Yuna . Information Sciences September 2007. 177 (17) p. .
  24. Knowledge Discovery in Databases: An Overview, W Frawley , G Piatetsky-Shapiro , C ; Matheus , Ai Magazine . 1992. 1992. p. .
  25. Efficient Mining of Weighted Association Rules (WAR). W Wang , Yu P Yang . Proceedings of the KDD, (the KDDBoston, MA
    ) August 2000. p. .
  26. Isolated items discarding strategy for discovering high utility itemsets. Yu-Chiang Li , Jieh-Shan Yeh , Chin-Chen Chang . Data & Knowledge Engineering January 2008. 64 (1) p. .
  27. Decomposition Methodology for Knowledge Discovery and Data Mining: Theory and Applications, Z Oded , Lior Maimon , Rokach . May 2005. World Scientific Publishing Company.
Notes
1
December
2
© 2011 Global Journals Inc. (US) Global Journal of Computer Science and Technology Volume XI Issue XXII Version I 26 2011 December
3
© 2011 Global Journals Inc. (US) Global Journal of Computer Science and Technology Volume XI Issue XXII Version I 28 2011 December
4
© 2011 Global Journals Inc. (US) Global Journal of Computer Science and Technology Volume XI Issue XXII Version I 29 2011 December
5
© 2011 Global Journals Inc. (US)
Date: 2011-12-02