Journal of Software, Vol 6, No 2 (2011), 306-313, Feb 2011
doi:10.4304/jsw.6.2.306-313

A DIC-based Distributed Algorithm for Frequent Itemset Generation

Preeti Paranjape-Voditel, Umesh Deshpande

Abstract


 

A distributed algorithm based on Dynamic Itemset Counting (DIC) for generation of frequent itemsets is presented by us. DIC represents a paradigm shift from Apriori-based algorithms in the number of passes of the database hence reducing the total time taken to obtain the frequent itemsets. We exploit the advantage of Dynamic Itemset Counting in our algorithm- that of starting the counting of an itemset as early as possible at the different site as soon as they become frequent at atleast one site. Hence, our algorithm shows remarkable improvement in the amount of time taken because of reduction in the number of passes of the database and comparatively lesser number of candidates generated. Distributed frequent itemset counting and association rule generation have basically used algorithms based on Apriori or Sampling. This is the first algorithm which is based on DIC.

 

 



Keywords


Distributed Association Rule Mining, Dynamic Itemset Counting (DIC), Optimistic Messaging DIC

References


[1] S.Brin and R.Motwani and J.Ullman and Shalom Tsur, Dynamic Itemset Counting and Implication Rules for Market Basket Data,SIGMOD Record, volume 6,number 2,pages 255-264,June 1997.
doi:10.1145/253262.253325

[2] R.Agrawal and T.Imilienski and A.Swami,Mining Association Rules between Sets of items in large Databases,Proc. of the ACM SIGMOD int’l Conf. on Management of Data,May,1993,pages 207-216.

[3] R.Agrawal and R.Srikant,Fast Algorithms for Mining Association Rules, Proceedings of the 20th VLDB Conference, Santiago, Chile,1993,

[4] R.Agrawal and J.Schafer,Parallel Mining of Association Rules,IEEE Transactions on Knowledge and Data Engineering”, volume 8,number 6, pages 962-969,1996.
doi:10.1109/69.553164

[5] M.Ashrafi and D.Taniar and K.Smith,ODAM:An Optimised Distributed Association Rule Mining Algorithm,IEEE Distributed Systems Online,5,3, March,2004,1541-4922.

[6] Cheung.D and Han.J and Ng.V and Fu.A and Fu.Y,A Fast Distributed Algorithm for Mining Association Rules, Proc of Tnt’l Conf on Parallel and Distributed Information Systems, Miami Beach Florida, 31-44.

[7] Assaf Schuster and Ran Wolff,Communication-Efficient Distributed Mining of Association Rules,ACM SIGMOD, Boston,4,May,2001.

[8] Assaf Schuster and Ran Wolff and Dan Trock, A high Performance Distributed Algorithm for Mining Association Rules

[9] Toivonen.H, Sampling Large Databases for association Rules, VLDB Journal,pages = 134-145.

[10] Lin.D.I and Kedem.Z.M,Pincer Search: A new algorithm for discovering the maximum frequent set,Extending Database Technology,October-December, 1999,14-25

[11] Mohammed Zaki,Parallel and Distributed Association Mining: A Survey, IEEE Concurrency,October- December,1999,14-25.
doi:10.1109/4434.806975

[12] UCI Machine Learning Repository, http://archive.ics.uci.edu/beta/datasets/Mushroom.

[13] Srikant.R,Synthetic Data Generation Code for association and sequential Patterns,IBM Quest website at http://www.almaden.ibm.com/cs/quest.

[14] Margaret Dunham,Data Mining, Introductory and Advanced Topics, Pearson Education,2006.

[15] J.Han and M.Kamber,Data Mining, Concepts and Techniques, Morgan Kaufmann Elsevier Science India.


Full Text: PDF


Journal of Software (JSW, ISSN 1796-217X)

Copyright @ 2006-2012 by ACADEMY PUBLISHER – All rights reserved.