Journal of Computers, Vol 7, No 7 (2012), 1564-1573, Jul 2012
doi:10.4304/jcp.7.7.1564-1573

A Novel Strategy for Mining Frequent Closed Itemsets in Data Streams

Keming Tang, Caiyan Dai, Ling Chen

Abstract


Mining frequent itemsets from data stream is an important task in stream data mining. This paper presents an algorithm Stream_FCI for mining the frequent closed itemsets from data streams in the model of sliding window. The algorithm detects the frequent closed itemsets in each sliding window using a DFP-tree with a head table. In processing each new transaction, the algorithm changes the head table and modifies the DFP-tree according to the changed items in the head table. The algorithm also adopts a table to store the frequent closed itemsets so as to avoid the time-consuming operations of searching in the whole DFP-tree for adding or deleting transactions. Our experimental results show that our algorithm is more efficient and has lower time and memory complexity than the similar algorithms Moment and FPCFI-DS.


Keywords


Stream data; mining closed frequent data itemsets; sliding window

References


 

[1] N. Pasquier, Y. Bastide, R. Taouil, L. Lakhal. Discovering Frequent Closed Itemsets for Association Rules [C], Springer, Volume 1540, pp.398-416, 1999.

[2] R. Agrawal, R. Srikant. Fast algorithms for mining association rules in large databases [A]. Proceedings of the 20th International Conference on Very Large Data Bases [C]. San Francis2 co: Morgan Kaufmann, pp.487-499, 1994.

[3] H. Jiawei, P. Jian, and Y.Yiwen. Mining frequent patterns without candidate generation [C], Proc. ACM SIGMOD 2000, pp.1(R) C12.

[4] Y. Chi, H. Wang, P. Yu, R. Muntz. MOMENT: Maintaining closed frequent Itemsets over a stream sliding window [C]. Proceedings of the 2004 IEEE International Conference on Data Mining. Brighton, UK: IEEE Computer Society Press, pp.59-66, 2004.

[5] Z. Yunyue, D. Shasha. StatStream: statistical monitoring of thousands of data streams in real time. Proceedings of the 20th International Conference on Very Large Data Bases. Hong Kong, China: Morgan Kaufmann, pp.358-369, 2002.

[6] G. Manku, R. Motwani. Approximate frequency counts over data stream. Proceedings of the 28th International Conference on Very Large Data Bases. Hong Kong, China: Morgan Kaufmann, pp.346-357, 2002.
http://dx.doi.org/10.1016/B978-155860869-6/50038-X

[7] Joong Hyuk Chang, Won Suk Lee. Finding recent frequent Itemsets adaptively over online data streams [C]. Proceedings of the 9th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining. Washington, USA: ACM Press, pp.487-492, 2003.
http://dx.doi.org/10.1145/956750.956807

[8] C. Gannell, J. Han, E. Robertson, C. Liu. Mining frequent Itemsets over arbitrary time intervals in data streams [C]. Bloomington: Indiana University, 2003.

[9] T. Wei-Guang, C. Ming-Syan, Y. Philip S. A regression based temporal pattern mining scheme for data streams [A]. Proceedings of the 29th International Conference on Very Large Data Bases[C]. Berlin, Germany: Morgan Kaufmann, pp.607-617, 2003.

[10] Fujiang Ao, Jing Du, Yuejin Yan, Baohong Liu, Kedi Huang An Efficient Algorithm for Mining Closed Frequent Itemsets in Data Streams, IEEE 8th International Conference on Computer and Information Technology Workshops, pp.37-42, 2008.

[11] Dataset available at http://fimi.cs.helsinki.fi/.

[12] H. Li, C. Ho, ET al. A New Algorithm for Maintaining Closed Frequent Itemsets in Data Streams by Incremental Updates. In Proc. Of IWMESD’06, Hong Kong, pp.672-676, 2006.


Full Text: PDF


Journal of Computers (JCP, ISSN 1796-203X)

Copyright @ 2006-2013 by ACADEMY PUBLISHER – All rights reserved.