Journal of Software, Vol 6, No 11 (2011), 2121-2128, Nov 2011
doi:10.4304/jsw.6.11.2121-2128

An Efficient Mining Algorithm by Bit Vector Table for Frequent Closed Itemsets

Keming Tang, Caiyan Dai, Ling Chen

Abstract


Mining frequent closed itemsets in data streams is an important task in stream data mining. In this paper, an efficient mining algorithm (denoted as EMAFCI) for frequent closed itemsets in data stream is proposed. The algorithm is based on the sliding window model, and uses a Bit Vector Table (denoted as BVTable) where the transactions and itemsets are recorded by the column and row vectors respectively. The algorithm first builds the BVTable for the first sliding window. Frequent closed itemsets can be detected by pair-test operations on the binary numbers in the table. After building the first BVTable, the algorithm updates the BVTable for each sliding window. The frequent closed itemsets in the sliding window can be identified from the BVTable. Algorithms are also proposed to modify BVTable when adding and deleting a transaction. The experimental results on synthetic and real data sets indicate that the proposed algorithm needs less CPU time and memory than other similar methods.


Keywords


data mining; frequent closed itemsets; bit vector table; data stream; sliding window

References


J.W.Han, J.Pei, Y.W.Yin, R.Y.Mao. Mining frequent patterns without candidate generation: frequent-pattern tree approach, Data Mining and Knowledge Discovery, No.8, pp.53-87, 2004.
http://dx.doi.org/10.1023/B:DAMI.0000005258.31418.83

K.T.Chuang, H.L.Chen, M.S.Chen. Feature-preserved sampling over streaming data. ACM Transactions on Knowledge discovery from data, Vol.2, No.4, Article 15, 2009.

Y.Y.Zhu, D.Shasha. StatStream: Statistical monitoring of thousands of data streams in real time. Proceedings of the 28th International Conference on VLDB, Hong Kong, China, pp.358-369, 2002.

H.F.Li, C.C.Ho, S.Y.Lee. Incremental updates of closed frequent itemsets over continuous data streams. Expert Systems with Applications, Vol.36, pp.2451-2458, 2009.
http://dx.doi.org/10.1016/j.eswa.2007.12.054

Y.Chi, H.Wang, P.S.Yu, R.R.Muntz. Moment: Maintaining closed frequent itemsets over a stream sliding window. Proceedings of the 2004 IEEE International Conference on Data Mining. Brighton, UK, pp.59-66,2004.

N.Jiang, L.Gruenwald. Research issues in data stream association rule mining. SIGMOD Record 35 (1), pp.14-19, 2006.
http://dx.doi.org/10.1145/1121995.1121998

Y.Chi, H.Wang, P.S.Yu, R.R.Muntz. Catch the moment: Maintaining closed frequent itemsets over a data stream sliding window. Knowledge and Information Systems, 10 (3), pp.265-294, 2006.
http://dx.doi.org/10.1007/s10115-006-0003-0

F.J.Ao, J.Du, Y.J.Yan, B.H.Liu, K.D.Huang. An efficient algorithm for mining closed frequent itemsets in data Streams. Proceedings of the IEEE 8th International Conference on Computer and Information Technology, pp.37-42, 2008.

J.Y.Wang, J.W.Han, Y.Lu, P.Tzvetkov. TFP: An efficient algorithm for mining Top-K frequent closed itemsets. IEEE Transaction on knowledge and Engineering, Vol.17, No.5, pp.652-664, 2005.

J.Dong, M.Han. BitTableFI: An efficient mining frequent itemsets algorithm. Knowledge-Based Systems, Vol.20, pp.329-335, 2007.
http://dx.doi.org/10.1016/j.knosys.2006.08.005

Dataset available at http://fimi.cs.helsinki.fi/.

H.F.Li, H.Chen. Improve frequent closed itemsets mining over data stream with BitMap. Ninth ACIS international Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing, pp.399-404, 2008.

L.Chen, L.J.Zou, L.Tu. Stream data classification using improved fisher discriminate analysis. Journal of Computers. Vol.4, No.3, pp.208-214, 2009.
http://dx.doi.org/10.4304/jcp.4.3.208-214


Full Text: PDF


Journal of Software (JSW, ISSN 1796-217X)

Copyright @ 2006-2012 by ACADEMY PUBLISHER – All rights reserved.