Journal of Computers, Vol 6, No 3 (2011), 418-425, Mar 2011
doi:10.4304/jcp.6.3.418-425

KNN-DTW Based Missing Value Imputation for Microarray Time Series Data

Hui-Huang Hsu, Andy C. Yang, Ming-Da Lu

Abstract


Microarray technology provides an opportunity for scientists to analyze thousands of gene expression profiles simultaneously. However, microarray gene expression data often contain multiple missing expression values due to many reasons. Effective methods for missing value imputation in gene expression data are needed since many algorithms for gene analysis require a complete matrix of gene array values. Several algorithms are proposed to handle this problem, but they have various limitations. In this paper, we develop a novel method to impute missing values in microarray time-series data combining k-nearest neighbor (KNN) and dynamic time warping (DTW). We also analyze and implement several variants of DTW to further improve the efficiency and accuracy of our method. Experimental results show that our method is more accurate compared with existing missing value imputation methods on real microarray time series datasets.


Keywords


microarray time series data; missing value imputation; dynamic time warping; k-nearest neighbor

References


[1]J. DeRisi, R. Iyer, and Brown P, “Exploring the metabolic and genetic control of gene expression on a genomic scale,” Science, vol.278, pp.680-686, 1997.
doi:10.1126/science.278.5338.680
PMid:9381177

[2]P.T. Spellman, G. Sherlock, M.Q. Zhang, V.R. Iyer, K. Anders, M.B. Eisen, P.O. Brown, D. Botstein, and B. Futcher, “Comprehensive identification of cell cycle-regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization,” Mol. Biol. Cell, vol.9, pp.3273-3297, 1998.
PMid:9843569    PMCid:25624

[3]D.S.V. Wong, F.K. Wong, and G.R. Wood, “A multi-stage approach to clustering and imputation of gene expression profiles,” Bioinformatics, vol.23, pp.998-1005, 2007.
doi:10.1093/bioinformatics/btm053
PMid:17308340

[4]E. Acuna and C. Rodriguez, “The treatment of missing values and its effect in the classifier accuracy,” Classification, Clustering and Data Mining Applications, pp.639-648, 2004.

[5]O. Troyanskaya, M. Cantor, G. Sherlock, P. Brown, T. Hastie, R. Tibshirani, D. Botstein, and R.B. Altman, “Missing value estimation methods for DNA microarrays,” Bioinformatics, vol.17, pp.520-525, 2001.
doi:10.1093/bioinformatics/17.6.520
PMid:11395428

[6]H. Kim, G.H. Golub, and H. Park, “Imputation of missing values in DNA microarray gene expression data,” in: Proc. of the IEEE Computational Syst. Bioinformatics. Conf., pp.572-573, 2004.

[7]S. Oba, M. Sato, I. Takemasa, M. Monden, K. Matsubara, and S. Ishii, “A Bayesian missing value estimation method for gene expression profile data,” Bioinformatics, vol.19, pp.2088-2096, 2003.
doi:10.1093/bioinformatics/btg287
PMid:14594714

[8]H. Kim, G. H. Golub, and H. Park, “Missing value estimation for DNA microarray gene expression data: local least squares imputation,” Bioinformatics, vol.21, pp.187-198, 2005.
doi:10.1093/bioinformatics/bth499
PMid:15333461

[9]X. Wang, A. Li, Z. Jiang, and H. Feng, “Missing value estimation for DNA microarray gene expression data by support vector regression imputation and orthogonal coding scheme,” BMC Bioinformatics, vol.7, pp.1-10, 2006.
doi:10.1186/1471-2105-7-1
PMid:16393334    PMCid:1360682

[10]A.C. Yang, H.H. Hsu, and M.D. Lu, “Outlier filtering for identification of gene regulations in microarray time-series data,” in: Proc. of the 3rd Intl. Conf. on Complex, Intelligent and Software Intensive Syst., pp.854-859, 2009.
doi:10.1109/CISIS.2009.70

[11]V.S. Tseng, L.C. Chen, and J.J. Chen, “Gene relation discovery by mining similar subsequences in time-series microarray data,” in: Proc. of the IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biol., pp.106-112, 2007.

[12]C. Furlanello, S. Merler, and G. Jurman, “Combining feature selection and DTW for time-varying functional genomics,” IEEE Trans. on Sig. Processing, vol.54, pp.2436-2443, 2006.

[13]H.M. Yu, W.H. Tsai, and H.M. Wang, “Query-by-singing system for retrieving karaoke music,” IEEE Trans. on Multimedia, vol.10, pp.1626-1637, 2008.
doi:10.1109/TMM.2008.2007345

[14]C. Myers, L. Rabiner, and A. Roseneberg, “Performance tradeoffs in dynamic time warping algorithms for isolated word recognition,” IEEE Trans. on Acoustics, Speech, and Signal Processing, vol.ASSP-28, pp.623-635, 1980.
doi:10.1109/TASSP.1980.1163491

[15]L. Rabiner, A. Rosenberg, and S. Levinson, “Considerations in dynamic time warping algorithms for discrete word recognition,” IEEE Trans. on Acoustics, Speech, and Signal Processing, vol.ASSP-26, pp.575-582, 1978.
doi:10.1109/TASSP.1978.1163164

[16]S. Salvador and P. Chan, “Toward accurate dynamic time warping in linear time and space,” Intelligent Data Analysis, vol.11, pp.561-580, 2007.

[17]H. Sakoe and S. Chiba, “Dynamic programming algorithm optimization for spoken word recognition,” IEEE Trans. on Acoustics, Speech, and Signal Processing, vol.ASSP-26, pp.43-49, 1978.
doi:10.1109/TASSP.1978.1163055

[18]D. Berndt and J. Clifford, “Using dynamic time warping to find patterns in time Series,” in: Proc. of the Workshop on Knowledge Discovery in Databases, pp.359-370, 1994.

[19]J.B. Kruskall and M. Liberman, “The symmetric time warping algorithm: from continuous to discrete,” in: Time Warps, String Edits, and Macromolecules: The theory and Practice of String Comparison, 1983.

[20]F. Itakura, “Minimum prediction residual principle applied to speech recognition,” IEEE Trans. on Acoustics, Speech, and Signal Processing, vol.ASSP-23, pp.52-72, 1975.

[21]E. Keogh and M. Pazzani, “Derivative dynamic time warping,” in: Proc. of the 1st SIAM Intl. Conf. on Data Mining, pp.1-11, 2001.

[22]R. Cho, M. Campbell, E. Winzeler, L. Steinmetz, A. Conway, L. Wodicka, T. Wolfsberg, A. Gabrielian, D. Landsman, and D. Lockhart, “A genome-wide transcriptional analysis of the mitotic cell cycle,” Mol. Cell, vol.2, pp.65-73, 1998.
doi:10.1016/S1097-2765(00)80114-8


Full Text: PDF


Journal of Computers (JCP, ISSN 1796-203X)

Copyright @ 2006-2013 by ACADEMY PUBLISHER – All rights reserved.