A Translation Framework for Executing the Sequential Binary Code on CPU/GPU Based Architectures
Abstract
Keywords
References
[1] K.Fatahalian, J.Sugerman, and P.Hanrahan, “Understanding the efficiency of GPU algorithms for matrix-matrix multiplication”, Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, 2004, pp.133-137.
[2] N.K.Govindaraju, S.Larsen, J.Gray, and D.Manocha, “A memory model for scientific algorithms on graphics processors”, Proceedings of the 2006 ACM/IEEE conference on Supercomputing, 2006.
http://dx.doi.org/10.1145/1188455.1188549
[3] General-Purpose Computation Using Graphics Hardware. http://www.gpgpu.org/.
[4] Cristina Cifuentes and K. John Gough, “Decompilation of Binary Programs”, SOFTWARE: PRACTICE AND EXPERIENCE, VOL. 25(7), 1995, pp.811-829.
http://dx.doi.org/10.1002/spe.4380250706
[5] Tipp Moseley, Daniel A. Connors, Dirk Grunwald, Ramesh Peri, “Identifying potential parallelism via loop-centric profiling”, Proceedings of the 2007 International Conference on Computing Frontiers, 2007, pp.143-152.
http://dx.doi.org/10.1145/1242531.1242554
[6] C. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, Vijay Janapa Reddi and K. Hazelwood, “Pin: building customized program analysis tools with dynamic instrumentation”, Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, 2005, pp.190-200.
http://dx.doi.org/10.1145/1065010.1065034
[7] Nicholas Nethercote and Julian Seward, “Valgrind: A Framework for Heavyweight Dynamic Binary Instrumentation”, Proceedings of ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, 2007, pp.89-100.
[8] Derek Bruening, Timothy Garnett, Saman Amarasinghe, “An Infrastructure for Adaptive Dynamic Optimization”, International Symposium on Code Generation and Optimization, 2003, pp. 265-275.
[9] C. Ancourt and F.Irigoin, “Scanning polyhedral with do loops”, Symposium on Principles and Practice of Parallel Programming, 1991, pp.39-50.
[10] U. Bondhugula, A. Hartono, J. Ramanujan, and P. Sadayappan, “Apractical automatic polyhedral parallelizer and locality optimizer”, Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation, 2008, pp.101-113.
[11] Nathan Clark, “Why Should I Rewrite My Software When Dynamic Compilation Can Be Good Enough”, Workshop on Software Tools for Multi-Core Systems, 2008.
[12] Nathan Clark, Jason Bolme, Micheal Chu, Scott Mahlke, Stuart Biles, and Krisztian Flautner, “An Architecture Framework for Transparent Instruction Set Customization in Embedded Processors”, International Symposium on Computer Architecture, 2005, pp. 272-283.
[13] “PTX: Parallel Thread Execution ISA”, Version 2.1, 2010.
[14] “NVIDIA CUDA Programming Guide”, Version 3.1, 2010.
[15] Muthu Manikandan Baskaran, J. Ramanujan, Sriram Krishnamoorthy, “Automatic Data Movement and Computation Mapping for Multi-level Parallel Architectures with Explicitly Managed Memories”, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, 2008, pp.1-10.
[16] Victor Podlozhnyuk, “FFT-based 2D convolution”, NVIDIA CUDA Sample Documentation, 2007.
[17] “Parboil benchmarkSuite”, http://impact.crhc.illinois.edu/parboil.php.
[18] Muthu Manikandan Baskaran, J. Ramanujan, “A Compiler Framework for Optimization of Affine Loop Nests for GPGPUs”, Proceedings of the 22nd annual international conference on Supercomputing, 2008, pp.225-234.
[19] Seyong Lee, Seung-Jai Min, Rudolf Eigenmann, “OpenMP to GPGPU: a compiler framework for automatic translation and optimization”, Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, 2009, pp.101-110.
[20] Par4All, http://www.par4all.org/
[21] Uday Bondhugula, J.Ramanujam, P.Sadayappan, “PLuTo: A polyhedral automatic parallelizer and locality optimizer for multicores”, http://pluto-compiler.sourceforge.net.
[22] Isaac Gelado, Javier Cabezas, John Stone, Sanjay Patel, Nacho Navarro and Wen-mei Hwu, “An Asymmetric Distributed Shared Memory Model for Heterogeneous Parallel Systems”, Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems, 2010, pp.347-358.
[23] Bratin Saha, Xiaocheng Zhou, Hu Chen, Ying Gao, Shoumeng Yan, Mohan Rajagopalan, Jesse Fang, Peinan Zhang, Ronny Ronen, Avi Mendelson, “Programming Model for a Heterogeneoous x86 Platform”, Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation, 2009, pp.431-440.
[24] Nathan Clark, Amir Hormati, and Scott Mahlke, “VEAL: Virtualized Execution Accelerator for Loops”, 35th International Symposium on Computer Architecture (ISCA), 2008, pp.389-400.
[25] Muthu Manikandan Baskaran, J. Ramanujan and P. Sadayappan, “Automatic C-to-CUDA Code Generation for Affine Programs”, 19th International Conference on Compiler Construction, 2010, pp.244-263.
[26] Hyunchul Park, Yongjun Park, and Scott Mahlke, “Polymorphic Pipeline Array: A Flexible Multicore Accelerator with Virtuzlied Execution for Mobile Multimedia Applications”, Procedings 42nd International Symposium on Micro architecture, 2009, pp.370-380.
[27] Yi Yang, Ping Xiang, “A GPGPU Compiler for Memory Optimization an Parallelism Management”, Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation, 2010, pp.86-97.
[28] Yindong Yang, Haibing Guan, Erzhou Zhu, Hongbo Yang, Bo Liu, “CrossBit: A Multi-Sources and Multi-Targets DBT”, The First International Conference on Cloud Computing, GRIDs, and Virtualization, 2010, pp.41-47.
[29] Guoxing Dong, Kai Chen, Erzhou Zhu, Yichao Zhang, Zhengwei Qi and Haibing Guan, “A Translation Framework for Virtual Execution Environment on CPU/GPU Architecture”, the Third International Symposium on Parallel Architectures, Algorithms and Programming, 2010, pp.130-137.
Full Text: PDF


