Journal of Computers, Vol 7, No 5 (2012), 1236-1243, May 2012
doi:10.4304/jcp.7.5.1236-1243

Adaptive Capacity Sharing through Probabilistic Controlled Placement

Xianju Yang, Peixiang Yan, Jiang Jiang, Minxuan Zhang

Abstract


As capacity demands vary among simultaneously executed threads in chip multiprocessors, dynamically managing cache resources according to the run-time demands is effective to improve L2 cache performance. Differed from existing dynamic cache management schemes based on LRU replacement policy, we propose an adaptive capacity sharing mechanism based on a global reuse replacement policy. This mechanism adopts decoupled tag and data arrays, and partitions the data arrays into private and shared regions. Capacity sharing is accomplished by determining whether to place the incoming data into the private data region or into the shared data region, which is controlled by probabilities. Our mechanism includes: (1) A VMON monitor to predict run-time capacity demands. (2) A PCS algorithm to determine the probabilities. (3) A probabilistic controlled placement scheme to enforce capacity sharing. We evaluated our mechanism with a full system simulation of an 8-core CMP and used parallel programs from PARSEC benchmark suite. We found that with the same total L2 cache capacity, our mechanism exceeds the conventional private cache managed by LRU policy, the private cache without sharing managed by reuse replacement policy, and an existing adaptive sharing scheme based on LRU policy.


Keywords


Chip Multiprocessors; Capacity Sharing; Reuse Replacement

References


 

[1] Z. Chishti, M. D. Powell, and T. N. Vijaykumar, “Optimizing replication, communication, and capacity allocation in cmps,” in ISCA '05: Proceedings of the 32nd annual international symposium on Computer Architecture. Washington, DC, USA: IEEE Computer Society, 2005, pp. 357--368. doi:10.1109/ISCA.2005.39.

[2] J. Chang and G. S. Sohi, “Cooperative caching for chip multiprocessors,” in ISCA '06: Proceedings of the 33rd annual international symposium on Computer Architecture. Washington, DC, USA: IEEE Computer Society, 2006, pp. 264--276. doi:10.1109/ISCA.2006.17.

[3] D. Zhan, H. Jiang, and S. Seth, “Exploiting set-level non-uniformity of capacity demand to enhance cmp cooperative caching,” in IPDPS'10: Proceedings of the 24th IEEE International Parallel and Distributed Processing Symposium, 2010, pp. 1--10. doi:10.1109/IPDPS.2010. 5470441.
http://dx.doi.org/10.1109/IPDPS.2010

[4] M. K. Qureshi, “Adaptive spill-receive for robust high-performance caching in cmps,” in International Symposium on High-Performance Computer Architecture, 2009, pp. 45--54. doi:10.1109/HPCA.2009.4798236.
http://dx.doi.org/10.1109/HPCA.2009.4798236

[5] H. Dybdahl and P. Stenstrom, “An adaptive shared/private nuca cache partitioning scheme for chip multiprocessors,” in Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture. Washington, DC, USA: IEEE Computer Society, 2007, pp. 2--12. doi:10.1109/HPCA.2007.346180.

[6] T. Y. Yeh and G. Reinman, “Fast and fair: data-stream quality of service,” in CASES '05: Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems. New York, NY, USA: ACM, 2005, pp. 237--248. doi:10.1145/1086297.1086328.

[7] L. Zhao, R. Iyer, M. Upton, and D. Newell, “Towards hybrid last level caches for chip-multiprocessors,” SIGARCH Comput. Archit. News, vol. 36, pp. 56--63, May 2008. doi:10.1145/1399972.1399982.
http://dx.doi.org/10.1145/1399972.1399982

[8] G. E. Suh, L. Rudolph, and S. Devadas, “Dynamic partitioning of shared cache memory,” J. Supercomput., vol. 28, pp. 7--26, April 2004. doi:10.1023/B:SUPE. 0000014800.27383.8f.

[9] M. K. Qureshi and Y. N. Patt, “Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches,” in MICRO 39: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture. Washington, DC, USA: IEEE Computer Society, 2006, pp. 423--432. doi:10.1109/ MICRO.2006.49.

[10] S. Kim, D. Chandra, and Y. Solihin, “Fair cache sharing and partitioning in a chip multiprocessor architecture,” in PACT '04: Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques. Washington, DC, USA: IEEE Computer Society, 2004, pp. 111--122. doi:10.1109/PACT.2004.15.
http://dx.doi.org/10.1109/PACT.2004.15

[11] R. Iyer, “Cqos: a framework for enabling qos in shared caches of CMP platforms,” in ICS '04: Proceedings of the 18th annual international conference on Supercomputing. New York, NY, USA: ACM, 2004, pp. 257--266. doi:10.1145/1006209.1006246.

[12] E. G. Hallnor and S. K. Reinhardt, “A fully associative software-managed cache design,” SIGARCH Comput. Archit. News, vol. 28, no. 2, pp. 107--116, May 2000. doi:10.1145/342001.339660.
http://dx.doi.org/10.1145/342001.339660

[13] M. K. Qureshi, D. Thompson, and Y. N. Patt, “The v-way cache: Demand based associativity via global replacement,” in ISCA '05: Proceedings of the 32nd annual international symposium on Computer Architecture. Washington, DC, USA: IEEE Computer Society, 2005, pp. 544--555. doi:10.1109/ISCA.2005.52.

[14] K. Rajan and G. Ramaswamy, “Emulating optimal replacement with a shepherd cache,” in MICRO 40: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture. Washington, DC, USA: IEEE Computer Society, 2007, pp. 445--454. doi:10.1109/MICRO.2007.14.
http://dx.doi.org/10.1109/MICRO.2007.14

[15] C. Bienia, S. Kumar, J. P. Singh, and K. Li, “The parsec benchmark suite: characterization and architectural implications,” in PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques. New York, NY, USA: ACM, 2008, pp. 72--81. doi:10.1145/1454115.1454128.

[16] P. S. Magnusson, M. Christensson, J. E. and et al., “Simics: A full system simulation platform,” Computer, vol. 35, pp. 50--58, 2002. doi:10.1109/2.982916.
http://dx.doi.org/10.1109/2.982916

[17] N. Muralimanohar and R. Balasubramonian, “Cacti 6.0: A tool to understand large caches,” University of Utah and Hewlett Packard Laboratories, Tech. Rep., 2009.


Full Text: PDF


Journal of Computers (JCP, ISSN 1796-203X)

Copyright @ 2006-2013 by ACADEMY PUBLISHER – All rights reserved.