Please wait a minute...
浙江大学学报(工学版)
计算机技术﹑电信技术     
面向非写分配高速缓存的一致性协议及实现
修思文1, 黄凯1, 余慜1, 谢天艺1, 葛海通2, 严晓浪1
1. 浙江大学 超大规模集成电路研究所,浙江 杭州 310027;2. 杭州中天微系统有限公司,浙江 杭州 310027
Cache coherence protocol and implementation for multiprocessors with no-write-allocate caches
XIU Si-wen1, HUANG Kai1, YU Min1, XIE Tian-yi1,GE Hai-tong2, YAN Xiao-lang1
1. Institute of VLSI Design, Zhejiang University, Hangzhou 310027, China; 2. Hangzhou C-SKY Microsystems Co., Ltd, Hangzhou 310027,  China
 全文: PDF(2342 KB)   HTML
摘要:

针对现有的高速缓存一致性协议应用在基于写回、非写分配缓存的多核处理器的缺点,提出一种新颖的基于写干涉的一致性协议,并加以硬件实现.采用写干涉协议,在处理器产生写缺失操作时,可以把数据直接写到系统中其他处理器有效的该高速缓存行中;支持“脏数据”的延迟回写和缓存间的数据拷贝;且系统中只要存在有效的被请求的缓存行就可以提供数据,避免不必要的共享存储器访问.实验结果表明,该文提出的写干涉协议与MOESI协议相比,显著减少了对共享存储器的访问,提高了整个系统性能,同时大幅降了低动态功耗.

Abstract:

Against the disadvantages of existing cache coherence protocols for write-back and no-write-allocate caches, a novel write intervention based protocol was proposed and hardware implemented. Taking advantage of this protocol, in some cases the data can be directly written to the peer caches when write miss occurs, Furthermore, both delayed write-back mechanism of dirty data and cache-to-cache copy are supported. And the requested data can be provided as long as there is at least one valid corresponding cache line, avoiding the unnecessary access of the shared memory. Experimental results show that, in comparison to MOESI protocol, the proposed protocol can significantly reduce the accesses of the shared memory, save the dynamic power consumption and power consumption, and improve  the performance of the whole system.

出版日期: 2015-02-01
:  TN 47  
基金资助:

国家科技重大专项基金资助项目(2009ZX01030-001-002);国家自然科学基金资助项目(61100074)

通讯作者: 黄凯,男,副教授     E-mail: huangk@vlsi.zju.edu.cn
作者简介: 修思文(1985—),男,博士,主要研究方向为多核处理器设计.Email: xiusw@vlsi.zju.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  

引用本文:

修思文, 黄凯, 余慜, 谢天艺, 葛海通, 严晓浪. 面向非写分配高速缓存的一致性协议及实现[J]. 浙江大学学报(工学版), 10.3785/j.issn.1008-973X.2015.02.023.

XIU Si-wen, HUANG Kai, YU Min, XIE Tian-yi,GE Hai-tong, YAN Xiao-lang. Cache coherence protocol and implementation for multiprocessors with no-write-allocate caches. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 10.3785/j.issn.1008-973X.2015.02.023.

链接本文:

http://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2015.02.023        http://www.zjujournals.com/eng/CN/Y2015/V49/I2/351

[1] ZHOU X, YU C, DASH A, et al. Application-aware snoop filtering for low-power cache coherence in embedded multiprocessors [J]. ACM Transactions on Design Automation of Electronic Systems (TODAES), 2008, 13(1): 16: 116: 25.
[2] CRAWFORD S E, DEMARA R F. Cache coherence in a multiport memory environment [C]∥ Proceedings of the First International Conference on Massively Parallel Computing Systems. Ischia: IEEE, 1994: 632-642.
[3] STENSTROM P. A survey of cache coherence schemes for multiprocessors [J]. Computer, 1990, 23(6): 12-24.
[4] HENNESSY J L, PATTERSON D A. Computer architecture: a quantitative approach, Fourth Edition [M]. Amsterdam: Elsevier, 2007: 208-284.
[5] LEVERICH J, ARAKIDA H, SOLOMATNIKOV A, et al. Comparing memory systems for chip multiprocessors [J]. ACM SIGARCH Computer Architecture News, 2007, 35(2): 358-368.
[6] JANG Y J, RO W W. Evaluation of cache coherence protocols on multi-core systems with linear workloads [C] ∥ ISECS International Colloquium on Computing, Communication, Control, and Managemen. Sanya: IEEE, 2009: 342-345.
[7] YI K, RO W, GAUDIOT J. Importance of coherence protocols with network applications on multi-Core processors [J]. IEEE Transactions on Computers, 2013, 62(1): 6-15.
[8] LI J M, LIU W J, JIAO P. A new kind of cache coherence protocol with sc-cache for multiprocessor [C]∥ 2010 2nd International Workshop on Intelligent Systems and Applications (ISA). Wuhan: IEEE, 2010: 15.
[9] KAXIRAS S, ROS A. Efficient, snoopless, system-on-chip coherence [C]∥ SOC Conference (SOCC). Niagara Falls, NY : IEEE, 2012: 230-235.
[10] HACKENBERG D, MOLKA D, NAGEL W E. Comparing cache architectures and coherency protocols on x86-64 multicore SMP systems [C]∥ Proceedings of the 42Nd Annual IEEE/ACM International Symposium on microarchitecture. New York: IEEE, 2009: 413-422.
[11] PONG F, DUBOIS M. Formal automatic verification of cache coherence in multiprocessors with relaxed memory models [J]. IEEE Transactions on Parallel and Distributed Systems, 2000, 11(9): 989-1006.
[12] C-SKY Microsystems Co., Ltd. CK600 Introduction [EB/OL].[2014-01-11]. http:∥www.c-sky.com/downdisp.php?aid=72
[13] Micron Technology, Inc. MT48LC32M16A2 datasheet [EB/OL].[2014-01-11]. http:∥www.micron.com/~/media/Documents/Products/Data%20Sheet/DRAM/512Mb_sdr.pdf
[14] Embedded Microprocessor Benchmark Consortium. MultiBenchTM 1.0 Benchmark Software [EB/OL]. [2014-01-11]. http:∥www.eembc.org/benchmark/multi_sl.php

[1] 陈超, 罗小华, 陈淑群, 俞国军. 基于现场可编程门阵列的高斯滤波算法优化实现[J]. 浙江大学学报(工学版), 2017, 51(5): 969-975.
[2] 蓝帆, 潘赟, 严晓浪, 宦若虹, CHENG Kwang ting. 片上网络良率评估的GPU加速[J]. 浙江大学学报(工学版), 2017, 51(1): 160-167.
[3] 夏凯锋,周小平,吴斌. 任意2k点存储器结构傅里叶处理器[J]. 浙江大学学报(工学版), 2016, 50(11): 2239-2244.
[4] 王树朋,黄凯,严晓浪. 基于遗传算法的覆盖率驱动测试产生器[J]. 浙江大学学报(工学版), 2016, 50(3): 580-588.
[5] 韩晓霞, 韩雁. 填充辅助多晶硅图形的参数成品率版图优化[J]. 浙江大学学报(工学版), 2015, 49(12): 2333-2339.
[6] 高史义, 罗小华, 卢宇峰, 刘富春, 张晨秋. 基于遗传算法的功能覆盖率收敛技术[J]. 浙江大学学报(工学版), 2015, 49(8): 1509-1515.
[7] 修思文, 李彦哲, 黄凯, 马德, 晏荣杰, 严晓浪. 面向MPSoC性能评估的高速缓存建模技术[J]. 浙江大学学报(工学版), 2015, 49(7): 1367-1375.
[8] 谭腾飞,马德,黄凯,马琪. 多层图像叠加处理的低功耗自适应流水线设计[J]. 浙江大学学报(工学版), 2015, 49(1): 27-35.
[9] 王钰博,黄凯,陈辰,冯炯,葛海通,严晓浪. 嵌入式Flash读取加速技术及实现[J]. 浙江大学学报(工学版), 2014, 48(9): 1570-1579.
[10] 修思文, 黄凯, 余慜, 谢天艺, 葛海通, 严晓浪. 面向非写分配高速缓存的一致性协议及实现[J]. 浙江大学学报(工学版), 2014, 48(9): 1-9.
[11] 黄凯杰, 黄凯, 马德, 王钰博, 冯炯, 葛海通, 严晓浪. 基于IP-XACT标准的SoC集成方法[J]. J4, 2013, 47(10): 1770-1776.
[12] 项晓燕,陈志坚,孟建熠,严晓浪. 基于邻行链接访问的低功耗指令高速缓存[J]. J4, 2013, 47(7): 1213-1217.
[13] 陈志坚,孟建熠,葛海通,严晓浪. 基于内存页面动态合并的旁路转换缓冲器设计[J]. J4, 2012, 46(1): 118-122.
[14] 张洋, 王秀敏, 陈豪威. 基于FPGA的低密度奇偶校验码编码器设计[J]. J4, 2011, 45(9): 1582-1586.
[15] 陈志坚,孟建熠,葛海通,严晓浪. 支持程序无缝切换的高性能硬件堆栈[J]. J4, 2011, 45(9): 1587-1592.