Cache coherence protocol and implementation for multiprocessors with no-write-allocate caches

doi:10.3785/j.issn.1008-973X.2015.02.023

JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE)

XIU Si-wen1, HUANG Kai1, YU Min1, XIE Tian-yi1,GE Hai-tong2, YAN Xiao-lang1

1. Institute of VLSI Design, Zhejiang University, Hangzhou 310027, China; 2. Hangzhou C-SKY Microsystems Co., Ltd, Hangzhou 310027, China

Download:

PDF(2342KB) HTML
Export: BibTeX | EndNote (RIS)

Abstract

Against the disadvantages of existing cache coherence protocols for write-back and no-write-allocate caches, a novel write intervention based protocol was proposed and hardware implemented. Taking advantage of this protocol, in some cases the data can be directly written to the peer caches when write miss occurs, Furthermore, both delayed write-back mechanism of dirty data and cache-to-cache copy are supported. And the requested data can be provided as long as there is at least one valid corresponding cache line, avoiding the unnecessary access of the shared memory. Experimental results show that, in comparison to MOESI protocol, the proposed protocol can significantly reduce the accesses of the shared memory, save the dynamic power consumption and power consumption, and improve the performance of the whole system.

Published: 01 February 2015

CLC:

TN 47

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors

Cite this article:

XIU Si-wen, HUANG Kai, YU Min, XIE Tian-yi,GE Hai-tong, YAN Xiao-lang. Cache coherence protocol and implementation for multiprocessors with no-write-allocate caches. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2015, 49(2): 351-359.

URL:

http://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2015.02.023 OR http://www.zjujournals.com/eng/Y2015/V49/I2/351

面向非写分配高速缓存的一致性协议及实现

针对现有的高速缓存一致性协议应用在基于写回、非写分配缓存的多核处理器的缺点,提出一种新颖的基于写干涉的一致性协议,并加以硬件实现.采用写干涉协议,在处理器产生写缺失操作时,可以把数据直接写到系统中其他处理器有效的该高速缓存行中；支持“脏数据”的延迟回写和缓存间的数据拷贝;且系统中只要存在有效的被请求的缓存行就可以提供数据，避免不必要的共享存储器访问.实验结果表明,该文提出的写干涉协议与MOESI协议相比,显著减少了对共享存储器的访问,提高了整个系统性能,同时大幅降了低动态功耗.

［1］ ZHOU X, YU C, DASH A, et al. Application-aware snoop filtering for low-power cache coherence in embedded multiprocessors ［J］. ACM Transactions on Design Automation of Electronic Systems (TODAES), 2008, 13(1): 16: 116: 25.
［2］ CRAWFORD S E, DEMARA R F. Cache coherence in a multiport memory environment ［C］∥ Proceedings of the First International Conference on Massively Parallel Computing Systems. Ischia: IEEE, 1994: 632-642.
［3］ STENSTROM P. A survey of cache coherence schemes for multiprocessors ［J］. Computer, 1990, 23(6): 12-24.
［4］ HENNESSY J L, PATTERSON D A. Computer architecture: a quantitative approach, Fourth Edition ［M］. Amsterdam: Elsevier, 2007: 208-284.
［5］ LEVERICH J, ARAKIDA H, SOLOMATNIKOV A, et al. Comparing memory systems for chip multiprocessors ［J］. ACM SIGARCH Computer Architecture News, 2007, 35(2): 358-368.
［6］ JANG Y J, RO W W. Evaluation of cache coherence protocols on multi-core systems with linear workloads ［C］ ∥ ISECS International Colloquium on Computing, Communication, Control, and Managemen. Sanya: IEEE, 2009: 342-345.
［7］ YI K, RO W, GAUDIOT J. Importance of coherence protocols with network applications on multi-Core processors ［J］. IEEE Transactions on Computers, 2013, 62(1): 6-15.
［8］ LI J M, LIU W J, JIAO P. A new kind of cache coherence protocol with sc-cache for multiprocessor ［C］∥ 2010 2nd International Workshop on Intelligent Systems and Applications (ISA). Wuhan: IEEE, 2010: 15.
［9］ KAXIRAS S, ROS A. Efficient, snoopless, system-on-chip coherence ［C］∥ SOC Conference (SOCC). Niagara Falls, NY : IEEE, 2012: 230-235.
［10］ HACKENBERG D, MOLKA D, NAGEL W E. Comparing cache architectures and coherency protocols on x86-64 multicore SMP systems ［C］∥ Proceedings of the 42Nd Annual IEEE/ACM International Symposium on microarchitecture. New York: IEEE, 2009: 413-422.
［11］ PONG F, DUBOIS M. Formal automatic verification of cache coherence in multiprocessors with relaxed memory models ［J］. IEEE Transactions on Parallel and Distributed Systems, 2000, 11(9): 989-1006.
［12］ C-SKY Microsystems Co., Ltd. CK600 Introduction ［EB/OL］.［2014-01-11］. http:∥www.c-sky.com/downdisp.php？aid=72
［13］ Micron Technology, Inc. MT48LC32M16A2 datasheet ［EB/OL］.［2014-01-11］. http:∥www.micron.com/～/media/Documents/Products/Data%20Sheet/DRAM/512Mb_sdr.pdf
［14］ Embedded Microprocessor Benchmark Consortium. MultiBenchTM 1.0 Benchmark Software ［EB/OL］. ［2014-01-11］. http:∥www.eembc.org/benchmark/multi_sl.php

[1]	CHEN Chao, LUO Xiao-hua, CHEN Shu-qun, YU Guo-jun. Optimizing implementation of Gaussian filter based on field programmable gate array[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(5): 969-975.

[2]	LAN Fan, PAN Yun, YAN Xiao lang, HUAN Ruo hong, CHENG Kwang ting. GPU acceleration for network-on-chip yield evaluation[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(1): 160-167.

[3]	XIA Kai feng, ZHOU Xiao ping, WU Bin. Memory-based FFT processor for arbitrary 2k-point FFT computations[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2016, 50(11): 2239-2244.

[4]	WANG Shu peng, HUANG Kai, YAN Xiao lang. Coverage directed test generation based on genetic algorithm[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2016, 50(3): 580-588.

[5]	HAN Xiao xia, HAN Yan. Layout optimization of parametric yield by filling dummy polysilicon pattern[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2015, 49(12): 2333-2339.

[6]	GAO Shi-yi, LUO Xiao-hua, LU Yu-feng, LIU Fu-chun, ZHANG Chen-qiu. Functional coverage convergence technique based on genetic algorithm[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2015, 49(8): 1509-1515.

[7]	XIU Si-wen, LI Yan-zhe, HUANG Kai, MA De, YAN Rong-jie, YAN Xiao-lang. Cache modeling for MPSoC performance estimation[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2015, 49(7): 1367-1375.

[8]	TAN Teng-fei, MA De, HUANG Kai, MA Qi. Power-efficient image blending engine design based on self-adaptive pipeline[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2015, 49(1): 27-35.

[9]	WANG Yu-bo, HUANG Kai, CHEN Chen, FENG Jiong, GE Hai-tong, YAN Xiao-lang. Embedded Flash data fetching acceleration techniques and implementation[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2014, 48(9): 1570-1579.

[10]	XIU Si-wen, HUANG Kai, YU Min, XIE Tian-yi, GE Hai-tong, YAN Xiao-lang. Cache coherence protocol and implementation for multiprocessors with no-write-allocate caches[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2014, 48(9): 1-9.

[11]	HUANG Kai-jie, HUANG Kai, MA De, WANG Yu-bo,FENG Jiong, GE Hai-tong, YAN Xiao-la. IP-XACT standard based SoC design methodology[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2013, 47(10): 1770-1776.

[12]	XIANG Xiao-yan, CHEN Zhi-jian, MENG Jian-yi, YAN Xiao-lang. Low power instruction cache based on adjacent line linking access[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2013, 47(7): 1213-1217.

[13]	CHEN Zhi-jian, MENG Jian-yi, GE Hai-tong, YAN Xiao-lang. Translation lookaside buffer design based on dynamic memory page merging[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2012, 46(1): 118-122.

[14]	CHEN Zhi-jian, MENG Jian-yi, GE Hai-tong, YAN Xiao-lang. High performance hardware stack for seamless context switching[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2011, 45(9): 1587-1592.

[15]	ZHANG Yang, WANG Xiu-min, CHEN Hao-wei. FPGA based design of LDPC encoder[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2011, 45(9): 1582-1586.

Viewed

Full text

Abstract

Cited

Shared

Discussed