Please wait a minute...
J4  2014, Vol. 48 Issue (2): 268-278    DOI: 10.3785/j.issn.1008-973X.2014.02.013
叶霞1,2, 辛愿1, 刘勇1, 刘鹏1
1. 浙江大学 信息与电子工程学系,浙江 杭州 310027; 2. 杭州师范大学 钱江学院,浙江 杭州 310016
Stream Prefetcher based on MediaDSP
YE Xia1,2, XIN Yuan1, LIU Yong1, LIU Peng1
1. Department of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China,
Hangzhou Normal University Qian Tiang College, Hangzhou 310036, China
 全文: PDF(3397 KB)   HTML

为了降低数据cache缺失而引起的延迟,提出一种针对媒体数字信号处理器MediaDSP64的一级数据cache优化策略,即基于流信息表的可变步长的最小差值预取,同时给出详细数据分析预取深度、流信息表项数和历史表长度对预取效果的影响,得出最优化的预取参数配置.仿真结果表明,该预取算法在最佳参数配置下针对评测程序H.264、DSP kernel和EEMBC消费类测试集性能分别提高了6%、32%和39%,处理器的平均访存时间分别减少了32%、56%和65%.


In order to reduce cache miss stall penalties, an optimization method for the first level data cache controller of the media digital signal processor MediaDSP64 was proposed, that was prefetching mechanism based on stream table and using the minimum delta stride to prefetch data. Meanwhile, this paper provided a detailed data analysis of how the prefetch distance, the entry of stream table and the length of history table affected the processor performance, and gave the parameter optimization configuration. Simulation results show that the proposed prefetching scheme under optimal parameters can improve H.264, DSP kernel and EEMBC Consumer performance by 6%, 32% and 39%, the average memory access time is decreased by 32%, 56% and 65%, respectively.

出版日期: 2014-02-01
:  TP 302  

国家自然科学基金资助项目(60873112, 61028004);国家“863”高技术研究发展计划资助项目(2009AA01Z109).

通讯作者: 刘鹏,男,副教授     E-mail:
作者简介: 叶霞(1981—),女,博士生,主要研究方向为媒体数字信号处理器和媒体处理算法.E-mail:
E-mail Alert


叶霞,辛愿,刘勇,刘鹏. 基于媒体数字信号处理器的流预取机制[J]. J4, 2014, 48(2): 268-278.

YE Xia, XIN Yuan, LIU Yong, LIU Peng. Stream Prefetcher based on MediaDSP. J4, 2014, 48(2): 268-278.


[1\] MCKEE S A. Reflections on the memory wall \
[C\]∥ Proceedings of the 1st Conference on Computing Frontiers. USA: ACM Press, 2004: 162-167.
[2\] ACQUAVIVE J T. Data prefetching efficiency on two commercial systems \
[C\]∥ Proceedings of the 5th European SGI/Cray MPP Workshop. Bologna, Italy: \
[s. n.\], 1999: 1-12.
[3\] TENDLER J M, DODSON J S, FIELDS J S, et al. POWER4 system microarchitecture \
[J\]. IBM Journal of Research and Development, 2002, 46(1): 525.
[4\] HOREL T, LAUTERBACH G. UltraSparc-III: Designing third-generation 64-bit performance \
[J\]. IEEE Micro, 1999, 19(3): 73-85.
[5\] HARING R A, OHMACHT M, FOX T W, et al. The IBM Blue Gene/Q compute chip \
[J\]. IEEE Micro, 2012, 32(2): 48-60.
[6\] DAMODARAN R, ANDERSON T, AGARWALA S, et al. A 1.25GHz  0.8W C66x DSP core in 40 nm CMOS \
[C\]∥ Proceedings of 25th International Conference on VLSI Design (VLSID). India: IEEE, 2012: 286-291.
[7\] 郭阳, 傅祎晖, 刘胜,等.YHFT-DX高性能DSP中Cache失效流水设计\
[J\].国防科技大学学报, 2009,(6): 611.
GUO Yang, FU Yi-Hui, LIU Sheng, et al. Design of cache miss pipeline in YHFT-DX high performance DSP \
[J\]. Journal of National University of Defense Technology, 2009, (6): 611.
[8\] 蔡卫光. 媒体数字信号处理器IP核微结构优化研究\
[D\].杭州: 浙江大学,2011.
CAI Wei-Guang.Research for microarchitecture optimization on Media DSP IP core \
[D\]. Hangzhou: Zhejiang university, 2011.
[9\] VANDERWIEL S P ,LILJA D J. Data prefetch mechanisms \
[J\]. ACM Computing Surveys(CSUR), 2000, 32(2): 174-199.
[10\] JOUPPI N P. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers\
[C\]∥ Proceedings of 17th Annual International Symposium on Computer Architecture. Seattle, USA: IEEE, 1990: 364-373.
[11\] FU J W C , PATEL J H. Stride directed prefetching in scalar processors \
[C\]∥ Proceedings of the 25th Annual International Symposium on Microarchitecture. Portland USA: ACM Press, 1992: 102-110.
[12\] PALACHARLA S , KESSLER R E. Evaluating stream buffers as a secondary cache replacement \
[C\]∥ Proceedings of the 21st Annual International Symposium on Computer Architecture. Chicago, USA: ACM Press, 1994: 24-33.
[13\] GANUSOV I , BURTSCHER M. On the importance of optimizing the configuration of stream prefetchers\
[C\]∥ Proceedings of the 2005 Workshop on Memory System Performance(MSP). Chicago USA: ACM Press, 2005: 54-61.
[14\] CHEN T F, BAER J L. Effective hardware-based data prefetching for high performance processor \
[J\]. IEEE Transactions on Computers, 1995, 44(5): 609-623.
[15\] ROTH A, MOSHOVOS A, SOHI G S. Dependence based prefetching for linked data structure \
[C\]∥ Proceedings of the 8th International Conference on Architectural Support for Programming Language and Operating Systems. alxfornia, USA: ACM Press, 1998: 115-126.
[16\] 刘鹏, 姚庆栋, 李东晓,等. 32位媒体数字信号处理器. 中国. ZL200410016753.8\
[P\]. 2005.
LIU Peng, YAO Qing-Dong, LI Dongk-Xiao, et al. 32bits Media digital signal processor, China, ZL200410016753.8\
[17\] IACOBOVICI S, SPRACKLEN L, KADAMBI S, et al. Effective stream-based and execution-based data prefetching \
[C\]∥ Proceedings of 18th Annual International Conference on Supercomputing. Saint-Malo: ACM Press, 2004: 1-11.
[18\] KARSTEN S. MediaBench \
[EB/OL\]. \
[2005-12\]. http:∥
[19\] BDTI. BDTI DSP kernel benchmarks \
[EB/OL\]. \
[2012-6-7\]. http:∥
[20\] POOVEY J A, CONTE T M, LEVY M et al. A benchmark characterization of the EEMBC benchmark suite \
[J\]. IEEE Micro, 2009, 29 (5): 18-29.
[21\] AUSTIN T, LARSON E, ERNST D. SimpleScalar: An infrastructure for computer system modeling \
[J\]. Computer, 2002, 35(2): 59-67.
[22\] KANE G, HEINRICH J. MIPS RISC architectures \
[M\]. Upper Saddle River USA: Prentice-Hall, Inc, 1992.
[23\] Synopsys.  Design compiler graphical \
[EB/OL\]. \
[2000\].  http:∥
[24\] GUO Y, CHHEDA S, KOREN I, et al. Energy characterization of hardware-based data prefetching\
[C\]∥ Proceedings of the IEEE International Conference on Computer Design. California: IEEE, 2004: 518-523.
[25\] BROOKS D, TIWARI V, MARTONOSI M. Wattch: a framework for architectural level power analysis and optimizations\
[C\]∥ Proceedings of the 27th Annual International Symposium on Computer Architecture. Vancouver, Canada: ACM Press, 2000: 839.

[1] 全励, 程爱莲, 潘赟, 丁勇, 严晓浪. 基于旁路通道的片上网络差别型服务实现方法[J]. J4, 2013, 47(6): 957-968.
[2] 曹晓阳, 潘赟, 严晓浪, 宦若虹. 低面积-时间复杂度的离散余弦变换脉动结构[J]. J4, 2011, 45(4): 656-659.