Please wait a minute...
J4  2010, Vol. 44 Issue (1): 75-80    DOI: 10.3785/j.issn.1008-973X.2010.01.014
    
Optimization of data forwarding based on early write-back strategy
CAI Wei-guang, YAO Qing-dong, LIU Peng, ZHANG Qi, ZHANG Yi-xiong
(Department of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China)
Download:   PDF(0KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

For deep pipeline and complex instruction set architecture, a data forwarding model based on operand access order was introduced, which utilized five parameters to describe the instruction execution process. Employing this model, a processor prototype MediaDSP64 of RISC/DSP architecture was analyzed, and a data forwarding optimization method based on distributed bypassing unit was introduced. The number of data forwarding source was reduced without instruction execution efficiency degradation, by earlier writing back the result of auxiliary register in DSP instructions. For the out of order execution caused by this method, a shadow register structure was designed to deal with precise exception handling. Experimental results showed that the hardware resource of data forwarding circuit was reduced by 43.8%, and the timing delay of critical path was reduced by 19.8%.



Published: 26 February 2010
CLC:  TP 302  
  TP 37  
Cite this article:

CA Wei-Guang, TAO Qiang-Dong, LIU Feng, et al. Optimization of data forwarding based on early write-back strategy. J4, 2010, 44(1): 75-80.

URL:

http://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2010.01.014     OR     http://www.zjujournals.com/eng/Y2010/V44/I1/75


基于提前写回策略的数据转发优化方法

针对深度流水线和复杂指令集结构,给出一种基于操作数访问时序的数据转发模型,使用5个参数描述指令执行过程,并以一种RISC/DSP结构MediaDSP64原型机为例进行分析.在分布式转发电路的基础上,提出一种基于提前写回策略的转发优化方法.该策略在不影响指令执行效率的前提下,通过将DSP指令中辅助寄存器的结果提前写回寄存器文件减少了转发源的数量.针对该方法造成的指令乱序执行情况,设计一种影子寄存器结构,保证了精确异常处理的实现.实验结果表明,转发电路的硬件资源占用减少了43.8%,关键路径延时下降了19.8%.

[1] HENNESSY J L, PATTERSON D A. Computer architecture: a quantitative approach [M]. 3rd ed. New York: Morgan Kaufmann Publishers, 2003: 172-189.
[2] ERICH B. The engineering design of the stretch computer [C]∥Proceedings of the Eastern Joint Computer Conference. Boston: National Joint Computer Committee, 1959: 48-58.
[3] ABNOUS A, BAGHERZADEH N. Pipelining and bypassing in a VLIW processor [J]. IEEE Transactions on Parallel and Distributed Systems, 1994, 5(6): 658-663.
[4] AHUJA P, CLARK D W, ROGERS A. The performance impact of incomplete bypassing in processor pipelines [C]∥ Proceedings of the 28th Annual International Symposium on Microarchitecture. Michigan: IEEE, 1995: 36-45.
[5] SAMI M, SCIUTO D, SILVANO C, et al. Exploiting data forwarding to reduce the power budget of VLIW embedded processors [C]∥ Proceedings of Design, Automation and Test in Europe. Munich: IEEE, 2001: 252-257.
[6] DOLLE M, JHAND S, LEHNER W, et al. A 32-b RISC/DSP microprocessor with reduced complexity [J]. IEEE Journal of Solid-State Circuits, 1997, 32(7): 1056-1066.
[7] CHAVES R, SOUSA L. RDSP: a RISC DSP based on residue number system [C]∥ Proceedings of the Euromicro Symposium on Digital Systems Design. Antalya: IEEE, 2003: 128-135.
[8] SHI Ce, WANG Wei-dong, ZHOU Li, et al. 32b RISC/DSP media processor: MediaDSP3201 [C] ∥ Proceedings of SPIE-IS & T Electronic Imaging. San Jose: SPIE, 2005: 43-52.
[9] CHEN Xiao-yi, YAO Qing-dong, LIU Peng. The forwarding architecture and circuit design in 32-bits digital signal processor [J]. Journal of Electronics, 2005, 22(6): 640-649.
[10] 俞国军,刘鹏,姚庆栋. RISC/DSP处理器数据转发机制设计[J]. 计算机辅助设计与图形学报, 2006, 18(7): 999-1004.
YU Guo-jun, LIU Peng, YAO Qing-dong. Design of bypassing mechanism of RISC-DSP processor [J]. Journal of Computer-Aided Design and Computer Graphics, 2006, 18(7): 999-1004.
[11] YU Guo-jun, YAO Qing-dong, LIU Peng, et al. A processor for MPEG decoder SoC: a software/hardware co-design approach [C]∥ Proceedings of SPIE-IS and T Electronic Imaging. San Jose: SPIE, 2005: 742-752.
[12] 刘鹏,姚庆栋,李东晓,等. 32位媒体数字信号处理器:中国, 200410016753.8 [P]. 2007-01-31.
LIU Peng, YAO Qing-dong, LI Dong-xiao, et al. 32 bit media DSP processor: China, 200410016753.8 [P]. 2007-01-31.

[1] YE Xia, XIN Yuan, LIU Yong, LIU Peng. Stream Prefetcher based on MediaDSP[J]. J4, 2014, 48(2): 268-278.
[2] QUAN Li, CHENG Ai-lian, PAN Yun, DING Yong, YAN Xiao-lang. Bypassed channels based differentiated service implementation method for network-on-chip[J]. J4, 2013, 47(6): 957-968.
[3] ZHANG Zhen, LI Shan-ping. DVFS-aware CPU service time estimation method[J]. J4, 2012, 46(4): 725-733.
[4] CAO Xiao-yang, PAN Yun, YAN Xiao-lang, HUAN Ruo-hong. Systolic structure for DCT with low area-time complexity[J]. J4, 2011, 45(4): 656-659.
[5] FU Chao-yang, GAO Ji, ZHOU You-ming. ATCL:formalization tool for commitment-based agent organization[J]. J4, 2011, 45(4): 627-636.
[6] GONG Shuai-shuai, WU Xiao-bo, MENG Jian-yi, DING Yong-lin. Linking history based low-power instruction cache[J]. J4, 2011, 45(3): 467-471.
[7] XU Hong-ming, MENG Jian-yi, YAN Xiao-lang, GE Hai-tong. Translation look-aside buffer design method based on
cache resource reusing
[J]. J4, 2011, 45(3): 462-466.
[8] HUANG Jiang-Wei, HU Wei, XIANG Ling-Xiang, et al. Power aware embedded  software and hardware design driven by battery model[J]. J4, 2009, 43(12): 2149-2154.