Please wait a minute...
J4  2011, Vol. 45 Issue (9): 1587-1592    DOI: 10.3785/j.issn.1008-973X.2011.09.013
    
High performance hardware stack for seamless context switching
CHEN Zhi-jian, MENG Jian-yi, GE Hai-tong, YAN Xiao-lang
Institute of VLSI Design Zhejiang University, Hangzhou 310027, China
Download:   PDF(0KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A new hardware stack of embedded processor was proposed to support seamless context switching and remove the performance loss during function call. The high-performance hardware stack is composed of data stack(DS)and returning stack (RS), and both of them are designed to be reconfigurable two-level buffer scheme to eliminate the overhead of process switching. DS utilizes two alternative general purpose register (GPR) to construct a new virtual GPR, which operates multiple data in/out stack in one cycle and performs switch automatically,hiding the performance cost of stack operations during program switching. RS preserves the function return address and corresponding instruction when function is called to eliminate the pipeline bubbles during the function returnes. Both DS and RS reuse partial memory space of scratchpad memory (SPM) as the second level buffers to provide support for user reconfiguration and sufficient buffer space for specified embedded software. Experiment results show that the performance is improved by over 10% while the power cost reduced by 2 % with the new hardware stack.



Published: 01 September 2011
CLC:  TN 332  
  TN 47  
Cite this article:

CHEN Zhi-jian, MENG Jian-yi, GE Hai-tong, YAN Xiao-lang. High performance hardware stack for seamless context switching. J4, 2011, 45(9): 1587-1592.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2011.09.013     OR     https://www.zjujournals.com/eng/Y2011/V45/I9/1587


支持程序无缝切换的高性能硬件堆栈

针对函数调用中上下文切换产生的性能损失,提出一种支持程序无缝切换的嵌入式处理器高性能硬件堆栈.高性能硬件堆栈包括数据栈和返回栈,采用动态可重构的两级缓存机制,消除程序切换的性能开销.数据栈实现单周期多数据压栈/出栈,隐藏程序切换中的堆栈操作;返回栈实现指令超前预取,消除程序返回时流水线气泡.数据栈与返回栈分别复用数据和指令高速暂存器,实现用户可重构的二级缓存.实验结果显示:本方法平均提升性能10%以上,功耗降低2%.

[1] BOUYSSOUNOUSE B, SIFAKIS J. The artist roadmap for research and development [C]∥ Embedded Systems Design. Secaucus, NJ, USA: SpringerVerlag New York, Inc, 2005:1-4.
[2] YAU S S, KARIM F. An adaptive middleware for contextsensitive communications for realtime applications in ubiquitous computing environments [J]. RealTime Systems, 2004, 26(1):29-61.
[3] MAMIDIPAKA M, DUTT N. Onchip stack based memory organization for low power embedded architectures. design automation and test in Europe conference and exhibition [C]∥ Proceedings of the Conference on Design, Automation and Test in Europe. Washington, DC, USA: IEEE Computer Society, 2003:11082.
[4] JANG S J, CHUNG M K, KIM J,et al. Cache missaware dynamic stack allocation [J]. Circuits and Systems, IEEE International Symposium on, 2007:3494-3497.
[5] GHOSH A, GIVARGIS T. Cache optimization for embedded processor cores: an analytical approach [J]. ACM Transactions on Design Automation of Electronic Systems (TODAES), 2004, 9 (4):419-440.
[6]ARM Ltd. ARM11 Processor Introduction [EB/OL]. [2008-09-01]. http:∥www.arm.com/products/CPUs/ARM1176.html.
[7] MIPS Technologies, Inc. MIPS 4KE Specification and User Guide [EB/OL]. [2008-09-01]. http:∥www.mips.com/products/cores/32-bit-cores/mips32-4ke/.
[8] CSKY Microsystems. 32bit High Performance and Low Power Microprocessor CK510 [EB/OL]. [2003-08-01]. http:∥ www.c-sky.com

[1] XIANG Xiao-yan, CHEN Zhi-jian, MENG Jian-yi, YAN Xiao-lang. Low power instruction cache based on adjacent line linking access[J]. J4, 2013, 47(7): 1213-1217.
[2] CHEN Zhi-jian, MENG Jian-yi, GE Hai-tong, YAN Xiao-lang. Translation lookaside buffer design based on
dynamic memory page merging
[J]. J4, 2012, 46(1): 118-122.