|
|
High performance hardware stack for seamless context switching |
CHEN Zhi-jian, MENG Jian-yi, GE Hai-tong, YAN Xiao-lang |
Institute of VLSI Design Zhejiang University, Hangzhou 310027, China |
|
|
Abstract A new hardware stack of embedded processor was proposed to support seamless context switching and remove the performance loss during function call. The high-performance hardware stack is composed of data stack(DS)and returning stack (RS), and both of them are designed to be reconfigurable two-level buffer scheme to eliminate the overhead of process switching. DS utilizes two alternative general purpose register (GPR) to construct a new virtual GPR, which operates multiple data in/out stack in one cycle and performs switch automatically,hiding the performance cost of stack operations during program switching. RS preserves the function return address and corresponding instruction when function is called to eliminate the pipeline bubbles during the function returnes. Both DS and RS reuse partial memory space of scratchpad memory (SPM) as the second level buffers to provide support for user reconfiguration and sufficient buffer space for specified embedded software. Experiment results show that the performance is improved by over 10% while the power cost reduced by 2 % with the new hardware stack.
|
Published: 01 September 2011
|
|
支持程序无缝切换的高性能硬件堆栈
针对函数调用中上下文切换产生的性能损失,提出一种支持程序无缝切换的嵌入式处理器高性能硬件堆栈.高性能硬件堆栈包括数据栈和返回栈,采用动态可重构的两级缓存机制,消除程序切换的性能开销.数据栈实现单周期多数据压栈/出栈,隐藏程序切换中的堆栈操作;返回栈实现指令超前预取,消除程序返回时流水线气泡.数据栈与返回栈分别复用数据和指令高速暂存器,实现用户可重构的二级缓存.实验结果显示:本方法平均提升性能10%以上,功耗降低2%.
|
|
[1] BOUYSSOUNOUSE B, SIFAKIS J. The artist roadmap for research and development [C]∥ Embedded Systems Design. Secaucus, NJ, USA: SpringerVerlag New York, Inc, 2005:1-4. [2] YAU S S, KARIM F. An adaptive middleware for contextsensitive communications for realtime applications in ubiquitous computing environments [J]. RealTime Systems, 2004, 26(1):29-61. [3] MAMIDIPAKA M, DUTT N. Onchip stack based memory organization for low power embedded architectures. design automation and test in Europe conference and exhibition [C]∥ Proceedings of the Conference on Design, Automation and Test in Europe. Washington, DC, USA: IEEE Computer Society, 2003:11082. [4] JANG S J, CHUNG M K, KIM J,et al. Cache missaware dynamic stack allocation [J]. Circuits and Systems, IEEE International Symposium on, 2007:3494-3497. [5] GHOSH A, GIVARGIS T. Cache optimization for embedded processor cores: an analytical approach [J]. ACM Transactions on Design Automation of Electronic Systems (TODAES), 2004, 9 (4):419-440. [6]ARM Ltd. ARM11 Processor Introduction [EB/OL]. [2008-09-01]. http:∥www.arm.com/products/CPUs/ARM1176.html. [7] MIPS Technologies, Inc. MIPS 4KE Specification and User Guide [EB/OL]. [2008-09-01]. http:∥www.mips.com/products/cores/32-bit-cores/mips32-4ke/. [8] CSKY Microsystems. 32bit High Performance and Low Power Microprocessor CK510 [EB/OL]. [2003-08-01]. http:∥ www.c-sky.com. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|