Please wait a minute...
JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE)
    
Embedded Flash data fetching acceleration techniques and implementation
WANG Yu-bo1, HUANG Kai1, CHEN Chen1, FENG Jiong2, GE Hai-tong2, YAN Xiao-lang1
1. Institute of VLSI Design, Zhejiang University, Hangzhou 310027, China; 2. Hangzhou C-SKY Micro-system Company, Hangzhou 310027, China
Download:   PDF(5152KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

Some embedded Flash data fetching acceleration techniques based on cache were proposed and implemented, which are used for low-cost, low-power consumption application, including low frequency fast access, backfill hidden with modified critical-word-first strategy, cache-lock with adaptive prefetching, and pre-lookup. With the combination of these techniques, the Flash data fetching performance is improved and the power dissipation is kept low. Simulations show that when the resource on chip (cache size) is limited and the system frequency is low (for some low-power consumption applications), the embedded Flash accelerator with these techniques has higher performance(20%-40% higher) and lower dynamic power consumption (about 40% lower) compared with conventional two-way set-associative cache.



Published: 01 September 2014
CLC:  TN 47  
Cite this article:

WANG Yu-bo, HUANG Kai, CHEN Chen, FENG Jiong, GE Hai-tong, YAN Xiao-lang. Embedded Flash data fetching acceleration techniques and implementation. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2014, 48(9): 1570-1579.

URL:

http://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2014.09.005     OR     http://www.zjujournals.com/eng/Y2014/V48/I9/1570


嵌入式Flash读取加速技术及实现

为了解决低成本和低功耗应用中的嵌入式Flash读取速度问题,提出多种基于缓存结构的嵌入式Flash读取加速技术及实现,包括低频快速访问技术、回填隐藏技术和改进型关键字优先预取策略,以及具有自适应预取功能的缓存锁定技术、预查找技术等,通过这些技术的整合应用,在提高Flash读取性能的同时,保持较低的功耗.仿真实验证明:在占用资源(缓存容量)较少,频率较低(用于部分低功耗应用)的环境下,这些技术的应用使加速控制器的加速性能与传统的2路组相联缓存相比得到了明显的提升(20%~40%),同时加速控制器中读加速单元的动态功耗与传统2路组相联缓存相比降低了40%左右.

[1] BREWER J, GILL M. Nonvolatile memory technologies with emphasis on flash: a comprehensive guide to understanding and using flash memory devices [M]. Hoboken: Wiley, 2011: 1962.
[2] 周立功. ARM嵌入式系统基础教程[M]. 2版. 北京:北京航空航天大学出版社, 2008: 158-162.
[3] STM32F401xB/STM32F401xC datasheet [EB/OL]. [2013-04]. http:∥www.st.com/st-web-ui/static/active/en/resource/technical/document/data_brief/DM00071938.pdf.
[4] GOODHUE G K, KHAN A R, WHARTON J H, et al. Memory accelerator for ARM processors: US, 0021928[P].2005-01-27.
[5] GSMC Embedded FLASH IP datasheet (ESF2-130E 320Kx8 E-Flash IP (FLS2P5M28DA)) [EB/OL]. [2012-05]. http:∥sso.gracesemi.com/domino/servlet/GetCVSFile.
[6] VeriSilicon GSMC 013 μm single-port register file compiler [CP/OL]. [2006]. http:∥www.verisilicon.com/.
[7] HENNESSY J L, PATTERSON D A. Computer architecture: a quantitative approach [M]. 5th Edition. \[S.l.\]:Elsevier, 2012: C1C58.
[8] 潘赟. CK-CPU嵌入式系统开发教程[M]. 北京:科学出版社, 2011: 54-74.
[9] LIU T, LI M, XUE C J. Instruction cache locking for multi-task real-time embedded systems [J]. Real-Time Systems, 2012, 48(2): 166-197.
[10] APARICIO L C, SEGARRA J, RODRIGUEZ C, et al. Improving the WCET computation in the presence of a lockable instruction cache in multitasking real-time systems[J]. Journal of Systems Architecture, 2011, 57(7): 695-706.
[11] PLAZAR S, KLEINSORGE J C, MARWEDEL P, et al. WCET-aware static locking of instruction caches[C]∥Proceedings of the Tenth International Symposium on Code Generation and Optimization.[S.l.]:ACM, 2012: 44-52.
[12] JUNG-WOOK P, CHEONG-GHIL K, JUNG-HOON L, et al. An energy efficient cache memory architecture for embedded systems [C]∥ Proceedings of the 2004 ACM Symposium on Applied Computing. New York, USA: ACM, 2004: 884-890.
[13] LEE C J. DRAM-Aware Prefetching and Cache Management [D]. Austin: University of Texas, 2010.
[14] ZANG W, GORDON-ROSS A. A survey on cache tuning from a power/energy perspective [J]. ACM Computing Surveys (CSUR), 2013, 45(3): 32:132:49.
[15] GSMC GRA_FLS2P5M28DA IP overview [EB/OL]. [2011-03]. http:∥sso.gracesemi.com/qra/TDISDocs.nsf/TDISRecord/160F56A1EE02A38248257B 36000972F2?opendocument.

[1] CHEN Chao, LUO Xiao-hua, CHEN Shu-qun, YU Guo-jun. Optimizing implementation of Gaussian filter based on field programmable gate array[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(5): 969-975.
[2] LAN Fan, PAN Yun, YAN Xiao lang, HUAN Ruo hong, CHENG Kwang ting. GPU acceleration for network-on-chip yield evaluation[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(1): 160-167.
[3] XIA Kai feng, ZHOU Xiao ping, WU Bin. Memory-based FFT processor for arbitrary 2k-point FFT computations[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2016, 50(11): 2239-2244.
[4] WANG Shu peng, HUANG Kai, YAN Xiao lang. Coverage directed test generation based on genetic algorithm[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2016, 50(3): 580-588.
[5] HAN Xiao xia, HAN Yan. Layout optimization of parametric yield by filling dummy polysilicon pattern[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2015, 49(12): 2333-2339.
[6] GAO Shi-yi, LUO Xiao-hua, LU Yu-feng, LIU Fu-chun, ZHANG Chen-qiu. Functional coverage convergence technique based on genetic algorithm[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2015, 49(8): 1509-1515.
[7] XIU Si-wen, LI Yan-zhe, HUANG Kai, MA De, YAN Rong-jie, YAN Xiao-lang. Cache modeling for MPSoC performance estimation[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2015, 49(7): 1367-1375.
[8] XIU Si-wen, HUANG Kai, YU Min, XIE Tian-yi,GE Hai-tong, YAN Xiao-lang. Cache coherence protocol and implementation for multiprocessors with no-write-allocate caches[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2015, 49(2): 351-359.
[9] TAN Teng-fei, MA De, HUANG Kai, MA Qi. Power-efficient image blending engine design based on self-adaptive pipeline[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2015, 49(1): 27-35.
[10] XIU Si-wen, HUANG Kai, YU Min, XIE Tian-yi, GE Hai-tong, YAN Xiao-lang. Cache coherence protocol and implementation for multiprocessors with no-write-allocate caches[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2014, 48(9): 1-9.
[11] HUANG Kai-jie, HUANG Kai, MA De, WANG Yu-bo,FENG Jiong, GE Hai-tong, YAN Xiao-la. IP-XACT standard based SoC design methodology[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2013, 47(10): 1770-1776.
[12] XIANG Xiao-yan, CHEN Zhi-jian, MENG Jian-yi, YAN Xiao-lang. Low power instruction cache based on adjacent line linking access[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2013, 47(7): 1213-1217.
[13] CHEN Zhi-jian, MENG Jian-yi, GE Hai-tong, YAN Xiao-lang. Translation lookaside buffer  design  based on
dynamic memory page merging
[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2012, 46(1): 118-122.
[14] ZHANG Yang, WANG Xiu-min, CHEN Hao-wei. FPGA based design of LDPC encoder[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2011, 45(9): 1582-1586.
[15] CHEN Zhi-jian, MENG Jian-yi, GE Hai-tong, YAN Xiao-lang. High performance hardware stack for seamless context switching[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2011, 45(9): 1587-1592.