Please wait a minute...
J4  2012, Vol. 46 Issue (1): 118-122    DOI: 10.3785/j.issn.1008-973X.2012.01.19
    
Translation lookaside buffer  design  based on
dynamic memory page merging
CHEN Zhi-jian, MENG Jian-yi, GE Hai-tong, YAN Xiao-lang
Institute of VLSI Design, Zhejiang University, Hangzhou 310027, China
Download:   PDF(0KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

Virtual memory pages and physical memory pages are often sequentially allocated in traditional memory management algorithms. A translation lookaside buffer (TLB) design method was proposed  to merge two sequential small size memory pages into a large one during the processor execution. The mapping size of each TLB entry is automatically enlarged with recursive memory page merging. Consequently, the utilization efficiency of TLB can be improved and the TLB miss rate can be reduced. A new uTLB architecture composed of fuTLB and suTLB was proposed. Both fuTLB and suTLB are not only used as the first level address translation buffer of the two-level TLB architecture, but also provided as the temporary buffer for hardware based dynamic page merging. The page merging operation is processed by hardware and not affected by software. Experimental results from Mibench show that the TLB miss ratio can be reduced by 27% with the new TLB design method compared with the filter-TLB design method.



Published: 22 February 2012
CLC:  TN 332  
  TN 47  
Cite this article:

CHEN Zhi-jian, MENG Jian-yi, GE Hai-tong, YAN Xiao-lang. Translation lookaside buffer  design  based on
dynamic memory page merging. J4, 2012, 46(1): 118-122.

URL:

http://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2012.01.19     OR     http://www.zjujournals.com/eng/Y2012/V46/I1/118


基于内存页面动态合并的旁路转换缓冲器设计

针对内存管理中虚拟页面和物理页面连续分配的特性,提出可对相邻页面进行动态合并的旁路转换缓冲器(TLB)设计方法.该方法的核心思想是在处理器运行过程中,通过对相邻页面的递归合并,动态扩展单个TLB表项的地址映射范围,提高TLB表项的利用率并降低TLB缺失率.在两级TLB架构中,提出基于快速uTLB(fuTLB)和影子uTLB(suTLB)动态切换的新型uTLB结构,作为两级TLB架构的一级缓存,为页面动态合并提供现场和载体,页面合并过程对软件透明.基于Mibench测试基准的实验结果表明,与filter-TLB架构相比,该页面动态合并方法可以平均降低TLB缺失率达27%.

[1] ARM Ltd. ARM processor specifications [EB/OL].2008-09-01. http:∥www.arm.com.
[2] TALLURI M, HILL M D. Surpassing the TLB performance of superpages with less operating system support [C]∥ Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems. San Jose: IEEE, 1994: 171-182.
[3] MIPS Technologies. MIPS processor features [EB/OL].2008-09-01. http:∥www.mips.com.
[4] LEE J H, LEE J S, KIM S D. A dynamic TLB management structure to support different page sizes [C]∥ Proceedings of the 2nd IEEE Asia Pacific Conference on ASICs. Cheju, Korea: IEEE, 2000:299-302.
[5] SAMANTA R, SURPRISE J, MAHAPATR R. Dynamic aggregation of virtual addresses in TLB using TCAM cells [C]∥ Proceedings of the 21st International Conference on VLSI Design. Washington D.C.: IEEE, 2008: 243-248.
[6] PETERSON J L, NORMAN T A. Buddy systems [J]. Communications of the ACM, 1977, 20(6): 421-431.
[7] MCKENNEY P E, SLINGWINE J. Efficient kernel memory allocation on sharedmemory multiprocessors [C]∥ Proceedings of the USENIX Winter 1993 Technical Conference. San Diego: USENIX, 1993: 295-305.
[8] BONWICK J. The slab allocator:an objectcaching kernel memory allocator [C]∥ Proceedings of the USENIX Summer 1994 Technical Conference. Berkeley: USENIX, 1994: 87-98.
[9] CSKY microsystems. 32bit high performance and low power embedded processor [EB/OL].2003-08-01. http:∥www.csky.com.
[10] LEE J H, PARK G H, PARK S B. A selective filterbank TLB system [C]∥Proceedings of the 2003 International Symposium on Low Power Electronics and Design. Seoul: IEEE, 2003: 312-317.

[1] XIANG Xiao-yan, CHEN Zhi-jian, MENG Jian-yi, YAN Xiao-lang. Low power instruction cache based on adjacent line linking access[J]. J4, 2013, 47(7): 1213-1217.
[2] CHEN Zhi-jian, MENG Jian-yi, GE Hai-tong, YAN Xiao-lang. High performance hardware stack for seamless context switching[J]. J4, 2011, 45(9): 1587-1592.