High efficient pipeline design and implementation for sub-pixel <br /> interpolation process in H.264/AVC

doi:10.3785/j.issn.1008-973X.2011.07.008

2011, Vol. 45

Issue (7): 1187-1193 DOI: 10.3785/j.issn.1008-973X.2011.07.008

High efficient pipeline design and implementation for sub-pixel
interpolation process in H.264/AVC

LI Chun-shu1, HUANG Kai1, XIU Si-wen1, MA De1, GE Hai-tong2, YAN Xiao-lang1

1.Institute of VLSI Design, Zhejiang University, Hangzhou 310027, China;
2.Hangzhou csky Microsystem Corporation, Hangzhou 310027, China

Download:

PDF(0KB) HTML
Export: BibTeX | EndNote (RIS)

Abstract

A twolevel pipeline architecture was proposed in order to decrease the high complexity of subpixel interpolation process in H.264/AVC decoding system. The first level pipeline scheme was utilized to explore the parallelism for the interpolation processes of different 4×4 blocks with two stages of fetching 4×4 block’s reference pixels and interpolation computation operation when the four 4×4 blocks inside one 8×8 block share the same motion information. The second level pipeline scheme was used to accelerate the subpixel interpolation computation operation of different pixels by using the independence of adjacent halfpixels and the symmetry between horizontal and vertical interpolation computation processes. The kernel interpolation computation unit was implemented with 13 sixtap filters, 4 bilinear interpolation filters and 4 chroma interpolation filters. The pipelining and parallelism in interpolation computation process can reduce computation time by at least 75%. Experimental results show that the proposed architecture design can reduce the external memory bandwidth by 47% and improve the performance of subpixel interpolation by 30% at a lower hardware cost compared with other designs．

Published: 01 July 2011

CLC:

TN 919.8

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors

Cite this article:

LI Chun-shu, HUANG Kai, XIU Si-wen, MA De, GE Hai-tong, YAN Xiao-lang. High efficient pipeline design and implementation for sub-pixel
interpolation process in H.264/AVC. J4, 2011, 45(7): 1187-1193.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2011.07.008 OR https://www.zjujournals.com/eng/Y2011/V45/I7/1187

H.264/AVC子像素插值的高性能流水线设计及实现

针对在H.264/AVC视频解码系统中子像素插值过程复杂度高的问题,提出一种子像素插值的2层流水线设计方法.第1层流水机制是当8×8分割块内部4个4×4块具有相同的运动信息时,基于4×4分割块参考像素读取和插值运算的两级流水,实现了不同4×4块插值过程的并行操作.第2层流水机制利用插值运算算法中1/2像素值之间的无依赖性以及水平和垂直插值运算过程的对称性,加速了各子像素位置处的像素插值运算过程.核心插值运算单元包括13个6阶滤波器、4个双线性插值滤波器和4个色度插值滤波器.插值运算过程的并行流水机制至少缩减了75%的插值运算时间.实验结果表明,与其他同领域工作相比,该架构设计的硬件开销较小,外部存储器访问量降低了47%,子像素插值性能提高了30%.

［1］ ITUT Rec. H.264 and ISO/IEC 1448610 AVC. Draft ITUT recommendation and final draft international standard of joint video specification ［S］. ［S. l.］: JVT, 2003．
［2］ LIN Chienchang, CHEN Jiawei, CHANG Hsiucheng, et al. A 160K gates/45 kB SRAM H.264 video decoder for HDTV applications ［J］. IEEE Journal of SolidState Circuits, 2007, 42(1): 170-182．
［3］ XU Ke, CHOY Chiusing. A powerefficient and selfadaptive prediction engine for H.264/AVC decoding ［J］. IEEE Transactions on Very Large Scale Integration Systems, 2008, 16(3): 302-313．
［4］ LEI Yu, LI Hui, HUANG Kai, et al. A H.264 video decoder with scheme of efficient bandwidth optimization for motion compensation ［C］∥ International Symposium on Communications and Information Technologies. Sydney, Australia: IEEE, 2007: 531-534．
［5］ YANG Kun, ZHANG Chun, DU Guoze, et al. A hardwaresoftware codesign for H.264/AVC decoder ［C］∥ Asia SolidState Circuit Conference. Hangzhou, China: IEEE, 2006: 119-122．
［6］戴郁,李冬晓,郑伟,等. H.264/AVC运动补偿的高效插值结构设计［J］. 浙江大学学报：工学版, 2009, 43(2): 255-260．
DAI Yu, LI Dongxiao, ZHENG Wei, et al. Efficient interpolation architecture design for motion compensation in H.264/AVC ［J］. Journal of Zhejiang University: Engineering Science, 2009, 43(2): 255-260．
［7］ WANG Ronggang, LI Mo, LI Jintao, et al. High throughput and low memory access subpixel interpolation architecture for H.264/AVC HDTV decoder ［J］. IEEE Transactions on Consumer Electronics, 2005, 51(3): 1006-1013．
［8］ CHUANG Tzuder, CHANG Lomei, CHIU Tsaiwei, et al. Bandwidthefficient cachebased motion compensation architecture with dramfriendly data access control ［C］∥ Acoustics, Speech and Signal Processing. Taipei, Taiwan, China: IEEE, 2009: 2009-2012．
［9］姚栋,虞露. MPEG4运动补偿的亚像素内插过程及其硬件实现［J］. 浙江大学学报:工学版, 2005, 39(11): 1703-1707．
YAO Dong, YU Lu. Subpixel interpolation of MPEG4 motion compensation and its hardware implementation ［J］. Journal of Zhejiang University: Engineering Science, 2005, 39(11): 1703-1707．
［10］ KIM J H, HYUN G H, LEE H J. Cache organization for H.264/AVC motion compensation ［C］∥ Embedded and RealTime Computing Systems and Applications. Daegu, Korea: IEEE, 2007: 534-541．
［11］ FINCHELSTEIN D F, SZE V, CHANDRAKASAN A P. Multicore processing and efficient onchip caching for H.264 and future video decoders ［J］. IEEE Transactions on Circuits and Systems for Video Technology, 2009, 19(11): 1704-1722.

[1]	LIU Yun-peng, ZHANG San-yuan, WANG Ren-fang, ZHANG Yin. Inter-frame fast coding algorithm in temporal scalability for traffic video[J]. J4, 2013, 47(3): 400-408.

[2]	ZHANG Shen,WANG Wei-dong,ZHAO Ya-fei,WU Zu-cheng, WANG Yue-hai,ZHANG Ming. 3D-DCT based volumetric three-dimensional video data compression method[J]. J4, 2012, 46(1): 112-117.

[3]	MA De, HUANG Kai, CHEN Hua-feng, YU Min, YAN Xiao-lang. Mixed increasing filter pipeline design for H.264/AVC deblocking filter[J]. J4, 2011, 45(7): 1206-1214.

[4]	DU Juan, DING Dan-dan, YU Lu. Design methodology of FPGA based reconfigurable video encoder[J]. J4, 2012, 46(5): 905-911.

Viewed

Full text

Abstract

Cited

Shared

Discussed