|
|
High efficient pipeline design and implementation for sub-pixel interpolation process in H.264/AVC |
LI Chun-shu1, HUANG Kai1, XIU Si-wen1, MA De1, GE Hai-tong2, YAN Xiao-lang1 |
1.Institute of VLSI Design, Zhejiang University, Hangzhou 310027, China; 2.Hangzhou csky Microsystem Corporation, Hangzhou 310027, China |
|
|
Abstract A twolevel pipeline architecture was proposed in order to decrease the high complexity of subpixel interpolation process in H.264/AVC decoding system. The first level pipeline scheme was utilized to explore the parallelism for the interpolation processes of different 4×4 blocks with two stages of fetching 4×4 block’s reference pixels and interpolation computation operation when the four 4×4 blocks inside one 8×8 block share the same motion information. The second level pipeline scheme was used to accelerate the subpixel interpolation computation operation of different pixels by using the independence of adjacent halfpixels and the symmetry between horizontal and vertical interpolation computation processes. The kernel interpolation computation unit was implemented with 13 sixtap filters, 4 bilinear interpolation filters and 4 chroma interpolation filters. The pipelining and parallelism in interpolation computation process can reduce computation time by at least 75%. Experimental results show that the proposed architecture design can reduce the external memory bandwidth by 47% and improve the performance of subpixel interpolation by 30% at a lower hardware cost compared with other designs.
|
Published: 01 July 2011
|
|
H.264/AVC子像素插值的高性能流水线设计及实现
针对在H.264/AVC视频解码系统中子像素插值过程复杂度高的问题,提出一种子像素插值的2层流水线设计方法.第1层流水机制是当8×8分割块内部4个4×4块具有相同的运动信息时,基于4×4分割块参考像素读取和插值运算的两级流水,实现了不同4×4块插值过程的并行操作.第2层流水机制利用插值运算算法中1/2像素值之间的无依赖性以及水平和垂直插值运算过程的对称性,加速了各子像素位置处的像素插值运算过程.核心插值运算单元包括13个6阶滤波器、4个双线性插值滤波器和4个色度插值滤波器.插值运算过程的并行流水机制至少缩减了75%的插值运算时间.实验结果表明,与其他同领域工作相比,该架构设计的硬件开销较小,外部存储器访问量降低了47%,子像素插值性能提高了30%.
|
|
[1] ITUT Rec. H.264 and ISO/IEC 1448610 AVC. Draft ITUT recommendation and final draft international standard of joint video specification [S]. [S. l.]: JVT, 2003. [2] LIN Chienchang, CHEN Jiawei, CHANG Hsiucheng, et al. A 160K gates/45 kB SRAM H.264 video decoder for HDTV applications [J]. IEEE Journal of SolidState Circuits, 2007, 42(1): 170-182. [3] XU Ke, CHOY Chiusing. A powerefficient and selfadaptive prediction engine for H.264/AVC decoding [J]. IEEE Transactions on Very Large Scale Integration Systems, 2008, 16(3): 302-313. [4] LEI Yu, LI Hui, HUANG Kai, et al. A H.264 video decoder with scheme of efficient bandwidth optimization for motion compensation [C]∥ International Symposium on Communications and Information Technologies. Sydney, Australia: IEEE, 2007: 531-534. [5] YANG Kun, ZHANG Chun, DU Guoze, et al. A hardwaresoftware codesign for H.264/AVC decoder [C]∥ Asia SolidState Circuit Conference. Hangzhou, China: IEEE, 2006: 119-122. [6] 戴郁,李冬晓,郑伟,等. H.264/AVC运动补偿的高效插值结构设计[J]. 浙江大学学报:工学版, 2009, 43(2): 255-260. DAI Yu, LI Dongxiao, ZHENG Wei, et al. Efficient interpolation architecture design for motion compensation in H.264/AVC [J]. Journal of Zhejiang University: Engineering Science, 2009, 43(2): 255-260. [7] WANG Ronggang, LI Mo, LI Jintao, et al. High throughput and low memory access subpixel interpolation architecture for H.264/AVC HDTV decoder [J]. IEEE Transactions on Consumer Electronics, 2005, 51(3): 1006-1013. [8] CHUANG Tzuder, CHANG Lomei, CHIU Tsaiwei, et al. Bandwidthefficient cachebased motion compensation architecture with dramfriendly data access control [C]∥ Acoustics, Speech and Signal Processing. Taipei, Taiwan, China: IEEE, 2009: 2009-2012. [9] 姚栋,虞露. MPEG4运动补偿的亚像素内插过程及其硬件实现[J]. 浙江大学学报:工学版, 2005, 39(11): 1703-1707. YAO Dong, YU Lu. Subpixel interpolation of MPEG4 motion compensation and its hardware implementation [J]. Journal of Zhejiang University: Engineering Science, 2005, 39(11): 1703-1707. [10] KIM J H, HYUN G H, LEE H J. Cache organization for H.264/AVC motion compensation [C]∥ Embedded and RealTime Computing Systems and Applications. Daegu, Korea: IEEE, 2007: 534-541. [11] FINCHELSTEIN D F, SZE V, CHANDRAKASAN A P. Multicore processing and efficient onchip caching for H.264 and future video decoders [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2009, 19(11): 1704-1722. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|