Please wait a minute...
浙江大学学报(工学版)  2023, Vol. 57 Issue (9): 1894-1902    DOI: 10.3785/j.issn.1008-973X.2023.09.021
电子、通信与自动控制技术     
H.266/VVC二维变换的统一硬件结构
陈俊煜1(),孙斌1,*(),黄晓峰2,盛庆华3,赖昌材3,金心宇1
1. 浙江大学 工程师学院,浙江 杭州 310015
2. 杭州电子科技大学 通信工程学院,浙江 杭州 310018
3. 杭州电子科技大学 电子信息学院,浙江 杭州 310018
Unified hardware architecture for 2D transform in H.266/VVC
Jun-yu CHEN1(),Bin SUN1,*(),Xiao-feng HUANG2,Qing-hua SHENG3,Chang-cai LAI3,Xin-yu JIN1
1. Polytechnic Institute, Zhejiang University, Hangzhou 310015, China
2. School of Communication Engineering, Hangzhou Dianzi University, Hangzhou 310018, China
3. School of Electronics and Information, Hangzhou Dianzi University, Hangzhou 310018, China
 全文: PDF(1061 KB)   HTML
摘要:

为了降低H.266/VVC中二维变换部分的硬件实现面积和功耗,提出统一的硬件结构,支持全尺寸的离散余弦变换(DCT-II, DCT-VIII)和离散正弦变换(DST-VII). 所提结构包括2个并行的一维变换模块和1个转置存储器,其中一维变换模块基于多常量乘法(MCM)设计,针对所有的变换类型和尺寸设计可复用的MCM计算单元. 为了能够支持混合块的流水输入,设计支持流水线处理的转置存储器. 该转置存储器基于静态随机存储器(SRAM)实现,使用对角线存储方案并配合读写指针进行操作,利用先入先出队列(FIFO)进行块信息缓存. 实验结果表明,统一的计算单元可以减小变换结构1.3%的面积和49.5%的功耗,转置存储器能够结合VVC高频置零的特性减少SRAM一半的存储空间.

关键词: H.266/VVC离散余弦变换(DCT)离散正弦变换(DST)硬件结构专用集成电路(ASIC)流水线    
Abstract:

A unified hardware architecture was proposed in order to reduce the hardware implementation area and the power of the 2D transform in H.266/VVC. The architecture supported the full-size discrete cosine transform (DCT-II, DCT-VIII) and the discrete sine transform (DST-VII). The architecture consisted of two parallel 1D transform modules and one transpose memory. The 1D transform module was designed based on the multiple constant multiplication (MCM), and a reusable MCM computing unit was designed for all transform types and sizes. The transpose memory was proposed in order to support the pipeline input of the mixed blocks. And the transpose memory was implemented based on static random-access memory (SRAM), used a diagonal storage method with read and write pointers, and used first input first output (FIFO) to cache block information. Experimental results showed that the unified computing unit reduced the area of the transform architecture by 1.3% and the power consumption by 49.5%, and the transpose memory reduced the SRAM storage space by half with the high-frequency zeroing feature of VVC.

Key words: H.266/VVC    discrete cosine transform (DCT)    discrete sine transform (DST)    hardware architecture    application-specific integrated circuit (ASIC)    pipeline
收稿日期: 2022-11-21 出版日期: 2023-10-16
CLC:  TN 919.81  
基金资助: 国家自然科学基金资助项目(61901150);浙江省科技计划资助项目(LGG18F010004);科技部-科技创新2030重大项目(2021ZD0109802)
通讯作者: 孙斌     E-mail: chenjunyu@zju.edu.cn;shg@zju.edu.cn
作者简介: 陈俊煜(1999—),男,硕士生,从事移动智慧物联网研究. orcid.org/0000-0002-2206-5564. E-mail: chenjunyu@zju.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
陈俊煜
孙斌
黄晓峰
盛庆华
赖昌材
金心宇

引用本文:

陈俊煜,孙斌,黄晓峰,盛庆华,赖昌材,金心宇. H.266/VVC二维变换的统一硬件结构[J]. 浙江大学学报(工学版), 2023, 57(9): 1894-1902.

Jun-yu CHEN,Bin SUN,Xiao-feng HUANG,Qing-hua SHENG,Chang-cai LAI,Xin-yu JIN. Unified hardware architecture for 2D transform in H.266/VVC. Journal of ZheJiang University (Engineering Science), 2023, 57(9): 1894-1902.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2023.09.021        https://www.zjujournals.com/eng/CN/Y2023/V57/I9/1894

图 1  二维变换的流水线结构
图 2  DCT-II的2种变换矩阵
图 3  DST-VII4与DCT-VIII4的变换关系
图 4  基于MCM的一维变换统一结构
图 5  蝶形运算结构
图 6  DST-VII和DCT-VIII的统一结构
图 7  不同输入数据对应的移位加法单元
图 8  转置存储器结构
图 9  输入矩阵与存储结构
图 10  二维变换流水线结构时序图
方法 标准 工艺 F/MHz H/(像素·周期?1) NGC/103 P/mW N 变换类型
DCT-II DST-VII DCT-VIII
文献[10] VVC 65 nm 250 32 496.4 62.6 4~32 ×
文献[13] VVC 28 nm 600 1 89.1 4~64
文献[14] VVC 90 nm 160 8 416.0 4~32
文献[17] HEVC 90 nm 187 32 347.0 67.6 4~32 × ×
文献[21] HEVC 90 nm 300 8,8,4,2 166.0 23.2 4~32 × ×
本研究 VVC 28 nm 724 32 1 121.5 48.2 4~64
表 1  基于ASIC实现的二维变换硬件设计对比
方法 标准 H/(像素·周期?1) NB D W N 变换类型
DCT-II DST-VII DCT-VIII
文献[8] VVC 4 16 512 16 4~32
文献[10] VVC 32 32 64 16 4~32 ×
本研究 VVC 32 32 128 16 4~64
表 2  转置存储器硬件设计对比
1 BROSS B, CHEN J, LIU S, et al. Versatile video coding editorial refinements on draft 10 [EB/OL]. (2020-11-24) [2022-11-15]. https://jvet-experts.org/doc_end_user/documents/20_Teleconference/wg11/JVET-T2001-v2.zip.
2 SULLIVAN G J, OHM J R, HAN W J, et al Overview of the high efficiency video coding (HEVC) standard[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2012, 22 (12): 1649- 1668
doi: 10.1109/TCSVT.2012.2221191
3 BROSS B, CHEN J, OHM J R, et al Developments in international video coding standardization after AVC, with an overview of versatile video coding (VVC)[J]. Proceedings of the IEEE, 2021, 109 (9): 1463- 1493
doi: 10.1109/JPROC.2020.3043399
4 BOSSEN F, SÜHRING K, WIECKOWSKI F, et al VVC complexity and software implementation analysis[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31 (10): 3765- 3778
doi: 10.1109/TCSVT.2021.3072204
5 ZHAO X, KIM S H, ZHAO Y, et al Transform coding in the VVC standard[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31 (10): 3878- 3890
doi: 10.1109/TCSVT.2021.3087706
6 CHEN J, YE Y, KIM S. Algorithm description for versatile video coding and test model 11 (VTM 11) [EB/OL]. (2021-01-15) [2022-11-15]. https://jvet-experts.org/doc_end_user/documents/20_Teleconference/wg11/JVET-T2002-v5.zip.
7 GARRIDO M J, PESCADOR F, CHAVARRÍAS M, et al A high performance FPGA-based architecture for the future video coding adaptive multiple core transform[J]. IEEE Transactions on Consumer Electronics, 2018, 64 (1): 53- 60
doi: 10.1109/TCE.2018.2812459
8 GARRIDO M J, PESCADOR F, CHAVARRÍAS M, et al A 2-D multiple transform processor for the versatile video coding standard[J]. IEEE Transactions on Consumer Electronics, 2019, 65 (3): 274- 283
doi: 10.1109/TCE.2019.2913327
9 GARRIDO M J, PESCADOR F, CHAVARRÍAS M, et al An FPGA-based architecture for the versatile video coding multiple transform selection core[J]. IEEE Access, 2020, 8: 81887- 81903
doi: 10.1109/ACCESS.2020.2991299
10 FAN Y B, ZENG Y X, SUN H M, et al A pipelined 2D transform architecture supporting mixed block sizes for the VVC standard[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 30 (9): 3289- 3295
doi: 10.1109/TCSVT.2019.2934752
11 DEMPSTER A G, MACLEOD M D Use of minimum-adder multiplier blocks in FIR digital filters[J]. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, 1995, 42 (9): 569- 577
doi: 10.1109/82.466647
12 FARHAT I, HAMIDOUCHE W, GRILL A, et al. Lightweight hardware implementation of VVC transform block for ASIC decoder [C]// ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Barcelona: IEEE, 2020: 1663-1667.
13 FARHAT I, HAMIDOUCHE W, GRILL A, et al Lightweight hardware transform design for the versatile video coding 4K ASIC decoders[J]. IEEE Transactions on Consumer Electronics, 2021, 67 (4): 329- 340
doi: 10.1109/TCE.2021.3126549
14 MERT A C, KALALI E, HAMZAOGLU I High performance 2D transform hardware for future video coding[J]. IEEE Transactions on Consumer Electronics, 2017, 63 (2): 117- 125
doi: 10.1109/TCE.2017.014862
15 HAO Z J, XU F, XIANG G Q, et al. A multiplier-less transform architecture with the diagonal data mapping transpose memory for the AVS3 standard [C]// 2021 IEEE 14th International Conference on ASIC (ASICON). Kunming: IEEE, 2021: 1-4.
16 HAO Z J, ZHENG Q, FAN Y B, et al. An area-efficient unified transform architecture for VVC [C]// 2022 IEEE International Symposium on Circuits and Systems (ISCAS). Austin: IEEE, 2022: 2012-2016.
17 MEHER P K, PARK S Y, MOHANTY B K, et al Efficient integer DCT architectures for HEVC[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2014, 24 (1): 168- 178
doi: 10.1109/TCSVT.2013.2276862
18 ZHENG M K, ZHENG J Y, CHEN Z F, et al A reconfigurable architecture for discrete cosine transform in video coding[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 30 (3): 810- 821
doi: 10.1109/TCSVT.2019.2896294
19 VORONENKO Y, PÜSCHEL Multiplierless multiple constant multiplication[J]. ACM Transactions on Algorithms, 2007, 3 (2): 11
doi: 10.1145/1240233.1240234
20 CHEN W H, SMITH C, FRALICK S A fast computational algorithm for the discrete cosine transform[J]. IEEE Transactions on Communications, 1977, 25 (9): 1004- 1009
doi: 10.1109/TCOM.1977.1093941
[1] 赵婵媛, 陆志强, 崔维伟. 考虑随机故障的流水线调度问题前摄优化方法[J]. 浙江大学学报(工学版), 2016, 50(4): 641-649.
[2] 谭腾飞,马德,黄凯,马琪. 多层图像叠加处理的低功耗自适应流水线设计[J]. 浙江大学学报(工学版), 2015, 49(1): 27-35.
[3] 孙可旭, 何乐年. 基于频域特性的流水线ADC数字校正技术[J]. J4, 2013, 47(8): 1393-1402.
[4] 雷鑑铭, 胡北稳, 桂涵姝, 张乐. 采用新型低成本共模反馈电路的全差分运放设计[J]. J4, 2013, 47(10): 1777-1783.
[5] 李春澍,黄凯,修思文,马德,葛海通,严晓浪. H.264/AVC子像素插值的高性能流水线设计及实现[J]. J4, 2011, 45(7): 1187-1193.
[6] 马德,黄凯,陈华锋,余慜,严晓浪. H.264去块效应滤波器的混合递增滤波流水线设计[J]. J4, 2011, 45(7): 1206-1214.
[7] 曹晓阳, 潘赟, 严晓浪, 宦若虹. 低面积-时间复杂度的离散余弦变换脉动结构[J]. J4, 2011, 45(4): 656-659.
[8] 石冰 郑伟 李东晓 张明. 高性能通用H.264/AVC变换编码的硬件结构设计[J]. J4, 2008, 42(6): 933-938.
[9] 陈华锋 沈海斌 严晓浪. 基于硬件动态指令调度的椭圆曲线并行运算[J]. J4, 2007, 41(11): 1778-1781.
[10] 胡倩 张珂 虞露. AVS视频解码器的一种结构设计与硬件实现[J]. J4, 2006, 40(12): 2139-2143.
[11] 谭磊 张朝阳 陈文正. 高速定点快速傅立叶变换处理器的设计与实现[J]. J4, 2005, 39(3): 407-413.