Please wait a minute...
浙江大学学报(工学版)  2021, Vol. 55 Issue (6): 1199-1207    DOI: 10.3785/j.issn.1008-973X.2021.06.021
信息与电子工程     
基于改进旋转因子的高性能FFT硬件设计
骆阳(),张为*()
天津大学 微电子学院,天津 300072
Hardware efficient FFT design based on improved rotation factor
Yang LUO(),Wei ZHANG*()
School of Microelectronic, Tianjin University, Tianjin 300072, China
 全文: PDF(1304 KB)   HTML
摘要:

针对FFT硬件实现中旋转因子模块占用资源较多的问题,设计高性能单路延时反馈结构的基22快速傅里叶变换. 采用CORDIC与MCM混合的方法设计旋转因子模块,实现了无需常规乘法器的FFT架构,不必占用DSP48E资源. 对于旋转角度数量较少的W16旋转因子模块,采用基于三输入加法器的MCM方法设计,将加法器数量降到最低. 对于旋转角度数量较多的W64W256W1 024模块,采用CORDIC方法设计. 依据旋转角度的数学规律,设计旋转角度实时生成模块,与传统的CORDIC方法相比,不需要占用ROM资源,避免了复杂的寻址逻辑和时序控制. 与其他构架相比,设计的16 bit 64点快速傅里叶变换在Xilinx Virtex-7上将单位slice吞吐率提高了35.20%,256点FFT在Virtex-5上提高了30.37%,1 024点FFT在Virtex-7上提高了25.38%.

关键词: 快速傅里叶变换单路延迟反馈架构常数乘法器坐标旋转数字计算方法    
Abstract:

A hardware efficient Radix-22 fast Fourier transform based on a single-path delay feedback architecture was designed aiming at the problem that the rotation factor module takes up more resources in FFT hardware implementation. The method of mixing CORDIC and MCM was adopted to design the rotation factor module to realize FFT architecture without conventional multiplier and DSP48E resource. MCM method based on ternary adders was used to minimize the number of adders for the W16 rotation factor module with less rotation angles. CORDIC method was adopted for the rotation factor modules of W64, W256 and W1 024 with more rotation angles. The real-time generation module of rotation angles was designed according to the mathematical law of rotation angles. The method does not need to occupy ROM resources and avoids complex addressing logic and timing control compared with the traditional CORDIC method. The designed 16 bit 64-point FFT improves the throughput per slice by 35.20% on Xilinx Virtex-7, the 256-point FFT improves by 30.37% on Virtex-5, and the 1 024-point FFT improves by 25.38% on Virtex-7 compared with other architectures.

Key words: fast Fourier transform    single-path delay feedback architecture    constant multiplier    coordinate rotation digital computer method
收稿日期: 2020-06-07 出版日期: 2021-07-30
CLC:  TN 47  
基金资助: 光电信息控制和安全技术重点实验室资助项目(JCKY2019210C053)
通讯作者: 张为     E-mail: 2018232032@tju.edu.cn;tjuzhangwei@tju.edu.cn
作者简介: 骆阳(1995—),女,硕士生,从事数字集成电路的研究. orcid.org/0000-0002-0691-9478. E-mail: 2018232032@tju.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
骆阳
张为

引用本文:

骆阳,张为. 基于改进旋转因子的高性能FFT硬件设计[J]. 浙江大学学报(工学版), 2021, 55(6): 1199-1207.

Yang LUO,Wei ZHANG. Hardware efficient FFT design based on improved rotation factor. Journal of ZheJiang University (Engineering Science), 2021, 55(6): 1199-1207.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2021.06.021        https://www.zjujournals.com/eng/CN/Y2021/V55/I6/1199

图 1  1 024点基22 SDF FFT结构图
计数器 计数延迟 计数器位 控制信号 控制模块
Counter_1 0clk Counter_1[9] Control_10 BF10、TR10
Counter_1 0clk Counter_1[8] Control_9 BF9、W1 024
Counter_2 24clk Counter_2[7] Control_8 BF8、TR8
Counter_2 24clk Counter_2[6] Control_7 BF7、W256
Counter_3 24clk Counter_3[5] Control_6 BF6、TR6
Counter_3 24clk Counter_3[4] Control_5 BF5、W64
Counter_4 24clk Counter_4[3] Control_4 BF4、TR4
Counter_4 24clk Counter_4[2] Control_3 BF3、W16
Counter_5 3clk Counter_5[1] Control_2 BF2、TR2
Counter_5 3clk Counter_5[0] Control_1 BF1
表 1  控制信号表
图 2  1 024点基22 SDF FFT时序图
图 3  蝶形单元结构图
图 4  蝶形单元时序图
图 5  简单旋转因子(-j)乘法器
Counter_4 ${ {\rm{exp} }\;({ {\rm{ - j} }2{\text{π}} nk/N}) }$ n θ 实部 虚部
0、4、8、12~15 ${{\rm{e}}^0}$ 0 0 1 0
5 ${ {\rm{exp} }\;({ {\rm{ - j} }{\text{π}} k/8}) }$ 1 π/8 0.923 9 ?0.382 7
1、6 ${ {\rm{exp} }\;({ {\rm{ - j} }{\text{π}} k/4} )}$ 2 π/4 0.707 1 ?0.707 1
7、9 ${ {\rm{exp} }\;({ {\rm{ - j3} }{\text{π}} k/8}) }$ 3 3π/8 0.382 7 ?0.923 9
2 ${ {\rm{exp} }\;({ {\rm{ - j} }{\text{π}} k/2}) }$ 4 π/2 0 ?1
3、10 ${ {\rm{exp} }\;({ {\rm{ - j3} }{\text{π}} k/4}) }$ 6 3π/4 ?0.707 1 ?0.707 1
11 ${ {\rm{exp} }\;({ {\rm{ - j9} }{\text{π}} k/8}) }$ 9 9π/8 ?0.923 9 0.382 7
表 2  W16的旋转因子系数
图 6  236、97、181的MCM 流水线加法器图
图 7  W16旋转因子乘法器结构图
图 8  流水线型CORDIC算法结构图
旋转角度Z0 角度预处理 X0 Y0
[0, π/2] Z Data_im Data_re
[π/2, π] Z ? π/2 ?Data_re Data_im
[π, 3π/2] Z ? π ?Data_im ?Data_re
表 3  旋转角度预处理表格
图 9  旋转角度生成模块
k1 k2 n3 $W_N^{{n_3}\left( {{k_1} + 2{k_2}} \right)}$ 旋转角度
0 0 0, 1,···, N/4?1 $W_N^0$ 0
0 1 0, 1,···, N/4?1 $W_N^{2{n_3}}$ $0,\dfrac{ { {\rm{4} }{\text{π} } } }{ {{N} } },\dfrac{ { {\rm{8} }{\text{π} } } }{ {{N} } }, \ldots ,\dfrac{ { {\rm{4} }{\text{π} } } }{ {{N} } }\left( {\dfrac{ {{N} } }{ {\rm{4} } } - 1} \right)$
1 0 0, 1,···, N/4?1 $W_N^{{n_3}}$ $0,\dfrac{ { {\rm{2} }{\text{π} } } }{ {{N} } },\dfrac{ {4{\text{π} } } }{ {{N} } }, \ldots ,\dfrac{ { {\rm{2} }{\text{π} } } }{ {{N} } }\left( {\dfrac{ {{N} } }{ {\rm{4} } } - 1} \right)$
1 1 0, 1,···, N/4?1 $W_N^{3{n_3}}$ $0,\dfrac{ { {\rm{6} }{\text{π} } } }{ {{N} } },\dfrac{ { {\rm{12} }{\text{π} } } }{ {{N} } }, \ldots ,\dfrac{ { {\rm{6} }{\text{π} } } }{ {{N} } }\left( {\dfrac{ {{N} } }{ {\rm{4} } } - 1} \right)$
表 4  旋转角度规律表
图 10  增益因子M的SCM流水线加法器图
图 11  直流分量与余弦分量叠加信号的FFT结果图
FFT结果 结果幅值 信号幅值 相对误差/% 相位 相对误差/%
第0点 (64, 0) 64 0.062 5 0 ? ?
第100点 (55.418, ?32.031) 64.008 9 0.125 02 0.013 9 ?30.027 4 0.091 5
第300点 (0.031, 127.949) 127.949 0 0.249 9 0.039 8 89.986 1 0.015 4
第724点 (0.059, ?127.910) 127.910 0 0.249 8 0.070 3 ?89.973 6 0.02936
第924点 (55.523, 32.082) 64.125 3 0.125 24 0.195 8 30.019 9 0.066 5
表 5  信号频域特性测试数据表
方法 FPGA型号 N Slices LUTs Regs DSP48Es slice总量 f /MHz Tp /(MS·s?1 Tp,u /(kS·s?1
实际数 等效slice数
文献[21]方法 V7 64 ? 848 626 8 4 000 5 474 335 335 61.20
本文方法 V7 64 667 2 423 1 924 0 0 4 347 359.69 359.69 82.74
文献[22]方法 V5 256 ? 1 733 1 073 12 6 000 8 806 297 297 33.72
本文方法 V5 256 1 093 3 925 3 258 0 0 7 183 315.77 315.77 43.96
文献[23]方法 V7 1 024 ? 11 865 1 393 0 0 13 258 200 200 15.08
文献[21]方法 V7 1 024 ? 1 671 2 065 25 12 500 16 236 317 317 19.52
文献[20]方法 V7 1 024 ? 12 737 2 715 0 0 15 452 200 400 25.89
本文方法 V7 1 024 1 645 6 098 4 785 0 0 10 883 353.21 353.21 32.46
表 6  FPGA实现的电路性能和硬件资源比较表
理想系数 扩大取整后系数 实际旋转因子系数 相对误差/%
0.923 9 236 0.921 8 0.227
0.382 7 98 0.382 8 0.026
0.707 1 181 0.707 0 0.014
表 7  W16旋转角度误差表
1 GROGINSKY H L, WORKS G A A pipeline fast Fourier transform[J]. IEEE Transactions on Computers, 1970, 19 (11): 1015- 1019
2 HE S, TORKELSON M. A new approach to pipeline FFT processor[C]// International Parallel Processing Symposium. Hawaii: IEEE, 1996: 766-770.
3 GARRIDO M, QURESHI F, TAKALA J, et al. Hardware architectures for the fast Fourier transform[M]//SHUVRA S, LEUPERS R, TAKALA J, et al. Handbook of signal processing systems. Switzerland: Springer, 2018: 613-648.
4 QURESHI F. Optimization of rotations in FFTs[D]. Linköping: Linköping University, 2012.
5 GARRIDO M, HUANG S, CHEN S, et al The serial commutator FFT[J]. IEEE Transactions on Circuits and Systems II: Express Briefs, 2016, 63 (10): 974- 978
doi: 10.1109/TCSII.2016.2538119
6 INGEMARSSON C, KALLSTROM P, GUSTAFSSON O. Using DSP block pre-adders in pipeline SDF FFT implementations in contemporary FPGAs[C]// 22nd International Conference on Field Programmable Logic and Applications. Oslo: IEEE, 2012: 71-74.
7 INGEMARSSON C, GUSTAFSSON O SFF: the single-stream FPGA-optimized feedforward FFT hardware architecture[J]. Journal of Signal Processing Systems, 2018, 90 (11): 1583- 1592
doi: 10.1007/s11265-018-1370-y
8 MA Z G, YIN X B, YU F A novel memory-based FFT architecture for real-valued signals based on a radix-2 decimation-in-frequency algorithm[J]. IEEE Transactions on Circuits and Systems II: Express Briefs, 2015, 62 (9): 876- 880
doi: 10.1109/TCSII.2015.2435522
9 TANG A M, YU L, HAN F J, et al. CORDIC-based FFT real-time processing design and FPGA implementation[C]// 12th International Colloquium on Signal Processing and its Applications. Malacca: IEEE, 2016: 233-236.
10 SHI J Y, TIAN Y H, WANG M X, et al. A novel design of 1024-point pipelined FFT processor based on Cordic algorithm[C]// 2nd International Conference on Intelligent System Design and Engineering Application. Sanya: IEEE, 2012: 80-83.
11 MANKAR A, PRASAD N, DAS A D, et al Multiplier: less VLSI architectures for radix‐22 folded pipelined complex FFT core [J]. International Journal of Circuit Theory and Applications, 2015, 43 (11): 1743- 1758
doi: 10.1002/cta.2038
12 ZHANG J F, LIU H Z, CHEN T, et al Enhanced hardware efficient FFT processor based on adaptive recoding CORDIC[J]. Electronics and Electrical Engineering, 2013, 19 (4): 97- 103
13 MAHDAVI H, TIMARCHI S Area-time-power efficient FFT architectures based on binary-signed-digit CORDIC[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2019, 66 (10): 3874- 3881
doi: 10.1109/TCSI.2019.2922988
14 MEYER-BASE U, MEYER-BASE A, HILBERG W. Coordinate rotation digital computer (CORDIC) synthesis for FPGA[C]// International Workshop on Field-programmable Logic. Berlin: Springer, 1994: 397–408.
15 VORONENKO Y, PUSCHEL M Multiplierless multiple constant multiplication[J]. ACM Transactions on Algorithms, 2007, 3 (2): 11- 49
doi: 10.1145/1240233.1240234
16 KUMM M, HARDIECK M, WILLKOMM J, et al. Multiple constant multiplication with ternary adders[C]// 23rd International Conference on Field Programmable Logic and Applications. Porto: IEEE, 2013: 1-8.
17 KUMM M. Multiple constant multiplication optimizations for programmable gate arrays [M]. Wiesbaden: Springer, 2016.
18 EHLIAR A. Optimizing Xilinx designs through primitive instantiation[C]// 7th FPGA World Conference. Copenhagen: ACM, 2010: 20-27.
19 MA Y K, LIANG H H. Implementation of a pipeline large-FFT processor based on the FPGA[C]// International Conference in Communications, Signal Processing and Systems. Harbin: Springer, 2017: 638-644.
20 NGUYEN H N, KHAN S A, KIM C H, et al A high-performance, resource-efficient, reconfigurable parallel-pipelined FFT processor for FPGA platforms[J]. Microprocessors and Microsystems, 2018, 60: 96- 106
doi: 10.1016/j.micpro.2018.04.003
21 VALENCIA D, ALIMOHAMMAD A Compact and high-throughput parameterizable architectures for memory-based FFT algorithms[J]. IET Circuits, Devices and Systems, 2019, 13 (5): 696- 703
doi: 10.1049/iet-cds.2018.5556
22 WANG Z, LIU X, HE B, et al A combined SDC-SDF architecture for normal I/O pipelined radix-2 FFT[J]. IEEE Transactions on Very Large Scale Integration Systems, 2015, 23 (5): 973- 977
doi: 10.1109/TVLSI.2014.2319335
23 NGUYEN H N, KHAN S A, KIM C H, et al A pipelined FFT processor using an optimal hybrid rotation scheme for complex multiplication: design, FPGA implementation and analysis[J]. Electronics, 2018, 7 (8): 137
doi: 10.3390/electronics7080137
[1] 孙曙光,田朋,杜太行,王景芹. 基于apFFT-AMD的密集频率谐波/间谐波检测[J]. 浙江大学学报(工学版), 2020, 54(1): 178-188.
[2] 张心怡, 杨家强, 张晓军. 基于机会约束的含多风电场动态经济调度[J]. 浙江大学学报(工学版), 2017, 51(5): 976-983.
[3] 楼文娟,王嘉伟,杨伦,陈勇. 雷暴风三维脉动风速场数值模拟[J]. 浙江大学学报(工学版), 2014, 48(7): 1162-1169.
[4] 张进高,卢琴芬,王利,叶云岳, 洪伟明. 基于FFT的直线电机地铁牵引电机功率因数角测量[J]. J4, 2012, 46(11): 1981-1984.
[5] 梅德庆 刚宪约 陈子辰. 一种快速高精度Duhamel积分算法[J]. J4, 2005, 39(8): 1152-1155.