Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2021, Vol. 55 Issue (6): 1199-1207    DOI: 10.3785/j.issn.1008-973X.2021.06.021
    
Hardware efficient FFT design based on improved rotation factor
Yang LUO(),Wei ZHANG*()
School of Microelectronic, Tianjin University, Tianjin 300072, China
Download: HTML     PDF(1304KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A hardware efficient Radix-22 fast Fourier transform based on a single-path delay feedback architecture was designed aiming at the problem that the rotation factor module takes up more resources in FFT hardware implementation. The method of mixing CORDIC and MCM was adopted to design the rotation factor module to realize FFT architecture without conventional multiplier and DSP48E resource. MCM method based on ternary adders was used to minimize the number of adders for the W16 rotation factor module with less rotation angles. CORDIC method was adopted for the rotation factor modules of W64, W256 and W1 024 with more rotation angles. The real-time generation module of rotation angles was designed according to the mathematical law of rotation angles. The method does not need to occupy ROM resources and avoids complex addressing logic and timing control compared with the traditional CORDIC method. The designed 16 bit 64-point FFT improves the throughput per slice by 35.20% on Xilinx Virtex-7, the 256-point FFT improves by 30.37% on Virtex-5, and the 1 024-point FFT improves by 25.38% on Virtex-7 compared with other architectures.



Key wordsfast Fourier transform      single-path delay feedback architecture      constant multiplier      coordinate rotation digital computer method     
Received: 07 June 2020      Published: 30 July 2021
CLC:  TN 47  
Fund:  光电信息控制和安全技术重点实验室资助项目(JCKY2019210C053)
Corresponding Authors: Wei ZHANG     E-mail: 2018232032@tju.edu.cn;tjuzhangwei@tju.edu.cn
Cite this article:

Yang LUO,Wei ZHANG. Hardware efficient FFT design based on improved rotation factor. Journal of ZheJiang University (Engineering Science), 2021, 55(6): 1199-1207.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2021.06.021     OR     https://www.zjujournals.com/eng/Y2021/V55/I6/1199


基于改进旋转因子的高性能FFT硬件设计

针对FFT硬件实现中旋转因子模块占用资源较多的问题,设计高性能单路延时反馈结构的基22快速傅里叶变换. 采用CORDIC与MCM混合的方法设计旋转因子模块,实现了无需常规乘法器的FFT架构,不必占用DSP48E资源. 对于旋转角度数量较少的W16旋转因子模块,采用基于三输入加法器的MCM方法设计,将加法器数量降到最低. 对于旋转角度数量较多的W64W256W1 024模块,采用CORDIC方法设计. 依据旋转角度的数学规律,设计旋转角度实时生成模块,与传统的CORDIC方法相比,不需要占用ROM资源,避免了复杂的寻址逻辑和时序控制. 与其他构架相比,设计的16 bit 64点快速傅里叶变换在Xilinx Virtex-7上将单位slice吞吐率提高了35.20%,256点FFT在Virtex-5上提高了30.37%,1 024点FFT在Virtex-7上提高了25.38%.


关键词: 快速傅里叶变换,  单路延迟反馈架构,  常数乘法器,  坐标旋转数字计算方法 
Fig.1 Radix 22 SDF FFT architecture for 1 024-Point
计数器 计数延迟 计数器位 控制信号 控制模块
Counter_1 0clk Counter_1[9] Control_10 BF10、TR10
Counter_1 0clk Counter_1[8] Control_9 BF9、W1 024
Counter_2 24clk Counter_2[7] Control_8 BF8、TR8
Counter_2 24clk Counter_2[6] Control_7 BF7、W256
Counter_3 24clk Counter_3[5] Control_6 BF6、TR6
Counter_3 24clk Counter_3[4] Control_5 BF5、W64
Counter_4 24clk Counter_4[3] Control_4 BF4、TR4
Counter_4 24clk Counter_4[2] Control_3 BF3、W16
Counter_5 3clk Counter_5[1] Control_2 BF2、TR2
Counter_5 3clk Counter_5[0] Control_1 BF1
Tab.1 Table of control signal
Fig.2 Radix 22 SDF FFT sequence diagram for 1 024-point
Fig.3 Structure of butterfly elements
Fig.4 Butterfly element sequence diagram
Fig.5 Multiplier of trivial rotation (-j)
Counter_4 ${ {\rm{exp} }\;({ {\rm{ - j} }2{\text{π}} nk/N}) }$ n θ 实部 虚部
0、4、8、12~15 ${{\rm{e}}^0}$ 0 0 1 0
5 ${ {\rm{exp} }\;({ {\rm{ - j} }{\text{π}} k/8}) }$ 1 π/8 0.923 9 ?0.382 7
1、6 ${ {\rm{exp} }\;({ {\rm{ - j} }{\text{π}} k/4} )}$ 2 π/4 0.707 1 ?0.707 1
7、9 ${ {\rm{exp} }\;({ {\rm{ - j3} }{\text{π}} k/8}) }$ 3 3π/8 0.382 7 ?0.923 9
2 ${ {\rm{exp} }\;({ {\rm{ - j} }{\text{π}} k/2}) }$ 4 π/2 0 ?1
3、10 ${ {\rm{exp} }\;({ {\rm{ - j3} }{\text{π}} k/4}) }$ 6 3π/4 ?0.707 1 ?0.707 1
11 ${ {\rm{exp} }\;({ {\rm{ - j9} }{\text{π}} k/8}) }$ 9 9π/8 ?0.923 9 0.382 7
Tab.2 Coefficients of rotation W16
Fig.6 Pipelined adder graph of MCM with coefficients {236、97、181}
Fig.7 Multiplier of general rotation W16
Fig.8 Structure of pipelined CORDIC unit
旋转角度Z0 角度预处理 X0 Y0
[0, π/2] Z Data_im Data_re
[π/2, π] Z ? π/2 ?Data_re Data_im
[π, 3π/2] Z ? π ?Data_im ?Data_re
Tab.3 Pretreatment of rotation angle
Fig.9 Angle generator module
k1 k2 n3 $W_N^{{n_3}\left( {{k_1} + 2{k_2}} \right)}$ 旋转角度
0 0 0, 1,···, N/4?1 $W_N^0$ 0
0 1 0, 1,···, N/4?1 $W_N^{2{n_3}}$ $0,\dfrac{ { {\rm{4} }{\text{π} } } }{ {{N} } },\dfrac{ { {\rm{8} }{\text{π} } } }{ {{N} } }, \ldots ,\dfrac{ { {\rm{4} }{\text{π} } } }{ {{N} } }\left( {\dfrac{ {{N} } }{ {\rm{4} } } - 1} \right)$
1 0 0, 1,···, N/4?1 $W_N^{{n_3}}$ $0,\dfrac{ { {\rm{2} }{\text{π} } } }{ {{N} } },\dfrac{ {4{\text{π} } } }{ {{N} } }, \ldots ,\dfrac{ { {\rm{2} }{\text{π} } } }{ {{N} } }\left( {\dfrac{ {{N} } }{ {\rm{4} } } - 1} \right)$
1 1 0, 1,···, N/4?1 $W_N^{3{n_3}}$ $0,\dfrac{ { {\rm{6} }{\text{π} } } }{ {{N} } },\dfrac{ { {\rm{12} }{\text{π} } } }{ {{N} } }, \ldots ,\dfrac{ { {\rm{6} }{\text{π} } } }{ {{N} } }\left( {\dfrac{ {{N} } }{ {\rm{4} } } - 1} \right)$
Tab.4 Mathematical law of rotation angles
Fig.10 Pipelined adder graph of SCM with coefficient M
Fig.11 FFT result of signal superimposed by DC component and cosine component
FFT结果 结果幅值 信号幅值 相对误差/% 相位 相对误差/%
第0点 (64, 0) 64 0.062 5 0 ? ?
第100点 (55.418, ?32.031) 64.008 9 0.125 02 0.013 9 ?30.027 4 0.091 5
第300点 (0.031, 127.949) 127.949 0 0.249 9 0.039 8 89.986 1 0.015 4
第724点 (0.059, ?127.910) 127.910 0 0.249 8 0.070 3 ?89.973 6 0.02936
第924点 (55.523, 32.082) 64.125 3 0.125 24 0.195 8 30.019 9 0.066 5
Tab.5 Table of signal frequency domain characteristic test data
方法 FPGA型号 N Slices LUTs Regs DSP48Es slice总量 f /MHz Tp /(MS·s?1 Tp,u /(kS·s?1
实际数 等效slice数
文献[21]方法 V7 64 ? 848 626 8 4 000 5 474 335 335 61.20
本文方法 V7 64 667 2 423 1 924 0 0 4 347 359.69 359.69 82.74
文献[22]方法 V5 256 ? 1 733 1 073 12 6 000 8 806 297 297 33.72
本文方法 V5 256 1 093 3 925 3 258 0 0 7 183 315.77 315.77 43.96
文献[23]方法 V7 1 024 ? 11 865 1 393 0 0 13 258 200 200 15.08
文献[21]方法 V7 1 024 ? 1 671 2 065 25 12 500 16 236 317 317 19.52
文献[20]方法 V7 1 024 ? 12 737 2 715 0 0 15 452 200 400 25.89
本文方法 V7 1 024 1 645 6 098 4 785 0 0 10 883 353.21 353.21 32.46
Tab.6 FPGA implementation results of circuit performance and hardware resources for different architectures
理想系数 扩大取整后系数 实际旋转因子系数 相对误差/%
0.923 9 236 0.921 8 0.227
0.382 7 98 0.382 8 0.026
0.707 1 181 0.707 0 0.014
Tab.7 Rotation angle error of W16
[1]   GROGINSKY H L, WORKS G A A pipeline fast Fourier transform[J]. IEEE Transactions on Computers, 1970, 19 (11): 1015- 1019
[2]   HE S, TORKELSON M. A new approach to pipeline FFT processor[C]// International Parallel Processing Symposium. Hawaii: IEEE, 1996: 766-770.
[3]   GARRIDO M, QURESHI F, TAKALA J, et al. Hardware architectures for the fast Fourier transform[M]//SHUVRA S, LEUPERS R, TAKALA J, et al. Handbook of signal processing systems. Switzerland: Springer, 2018: 613-648.
[4]   QURESHI F. Optimization of rotations in FFTs[D]. Linköping: Linköping University, 2012.
[5]   GARRIDO M, HUANG S, CHEN S, et al The serial commutator FFT[J]. IEEE Transactions on Circuits and Systems II: Express Briefs, 2016, 63 (10): 974- 978
doi: 10.1109/TCSII.2016.2538119
[6]   INGEMARSSON C, KALLSTROM P, GUSTAFSSON O. Using DSP block pre-adders in pipeline SDF FFT implementations in contemporary FPGAs[C]// 22nd International Conference on Field Programmable Logic and Applications. Oslo: IEEE, 2012: 71-74.
[7]   INGEMARSSON C, GUSTAFSSON O SFF: the single-stream FPGA-optimized feedforward FFT hardware architecture[J]. Journal of Signal Processing Systems, 2018, 90 (11): 1583- 1592
doi: 10.1007/s11265-018-1370-y
[8]   MA Z G, YIN X B, YU F A novel memory-based FFT architecture for real-valued signals based on a radix-2 decimation-in-frequency algorithm[J]. IEEE Transactions on Circuits and Systems II: Express Briefs, 2015, 62 (9): 876- 880
doi: 10.1109/TCSII.2015.2435522
[9]   TANG A M, YU L, HAN F J, et al. CORDIC-based FFT real-time processing design and FPGA implementation[C]// 12th International Colloquium on Signal Processing and its Applications. Malacca: IEEE, 2016: 233-236.
[10]   SHI J Y, TIAN Y H, WANG M X, et al. A novel design of 1024-point pipelined FFT processor based on Cordic algorithm[C]// 2nd International Conference on Intelligent System Design and Engineering Application. Sanya: IEEE, 2012: 80-83.
[11]   MANKAR A, PRASAD N, DAS A D, et al Multiplier: less VLSI architectures for radix‐22 folded pipelined complex FFT core [J]. International Journal of Circuit Theory and Applications, 2015, 43 (11): 1743- 1758
doi: 10.1002/cta.2038
[12]   ZHANG J F, LIU H Z, CHEN T, et al Enhanced hardware efficient FFT processor based on adaptive recoding CORDIC[J]. Electronics and Electrical Engineering, 2013, 19 (4): 97- 103
[13]   MAHDAVI H, TIMARCHI S Area-time-power efficient FFT architectures based on binary-signed-digit CORDIC[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2019, 66 (10): 3874- 3881
doi: 10.1109/TCSI.2019.2922988
[14]   MEYER-BASE U, MEYER-BASE A, HILBERG W. Coordinate rotation digital computer (CORDIC) synthesis for FPGA[C]// International Workshop on Field-programmable Logic. Berlin: Springer, 1994: 397–408.
[15]   VORONENKO Y, PUSCHEL M Multiplierless multiple constant multiplication[J]. ACM Transactions on Algorithms, 2007, 3 (2): 11- 49
doi: 10.1145/1240233.1240234
[16]   KUMM M, HARDIECK M, WILLKOMM J, et al. Multiple constant multiplication with ternary adders[C]// 23rd International Conference on Field Programmable Logic and Applications. Porto: IEEE, 2013: 1-8.
[17]   KUMM M. Multiple constant multiplication optimizations for programmable gate arrays [M]. Wiesbaden: Springer, 2016.
[18]   EHLIAR A. Optimizing Xilinx designs through primitive instantiation[C]// 7th FPGA World Conference. Copenhagen: ACM, 2010: 20-27.
[19]   MA Y K, LIANG H H. Implementation of a pipeline large-FFT processor based on the FPGA[C]// International Conference in Communications, Signal Processing and Systems. Harbin: Springer, 2017: 638-644.
[20]   NGUYEN H N, KHAN S A, KIM C H, et al A high-performance, resource-efficient, reconfigurable parallel-pipelined FFT processor for FPGA platforms[J]. Microprocessors and Microsystems, 2018, 60: 96- 106
doi: 10.1016/j.micpro.2018.04.003
[21]   VALENCIA D, ALIMOHAMMAD A Compact and high-throughput parameterizable architectures for memory-based FFT algorithms[J]. IET Circuits, Devices and Systems, 2019, 13 (5): 696- 703
doi: 10.1049/iet-cds.2018.5556
[22]   WANG Z, LIU X, HE B, et al A combined SDC-SDF architecture for normal I/O pipelined radix-2 FFT[J]. IEEE Transactions on Very Large Scale Integration Systems, 2015, 23 (5): 973- 977
doi: 10.1109/TVLSI.2014.2319335
[23]   NGUYEN H N, KHAN S A, KIM C H, et al A pipelined FFT processor using an optimal hybrid rotation scheme for complex multiplication: design, FPGA implementation and analysis[J]. Electronics, 2018, 7 (8): 137
doi: 10.3390/electronics7080137
[1] Shu-guang SUN,Peng TIAN,Tai-hang DU,Jing-qin WANG. Dense frequency harmonic/interharmonic detection based on apFFT-AMD[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(1): 178-188.
[2] ZHANG Xin-yi, YANG Jia-qiang, ZHANG Xiao-jun. Dynamic economic dispatch incorporating multiple wind farms based on FFT simplified chance constrained programming[J]. Journal of ZheJiang University (Engineering Science), 2017, 51(5): 976-983.
[3] LOU Wen-juan, WANG Jia-wei, YANG Lun, CHEN Yong. Simulation of three-dimensional fluctuating wind velocity field #br# upon thunderstorm downburst[J]. Journal of ZheJiang University (Engineering Science), 2014, 48(7): 1162-1169.
[4] ZHANG Jin-gao, LU Qin-fen, WANG Li, YE Yun-yue, Hong Wei-ming. Measurement of power factor angle of linear metro’s traction
motor based on FFT
[J]. Journal of ZheJiang University (Engineering Science), 2012, 46(11): 1981-1984.