Please wait a minute...
JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE)
Telecommunication Technolgy     
Memory-based FFT processor for arbitrary 2k-point FFT computations
XIA Kai feng, ZHOU Xiao ping, WU Bin
Institute of Microelectronics of Chinese Academy of Sciences, Beijing 100029, China
Download:   PDF(847KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

An efficient memory-based fast Fourier transform (FFT) processor with parallel conflict-free address scheme was designed for arbitrary 2k-point FFT computations. The address scheme can support in-place strategy, continuous-flow mode, and variable sizes for arbitrary-long sized FFTs.  The available throughput of FFT processor is decided by every stage's data decimation restriction set number according to this scheme. Then the address scheme can adjust the processor throughput by changing the computation radix and the parallelism of the arithmetic processing units. A configurable 128~2 048 point FFT processor in LTE system was designed to verify the availability of this scheme. The FFT processor occupies 0.615 mm2 core area and 32.4 mW power consumption at 122.88 MHz frequency in SMIC 55nm technology. The ASIC results show that the proposed address scheme has excellent point flexibility, hardware efficiency, and can support almost any 2k-point FFT implementations.



Published: 01 November 2016
CLC:  TN 47  
  TN 914.3  
Cite this article:

XIA Kai feng, ZHOU Xiao ping, WU Bin. Memory-based FFT processor for arbitrary 2k-point FFT computations. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2016, 50(11): 2239-2244.

URL:

http://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2016.11.027     OR     http://www.zjujournals.com/eng/Y2016/V50/I11/2239


任意2k点存储器结构傅里叶处理器

针对任意2k点数快速傅里叶变换(FFT)运算,设计并实现一种拥有并行地址无冲突策略的存储器结构FFT处理器.该策略可以支持原位回存,连续帧计算模式,可变多种点数和任意2k长度的FFT运算.通过这种地址策略,FFT处理器所能达到的吞吐率由每一级抽取时的限制条件集合个数所决定.因此这种地址策略可以通过改变计算单元基底和调整计算单元并行度的方式可控地调整吞吐率.为了验证本地址策略的可行性,设计一款应用于长期演进(LTE)系统的128~2 048点的可配置FFT处理器.处理器采用中芯国际55 nm CMOS工艺实现,在122.88 MHz工作频率下内核面积为0.615 mm2,功耗为32.4 mW.FFT处理器的ASIC结果表明所提策略具有优秀的计算长度灵活性,硬件效率,可以支持任意2k长度的FFT计算.

[1] PEASE M C. Organization of large scale Fourier processors [J]. Journal of the Association for Computing Machinery, 1969, 16(3): 474-482.
[2] COHEN D. Simplified control of FFT hardware [J]. IEEE Transactions on Acoustics, Speech and Signal Processing, 1976, 24(6): 577-579.
[3] JOHNSON L G. Conflict free memory addressing for dedicated FFT hardware [J]. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, 1992 39(5): 312-316.
[4] JO B G, SUNWOO M H. New continuousflow mixedradix (CFMR) FFT processor using novel inplace strategy [J]. IEEE Transactions on Circuits and Systems IRegular Papers, 2005, 52(5): 911-919.
[5] BAEK J, CHOI K. New address generation scheme for memorybased FFT processor using multiple radix2 butterflies [C]∥ 2008 International SoC Design Conference. Busan, Korea: IEEE, 2008, I273-276.
[6] TSAI P, LIN C, A Generalized conflictfree memory addressing scheme for continuousfow parallelprocessing FFT processors with rescheduling [J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2011, 19(12): 2290-2302.
[7] HUANG S, CHEN S, A highthroughput radix16 FFT processor with parallel and normal input/output ordering for IEEE 802153c systems [J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2012, 59(8): 1752-1765.
[8] CHO T, LEE H. A highspeed lowcomplexity modified radix25 FFT processor for high rate WPAN applications [J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2013, 21(1): 187-191.
[9] WANG C, YAN Y, FU X. A highthroughput low complexity radix2422 23 FFT/IFFT processor with parallel and normal input/output order for IEEE 802-11ad systems [J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2015, 23(11): 2728-2732.
[10] KIM E J, LEE J H, SUNWOO M H. Novel shared multiplier scheduling scheme for areaefficient FFT/IFFT processors [J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2015, 23(9): 1689-1699.
[11] YANG C H, YU T H, MARKOVIC D. Power and area minimization of reconfigurable FFT processors: A 3GPPLTE example [J]. IEEE Journal of SolidState Circuits , 2012, 47(3): 757-768.
[12] PENG S, SHR K, CHEN C et al. Energy-efficient 128 ~ 2048/1536-point FFT processor with resource block mapping for 3GPP-LTE system [C]∥2010 International Conference on Green Circuits and Systems. Shanghai, China: IEEE, 2010: 14-17.
[13] YU C, YEN M H. Areaefficient 128 2 048/1 536point pipeline FFT processor for LTE and mobile WiMaX systems [J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2015, 23(9): 1793-1800.

[1] CHEN Chao, LUO Xiao-hua, CHEN Shu-qun, YU Guo-jun. Optimizing implementation of Gaussian filter based on field programmable gate array[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(5): 969-975.
[2] LAN Fan, PAN Yun, YAN Xiao lang, HUAN Ruo hong, CHENG Kwang ting. GPU acceleration for network-on-chip yield evaluation[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(1): 160-167.
[3] WANG Shu peng, HUANG Kai, YAN Xiao lang. Coverage directed test generation based on genetic algorithm[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2016, 50(3): 580-588.
[4] HAN Xiao xia, HAN Yan. Layout optimization of parametric yield by filling dummy polysilicon pattern[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2015, 49(12): 2333-2339.
[5] GAO Shi-yi, LUO Xiao-hua, LU Yu-feng, LIU Fu-chun, ZHANG Chen-qiu. Functional coverage convergence technique based on genetic algorithm[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2015, 49(8): 1509-1515.
[6] XIU Si-wen, LI Yan-zhe, HUANG Kai, MA De, YAN Rong-jie, YAN Xiao-lang. Cache modeling for MPSoC performance estimation[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2015, 49(7): 1367-1375.
[7] XIU Si-wen, HUANG Kai, YU Min, XIE Tian-yi,GE Hai-tong, YAN Xiao-lang. Cache coherence protocol and implementation for multiprocessors with no-write-allocate caches[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2015, 49(2): 351-359.
[8] TAN Teng-fei, MA De, HUANG Kai, MA Qi. Power-efficient image blending engine design based on self-adaptive pipeline[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2015, 49(1): 27-35.
[9] WANG Yu-bo, HUANG Kai, CHEN Chen, FENG Jiong, GE Hai-tong, YAN Xiao-lang. Embedded Flash data fetching acceleration techniques and implementation[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2014, 48(9): 1570-1579.
[10] XIU Si-wen, HUANG Kai, YU Min, XIE Tian-yi, GE Hai-tong, YAN Xiao-lang. Cache coherence protocol and implementation for multiprocessors with no-write-allocate caches[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2014, 48(9): 1-9.
[11] HUANG Kai-jie, HUANG Kai, MA De, WANG Yu-bo,FENG Jiong, GE Hai-tong, YAN Xiao-la. IP-XACT standard based SoC design methodology[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2013, 47(10): 1770-1776.
[12] XIANG Xiao-yan, CHEN Zhi-jian, MENG Jian-yi, YAN Xiao-lang. Low power instruction cache based on adjacent line linking access[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2013, 47(7): 1213-1217.
[13] CHEN Zhi-jian, MENG Jian-yi, GE Hai-tong, YAN Xiao-lang. Translation lookaside buffer  design  based on
dynamic memory page merging
[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2012, 46(1): 118-122.
[14] CHEN Zhi-jian, MENG Jian-yi, GE Hai-tong, YAN Xiao-lang. High performance hardware stack for seamless context switching[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2011, 45(9): 1587-1592.
[15] ZHANG Yang, WANG Xiu-min, CHEN Hao-wei. FPGA based design of LDPC encoder[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2011, 45(9): 1582-1586.