Please wait a minute...
Front. Inform. Technol. Electron. Eng.  2011, Vol. 12 Issue (12): 976-989    DOI: 10.1631/jzus.C1100027
    
Implementation and evaluation of parallel FFT on Engineering and Scientific Computation Accelerator (ESCA) architecture
Dan Wu, Xue-cheng Zou, Kui Dai*, Jin-li Rao, Pan Chen, Zhao-xia Zheng
Department of Electronic Science & Technology, Huazhong University of Science and Technology, Wuhan 430074, China
Download:   PDF(698KB)
Export: BibTeX | EndNote (RIS)      

Abstract  The fast Fourier transform (FFT) is a fundamental kernel of many computation-intensive scientific applications. This paper deals with an implementation of the FFT on the accelerator system, a heterogeneous multi-core architecture to accelerate computation-intensive parallel computing in scientific and engineering applications. The Engineering and Scientific Computation Accelerator (ESCA) consists of a control unit and a single instruction multiple data (SIMD) processing element (PE) array, in which PEs communicate with each other via a hierarchical two-level network-on-chip (NoC) with high bandwidth and low latency. We exploit the architecture features of ESCA to implement a parallel FFT algorithm efficiently. Experimental results show that both the proposed parallel FFT algorithm and the ESCA architecture are scalable. The 16-bit fixed-point parallel FFT performance of ESCA is compared with a published work to prove the superiority of the mapping algorithm and the hardware architecture. The floating-point parallel FFT performances of ESCA are evaluated and compared with those of the IBM Cell processor and GPU to demonstrate the computing power of the ESCA system for high performance applications.

Key wordsFast Fourier transform (FFT)      Multi-core      Parallel computing      SIMD     
Received: 26 January 2011      Published: 30 November 2011
CLC:  TP302.7  
Cite this article:

Dan Wu, Xue-cheng Zou, Kui Dai, Jin-li Rao, Pan Chen, Zhao-xia Zheng. Implementation and evaluation of parallel FFT on Engineering and Scientific Computation Accelerator (ESCA) architecture. Front. Inform. Technol. Electron. Eng., 2011, 12(12): 976-989.

URL:

http://www.zjujournals.com/xueshu/fitee/10.1631/jzus.C1100027     OR     http://www.zjujournals.com/xueshu/fitee/Y2011/V12/I12/976


Implementation and evaluation of parallel FFT on Engineering and Scientific Computation Accelerator (ESCA) architecture

The fast Fourier transform (FFT) is a fundamental kernel of many computation-intensive scientific applications. This paper deals with an implementation of the FFT on the accelerator system, a heterogeneous multi-core architecture to accelerate computation-intensive parallel computing in scientific and engineering applications. The Engineering and Scientific Computation Accelerator (ESCA) consists of a control unit and a single instruction multiple data (SIMD) processing element (PE) array, in which PEs communicate with each other via a hierarchical two-level network-on-chip (NoC) with high bandwidth and low latency. We exploit the architecture features of ESCA to implement a parallel FFT algorithm efficiently. Experimental results show that both the proposed parallel FFT algorithm and the ESCA architecture are scalable. The 16-bit fixed-point parallel FFT performance of ESCA is compared with a published work to prove the superiority of the mapping algorithm and the hardware architecture. The floating-point parallel FFT performances of ESCA are evaluated and compared with those of the IBM Cell processor and GPU to demonstrate the computing power of the ESCA system for high performance applications.

关键词: Fast Fourier transform (FFT),  Multi-core,  Parallel computing,  SIMD 
[1] Ze-yao MO. Extreme-scale parallel computing: bottlenecks and strategies[J]. Front. Inform. Technol. Electron. Eng., 2018, 19(10): 1251-1260.
[2] Ke-shi GE, Hua-you SU, Dong-sheng LI, Xi-cheng LU. Efficient parallel implementation of a density peaks clustering algorithm on graphics processing unit[J]. Front. Inform. Technol. Electron. Eng., 2017, 18(7): 915-927.
[3] Michaelraj Kingston Roberts, Ramesh Jayabalan. An improved low-complexity sum-product decoding algorithm for low-density parity-check codes[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(6): 511-518.
[4] Zhi-xiang Chen, Zhao-lin Li, Shan Cao, Fang Wang, Jie Zhou. Schedule refinement for homogeneous multi-core processors in the presence of manufacturing-caused heterogeneity[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(12): 1018-1033.
[5] Mei Wen, Da-fei Huang, Chang-qing Xun, Dong Chen. Improving performance portability for GPU-specific OpenCL kernels on multi-core/many-core CPUs by analysis-based transformations[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(11): 899-916.
[6] Eunsung Kim, Hyeonsang Eom, Heon Y. Yeom. Asymmetry-aware load balancing for parallel applications in single-ISA multi-core systems[J]. Front. Inform. Technol. Electron. Eng., 2012, 13(6): 413-427.
[7] Jing Zhang, Xiao-jun Chen, Jun-huai Li, Xiang Li. Task mapper and application-aware virtual machine scheduler oriented for parallel computing[J]. Front. Inform. Technol. Electron. Eng., 2012, 13(3): 155-177.
[8] Razieh Sadat Sadjady, Kamran Zamanifar. A self-routing load balancing algorithm in parallel computing: comparison to the central algorithm[J]. Front. Inform. Technol. Electron. Eng., 2011, 12(6): 455-463.
[9] Zhen-guo Ma, Feng Yu, Rui-feng Ge, Ze-ke Wang. An efficient radix-2 fast Fourier transform processor with ganged butterfly engines on field programmable gate arrays[J]. Front. Inform. Technol. Electron. Eng., 2011, 12(4): 323-329.
[10] Xue Liu, Feng Yu, Ze-ke Wang. A pipelined architecture for normal I/O order FFT[J]. Front. Inform. Technol. Electron. Eng., 2011, 12(1): 76-82.
[11] Lei Zhang, Peng Liu, Yu-ling Liu, Fei-hong Yu. High quality multi-focus polychromatic composite image fusion algorithm based on filtering in frequency domain and synthesis in space domain[J]. Front. Inform. Technol. Electron. Eng., 2010, 11(5): 365-374.