Implementation and evaluation of parallel FFT on Engineering and Scientific Computation Accelerator (ESCA) architecture

doi:10.1631/jzus.C1100027

Front. Inform. Technol. Electron. Eng.

2011, Vol. 12

Issue (12): 976-989 DOI: 10.1631/jzus.C1100027

Implementation and evaluation of parallel FFT on Engineering and Scientific Computation Accelerator (ESCA) architecture

Dan Wu, Xue-cheng Zou, Kui Dai^*, Jin-li Rao, Pan Chen, Zhao-xia Zheng

Department of Electronic Science & Technology, Huazhong University of Science and Technology, Wuhan 430074, China

Download:

PDF(698KB)
Export: BibTeX | EndNote (RIS)

Abstract The fast Fourier transform (FFT) is a fundamental kernel of many computation-intensive scientific applications. This paper deals with an implementation of the FFT on the accelerator system, a heterogeneous multi-core architecture to accelerate computation-intensive parallel computing in scientific and engineering applications. The Engineering and Scientific Computation Accelerator (ESCA) consists of a control unit and a single instruction multiple data (SIMD) processing element (PE) array, in which PEs communicate with each other via a hierarchical two-level network-on-chip (NoC) with high bandwidth and low latency. We exploit the architecture features of ESCA to implement a parallel FFT algorithm efficiently. Experimental results show that both the proposed parallel FFT algorithm and the ESCA architecture are scalable. The 16-bit fixed-point parallel FFT performance of ESCA is compared with a published work to prove the superiority of the mapping algorithm and the hardware architecture. The floating-point parallel FFT performances of ESCA are evaluated and compared with those of the IBM Cell processor and GPU to demonstrate the computing power of the ESCA system for high performance applications.

Key words： Fast Fourier transform (FFT) Multi-core Parallel computing SIMD

Received: 26 January 2011 Published: 30 November 2011

CLC:

TP302.7

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Dan Wu
	Xue-cheng Zou
	Kui Dai
	Jin-li Rao
	Pan Chen
	Zhao-xia Zheng

Cite this article:

Dan Wu, Xue-cheng Zou, Kui Dai, Jin-li Rao, Pan Chen, Zhao-xia Zheng. Implementation and evaluation of parallel FFT on Engineering and Scientific Computation Accelerator (ESCA) architecture. Front. Inform. Technol. Electron. Eng., 2011, 12(12): 976-989.

URL:

http://www.zjujournals.com/xueshu/fitee/10.1631/jzus.C1100027 OR http://www.zjujournals.com/xueshu/fitee/Y2011/V12/I12/976

Implementation and evaluation of parallel FFT on Engineering and Scientific Computation Accelerator (ESCA) architecture

The fast Fourier transform (FFT) is a fundamental kernel of many computation-intensive scientific applications. This paper deals with an implementation of the FFT on the accelerator system, a heterogeneous multi-core architecture to accelerate computation-intensive parallel computing in scientific and engineering applications. The Engineering and Scientific Computation Accelerator (ESCA) consists of a control unit and a single instruction multiple data (SIMD) processing element (PE) array, in which PEs communicate with each other via a hierarchical two-level network-on-chip (NoC) with high bandwidth and low latency. We exploit the architecture features of ESCA to implement a parallel FFT algorithm efficiently. Experimental results show that both the proposed parallel FFT algorithm and the ESCA architecture are scalable. The 16-bit fixed-point parallel FFT performance of ESCA is compared with a published work to prove the superiority of the mapping algorithm and the hardware architecture. The floating-point parallel FFT performances of ESCA are evaluated and compared with those of the IBM Cell processor and GPU to demonstrate the computing power of the ESCA system for high performance applications.

关键词： Fast Fourier transform (FFT), Multi-core, Parallel computing, SIMD

[1]	Ze-yao MO. Extreme-scale parallel computing: bottlenecks and strategies[J]. Front. Inform. Technol. Electron. Eng., 2018, 19(10): 1251-1260.

[2]	Ke-shi GE, Hua-you SU, Dong-sheng LI, Xi-cheng LU. Efficient parallel implementation of a density peaks clustering algorithm on graphics processing unit[J]. Front. Inform. Technol. Electron. Eng., 2017, 18(7): 915-927.

[3]	Michaelraj Kingston Roberts, Ramesh Jayabalan. An improved low-complexity sum-product decoding algorithm for low-density parity-check codes[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(6): 511-518.

[4]	Zhi-xiang Chen, Zhao-lin Li, Shan Cao, Fang Wang, Jie Zhou. Schedule refinement for homogeneous multi-core processors in the presence of manufacturing-caused heterogeneity[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(12): 1018-1033.

[5]	Mei Wen, Da-fei Huang, Chang-qing Xun, Dong Chen. Improving performance portability for GPU-specific OpenCL kernels on multi-core/many-core CPUs by analysis-based transformations[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(11): 899-916.

[6]	Eunsung Kim, Hyeonsang Eom, Heon Y. Yeom. Asymmetry-aware load balancing for parallel applications in single-ISA multi-core systems[J]. Front. Inform. Technol. Electron. Eng., 2012, 13(6): 413-427.

[7]	Jing Zhang, Xiao-jun Chen, Jun-huai Li, Xiang Li. Task mapper and application-aware virtual machine scheduler oriented for parallel computing[J]. Front. Inform. Technol. Electron. Eng., 2012, 13(3): 155-177.

[8]	Razieh Sadat Sadjady, Kamran Zamanifar. A self-routing load balancing algorithm in parallel computing: comparison to the central algorithm[J]. Front. Inform. Technol. Electron. Eng., 2011, 12(6): 455-463.

[9]	Zhen-guo Ma, Feng Yu, Rui-feng Ge, Ze-ke Wang. An efficient radix-2 fast Fourier transform processor with ganged butterfly engines on field programmable gate arrays[J]. Front. Inform. Technol. Electron. Eng., 2011, 12(4): 323-329.

[10]	Xue Liu, Feng Yu, Ze-ke Wang. A pipelined architecture for normal I/O order FFT[J]. Front. Inform. Technol. Electron. Eng., 2011, 12(1): 76-82.

[11]	Lei Zhang, Peng Liu, Yu-ling Liu, Fei-hong Yu. High quality multi-focus polychromatic composite image fusion algorithm based on filtering in frequency domain and synthesis in space domain[J]. Front. Inform. Technol. Electron. Eng., 2010, 11(5): 365-374.

Viewed

Full text

Abstract

Cited

Shared

Discussed