Please wait a minute...
Front. Inform. Technol. Electron. Eng.  2011, Vol. 12 Issue (12): 976-989    DOI: 10.1631/jzus.C1100027
    
Implementation and evaluation of parallel FFT on Engineering and Scientific Computation Accelerator (ESCA) architecture
Dan Wu, Xue-cheng Zou, Kui Dai*, Jin-li Rao, Pan Chen, Zhao-xia Zheng
Department of Electronic Science & Technology, Huazhong University of Science and Technology, Wuhan 430074, China
Implementation and evaluation of parallel FFT on Engineering and Scientific Computation Accelerator (ESCA) architecture
Dan Wu, Xue-cheng Zou, Kui Dai*, Jin-li Rao, Pan Chen, Zhao-xia Zheng
Department of Electronic Science & Technology, Huazhong University of Science and Technology, Wuhan 430074, China
 全文: PDF(698 KB)  
摘要: The fast Fourier transform (FFT) is a fundamental kernel of many computation-intensive scientific applications. This paper deals with an implementation of the FFT on the accelerator system, a heterogeneous multi-core architecture to accelerate computation-intensive parallel computing in scientific and engineering applications. The Engineering and Scientific Computation Accelerator (ESCA) consists of a control unit and a single instruction multiple data (SIMD) processing element (PE) array, in which PEs communicate with each other via a hierarchical two-level network-on-chip (NoC) with high bandwidth and low latency. We exploit the architecture features of ESCA to implement a parallel FFT algorithm efficiently. Experimental results show that both the proposed parallel FFT algorithm and the ESCA architecture are scalable. The 16-bit fixed-point parallel FFT performance of ESCA is compared with a published work to prove the superiority of the mapping algorithm and the hardware architecture. The floating-point parallel FFT performances of ESCA are evaluated and compared with those of the IBM Cell processor and GPU to demonstrate the computing power of the ESCA system for high performance applications.
关键词: Fast Fourier transform (FFT)Multi-coreParallel computingSIMD    
Abstract: The fast Fourier transform (FFT) is a fundamental kernel of many computation-intensive scientific applications. This paper deals with an implementation of the FFT on the accelerator system, a heterogeneous multi-core architecture to accelerate computation-intensive parallel computing in scientific and engineering applications. The Engineering and Scientific Computation Accelerator (ESCA) consists of a control unit and a single instruction multiple data (SIMD) processing element (PE) array, in which PEs communicate with each other via a hierarchical two-level network-on-chip (NoC) with high bandwidth and low latency. We exploit the architecture features of ESCA to implement a parallel FFT algorithm efficiently. Experimental results show that both the proposed parallel FFT algorithm and the ESCA architecture are scalable. The 16-bit fixed-point parallel FFT performance of ESCA is compared with a published work to prove the superiority of the mapping algorithm and the hardware architecture. The floating-point parallel FFT performances of ESCA are evaluated and compared with those of the IBM Cell processor and GPU to demonstrate the computing power of the ESCA system for high performance applications.
Key words: Fast Fourier transform (FFT)    Multi-core    Parallel computing    SIMD
收稿日期: 2011-01-26 出版日期: 2011-11-30
CLC:  TP302.7  
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  
Dan Wu
Xue-cheng Zou
Kui Dai
Jin-li Rao
Pan Chen
Zhao-xia Zheng

引用本文:

Dan Wu, Xue-cheng Zou, Kui Dai, Jin-li Rao, Pan Chen, Zhao-xia Zheng. Implementation and evaluation of parallel FFT on Engineering and Scientific Computation Accelerator (ESCA) architecture. Front. Inform. Technol. Electron. Eng., 2011, 12(12): 976-989.

链接本文:

http://www.zjujournals.com/xueshu/fitee/CN/10.1631/jzus.C1100027        http://www.zjujournals.com/xueshu/fitee/CN/Y2011/V12/I12/976

[1] Ze-yao MO. Extreme-scale parallel computing: bottlenecks and strategies[J]. Front. Inform. Technol. Electron. Eng., 2018, 19(10): 1251-1260.
[2] Ke-shi GE, Hua-you SU, Dong-sheng LI, Xi-cheng LU. Efficient parallel implementation of a density peaks clustering algorithm on graphics processing unit[J]. Front. Inform. Technol. Electron. Eng., 2017, 18(7): 915-927.
[3] Eunsung Kim, Hyeonsang Eom, Heon Y. Yeom. Asymmetry-aware load balancing for parallel applications in single-ISA multi-core systems[J]. Front. Inform. Technol. Electron. Eng., 2012, 13(6): 413-427.
[4] Jing Zhang, Xiao-jun Chen, Jun-huai Li, Xiang Li. Task mapper and application-aware virtual machine scheduler oriented for parallel computing[J]. Front. Inform. Technol. Electron. Eng., 2012, 13(3): 155-177.
[5] Razieh Sadat Sadjady, Kamran Zamanifar. A self-routing load balancing algorithm in parallel computing: comparison to the central algorithm[J]. Front. Inform. Technol. Electron. Eng., 2011, 12(6): 455-463.
[6] Zhen-guo Ma, Feng Yu, Rui-feng Ge, Ze-ke Wang. An efficient radix-2 fast Fourier transform processor with ganged butterfly engines on field programmable gate arrays[J]. Front. Inform. Technol. Electron. Eng., 2011, 12(4): 323-329.
[7] Xue Liu, Feng Yu, Ze-ke Wang. A pipelined architecture for normal I/O order FFT[J]. Front. Inform. Technol. Electron. Eng., 2011, 12(1): 76-82.
[8] Lei Zhang, Peng Liu, Yu-ling Liu, Fei-hong Yu. High quality multi-focus polychromatic composite image fusion algorithm based on filtering in frequency domain and synthesis in space domain[J]. Front. Inform. Technol. Electron. Eng., 2010, 11(5): 365-374.