Please wait a minute...
Front. Inform. Technol. Electron. Eng.  2012, Vol. 13 Issue (9): 711-718    DOI: 10.1631/jzus.C1200043
    
A floating point conversion algorithm for mixed precision computations
Choon Lih Hoo, Sallehuddin Mohamed Haris, Nik Abdullah Nik Mohamed
Department of Mechanical and Materials Engineering, Universiti Kebangsaan Malaysia, UKM Bangi 43600, Malaysia
A floating point conversion algorithm for mixed precision computations
Choon Lih Hoo, Sallehuddin Mohamed Haris, Nik Abdullah Nik Mohamed
Department of Mechanical and Materials Engineering, Universiti Kebangsaan Malaysia, UKM Bangi 43600, Malaysia
 全文: PDF 
摘要: The floating point number is the most commonly used real number representation for digital computations due to its high precision characteristics. It is used on computers and on single chip applications such as DSP chips. Double precision (64-bit) representations allow for a wider range of real numbers to be denoted. However, single precision (32-bit) operations are more efficient. Recently, there has been an increasing interest in mixed precision computations which take advantage of single precision efficiency on 64-bit numbers. This calls for the ability to interchange between the two formats. In this paper, an algorithm that converts floating point numbers from 64- to 32-bit representations is presented. The algorithm was implemented as a Verilog code and tested on field programmable gate array (FPGA) using the Quartus II DE2 board and Agilent 16821A portable logic analyzer. Results indicate that the algorithm can perform the conversion reliably and accurately within a constant execution time of 25 ns with a 20 MHz clock frequency regardless of the number being converted.
关键词: Double precisionSingle precisionFPGAVerilogHooHar algorithm    
Abstract: The floating point number is the most commonly used real number representation for digital computations due to its high precision characteristics. It is used on computers and on single chip applications such as DSP chips. Double precision (64-bit) representations allow for a wider range of real numbers to be denoted. However, single precision (32-bit) operations are more efficient. Recently, there has been an increasing interest in mixed precision computations which take advantage of single precision efficiency on 64-bit numbers. This calls for the ability to interchange between the two formats. In this paper, an algorithm that converts floating point numbers from 64- to 32-bit representations is presented. The algorithm was implemented as a Verilog code and tested on field programmable gate array (FPGA) using the Quartus II DE2 board and Agilent 16821A portable logic analyzer. Results indicate that the algorithm can perform the conversion reliably and accurately within a constant execution time of 25 ns with a 20 MHz clock frequency regardless of the number being converted.
Key words: Double precision    Single precision    FPGA    Verilog    HooHar algorithm
收稿日期: 2012-02-21 出版日期: 2012-09-05
CLC:  TN402  
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  
Choon Lih Hoo
Sallehuddin Mohamed Haris
Nik Abdullah Nik Mohamed

引用本文:

Choon Lih Hoo, Sallehuddin Mohamed Haris, Nik Abdullah Nik Mohamed. A floating point conversion algorithm for mixed precision computations. Front. Inform. Technol. Electron. Eng., 2012, 13(9): 711-718.

链接本文:

http://www.zjujournals.com/xueshu/fitee/CN/10.1631/jzus.C1200043        http://www.zjujournals.com/xueshu/fitee/CN/Y2012/V13/I9/711

[1] Xue Liu, Qing-xu Deng, Bo-ning Hou, Ze-ke Wang. High-speed, fixed-latency serial links with Xilinx FPGAs[J]. Front. Inform. Technol. Electron. Eng., 2014, 15(2): 153-160.
[2] Zhen-guo Ma, Feng Yu, Rui-feng Ge, Ze-ke Wang. An efficient radix-2 fast Fourier transform processor with ganged butterfly engines on field programmable gate arrays[J]. Front. Inform. Technol. Electron. Eng., 2011, 12(4): 323-329.
[3] Kui-kang Cao, Hai-bin Shen, Hua-feng Chen. A parallel and scalable digital architecture for training support vector machines[J]. Front. Inform. Technol. Electron. Eng., 2010, 11(8): 620-628.