Please wait a minute...
JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE)
Electrical Engineering     
Optimizing implementation of Gaussian filter based on field programmable gate array
CHEN Chao, LUO Xiao-hua, CHEN Shu-qun, YU Guo-jun
1. College of Electrical Engineering, Zhejiang University, Hangzhou 310027, China;
2. Spreadtrum Technology(Hangzhou) limited company, Hangzhou 310052, China
Download:   PDF(1704KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A hardware optimizing implementation with compressing attends was proposed for the problem that the critical path of traditional Gaussian filter hardware implementation is long and the logic delay is large. In the Gaussian filter hardware optimizing implementation process, shift operation was used to achieve multiplication and division calculation in order to avoid the use of multipliers and dividers. Three kinds of circuit structure, i.e. Carry Save Adder (CSA), 4-2 compressor based on two multiplexer (MUX), and tree structure for compressing attends, were used to compress the nine attends at three levels. After optimizing, only one full adder was needed to obtain the sum of results. The experimental results show that the hardware optimizing implementation of Gaussian filter with compressing attends can shorten the critial path and reduce the logic delay. The proposed method reduces the logic delay by32.48% and occupies less Macro Statistics of Adders compared with the traditional implementation. The saved Macro Statistics of Adders can provide greater design freedom for the following image processing units.



Published: 01 May 2017
CLC:  TN 47  
Cite this article:

CHEN Chao, LUO Xiao-hua, CHEN Shu-qun, YU Guo-jun. Optimizing implementation of Gaussian filter based on field programmable gate array. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(5): 969-975.


基于现场可编程门阵列的高斯滤波算法优化实现

针对传统高斯滤波算法硬件设计方法中关键路径较长、逻辑延时较大的问题,提出加数压缩的硬件优化实现方法.在高斯滤波算法优化实现过程中,采用移位操作来实现乘法与除法计算,避免使用乘法器与除法器.并引入保留进位加法器(CSA)、基于多路选择器(MUX)的4-2压缩器、加数压缩的树型结构,对9个加数进行3个层次的压缩.经过优化后,只需1个全加器便可得求和结果.结果表明,经过加数压缩设计可以达到缩短关键路径、减少逻辑延时的目标,使逻辑延时缩小32.48%,同时还极大节省所需加法器宏单元数,为后续图像处理模块提供更大的设计自由度.

参考文献(References):
[1] DRAPER B A, BEVERIDGE J R, BOHM A P W, ROSS C, et al. Accelerated image processing on FPGAs [J]. IEEE Transactions on Image Processing, 2003, 12(12): 1543-1551.
[2] VEGA-RODRIGUEZ M A, SANCHEZPEREZ J M, GOMEZ-PULIDO J.A. Real time image processing with reconfigurable hardware [C]∥8th IEEE International Conference on Electronics, Circuits and Systems. Malta: IEEE, 2001: 213-216.
[3] LEESER M L, MILLER S, YU H. Smart camera based on reconfigurable hardware enables diverse real time applications [C]∥ Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines. Napa: IEEE,2004: 147-155.
[4] KALOMIROS J A, LYGOURAS J. Design and evaluation of a hardware/software FPGA-based system for fast image processing [J]. Microprocessors and Microsystems, 2008, 32(2): 95-106.
[5] LEE P S, LEE C S, JU H L. Development of FPGA-based digital signal processing system for radiation spectroscopy [J]. Radiation Measurements, 2013, 48(1): 12-17.
[6] 杨帆,张皓,马新文,等.基于FPGA的图像处理系统[J].华中科技大学学报:自然科学版,2015,43(2): 119-123.
YANG Fan, ZHANG Hao, MA Xin-wen, JIANG Yong, et al. Image processing system based on FPGA [J]. Journal of Huazhong University of Science & Technology :Natural Science Edition, 2015, 43(2): 119-123.
[7] 付昱强.基于FPGA的图像算法的研究与硬件实现[D].南昌:南昌大学,2006.
FU Yu-qiang. Research and hardware design of image processing algorithms based on FPGA [D]. Nanchang: Nanchang University , 2006.
[8] NIDHI, GURINDERPAL S. Efficient design of ripple carry adder and carry skip adder with low quantum cost and low power consumption [J].International Journal of Engineering Research and Applications, 2014, 4(7):247-251.
[9] EFSTATHIOU C, OWDA Z, TSIATOUHAS Y. New high-speed multioutput carry look-ahead adders [J]. IEEE Transactions on Circuits and Systems II: Express Briefs, 2013, 60(10): 667-671.
[10] ANAND V, VIJAYAKUMAR V. Optimizing area of vedic multiplier using Brent-Kung adder [J]. Research Journal of Pharmaceutical, Biological and Chemical Sciences, 2016, 7(3): 1178-1185.
[11] PRIYADHARSHINI M, THIRUVENI M. Performance evaluation of carry select adder-review [J]. International Journal of Engineering Sciences & Research Technology, 2014, 3(11): 469-475.
[12] VARUN P.S, SHIV D, MANISH R. Efficient carry skip Adder design using full adder and carry skip block based on reversible Logic [J]. American Journal of Engineering Research, 2015, 4(12): 95-100.
[13] 赵忠明,林正浩.一种改进的Wallace树型乘法器的设计[J].电子设计应用,2006: 113-116.
ZHAO Zhong-ming, LIN Zheng-hao. Optimizing multiplier design and implementation based on optimizing Wallace tree [J]. Electronic Design & Application, 2006(8): 113-116.
[14] 李军强,李东生,李奕磊,等.32×32高速乘法器的设计与实现[J].微电子学与计算机,2009,26(12): 23-26.
LI Jun-qiang, LI Dong-sheng, LI Yi-lei, et al. 32×32 High-speed Multiplier Design and Implementation [J]. Microelectronics & Computer, 2009, 26(12): 23-26.
[15] 王红君,施楠,赵辉,等.改进中值滤波方法的图像预处理技术[J].计算机系统应用,2015, 24(5): 237-240.
WANG Hong-jun, SHI Nan, ZHAO Hui, et al. Image preprocessing technology based on improved median filter [J]. Computer Systems & Applications, 2015, 24(5): 237-240.
[16] ITOH N, NAEMURA Y, MAKINO H. A 600-MHz 54×54-bit multiplier with rectangular-styled Wallace tree[J]. IEEE Journal of Solid-State Circuits, 2001,36(2): 249-257.
[1] Shang-dian LIU,Yi-qiang ZHAO,Yan-jiang LIU,Jia-ji HE,Yi-dong YUAN,Yan-yan YU. A rare node activity improvement method based on genetic algorithm[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2019, 53(8): 1546-1551.
[2] LAN Fan, PAN Yun, YAN Xiao lang, HUAN Ruo hong, CHENG Kwang ting. GPU acceleration for network-on-chip yield evaluation[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(1): 160-167.
[3] XIA Kai feng, ZHOU Xiao ping, WU Bin. Memory-based FFT processor for arbitrary 2k-point FFT computations[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2016, 50(11): 2239-2244.
[4] WANG Shu peng, HUANG Kai, YAN Xiao lang. Coverage directed test generation based on genetic algorithm[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2016, 50(3): 580-588.
[5] HAN Xiao xia, HAN Yan. Layout optimization of parametric yield by filling dummy polysilicon pattern[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2015, 49(12): 2333-2339.
[6] GAO Shi-yi, LUO Xiao-hua, LU Yu-feng, LIU Fu-chun, ZHANG Chen-qiu. Functional coverage convergence technique based on genetic algorithm[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2015, 49(8): 1509-1515.
[7] XIU Si-wen, LI Yan-zhe, HUANG Kai, MA De, YAN Rong-jie, YAN Xiao-lang. Cache modeling for MPSoC performance estimation[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2015, 49(7): 1367-1375.
[8] XIU Si-wen, HUANG Kai, YU Min, XIE Tian-yi,GE Hai-tong, YAN Xiao-lang. Cache coherence protocol and implementation for multiprocessors with no-write-allocate caches[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2015, 49(2): 351-359.
[9] TAN Teng-fei, MA De, HUANG Kai, MA Qi. Power-efficient image blending engine design based on self-adaptive pipeline[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2015, 49(1): 27-35.
[10] WANG Yu-bo, HUANG Kai, CHEN Chen, FENG Jiong, GE Hai-tong, YAN Xiao-lang. Embedded Flash data fetching acceleration techniques and implementation[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2014, 48(9): 1570-1579.
[11] XIU Si-wen, HUANG Kai, YU Min, XIE Tian-yi, GE Hai-tong, YAN Xiao-lang. Cache coherence protocol and implementation for multiprocessors with no-write-allocate caches[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2014, 48(9): 1-9.
[12] HUANG Kai-jie, HUANG Kai, MA De, WANG Yu-bo,FENG Jiong, GE Hai-tong, YAN Xiao-la. IP-XACT standard based SoC design methodology[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2013, 47(10): 1770-1776.
[13] XIANG Xiao-yan, CHEN Zhi-jian, MENG Jian-yi, YAN Xiao-lang. Low power instruction cache based on adjacent line linking access[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2013, 47(7): 1213-1217.
[14] CHEN Zhi-jian, MENG Jian-yi, GE Hai-tong, YAN Xiao-lang. Translation lookaside buffer  design  based on
dynamic memory page merging
[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2012, 46(1): 118-122.
[15] ZHANG Yang, WANG Xiu-min, CHEN Hao-wei. FPGA based design of LDPC encoder[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2011, 45(9): 1582-1586.