计算机与通信技术 |
|
|
|
|
二维矩阵卷积的并行计算方法 |
张军阳, 郭阳, 扈啸 |
国防科技大学 计算机学院, 湖南 长沙 410073 |
|
Parallel computing method for two-dimensional matrix convolution |
ZHANG Jun-yang, GUO Yang, HU Xiao |
College of Computer, National University of Defense Technology, Changsha 410073, China |
[1] DENG L, YU D. Deep learning:methods and applications[J]. Foundations & Trends© in Signal Processing, 2014, 7(3):197-387.
[2] WU M,CHEN L. Image recognition based on deep learning[C]//Chinese Automation Congress. Wuhan:IEEE,2015.
[3] LI D, LI J Y, HUANG J T,et al. Recent advances in deep learning for speech research at Microsoft[C]//In the Proceedings of the 2013 IEEE International Conference on Acoustics,Speech and Signal Processing. Vancouver:IEEE,2013:8604-8608.
[4] KAVUKCUOGLU K, BOUREAU Y L, BOUREAU Y L, et al. Learning convolutional feature hierarchies for visual recognition[C]//International Conference on Neural Information Processing Systems.Vancouver:Curran Associates Inc. 2010:1090-1098.
[5] CHEN Z,WANG J,HE H,et al. A fast deep learning system using GPU[C]//Proceedings of International Symposium on Circuits and Systems. Melbourne:IEEE,2014:1552-1555.
[6] BOURLARD H,KAMP Y. Auto-association by multilayer-perceptrons and singular value decomposition[J]. Biological Cybernetics,1988,59(4/5):291-294.
[7] YAJIE MIAO,MOHAMMAD GOWAYYED,AND FLORI-AN METZE. EESEN:End-to-end speech recognition using deep RNN models and WFST-based decoding[C]//Automatic Speech Recognition and Understanding. Scottsdale:IEEE, 2015:167-174.
[8] LIU S,DU Z,TAO J,et al. Cambricon:an instruction-set architecture for neural networks[J]. ACM Sigarch Co-mputer Architecture News,2016,44(3):393-405.
[9] NASSE F, THURAU C, FINK G A. Face detection using GPU-based convolutional neural networks[C]//International Conference on Computer Analysis of Images and Patterns. Berlin Heidelberg:Springer, 2009:83-90.
[10] POTLURI S,FASIH A,VUTUKURU L K,et al. CNN-based high performance computing for real time image-processing on GPU[C]//The Workshop on Nonlinear Dynamics & Synchronization & Int'l Symposium on Theoretical Electrical Engineering. Klagenfurt:IEEE, 2011:1-7.
[11] YU Q,WANG C,MA X,et al. A deep learning predic-tion process accelerator based FPGA[J]. Proceedings of the Annual ACM Symposium on Theory of Computing,2015:585-594.
[12] HEGDE G,SIDDHARTHA,RAMASAMY N,et al. Evaluating embedded FPGA accelerators for deep learning applications[C]//IEEE,International Symposium on Field Programmable Custom Computing Machines. Washington DC.:IEEE,2016:25.
[13] CHEN T,DU Z,SUN N,et al. DianNao:a small-footprint high throughput accelerator for ubiquitous machine learning[J]. ACM Sigarch Computer Architecture News,2014,49(4):269-284.
[14] LIU D,CHEN T,LIU S,et al. PuDianNao:a polyvalent machine learning accelerator[C]//Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems. Istanbul:ACM,2015:369-381.
[15] DU Z. ShiDianNao:shifting vision processing closer to the sensor[C]//ISCA'15 Proceedings of the,International Symposium on Computer Architecture. Portland:ISCA,2015:92-104.
[16] 刘仲,田希,陈磊.支持原位计算的高效三角矩阵乘法向量化方法[J].国防科技大学学报,2014,6(36):7-11. LIU Zhong,TIAN Xi,CHEN Lei. Efficient vectorization method of triangular matrix multiplication supporting in-place calculation[J]. Journal of National University of Defense Technology,2014,6(36):7-11.
[17] 刘仲,陈跃跃,陈海燕.支持任意系数长度和数据类型的FIR滤波器向量化方法[J].电子学报,2013,2(41):346-351. LIU Zhong,CHEN Yue-yue,CHEN Hai-yan. A vectorization of fir filter supporting arbitrary coefficients length and data types[J]. Acta Electronica Sinica,2013,2(41):346-351.
[18] 周海芳,高畅,方民权.基于CUBLAS和CUDA的MNF并行算法设计与优化[J].湖南大学学报:自科版,2017,4(44):147-156. ZHOU Hai-fang,GAO Chang,FANG Min-quan,Parallel algorithm design and performance optimization of maximum noise fraction rotation based on CUBLAS and CUDA[J]. Journal of Hunan University:Natural Sciences,2017,4(44):147-156.
[19] LAVIN A, GRAY S. Fast algorithms for convolutional neural networks[J]. Computer Science, 2015:4013-4021.
[20] ZAGORUYKO S, KOMODAKIS N. Learning to compare image patches via convolutional neural networks[C]//Computer Vision and Pattern Recognition. Boston:IEEE, 2015:4353-4361.
[21] POTLURI S, FASIH A, VUTUKURU L K, et al. CNN based high performance computing for real time image processing on GPU[C]//Nonlinear Dynamics and Synchronization. Klagenfurt.:IEEE, 2011:1-7.
[22] CHELLAPILLA K, PURI S, SIMARD P. High performance convolutional neural networks for document processing[C]//Tenth International Workshop on Frontiers in Handwriting Recognition, La Baule:Suvisoft, 2006:inria-00112631.
[23] DONGARRA J J. An extended set of FORTRAN basic linear algebra subprograms[J]. ACM Transactions on Mathematical Software,1988,14(1):18-32. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|