Please wait a minute...
JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE)
Information Engineering     
GPU acceleration for network-on-chip yield evaluation
LAN Fan, PAN Yun, YAN Xiao lang, HUAN Ruo hong, CHENG Kwang ting
1. College of Electrical Engineering, Zhejiang University, Hangzhou 310027, China;
2. College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China;
3. College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China;
4. Electrical Computer Engineering, University of California, Santa Barbara, 93106, USA
Download:   PDF(1178KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A speedup method based on GPU platform was presented in order to improve the efficiency of the time-consuming NoC yield evaluation algorithm. The runtime efficiency was improved. The evaluation algorithm was ported to GPU platform. GPU was not suitable for generating samples based on the random number generation comparison between GPU and CPU platform. The sample generation algorithm was optimized on CPU, making it more suitable to cooperate with GPU. A heterogeneous parallel algorithm was proposed, in which CPU generates the random samples and GPU analyzes the generated samples. The proposed algorithm achieved 10x speedup compared to the algorithm running on purely CPU.



Published: 01 January 2017
CLC:  TN 47  
Cite this article:

LAN Fan, PAN Yun, YAN Xiao lang, HUAN Ruo hong, CHENG Kwang ting. GPU acceleration for network-on-chip yield evaluation. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(1): 160-167.


片上网络良率评估的GPU加速

针对片上网络良率评估速度较慢、效率较低的问题,研究片上网络良率评估的GPU加速,提高评估算法的执行效率.将良率评估中的样本分析算法移植到GPU平台;在分析、比较了不同平台,随机样本生成算法优劣的基础上,发现GPU平台不适合生成样本;进一步优化CPU平台上的样本生成算法,使之能与GPU一起,实现异构并行;提出CPU生成样本、GPU执行样本分析的异构并行方案.与仅使用CPU的评估算法相比,采用提出的异构并行算法实现了10倍的运行效率提升.

[1] MARCULESCU R, OGRAS U Y, PEH L S, et al. Outstanding research problems in NoC design: system, microarchitecture, and circuit perspectives [J]. IEEE Transactions on Computeraided Design of Integrated Circuits and Systems, 2009, 28(1): 321.
[2] BELL S, EDWARDS B, AMANN J, et al. TILE64 processor: a 64Core SoC with mesh Interconnect [C]∥IEEE International SolidState Circuits ConferenceDigest of Technical Papers. San Francisco: IEEE, 2008.
[3] VANGAL S, HOWARD J, RUHL G, et al. An 80Tile 128TFLOPS networkonchip in 65nm CMOS [C]∥IEEE International SolidState Circuits ConferenceDigest of Technical Papers. San Francisco: IEEE, 2007.
[4] 全励,程爱莲,潘赟,等.基于旁路通道的片上网络差别型服务实现方法[J].浙江大学学报:工学版,2013,47(6): 957-968.
QUAN Li, CHENG Ailian, PAN Yun, et al. Bypassed channels based differentiated service implementation method for networkonchip [J]. Journal of Zhejiang University: Engineering Science, 2013, 47(6):957-968.
[5] KOREN I, KOREN Z. Defect tolerance in VLSI circuits: techniques and yield analysis [J]. Proceedings of IEEE, 1998, 86(9): 1819-1838.
[6] KAHLE J A, DAY M N, HOFSTEE H P, et al. Introduction to the cell multiprocessor [J]. IBM Journal of Research and Development, 2005, 40(45): 589-604.
[7] YANG Y, SHI Z, YU J, et al. Evaluating performance of manycore processors with various granularities considering yield and lifetime reliability [C]∥IEEE International Symposium on Circuits and Systems. Seoul: IEEE, 2012.
[8] CHEN Y Y, UPADHYAYA S J. Yield analysis of reconfigurable array processors based on multiplelevel redundancy [J]. IEEE Transactions on Computers, 1993, 42(9): 1136-1141.
[9] MICHALKA T L, VARSHNEY R C, MEINDL J D. A discussion of yield modeling with defect clustering, circuit repair, and circuit redundancy [J]. IEEE Transactions on Semiconductor Manufacture, 1990, 3(3):116-127.
[10] BREUER M A. Trading off area, yield and performance via hybrid redundancy in multicore architectures [C]∥IEEE VLSI Test Symposium. Berkeley: IEEE, 2013.
[11] PHAM D, ASANO S, BOLLIGER M, et al. The design and implementation of a firstgeneration CELL processor: a multicore SoC [C]∥International Conference on Integrated Circuit Design and Technology. Austin: IEEE, 2005.
[12] CHOUDHURY A D, PALERMO G, SILVANO C, et al. Yield enhancement by robust applicationspecific mapping on network-on-chips [C]∥NoCArc. New York: IEEE, 2009.
[13] KHALILINEZHAD S H, REZA A, RESHADI M. Yield modeling and yieldaware mapping for application specific networks-on-chip [C]∥NORCHIP. Lund: IEEE, 2011.
[14] KOLOGESKI A, CONCATTO C, MATOS D, et al. Combining fault tolerance and serialization effort to improve yield in 3D Networks-on-Chip [C]∥IEEE International Conference on Electronics, Circuits, and Systems. Abu Dhabi: IEEE, 2013.
[15] PALESI M, KUMAR S, CATANIA V. Leveraging partially faulty links usage for enhancing yield and performance in networks-on-chip [J]. IEEE Transactions on Computeraided Design of Integrated Circuits and Systems, 2010, 29(3): 426-440.
[16] RODRIGO S, HERNANDEZ C, FLICH J, et al. Yieldoriented evaluation methodology of networkonchip routing implementations [C]∥International Symposium on SystemonChip. Tampere: IEEE, 2009.
[17] SHAMSHIRI S, CHENG K T. Modeling yield, cost, and quality of a spareenhanced multicore chip [J]. IEEE Transactions on Computers, 2011, 60(9):1246-1259.
[18] SHAMSHIRI S, CHENG K T. Yield and cost analysis of a reliable NoC [C]∥IEEE VLSI Test Symposium. Washington: IEEE, 2009.
[19] SHAMSHIRI S, CHENG K T. Modeling yield, cost, and quality of an NoC with uniformly and nonuniformly distributed redundancy [C]∥IEEE VLSI Test Symposium. Santa Cruz: IEEE, 2010.
[20] 解聪,雷辉,徐星,等.基于并行欧式距离变换的三维障碍距离场计算[J].浙江大学学报:工学版,2014,48(2): 360-367.
XIE Cong, LEI Hui, XU Xing, et al. Computing 3D distance fields with obstacles based on parallel Euclidean distance transform [J]. Journal of Zhejiang University: Engineering Science, 2014, 48(2): 360-367.
[21] 巨涛,朱正东,董小社.异构众核系统及其编程模型与性能优化技术研究综述[J].电子学报,2015,43(1): 111-119.
JU Tao, ZHU Zhengdong, DONG Xiaoshe. The feature, programming model and performance optimization strategy of heterogeneous manycore system: a review [J]. Acta Electronica Sinica, 2015, 43(1):111-119.
[22] 党青青.基于GPU的通信仿真加速方法研究[D].北京:北京邮电大学, 2015.
DANG Qingqing. The research of acceleration methods in communication simulation based on GPU [D]. Beijing: Beijing University of Posts and Telecommunications, 2015.
[23] 马海晨. 基于GPU的EDA加速技术 [D]. 上海: 复旦大学, 2011.
MA Haichen. EDA acceleration techniques based on GPU [D]. Shanghai: Fudan University, 2011.
[24] KNUTH D E. Seminumerical algorithms, Vol. 2 of the art of computer programming [M]. 3rd ed. Boston: Wesley, 1981: 763-767.
[25] MATSUMOTO M, NISHIMURA T. Mersenne twister: a 623dimensionally equidistributed uniform pseudorandom number generator [J]. ACM Transactions on Modeling and Computer Simulation, 1998, 8(1): 330.
[26] ROBERT C, GEORGE C. Monte Carlo statistical methods [M]. 2nd ed. New York: Springer, 2004: 325-330.
[27] SHAO J. Mathematical statistics [M]. 2nd ed. New York: Springer, 2003: 524-530.

[1] Shang-dian LIU,Yi-qiang ZHAO,Yan-jiang LIU,Jia-ji HE,Yi-dong YUAN,Yan-yan YU. A rare node activity improvement method based on genetic algorithm[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2019, 53(8): 1546-1551.
[2] CHEN Chao, LUO Xiao-hua, CHEN Shu-qun, YU Guo-jun. Optimizing implementation of Gaussian filter based on field programmable gate array[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(5): 969-975.
[3] XIA Kai feng, ZHOU Xiao ping, WU Bin. Memory-based FFT processor for arbitrary 2k-point FFT computations[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2016, 50(11): 2239-2244.
[4] WANG Shu peng, HUANG Kai, YAN Xiao lang. Coverage directed test generation based on genetic algorithm[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2016, 50(3): 580-588.
[5] HAN Xiao xia, HAN Yan. Layout optimization of parametric yield by filling dummy polysilicon pattern[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2015, 49(12): 2333-2339.
[6] GAO Shi-yi, LUO Xiao-hua, LU Yu-feng, LIU Fu-chun, ZHANG Chen-qiu. Functional coverage convergence technique based on genetic algorithm[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2015, 49(8): 1509-1515.
[7] XIU Si-wen, LI Yan-zhe, HUANG Kai, MA De, YAN Rong-jie, YAN Xiao-lang. Cache modeling for MPSoC performance estimation[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2015, 49(7): 1367-1375.
[8] XIU Si-wen, HUANG Kai, YU Min, XIE Tian-yi,GE Hai-tong, YAN Xiao-lang. Cache coherence protocol and implementation for multiprocessors with no-write-allocate caches[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2015, 49(2): 351-359.
[9] TAN Teng-fei, MA De, HUANG Kai, MA Qi. Power-efficient image blending engine design based on self-adaptive pipeline[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2015, 49(1): 27-35.
[10] WANG Yu-bo, HUANG Kai, CHEN Chen, FENG Jiong, GE Hai-tong, YAN Xiao-lang. Embedded Flash data fetching acceleration techniques and implementation[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2014, 48(9): 1570-1579.
[11] XIU Si-wen, HUANG Kai, YU Min, XIE Tian-yi, GE Hai-tong, YAN Xiao-lang. Cache coherence protocol and implementation for multiprocessors with no-write-allocate caches[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2014, 48(9): 1-9.
[12] HUANG Kai-jie, HUANG Kai, MA De, WANG Yu-bo,FENG Jiong, GE Hai-tong, YAN Xiao-la. IP-XACT standard based SoC design methodology[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2013, 47(10): 1770-1776.
[13] XIANG Xiao-yan, CHEN Zhi-jian, MENG Jian-yi, YAN Xiao-lang. Low power instruction cache based on adjacent line linking access[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2013, 47(7): 1213-1217.
[14] CHEN Zhi-jian, MENG Jian-yi, GE Hai-tong, YAN Xiao-lang. Translation lookaside buffer  design  based on
dynamic memory page merging
[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2012, 46(1): 118-122.
[15] ZHANG Yang, WANG Xiu-min, CHEN Hao-wei. FPGA based design of LDPC encoder[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2011, 45(9): 1582-1586.