片上网络良率评估的GPU加速

doi:10.3785/j.issn.1008-973X.2017.01.020

浙江大学学报(工学版)

信息工程

片上网络良率评估的GPU加速

蓝帆, 潘赟, 严晓浪, 宦若虹, CHENG Kwang ting

1. 浙江大学电气工程学院,浙江杭州 310027;
2. 浙江大学信息与电子工程学院,浙江杭州 310027;
3. 浙江工业大学计算机科学与技术学院,浙江杭州 310023;
4. University of California, Electrical Computer Engineering, CA Santa Barbara 93106, USA

GPU acceleration for network-on-chip yield evaluation

LAN Fan, PAN Yun, YAN Xiao lang, HUAN Ruo hong, CHENG Kwang ting

1. College of Electrical Engineering, Zhejiang University, Hangzhou 310027, China;
2. College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China;
3. College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China;
4. Electrical Computer Engineering, University of California, Santa Barbara, 93106, USA

全文: PDF(1178 KB) HTML

摘要：

针对片上网络良率评估速度较慢、效率较低的问题,研究片上网络良率评估的GPU加速,提高评估算法的执行效率.将良率评估中的样本分析算法移植到GPU平台;在分析、比较了不同平台,随机样本生成算法优劣的基础上,发现GPU平台不适合生成样本;进一步优化CPU平台上的样本生成算法,使之能与GPU一起,实现异构并行;提出CPU生成样本、GPU执行样本分析的异构并行方案.与仅使用CPU的评估算法相比,采用提出的异构并行算法实现了10倍的运行效率提升.

Abstract:

A speedup method based on GPU platform was presented in order to improve the efficiency of the time-consuming NoC yield evaluation algorithm. The runtime efficiency was improved. The evaluation algorithm was ported to GPU platform. GPU was not suitable for generating samples based on the random number generation comparison between GPU and CPU platform. The sample generation algorithm was optimized on CPU, making it more suitable to cooperate with GPU. A heterogeneous parallel algorithm was proposed, in which CPU generates the random samples and GPU analyzes the generated samples. The proposed algorithm achieved 10x speedup compared to the algorithm running on purely CPU.

出版日期: 2017-01-01

CLC:

TN 47

基金资助:

浙江省自然科学基金资助项目（LY15F020008）;国家自然科学基金资助项目(61204030，61302129);浙江省科技厅公益性技术应用研究计划资助项目（2014C31045).

通讯作者: 潘赟,男,副教授.ORCID: 0000-0002-9335-4291. E-mail: panyun@vlsi.zju.edu.cn

作者简介: 蓝帆（1989—）,男,博士生,从事片上网络的研究.ORCID: 0000-0002-3299-9635. E-mail: lanfan@vlsi.zju.edu.cn

	服务
	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	作者相关文章

引用本文:

蓝帆, 潘赟, 严晓浪, 宦若虹, CHENG Kwang ting. 片上网络良率评估的GPU加速[J]. 浙江大学学报(工学版), 10.3785/j.issn.1008-973X.2017.01.020.

LAN Fan, PAN Yun, YAN Xiao lang, HUAN Ruo hong, CHENG Kwang ting. GPU acceleration for network-on-chip yield evaluation. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 10.3785/j.issn.1008-973X.2017.01.020.

［1］ MARCULESCU R, OGRAS U Y, PEH L S, et al. Outstanding research problems in NoC design: system, microarchitecture, and circuit perspectives ［J］. IEEE Transactions on Computeraided Design of Integrated Circuits and Systems, 2009, 28(1): 321.
［2］ BELL S, EDWARDS B, AMANN J, et al. TILE64 processor: a 64Core SoC with mesh Interconnect ［C］∥IEEE International SolidState Circuits ConferenceDigest of Technical Papers. San Francisco: IEEE, 2008.
［3］ VANGAL S, HOWARD J, RUHL G, et al. An 80Tile 128TFLOPS networkonchip in 65nm CMOS ［C］∥IEEE International SolidState Circuits ConferenceDigest of Technical Papers. San Francisco: IEEE, 2007.
［4］全励,程爱莲,潘赟,等.基于旁路通道的片上网络差别型服务实现方法［J］.浙江大学学报：工学版,2013,47(6): 957-968.
QUAN Li, CHENG Ailian, PAN Yun, et al. Bypassed channels based differentiated service implementation method for networkonchip ［J］. Journal of Zhejiang University: Engineering Science, 2013, 47(6):957-968.
［5］ KOREN I, KOREN Z. Defect tolerance in VLSI circuits: techniques and yield analysis ［J］. Proceedings of IEEE, 1998, 86(9): 1819-1838.
［6］ KAHLE J A, DAY M N, HOFSTEE H P, et al. Introduction to the cell multiprocessor ［J］. IBM Journal of Research and Development, 2005, 40(45): 589-604.
［7］ YANG Y, SHI Z, YU J, et al. Evaluating performance of manycore processors with various granularities considering yield and lifetime reliability ［C］∥IEEE International Symposium on Circuits and Systems. Seoul: IEEE, 2012.
［8］ CHEN Y Y, UPADHYAYA S J. Yield analysis of reconfigurable array processors based on multiplelevel redundancy ［J］. IEEE Transactions on Computers, 1993, 42(9): 1136-1141.
［9］ MICHALKA T L, VARSHNEY R C, MEINDL J D. A discussion of yield modeling with defect clustering, circuit repair, and circuit redundancy ［J］. IEEE Transactions on Semiconductor Manufacture, 1990, 3(3):116-127.
［10］ BREUER M A. Trading off area, yield and performance via hybrid redundancy in multicore architectures ［C］∥IEEE VLSI Test Symposium. Berkeley: IEEE, 2013.
［11］ PHAM D, ASANO S, BOLLIGER M, et al. The design and implementation of a firstgeneration CELL processor: a multicore SoC ［C］∥International Conference on Integrated Circuit Design and Technology. Austin: IEEE, 2005.
［12］ CHOUDHURY A D, PALERMO G, SILVANO C, et al. Yield enhancement by robust applicationspecific mapping on network-on-chips ［C］∥NoCArc. New York: IEEE, 2009.
［13］ KHALILINEZHAD S H, REZA A, RESHADI M. Yield modeling and yieldaware mapping for application specific networks-on-chip ［C］∥NORCHIP. Lund: IEEE, 2011.
［14］ KOLOGESKI A, CONCATTO C, MATOS D, et al. Combining fault tolerance and serialization effort to improve yield in 3D Networks-on-Chip ［C］∥IEEE International Conference on Electronics, Circuits, and Systems. Abu Dhabi: IEEE, 2013.
［15］ PALESI M, KUMAR S, CATANIA V. Leveraging partially faulty links usage for enhancing yield and performance in networks-on-chip ［J］. IEEE Transactions on Computeraided Design of Integrated Circuits and Systems, 2010, 29(3): 426-440.
［16］ RODRIGO S, HERNANDEZ C, FLICH J, et al. Yieldoriented evaluation methodology of networkonchip routing implementations ［C］∥International Symposium on SystemonChip. Tampere: IEEE, 2009.
［17］ SHAMSHIRI S, CHENG K T. Modeling yield, cost, and quality of a spareenhanced multicore chip ［J］. IEEE Transactions on Computers, 2011, 60(9):1246-1259.
［18］ SHAMSHIRI S, CHENG K T. Yield and cost analysis of a reliable NoC ［C］∥IEEE VLSI Test Symposium. Washington: IEEE, 2009.
［19］ SHAMSHIRI S, CHENG K T. Modeling yield, cost, and quality of an NoC with uniformly and nonuniformly distributed redundancy ［C］∥IEEE VLSI Test Symposium. Santa Cruz: IEEE, 2010.
［20］解聪,雷辉,徐星,等.基于并行欧式距离变换的三维障碍距离场计算［J］.浙江大学学报：工学版,2014,48(2): 360-367.
XIE Cong, LEI Hui, XU Xing, et al. Computing 3D distance fields with obstacles based on parallel Euclidean distance transform ［J］. Journal of Zhejiang University: Engineering Science, 2014, 48(2): 360-367.
［21］巨涛,朱正东,董小社.异构众核系统及其编程模型与性能优化技术研究综述［J］.电子学报,2015,43(1): 111-119.
JU Tao, ZHU Zhengdong, DONG Xiaoshe. The feature, programming model and performance optimization strategy of heterogeneous manycore system: a review ［J］. Acta Electronica Sinica, 2015, 43(1):111-119.
［22］党青青.基于GPU的通信仿真加速方法研究［D］.北京:北京邮电大学, 2015.
DANG Qingqing. The research of acceleration methods in communication simulation based on GPU ［D］. Beijing: Beijing University of Posts and Telecommunications, 2015.
［23］马海晨. 基于GPU的EDA加速技术［D］. 上海: 复旦大学, 2011.
MA Haichen. EDA acceleration techniques based on GPU ［D］. Shanghai: Fudan University, 2011.
［24］ KNUTH D E. Seminumerical algorithms, Vol. 2 of the art of computer programming ［M］. 3rd ed. Boston: Wesley, 1981: 763-767.
［25］ MATSUMOTO M, NISHIMURA T. Mersenne twister: a 623dimensionally equidistributed uniform pseudorandom number generator ［J］. ACM Transactions on Modeling and Computer Simulation, 1998, 8(1): 330.
［26］ ROBERT C, GEORGE C. Monte Carlo statistical methods ［M］. 2nd ed. New York: Springer, 2004: 325-330.
［27］ SHAO J. Mathematical statistics ［M］. 2nd ed. New York: Springer, 2003: 524-530.

[1]	刘尚典,赵毅强,刘燕江,何家骥,原义栋,于艳艳. 基于遗传算法的少态节点活性提升方法[J]. 浙江大学学报(工学版), 2019, 53(8): 1546-1551.
[2]	陈超, 罗小华, 陈淑群, 俞国军. 基于现场可编程门阵列的高斯滤波算法优化实现[J]. 浙江大学学报(工学版), 2017, 51(5): 969-975.
[3]	夏凯锋,周小平,吴斌. 任意2k点存储器结构傅里叶处理器[J]. 浙江大学学报(工学版), 2016, 50(11): 2239-2244.
[4]	王树朋,黄凯,严晓浪. 基于遗传算法的覆盖率驱动测试产生器[J]. 浙江大学学报(工学版), 2016, 50(3): 580-588.
[5]	韩晓霞, 韩雁. 填充辅助多晶硅图形的参数成品率版图优化[J]. 浙江大学学报(工学版), 2015, 49(12): 2333-2339.
[6]	高史义, 罗小华, 卢宇峰, 刘富春, 张晨秋. 基于遗传算法的功能覆盖率收敛技术[J]. 浙江大学学报(工学版), 2015, 49(8): 1509-1515.
[7]	修思文, 李彦哲, 黄凯, 马德, 晏荣杰, 严晓浪. 面向MPSoC性能评估的高速缓存建模技术[J]. 浙江大学学报(工学版), 2015, 49(7): 1367-1375.
[8]	修思文, 黄凯, 余慜, 谢天艺, 葛海通, 严晓浪. 面向非写分配高速缓存的一致性协议及实现[J]. 浙江大学学报(工学版), 2015, 49(2): 351-359.
[9]	谭腾飞,马德,黄凯,马琪. 多层图像叠加处理的低功耗自适应流水线设计[J]. 浙江大学学报(工学版), 2015, 49(1): 27-35.
[10]	王钰博,黄凯,陈辰,冯炯,葛海通,严晓浪. 嵌入式Flash读取加速技术及实现[J]. 浙江大学学报(工学版), 2014, 48(9): 1570-1579.
[11]	修思文, 黄凯, 余慜, 谢天艺, 葛海通, 严晓浪. 面向非写分配高速缓存的一致性协议及实现[J]. 浙江大学学报(工学版), 2014, 48(9): 1-9.
[12]	黄凯杰, 黄凯, 马德, 王钰博, 冯炯, 葛海通, 严晓浪. 基于IP-XACT标准的SoC集成方法[J]. J4, 2013, 47(10): 1770-1776.
[13]	项晓燕,陈志坚,孟建熠,严晓浪. 基于邻行链接访问的低功耗指令高速缓存[J]. J4, 2013, 47(7): 1213-1217.
[14]	陈志坚,孟建熠,葛海通,严晓浪. 基于内存页面动态合并的旁路转换缓冲器设计[J]. J4, 2012, 46(1): 118-122.
[15]	张洋, 王秀敏, 陈豪威. 基于FPGA的低密度奇偶校验码编码器设计[J]. J4, 2011, 45(9): 1582-1586.

Viewed

Full text

Abstract

Cited

Shared

Discussed