Information Engineering |
|
|
|
|
GPU acceleration for network-on-chip yield evaluation |
LAN Fan, PAN Yun, YAN Xiao lang, HUAN Ruo hong, CHENG Kwang ting |
1. College of Electrical Engineering, Zhejiang University, Hangzhou 310027, China;
2. College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China;
3. College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China;
4. Electrical Computer Engineering, University of California, Santa Barbara, 93106, USA |
|
|
Abstract A speedup method based on GPU platform was presented in order to improve the efficiency of the time-consuming NoC yield evaluation algorithm. The runtime efficiency was improved. The evaluation algorithm was ported to GPU platform. GPU was not suitable for generating samples based on the random number generation comparison between GPU and CPU platform. The sample generation algorithm was optimized on CPU, making it more suitable to cooperate with GPU. A heterogeneous parallel algorithm was proposed, in which CPU generates the random samples and GPU analyzes the generated samples. The proposed algorithm achieved 10x speedup compared to the algorithm running on purely CPU.
|
Published: 01 January 2017
|
|
|
Cite this article:
LAN Fan, PAN Yun, YAN Xiao lang, HUAN Ruo hong, CHENG Kwang ting. GPU acceleration for network-on-chip yield evaluation. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(1): 160-167.
|
片上网络良率评估的GPU加速
针对片上网络良率评估速度较慢、效率较低的问题,研究片上网络良率评估的GPU加速,提高评估算法的执行效率.将良率评估中的样本分析算法移植到GPU平台;在分析、比较了不同平台,随机样本生成算法优劣的基础上,发现GPU平台不适合生成样本;进一步优化CPU平台上的样本生成算法,使之能与GPU一起,实现异构并行;提出CPU生成样本、GPU执行样本分析的异构并行方案.与仅使用CPU的评估算法相比,采用提出的异构并行算法实现了10倍的运行效率提升.
|
|
[1] MARCULESCU R, OGRAS U Y, PEH L S, et al. Outstanding research problems in NoC design: system, microarchitecture, and circuit perspectives [J]. IEEE Transactions on Computeraided Design of Integrated Circuits and Systems, 2009, 28(1): 321.
[2] BELL S, EDWARDS B, AMANN J, et al. TILE64 processor: a 64Core SoC with mesh Interconnect [C]∥IEEE International SolidState Circuits ConferenceDigest of Technical Papers. San Francisco: IEEE, 2008.
[3] VANGAL S, HOWARD J, RUHL G, et al. An 80Tile 128TFLOPS networkonchip in 65nm CMOS [C]∥IEEE International SolidState Circuits ConferenceDigest of Technical Papers. San Francisco: IEEE, 2007.
[4] 全励,程爱莲,潘赟,等.基于旁路通道的片上网络差别型服务实现方法[J].浙江大学学报:工学版,2013,47(6): 957-968.
QUAN Li, CHENG Ailian, PAN Yun, et al. Bypassed channels based differentiated service implementation method for networkonchip [J]. Journal of Zhejiang University: Engineering Science, 2013, 47(6):957-968.
[5] KOREN I, KOREN Z. Defect tolerance in VLSI circuits: techniques and yield analysis [J]. Proceedings of IEEE, 1998, 86(9): 1819-1838.
[6] KAHLE J A, DAY M N, HOFSTEE H P, et al. Introduction to the cell multiprocessor [J]. IBM Journal of Research and Development, 2005, 40(45): 589-604.
[7] YANG Y, SHI Z, YU J, et al. Evaluating performance of manycore processors with various granularities considering yield and lifetime reliability [C]∥IEEE International Symposium on Circuits and Systems. Seoul: IEEE, 2012.
[8] CHEN Y Y, UPADHYAYA S J. Yield analysis of reconfigurable array processors based on multiplelevel redundancy [J]. IEEE Transactions on Computers, 1993, 42(9): 1136-1141.
[9] MICHALKA T L, VARSHNEY R C, MEINDL J D. A discussion of yield modeling with defect clustering, circuit repair, and circuit redundancy [J]. IEEE Transactions on Semiconductor Manufacture, 1990, 3(3):116-127.
[10] BREUER M A. Trading off area, yield and performance via hybrid redundancy in multicore architectures [C]∥IEEE VLSI Test Symposium. Berkeley: IEEE, 2013.
[11] PHAM D, ASANO S, BOLLIGER M, et al. The design and implementation of a firstgeneration CELL processor: a multicore SoC [C]∥International Conference on Integrated Circuit Design and Technology. Austin: IEEE, 2005.
[12] CHOUDHURY A D, PALERMO G, SILVANO C, et al. Yield enhancement by robust applicationspecific mapping on network-on-chips [C]∥NoCArc. New York: IEEE, 2009.
[13] KHALILINEZHAD S H, REZA A, RESHADI M. Yield modeling and yieldaware mapping for application specific networks-on-chip [C]∥NORCHIP. Lund: IEEE, 2011.
[14] KOLOGESKI A, CONCATTO C, MATOS D, et al. Combining fault tolerance and serialization effort to improve yield in 3D Networks-on-Chip [C]∥IEEE International Conference on Electronics, Circuits, and Systems. Abu Dhabi: IEEE, 2013.
[15] PALESI M, KUMAR S, CATANIA V. Leveraging partially faulty links usage for enhancing yield and performance in networks-on-chip [J]. IEEE Transactions on Computeraided Design of Integrated Circuits and Systems, 2010, 29(3): 426-440.
[16] RODRIGO S, HERNANDEZ C, FLICH J, et al. Yieldoriented evaluation methodology of networkonchip routing implementations [C]∥International Symposium on SystemonChip. Tampere: IEEE, 2009.
[17] SHAMSHIRI S, CHENG K T. Modeling yield, cost, and quality of a spareenhanced multicore chip [J]. IEEE Transactions on Computers, 2011, 60(9):1246-1259.
[18] SHAMSHIRI S, CHENG K T. Yield and cost analysis of a reliable NoC [C]∥IEEE VLSI Test Symposium. Washington: IEEE, 2009.
[19] SHAMSHIRI S, CHENG K T. Modeling yield, cost, and quality of an NoC with uniformly and nonuniformly distributed redundancy [C]∥IEEE VLSI Test Symposium. Santa Cruz: IEEE, 2010.
[20] 解聪,雷辉,徐星,等.基于并行欧式距离变换的三维障碍距离场计算[J].浙江大学学报:工学版,2014,48(2): 360-367.
XIE Cong, LEI Hui, XU Xing, et al. Computing 3D distance fields with obstacles based on parallel Euclidean distance transform [J]. Journal of Zhejiang University: Engineering Science, 2014, 48(2): 360-367.
[21] 巨涛,朱正东,董小社.异构众核系统及其编程模型与性能优化技术研究综述[J].电子学报,2015,43(1): 111-119.
JU Tao, ZHU Zhengdong, DONG Xiaoshe. The feature, programming model and performance optimization strategy of heterogeneous manycore system: a review [J]. Acta Electronica Sinica, 2015, 43(1):111-119.
[22] 党青青.基于GPU的通信仿真加速方法研究[D].北京:北京邮电大学, 2015.
DANG Qingqing. The research of acceleration methods in communication simulation based on GPU [D]. Beijing: Beijing University of Posts and Telecommunications, 2015.
[23] 马海晨. 基于GPU的EDA加速技术 [D]. 上海: 复旦大学, 2011.
MA Haichen. EDA acceleration techniques based on GPU [D]. Shanghai: Fudan University, 2011.
[24] KNUTH D E. Seminumerical algorithms, Vol. 2 of the art of computer programming [M]. 3rd ed. Boston: Wesley, 1981: 763-767.
[25] MATSUMOTO M, NISHIMURA T. Mersenne twister: a 623dimensionally equidistributed uniform pseudorandom number generator [J]. ACM Transactions on Modeling and Computer Simulation, 1998, 8(1): 330.
[26] ROBERT C, GEORGE C. Monte Carlo statistical methods [M]. 2nd ed. New York: Springer, 2004: 325-330.
[27] SHAO J. Mathematical statistics [M]. 2nd ed. New York: Springer, 2003: 524-530. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|