Please wait a minute...
Front. Inform. Technol. Electron. Eng.  2013, Vol. 14 Issue (11): 859-872    DOI: 10.1631/jzus.C1300078
    
Efficient fine-grained shared buffer management for multiple OpenCL devices
Chang-qing Xun, Dong Chen, Qiang Lan, Chun-yuan Zhang
College of Computer, National University of Defense Technology, Changsha 410073, China; State Key Laboratory of High Performance Computing, National University of Defense Technology, Changsha 410073, China
Efficient fine-grained shared buffer management for multiple OpenCL devices
Chang-qing Xun, Dong Chen, Qiang Lan, Chun-yuan Zhang
College of Computer, National University of Defense Technology, Changsha 410073, China; State Key Laboratory of High Performance Computing, National University of Defense Technology, Changsha 410073, China
 全文: PDF 
摘要: OpenCL programming provides full code portability between different hardware platforms, and can serve as a good programming candidate for heterogeneous systems, which typically consist of a host processor and several accelerators. However, to make full use of the computing capacity of such a system, programmers are requested to manage diverse OpenCL-enabled devices explicitly, including distributing the workload between different devices and managing data transfer between multiple devices. All these tedious jobs pose a huge challenge for programmers. In this paper, a distributed shared OpenCL memory (DSOM) is presented, which relieves users of having to manage data transfer explicitly, by supporting shared buffers across devices. DSOM allocates shared buffers in the system memory and treats the on-device memory as a software managed virtual cache buffer. To support fine-grained shared buffer management, we designed a kernel parser in DSOM for buffer access range analysis. A basic modified, shared, invalid cache coherency is implemented for DSOM to maintain coherency for cache buffers. In addition, we propose a novel strategy to minimize communication cost between devices by launching each necessary data transfer as early as possible. This strategy enables overlap of data transfer with kernel execution. Our experimental results show that the applicability of our method for buffer access range analysis is good, and the efficiency of DSOM is high.
关键词: Shared bufferOpenCLHeterogeneous programmingFine grained    
Abstract: OpenCL programming provides full code portability between different hardware platforms, and can serve as a good programming candidate for heterogeneous systems, which typically consist of a host processor and several accelerators. However, to make full use of the computing capacity of such a system, programmers are requested to manage diverse OpenCL-enabled devices explicitly, including distributing the workload between different devices and managing data transfer between multiple devices. All these tedious jobs pose a huge challenge for programmers. In this paper, a distributed shared OpenCL memory (DSOM) is presented, which relieves users of having to manage data transfer explicitly, by supporting shared buffers across devices. DSOM allocates shared buffers in the system memory and treats the on-device memory as a software managed virtual cache buffer. To support fine-grained shared buffer management, we designed a kernel parser in DSOM for buffer access range analysis. A basic modified, shared, invalid cache coherency is implemented for DSOM to maintain coherency for cache buffers. In addition, we propose a novel strategy to minimize communication cost between devices by launching each necessary data transfer as early as possible. This strategy enables overlap of data transfer with kernel execution. Our experimental results show that the applicability of our method for buffer access range analysis is good, and the efficiency of DSOM is high.
Key words: Shared buffer    OpenCL    Heterogeneous programming    Fine grained
收稿日期: 2013-04-02 出版日期: 2013-11-06
CLC:  TP393  
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  
Chang-qing Xun
Dong Chen
Qiang Lan
Chun-yuan Zhang

引用本文:

Chang-qing Xun, Dong Chen, Qiang Lan, Chun-yuan Zhang. Efficient fine-grained shared buffer management for multiple OpenCL devices. Front. Inform. Technol. Electron. Eng., 2013, 14(11): 859-872.

链接本文:

http://www.zjujournals.com/xueshu/fitee/CN/10.1631/jzus.C1300078        http://www.zjujournals.com/xueshu/fitee/CN/Y2013/V14/I11/859

[1] Mei Wen, Da-fei Huang, Chang-qing Xun, Dong Chen. 使用“基于分析的代码转换方法”来提升GPU特定的OpenCL kernel在多核/众核CPU上的性能移植性[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(11): 899-916.