Please wait a minute...

当期目录

2011年, 第12期 刊出日期:2011-12-01 上一期    下一期
Optimizing storage performance in public cloud platforms
Jian-zong Wang, Peter Varman, Chang-sheng Xie
Front. Inform. Technol. Electron. Eng., 2011, 12(12): 951-964.   https://doi.org/10.1631/jzus.C1100097
摘要( 2363 )     PDF(0KB)( 2357 )
Cloud computing is an elastic computing model where users can lease computing and storage resources on demand from a remote infrastructure. It is gaining popularity due to its low cost, high reliability, and wide availability. With the emergence of public cloud storage platforms like Amazon, Microsoft, and Google, individual applications and enterprise storage are being deployed on Clouds. However, a serious impediment to its wider deployment is the relative lack of effective data management services. Our experiments, as well as industry reports, have shown that the performance and service-level agreement (SLA) cannot be guaranteed when the data is served over public Clouds. The relatively slow access to persistent data and large variability in cloud storage I/O performance can significantly degrade the performance of data-intensive applications. This paper addresses the issue of I/O performance fluctuation over public cloud platforms and we propose a middleware called CloudMW between the Cloud storage and clients to provide the storage services with better performance and SLA satisfaction. Some technologies, including data virtualization, data chunking, caching, and replication, are integrated into CloudMW to achieve a more stable and predictable performance, and permit flexible sharing of storage among the virtual machines (VMs). Experimental results based on Amazon Web Services (AWS) show that CloudMW is able to improve the stability and help provide better SLAs and data sharing for cloud storage.
A hybrid genetic algorithm to optimize device allocation in industrial Ethernet networks with real-time constraints
Lei Zhang, Mattias Lampe, Zhi Wang
Front. Inform. Technol. Electron. Eng., 2011, 12(12): 965-975.   https://doi.org/10.1631/jzus.C1100045
摘要( 2394 )     PDF(0KB)( 1627 )
With the advance of automation technology, the scale of industrial communication networks at field level is growing. Guaranteeing real-time performance of these networks is therefore becoming an increasingly difficult task. This paper addresses the optimization of device allocation in industrial Ethernet networks with real-time constraints (DAIEN-RC). Considering the inherent diversity of real-time requirements of typical industrial applications, a novel optimization criterion based on relative delay is proposed. A hybrid genetic algorithm incorporating a reduced variable neighborhood search (GA-rVNS) is developed for DAIEN-RC. Experimental results show that the proposed novel scheme achieves a superior performance compared to existing schemes, especially for large scale industrial networks.
Implementation and evaluation of parallel FFT on Engineering and Scientific Computation Accelerator (ESCA) architecture
Dan Wu, Xue-cheng Zou, Kui Dai, Jin-li Rao, Pan Chen, Zhao-xia Zheng
Front. Inform. Technol. Electron. Eng., 2011, 12(12): 976-989.   https://doi.org/10.1631/jzus.C1100027
摘要( 2602 )     PDF(0KB)( 2297 )
The fast Fourier transform (FFT) is a fundamental kernel of many computation-intensive scientific applications. This paper deals with an implementation of the FFT on the accelerator system, a heterogeneous multi-core architecture to accelerate computation-intensive parallel computing in scientific and engineering applications. The Engineering and Scientific Computation Accelerator (ESCA) consists of a control unit and a single instruction multiple data (SIMD) processing element (PE) array, in which PEs communicate with each other via a hierarchical two-level network-on-chip (NoC) with high bandwidth and low latency. We exploit the architecture features of ESCA to implement a parallel FFT algorithm efficiently. Experimental results show that both the proposed parallel FFT algorithm and the ESCA architecture are scalable. The 16-bit fixed-point parallel FFT performance of ESCA is compared with a published work to prove the superiority of the mapping algorithm and the hardware architecture. The floating-point parallel FFT performances of ESCA are evaluated and compared with those of the IBM Cell processor and GPU to demonstrate the computing power of the ESCA system for high performance applications.
Accelerating geospatial analysis on GPUs using CUDA
Ying-jie Xia, Li Kuang, Xiu-mei Li
Front. Inform. Technol. Electron. Eng., 2011, 12(12): 990-999.   https://doi.org/10.1631/jzus.C1100051
摘要( 2746 )     PDF(0KB)( 3223 )
Inverse distance weighting (IDW) interpolation and viewshed are two popular algorithms for geospatial analysis. IDW interpolation assigns geographical values to unknown spatial points using values from a usually scattered set of known points, and viewshed identifies the cells in a spatial raster that can be seen by observers. Although the implementations of both algorithms are available for different scales of input data, the computation for a large-scale domain requires a mass amount of cycles, which limits their usage. Due to the growing popularity of the graphics processing unit (GPU) for general purpose applications, we aim to accelerate geospatial analysis via a GPU based parallel computing approach. In this paper, we propose a generic methodological framework for geospatial analysis based on GPU and its programming model Compute Unified Device Architecture (CUDA), and explore how to map the inherent parallelism degrees of IDW interpolation and viewshed to the framework, which gives rise to a high computational throughput. The CUDA-based implementations of IDW interpolation and viewshed indicate that the architecture of GPU is suitable for parallelizing the algorithms of geospatial analysis. Experimental results show that the CUDA-based implementations running on GPU can lead to dataset dependent speedups in the range of 13–33-fold for IDW interpolation and 28–925-fold for viewshed analysis. Their computation time can be reduced by an order of magnitude compared to classical sequential versions, without losing the accuracy of interpolation and visibility judgment.
Comprehensive and efficient discovery of time series motifs
Lian-hua Chi, He-hua Chi, Yu-cai Feng, Shu-liang Wang, Zhong-sheng Cao
Front. Inform. Technol. Electron. Eng., 2011, 12(12): 1000-1009.   https://doi.org/10.1631/jzus.C1100037
摘要( 2851 )     PDF(0KB)( 1408 )
Time series motifs are previously unknown, frequently occurring patterns in time series or approximately repeated subsequences that are very similar to each other. There are two issues in time series motifs discovery, the deficiency of the definition of K-motifs given by Lin et al. (2002) and the large computation time for extracting motifs. In this paper, we propose a relatively comprehensive definition of K-motifs to obtain more valuable motifs. To minimize the computation time as much as possible, we extend the triangular inequality pruning method to avoid unnecessary operations and calculations, and propose an optimized matrix structure to produce the candidate motifs almost immediately. Results of two experiments on three time series datasets show that our motifs discovery algorithm is feasible and efficient.
Robust optical flow estimation based on brightness correction fields
Wei Wang, Zhi-xun Su, Jin-shan Pan, Ye Wang, Ri-ming Sun
Front. Inform. Technol. Electron. Eng., 2011, 12(12): 1010-1020.   https://doi.org/10.1631/jzus.C1100062
摘要( 2246 )     PDF(0KB)( 1689 )
Optical flow estimation is still an important task in computer vision with many interesting applications. However, the results obtained by most of the optical flow techniques are affected by motion discontinuities or illumination changes. In this paper, we introduce a brightness correction field combined with a gradient constancy constraint to reduce the sensibility to brightness changes between images to be estimated. The advantage of this brightness correction field is its simplicity in terms of computational complexity and implementation. By analyzing the deficiencies of the traditional total variation regularization term in weakly textured areas, we also adopt a structure-adaptive regularization based on the robust Huber norm to preserve motion discontinuities. Finally, the proposed energy functional is minimized by solving its corresponding Euler-Lagrange equation in a more effective multi-resolution scheme, which integrates the twice downsampling strategy with a support-weight median filter. Numerous experiments show that our method is more effective and produces more accurate results for optical flow estimation.
A novel 3780-point FFT processor scheme for the time domain synchronous OFDM system
Ji-nan Leng, Lei Xie, Hui-fang Chen, Kuang Wang
Front. Inform. Technol. Electron. Eng., 2011, 12(12): 1021-1030.   https://doi.org/10.1631/jzus.C1100071
摘要( 2227 )     PDF(0KB)( 1994 )
The 3780-point FFT is a main component of the time domain synchronous OFDM (TDS-OFDM) system and the key technology in the Chinese Digital Multimedia/TV Broadcasting-Terrestrial (DMB-T) national standard. Since 3780 is not a power of 2, the classical radix-2 or radix-4 FFT algorithm cannot be applied directly. Hence, the Winograd Fourier transform algorithm (WFTA) and the Good-Thomas prime factor algorithm (PFA) are used to implement the 3780-point FFT processor. However, the structure based on WFTA and PFA has a large computational complexity and requires many DSPs in hardware implementation. In this paper, a novel 3780-point FFT processor scheme is proposed, in which a 60(63 iterative WFTA architecture with different mapping methods is imported to replace the PFA architecture, and an optimized CoOrdinate Rotation DIgital Computer (CORDIC) module is used for the twiddle factor multiplications. Compared to the traditional scheme, our proposed 3780-point FFT processor scheme reduces the number of multiplications by 45% at the cost of 1% increase in the number of additions. All DSPs are replaced by the optimized CORDIC module and ROM. Simulation results show that the proposed 3780-point FFT processing scheme satisfies the requirement of the DMB-T standard, and is an efficient architecture for the TDS-OFDM system.
7 articles

编辑部公告More

友情链接