|
Asymmetry-aware load balancing for parallel applications in single-ISA multi-core systems
Eunsung Kim, Hyeonsang Eom, Heon Y. Yeom
Front. Inform. Technol. Electron. Eng., 2012, 13(6): 413-427.
https://doi.org/10.1631/jzus.C1100198
Contemporary operating systems for single-ISA (instruction set architecture) multi-core systems attempt to distribute tasks equally among all the CPUs. This approach works relatively well when there is no difference in CPU capability. However, there are cases in which CPU capability differs from one another. For instance, static capability asymmetry results from the advent of new asymmetric hardware, and dynamic capability asymmetry comes from the operating system (OS) outside noise caused from networking or I/O handling. These asymmetries can make it hard for the OS scheduler to evenly distribute the tasks, resulting in less efficient load balancing. In this paper, we propose a user-level load balancer for parallel applications, called the ‘capability balancer’, which recognizes the difference of CPU capability and makes subtasks share the entire CPU capability fairly. The balancer can coexist with the existing kernel-level load balancer without detrimenting the behavior of the kernel balancer. The capability balancer can fairly distribute CPU capability to tasks with very little overhead. For real workloads like the NAS Parallel Benchmark (NPB), we have accomplished speedups of up to 9.8% and 8.5% in dynamic and static asymmetries, respectively. We have also experienced speedups of 13.3% for dynamic asymmetry and 24.1% for static asymmetry in a competitive environment. The impacts of our task selection policies, FIFO (first in, first out) and cache, were compared. The use of the cache policy led to a speedup of 5.3% in overall execution time and a decrease of 4.7% in the overall cache miss count, compared with the FIFO policy, which is used by default.
|
|
Feature detection of triangular meshes via neighbor supporting
Xiao-chao Wang, Jun-jie Cao, Xiu-ping Liu, Bao-jun Li, Xi-quan Shi, Yi-zhen Sun
Front. Inform. Technol. Electron. Eng., 2012, 13(6): 440-451.
https://doi.org/10.1631/jzus.C1100324
We propose a robust method for detecting features on triangular meshes by combining normal tensor voting with neighbor supporting. Our method contains two stages: feature detection and feature refinement. First, the normal tensor voting method is modified to detect the initial features, which may include some pseudo features. Then, at the feature refinement stage, a novel salient measure deriving from the idea of neighbor supporting is developed. Benefiting from the integrated reliable salient measure feature, pseudo features can be effectively discriminated from the initially detected features and removed. Compared to previous methods based on the differential geometric property, the main advantage of our method is that it can detect both sharp and weak features. Numerical experiments show that our algorithm is robust, effective, and can produce more accurate results. We also discuss how detected features are incorporated into applications, such as feature-preserving mesh denoising and hole-filling, and present visually appealing results by integrating feature information.
|
|
A submatrix-based P300 brain-computer interface stimulus presentation paradigm
Jin-he Shi, Ji-zhong Shen, Yu Ji, Feng-lei Du
Front. Inform. Technol. Electron. Eng., 2012, 13(6): 452-459.
https://doi.org/10.1631/jzus.C1100328
The P300 event-related potential (ERP), with advantages of high stability and no need for initial training, is one of the most commonly used responses in brain-computer interface (BCI) applications. The row/column paradigm (RCP) that flashes an entire column or row of a visual matrix has been used successfully to help patients to spell words. However, RCP remains subject to errors that slow down communication, such as adjacency-distraction and double-flash errors. In this paper, a new visual stimulus presentation paradigm called the submatrix-based paradigm (SBP) is proposed. SBP divides a 6×6 matrix into several submatrices. Each submatrix flashes in single cell paradigm (SCP) mode and separately performs an ensemble averaging method according to the sequences. The parameter of sequence number is used to improve further the accuracy and information transfer rate (ITR). SBP has advantages of flexibility in division of the matrix and better expansion capability, which were confirmed with different divisions of the 6×6 matrix and expansion to a 6×9 matrix. Stimulation results show that SBP is superior to RCP in performance and user acceptability.
|
|
High-performance low-leakage regions of nano-scaled CMOS digital gates under variations of threshold voltage and mobility
Hossein Aghababa, Behjat Forouzandeh, Ali Afzali-Kusha
Front. Inform. Technol. Electron. Eng., 2012, 13(6): 460-471.
https://doi.org/10.1631/jzus.C1100273
We propose a modeling methodology for both leakage power consumption and delay of basic CMOS digital gates in the presence of threshold voltage and mobility variations. The key parameters in determining the leakage and delay are OFF and ON currents, respectively, which are both affected by the variation of the threshold voltage. Additionally, the current is a strong function of mobility. The proposed methodology relies on a proper modeling of the threshold voltage and mobility variations, which may be induced by any source. Using this model, in the plane of threshold voltage and mobility, we determine regions for different combinations of performance (speed) and leakage. Based on these regions, we discuss the trade-off between leakage and delay where the leakage-delay-product is the optimization objective. To assess the accuracy of the proposed model, we compare its predictions with those of HSPICE simulations for both basic digital gates and ISCAS85 benchmark circuits in 45-, 65-, and 90-nm technologies.
|
|
Accurate real-time stereo correspondence using intra- and inter-scanline optimization
Li Yao, Dong-xiao Li, Jing Zhang, Liang-hao Wang, Ming Zhang
Front. Inform. Technol. Electron. Eng., 2012, 13(6): 472-482.
https://doi.org/10.1631/jzus.C1100311
This paper deals with a novel stereo algorithm that can generate accurate dense disparity maps in real time. The algorithm employs an effective cross-based variable support aggregation strategy within a scanline optimization framework. Rather than matching intensities directly, the use of adaptive support aggregation allows for precisely handling the weak textured regions as well as depth discontinuities. To improve the disparity results with global reasoning, we reformulate the energy function on a tree structure over the whole 2D image area, as opposed to dynamic programming of individual scanlines. By applying both intra- and inter-scanline optimizations, the algorithm reduces the typical ‘streaking’ artifact while maintaining high computational efficiency. The experimental results are evaluated on the Middlebury stereo dataset, showing that our approach is among the best for all real-time approaches. We implement the algorithm on a commodity graphics card with CUDA architecture, running at about 35 fames/s for a typical stereo pair with a resolution of 384×288 and 16 disparity levels.
|
7 articles
|