Most Downloaded Articles

Published in last 1 year| In last 2 years| In last 3 years| All| Most Downloaded in Recent Month | Most Downloaded in Recent Year|

Published in last 1 year
Please wait a minute...
Compound fault decoupling diagnosis method based on improved Transformer
Yu-xiang WANG,Zhi-wei ZHONG,Peng-cheng XIA,Yi-xiang HUANG,Cheng-liang LIU
Journal of ZheJiang University (Engineering Science)    2023, 57 (5): 855-864.   DOI: 10.3785/j.issn.1008-973X.2023.05.001
Abstract   HTML PDF (2584KB) ( 631 )  

Most of the compound fault diagnosis methods regard the compound fault as a new single fault type, ignoring the interaction of internal single faults, and the fault analysis is vague in granularity and poor in interpretation. An improved Transformer-based compound fault decoupling diagnosis method was proposed for industrial environments with very little compound fault data. The diagnosis process included pre-processing, feature extraction and fault decoupling. With introducing the decoder of the Transformer, the cross-attention mechanism enables each single fault label to adaptively in the extracted feature layer focus on the discriminative feature region corresponding to the fault feature and predicts the output probability to achieve compound fault decoupling. Compound fault tests were designed to verify the effectiveness of the method compared with the advanced algorithms. The results showed that the proposed method had high diagnostic accuracy with a small number of single fault training samples and a very small number of compound fault training samples. The compound fault diagnosis accuracy reached 88.29% when the training set contained only 5 compound fault samples. Thus the new method has a significant advantage over other methods.

Table and Figures | Reference | Related Articles | Metrics
Multi-agent pursuit and evasion games based on improved reinforcement learning
Ya-li XUE,Jin-ze YE,Han-yan LI
Journal of ZheJiang University (Engineering Science)    2023, 57 (8): 1479-1486.   DOI: 10.3785/j.issn.1008-973X.2023.08.001
Abstract   HTML PDF (1158KB) ( 422 )  

A multi-agent reinforcement learning algorithm based on priority experience replay and decomposed reward function was proposed in multi-agent pursuit and evasion games. Firstly, multi-agent twin delayed deep deterministic policygradient algorithm (MATD3) algorithm based on multi-agent deep deterministic policy gradient algorithm (MADDPG) and twin delayed deep deterministic policy gradient algorithm (TD3) was proposed. Secondly, the priority experience replay was proposed to determine the priority of experience and sample the experience with high reward, aiming at the problem that the reward function is almost sparse in the multi-agent pursuit and evasion problem. In addition, a decomposed reward function was designed to divide multi-agent rewards into individual rewards and joint rewards to maximize the global and local rewards. Finally, a simulation experiment was designed based on DEPER-MATD3. Comparison with other algorithms showed that DEPER-MATD3 algorithm solved the over-estimation problem, and the time consumption was improved compared with MATD3 algorithm. In the decomposed reward function environment, the global mean rewards of the pursuers were improved, and the pursuers had a greater probability of chasing the evader.

Table and Figures | Reference | Related Articles | Metrics
Structured image super-resolution network based on improved Transformer
Xin-dong LV,Jiao LI,Zhen-nan DENG,Hao FENG,Xin-tong CUI,Hong-xia DENG
Journal of ZheJiang University (Engineering Science)    2023, 57 (5): 865-874.   DOI: 10.3785/j.issn.1008-973X.2023.05.002
Abstract   HTML PDF (1744KB) ( 412 )  

Most of existing structural image super-resolution reconstruction algorithms can only solve a specific single type of structural image super-resolution problem. A structural image super-resolution network based on improved Transformer (TransSRNet) was proposed. The network used the self-attention mechanism of Transformer mine a wide range of global information in spatial sequences. A spatial attention unit was built by using the hourglass block structure. The mapping relationship between the low-resolution space and the high-resolution space in the local area was concerned. The structured information in the image mapping process was extracted. The channel attention module was used to fuse the features of the self-attention module and the spatial attention module. The TransSRNet was evaluated on highly-structured CelebA, Helen, TCGA-ESCA and TCGA-COAD datasets. Results of evaluation showed that the TransSRNet model had a better overall performance compared with the super-resolution algorithms. With a upscale factor of 8, the PSNR of the face dataset and the medical image dataset could reach 28.726 and 26.392 dB respectively, and the SSIM could reach 0.844 and 0.881 respectively.

Table and Figures | Reference | Related Articles | Metrics
Improved method for blockchain Kademlia network based on small world theory
Yue ZHAO,He ZHAO,Haibo TAN,Bin YU,Wangnian YU,Zhiyu MA
Journal of ZheJiang University (Engineering Science)    2024, 58 (1): 1-9.   DOI: 10.3785/j.issn.1008-973X.2024.01.001
Abstract   HTML PDF (1194KB) ( 384 )  

An improved method for the blockchain Kademlia network based on small world theory was proposed aiming at the issue of sacrificing security to improve scalability in the current research of the blockchain Kademlia network. The idea of the small world theory was followed, and a probability formula for replacing expansion nodes was proposed. The probability was inversely proportional to the distance between nodes. The number of node replacements and additional nodes could be flexibly adjusted according to actual conditions. The theoretical analysis and experimental verification demonstrate that the network transformed by this method can reach a stable state. The experimental results showed that the transmission hierarchy required for broadcasting transaction messages throughout the network was reduced by 15.0% to 30.8% and the rate of locating nodes was increased. The level of network structure was reduced and network security was enhanced compared to other optimization algorithms that modify the network structure.

Table and Figures | Reference | Related Articles | Metrics
Driver fatigue state detection method based on multi-feature fusion
Hao-jie FANG,Hong-zhao DONG,Shao-xuan LIN,Jian-yu LUO,Yong FANG
Journal of ZheJiang University (Engineering Science)    2023, 57 (7): 1287-1296.   DOI: 10.3785/j.issn.1008-973X.2023.07.003
Abstract   HTML PDF (1481KB) ( 349 )  

The improved YOLOv5 object detection algorithm was used to detect the facial region of the driver and a multi-feature fusion fatigue state detection method was established aiming at the problem that existing fatigue state detection method cannot be applied to drivers under the epidemic prevention and control. The image tag data including the situation of wearing a mask and the situation without wearing a mask were established according to the characteristics of bus driving. The detection accuracy of eyes, mouth and face regions was improved by increasing the feature sampling times of YOLOv5 model. The BiFPN network structure was used to retain multi-scale feature information, which makes the prediction network more sensitive to targets of different sizes and improves the detection ability of the overall model. A parameter compensation mechanism was proposed combined with face keypoint algorithm in order to improve the accuracy of blink and yawn frame number. A variety of fatigue parameters were fused and normalized to conduct fatigue classification. The results of the public dataset NTHU and the self-made dataset show that the proposed method can recognize the blink and yawn of drivers both with and without masks, and can accurately judge the fatigue state of drivers.

Table and Figures | Reference | Related Articles | Metrics
Adaptive salp swarm algorithm for solving flexible job shop scheduling problem with transportation time
Hao-yi NIU,Wei-min WU,Ting-qi ZHANG,Wei SHEN,Tao ZHANG
Journal of ZheJiang University (Engineering Science)    2023, 57 (7): 1267-1277.   DOI: 10.3785/j.issn.1008-973X.2023.07.001
Abstract   HTML PDF (1024KB) ( 344 )  

An adaptive salp swarm algorithm was proposed by minimizing the makespan in order to solve the flexible job shop scheduling problem with transportation time. A three-layer coding scheme was designed based on random key in order to make the discrete solution space continuous. The inertia weight was introduced to evaluate the influence among followers in order to enhance the global exploration and local search performance of the algorithm. An adaptive leader-follower population update strategy was proposed, and the number of leaders and followers was adjusted by the population status. The tabu search strategy was combined with the neighborhood search in order to prevent the algorithm from falling into local optimum. The benchmark instances verified the effectiveness and superiority of the proposed algorithm. The influence of the number of AGVs on the makespan conforms to the law of diminishing marginal effect.

Table and Figures | Reference | Related Articles | Metrics
Video object detection algorithm based on multi-level feature aggregation under mixed sampler
Siyi QIN,Shaoyan GAI,Feipeng DA
Journal of ZheJiang University (Engineering Science)    2024, 58 (1): 10-19.   DOI: 10.3785/j.issn.1008-973X.2024.01.002
Abstract   HTML PDF (2492KB) ( 280 )  

A video object detection algorithm which was built upon the YOLOX-S single-stage detector based on mixed weighted reference-frame sampler and multi-level feature aggregation attention was proposed aiming at the problems of existing deep learning-based video object detection algorithms failing to simultaneously meet accuracy and efficiency requirements. Mixed weighted reference-frame sampler (MWRS) included weighted random sampling and local consecutive sampling to fully utilize effective global information and inter-frame local information. Multi-level feature aggregation attention (MFAA) module refined the classification features extracted by YOLOX-S based on self-attention mechanism, encouraging the network to learn richer feature information from multi-level features. The experimental results demonstrated that the proposed algorithm achieved an average precision AP50 of 77.8% on the ImageNet VID dataset with an average detection speed of 11.5 milliseconds per frame. The object classification and location performance are significantly better than that of YOLOX-S, indicating that the proposed algorithm achieves higher accuracy and faster detection speed.

Table and Figures | Reference | Related Articles | Metrics
Optimal design of long span steel-concrete composite floor system
Yi-fan WU,Wen-hao PAN,Yao-zhi LUO
Journal of ZheJiang University (Engineering Science)    2023, 57 (5): 988-996.   DOI: 10.3785/j.issn.1008-973X.2023.05.015
Abstract   HTML PDF (1380KB) ( 278 )  

An optimal design problem of long span steel-concrete composite floor system was investigated based on important parameters to aim at the economical and applicable conditions and optimizing orientations of long span steel-concrete composite floor. The objective function was set as economical equivalent steel consumption, and the variables contained eight parameters including dimensions of the steel section, intermediate distance between steel sections and thickness of concrete slab. The objective function was constrained to the plastic theory, standards and construction experience. The generalized reduced gradient method (GRG) was used to generate optimal sections with minimum economical equivalent steel consumption under different spans and live loads. According to the optimization results, the composite floor within a span of 60 m and a variable load of 6 kN/m2 could efficiently facilitate the composite effect of composite structure and has good economic benefits. As for super-long span composite floor when the traditional I-section floor system is not economically suitable, the composite floor with corrugated web is recommended for improving the bearing efficiency of steel web. A new cable-supported composite floor system was proposed based on the cable-supported beam for its high efficiency in mechanism.

Table and Figures | Reference | Related Articles | Metrics
Image super-resolution reconstruction based on dynamic attention network
Xiao-qiang ZHAO,Ze WANG,Zhao-yang SONG,Hong-mei JIANG
Journal of ZheJiang University (Engineering Science)    2023, 57 (8): 1487-1494.   DOI: 10.3785/j.issn.1008-973X.2023.08.002
Abstract   HTML PDF (1196KB) ( 269 )  

The image super-resolution algorithm adopts the same processing mode in channels and spatial domains with different importance, which leads to the failure of computing resources to concentrate on important features. Aiming at the above problem, an image super-resolution algorithm based on dynamic attention network was proposed. Firstly, the existing way of equalizing attention mechanisms was changed, and dynamic learning weights were assigned to different attention mechanisms by constructed dynamic attention modules, by which high-frequency information more needed by the network was obtained and high-quality pictures were reconstructed. Secondly, the double butterfly structure was constructed through feature reuse , which fully integrated the information from the two branches of attention and compensated for the missing feature information between the different attention mechanisms. Finally, model evaluation was conducted on Set5, Set14, BSD100, Urban100 and Manga109 datasets. Results show that the proposed algorithm has better overall performance than other mainstream super-resolution algorithms. When the amplification factor was 4, compared with the sub-optimal algorithm, the peak signal-to-noise ratio values were improved by 0.06, 0.07, 0.04, 0.15 and 0.15 dB, respectively, on the above five public test sets.

Table and Figures | Reference | Related Articles | Metrics
Continual learning framework of named entity recognition in aviation assembly domain
Pei-feng LIU,Lu QIAN,Xing-wei ZHAO,Bo TAO
Journal of ZheJiang University (Engineering Science)    2023, 57 (6): 1186-1194.   DOI: 10.3785/j.issn.1008-973X.2023.06.014
Abstract   HTML PDF (1091KB) ( 260 )  

In order to build an aviation assembly knowledge graph composed of assembly process information, assembly technology knowledge, related industry standards and internal connections of the three, a named entity recognition technology framework based on continual learning was proposed. The characteristic of the proposed framework was that it maintained high recognition performance throughout the progressive learning process from zero corpus to large-scale corpus, without relying on manual feature setting. A comparative performance experiment of the proposed framework was carried out in practical industrial scenarios, the experiment proceeded from general assembly and component assembly, and the manipulations of the pull rod and cable installation were regard as a specific experimental case. Experimental results show that the proposed framework is significantly better in accuracy, recall, and F1 value than previous algorithms, while handling different-scale corpus environments. And the credible results for named entity recognition tasks can be provided consistently by the proposed framework in the aviation assembly domain.

Table and Figures | Reference | Related Articles | Metrics
Efficient and adaptive semantic segmentation network based on Transformer
Hai-bo ZHANG,Lei CAI,Jun-ping REN,Ru-yan WANG,Fu LIU
Journal of ZheJiang University (Engineering Science)    2023, 57 (6): 1205-1214.   DOI: 10.3785/j.issn.1008-973X.2023.06.016
Abstract   HTML PDF (1465KB) ( 258 )  

There are two problems at semantic segmentation network based on Transformer: significant drop of the segmentation accuracy due to the resolution variation and high computational complexity of self-attention. An adaptive convolutional positional encoding module was proposed, using a property of zero-padding convolution to retain positional information. Using the property that the dimensions of specific matrices can cancel each other in the self-attention computation. A joint resampling self-attention module to reduce the computational burden was proposed. A decoder was designed to fuse feature maps from different stages, resulting in the construction of an efficient segmentation network EA-Former which was capable of adapting to different resolution inputs. The mean intersection over union of EA-Former on the ADE20K was 51.0% and on the Cityscapes was 83.9%. Compared with the mainstream segmentation methods, the proposed network could achieve competitive accuracy with lower computational complexity, and the degradation of the segmentation performance caused by the variation of the input resolution was alleviated.

Table and Figures | Reference | Related Articles | Metrics
Dynamic multi-objective optimization algorithm based on individual prediction
Wan-liang WANG,Zhong-kui CHEN,Fei WU,Zheng WANG,Meng-jiao YU
Journal of ZheJiang University (Engineering Science)    2023, 57 (11): 2133-2146.   DOI: 10.3785/j.issn.1008-973X.2023.11.001
Abstract   HTML PDF (1723KB) ( 253 )  

A dynamic multi-objective optimization algorithm based on individual prediction (IPS) was proposed to quickly track the Pareto optimal front of the dynamic multi-objective optimization problem that changed with the environment. Firstly, the special points with good convergence and diversity were selected by the reference point relation algorithm, and the environment changes can be quickly responded to by predicting the special points set. Secondly, a feedback correction mechanism for population center point predication was proposed, and in the process of predicting the non-dominant solution set, the prediction step size was corrected to make the prediction more accurate. Finally, to avoid the algorithm falling into local optimal, a hybrid diversity maintenance mechanism was proposed, which introduced random individuals generated by Latin hypercube sampling and a precision controllable mutation strategy to improve the diversity of the population. The proposed algorithm was compared with the other four dynamic multi-objective optimization algorithms. Experimental results show that IPS can balance the diversity and convergence of the population, and the experimental results are better than that of the other four algorithms on the FDA, DMOP, and F5~F10 test suite.

Table and Figures | Reference | Related Articles | Metrics
Binocular vision object 6D pose estimation based on circulatory neural network
Heng YANG,Zhuo LI,Zhong-yuan KANG,Bing TIAN,Qing DONG
Journal of ZheJiang University (Engineering Science)    2023, 57 (11): 2179-2187.   DOI: 10.3785/j.issn.1008-973X.2023.11.005
Abstract   HTML PDF (1068KB) ( 252 )  

A method for creating binocular dataset and a 6D pose estimation network called Binocular-RNN were proposed, in response to the problem of low accuracy in the current task of 6D pose estimation for objects. The existing images in the YCB-Video Dataset were used as the content captured by the left camera of the binocular system. The corresponding 3D object models in the YCB-Video Dataset were imported using Open GL, and the parameters related to each object were input to generate synthetic images captured by the virtual right camera of the binocular system. A monocular prediction network was utilized in the Binocular-RNN to extract geometric features from the left and right images in the binocular dataset, and recurrent neural network was used to fuse these geometric features and predict the 6D pose of the objects. The evaluation of Binocular-RNN and other pose estimation methods was based on the average distance of model points (ADD), average nearest point distance (ADDS), translation error and angle error. The results show that when the network was trained on a single object, the ADD or ADDS score of Binocular-RNN was 2.66 times that of PoseCNN and 1.15 times that of GDR-Net. Furthermore, the Binocular-RNN trained by the physics-based real-time rendering (Real+PBR) outperformed the DeepIM method based on deep neural network iterative 6D pose matching.

Table and Figures | Reference | Related Articles | Metrics
Prediction model of axial bearing capacity of concrete-filled steel tube columns based on XGBoost-SHAP
Xi-ze CHEN,Jun-feng JIA,Yu-lei BAI,Tong GUO,Xiu-li DU
Journal of ZheJiang University (Engineering Science)    2023, 57 (6): 1061-1070.   DOI: 10.3785/j.issn.1008-973X.2023.06.001
Abstract   HTML PDF (2896KB) ( 239 )  

To reliably and accurately predict the axial bearing capacity of concrete-filled steel tube (CFST) columns, a prediction model of CFST column axial bearing capacity with ensemble machine learning was developed and explained. The quality of the CFST column database was evaluated using the Mahalanobis distance, the prediction model of CFST column axial bearing capacity was established by the extreme gradient boosting (XGBoost) algorithm, and the optimal hyperparameter combination of the model was found using the K-Fold cross-validation (K-Fold CV) and the tree-structured Parzen estimator (TPE) algorithms. The predicted values of the optimized XGBoost model were compared with the calculated values of the existing methods and the unoptimized XGBoost model using different evaluation metrics. The Shapley additive explanations (SHAP) approach was used to produce both global and local explanations for the predictions of XGBoost model. Results show that, after hyperparameter tuning, the XGBoost model’s performance surpasses performance of relevant standards and empirical formulas, and the SHAP approach can effectively explain the XGBoost model’s output.

Table and Figures | Reference | Related Articles | Metrics
Lightweight semantic segmentation network for underwater image
Hao-ran GUO,Ji-chang GUO,Yu-dong WANG
Journal of ZheJiang University (Engineering Science)    2023, 57 (7): 1278-1286.   DOI: 10.3785/j.issn.1008-973X.2023.07.002
Abstract   HTML PDF (2385KB) ( 238 )  

A semantic segmentation network was designed for underwater images. A lightweight and efficient encoder-decoder architecture was used by considering the trade-off between speed and accuracy. Inverted bottleneck layer and pyramid pooling module were designed in the encoder part to efficiently extract features. Feature fusion module was constructed in the decoder part in order to fuse multi-level features, which improved the segmentation accuracy. Auxiliary edge loss function was used to train the network better aiming at the problem of fuzzy edges of underwater images, and the edges of segmentation were refined through the supervision of semantic boundaries. The experimental data on the underwater semantic segmentation dataset SUIM show that the network achieves 53.55% mean IoU with an inference speed of 258.94 frames per second on one NVIDIA GeForce GTX 1080 Ti card for the input image of pixel 320×256, which can achieve real-time processing speed while maintaining high accuracy.

Table and Figures | Reference | Related Articles | Metrics
Dynamic knowledge graph inference method combining static facts and repeated historical facts
Dong LIN,Yong-qiang LI,Xiang QIU,Yuan-jing FENG,Bi-feng XIE
Journal of ZheJiang University (Engineering Science)    2023, 57 (10): 1915-1922.   DOI: 10.3785/j.issn.1008-973X.2023.10.001
Abstract   HTML PDF (856KB) ( 236 )  

A static-historical network (Sta-HisNet) method combining static facts and repeating historical facts was proposed, aiming at the problem that existing dynamic knowledge graph reasoning methods tend to overlook the vast amount of static information and repeating historical facts present in the dynamic knowledge graphs. The hidden static connections between entities in the dynamic knowledge graph were used to form static facts, assisting in the inference of the dynamic knowledge graph. Historical facts were employed to construct a historical vocabulary, and the historical vocabulary was queried when predicting the future. Facts that had not occurred in history were punished, and the probability of predicting duplicate historical facts was increased. Experiments were conducted on two public datasets for dynamic knowledge graph reasoning. Comparative experiments were performed using five mainstream models as baselines. In entity prediction experiments, the mean reciprocal rank (MRR) was 0.489 1 and 0.530 3, and Hits@10 reached 0.588 7 and 0.616 5 respectively, demonstrating the effectiveness of the proposed method.

Table and Figures | Reference | Related Articles | Metrics
Design and verification of autonomous docking guidance system for modular flying vehicle
Chen WANG,Wei LIN,Liang-peng HU,Jun-ming ZHANG
Journal of ZheJiang University (Engineering Science)    2023, 57 (12): 2345-2355.   DOI: 10.3785/j.issn.1008-973X.2023.12.001
Abstract   HTML PDF (3722KB) ( 235 )  

The process architecture, software and hardware systems, core algorithms, and the validation of the autonomous docking guidance system for a modular flying vehicle were investigated. The remote, medium range, and short range multi segment fusion guidance was adopted based on the transition of guidance methods. The point density clustering algorithm and the kernel correlation filter algorithm were used to provide smooth fusion information in response to the false detections and missed detections in the actual use of YOLOv4-tiny. A correction factor method was proposed to achieve fusion correction of AprilTag measurement data in the short range guidance stage, and the pose compensation algorithm was used to solve the camera pose problem of fixed connection between the camera and the drone. The dark light image enhancement algorithm was introduced and combined with the visual guidance algorithm to meet the docking requirements in low-light environment. A simulation platform and an engineering application platform were built, and the process, the system architecture and the algorithms were verified step by step. Experimental results showed that the engineering application flight platform could safely, stably and accurately guide the landing into a conical docking mechanism with an allowable error of only 6 cm and an angle error of 5°. The results prove that the developed autonomous docking technology has good accuracy and reliability.

Table and Figures | Reference | Related Articles | Metrics
Daily water supply prediction method based on integrated learning and deep learning
Xin-lei ZHOU,Hai-ting GU,Jing LIU,Yue-ping XU,Fang GENG,Chong WANG
Journal of ZheJiang University (Engineering Science)    2023, 57 (6): 1120-1127.   DOI: 10.3785/j.issn.1008-973X.2023.06.007
Abstract   HTML PDF (1780KB) ( 231 )  

Making use of the historical daily water supply data of four water plants in Yiwu city, a new water supply prediction model based on long short term memory (LSTM) neural network improved by integrated learning algorithm was proposed, in order to effectively resolve the problems of low accuracy and insufficient generalization ability of the daily water supply prediction. In the model, a historical daily water supply after pre-processing by Pauta criterion was taken as the data input, the LSTM neural network with long-term temporal information memory was applied as the weak predictor of integrated learning, the grid search method was utilized for network hyperparameter tuning, and the AdaBoost integrated learning algorithm was used to weight the combination of the weak predictors to obtain the strong predictor. Results show that the improved LSTM neural network based on integrated learning algorithm has the highest Nash efficiency coefficient (NSE) with the lowest root mean square error (RMSE) and mean absolute error (MAE), the best fitting effect on the change trend and the peak value of daily water supply data, compared with the random forest (RF), AdaBoost and LSTM neural network. The time series prediction accuracy of the improved LSTM water supply forecasting model is significantly improved, with good generalization ability and stable prediction performance. The results can provide an important reference for the rational allocation of urban water resources planning and integrated intelligent water supply scheduling.

Table and Figures | Reference | Related Articles | Metrics
UAV dense small target detection algorithm based on YOLOv5s
Jun HAN,Xiao-ping YUAN,Zhun WANG,Ye CHEN
Journal of ZheJiang University (Engineering Science)    2023, 57 (6): 1224-1233.   DOI: 10.3785/j.issn.1008-973X.2023.06.018
Abstract   HTML PDF (2789KB) ( 221 )  

The dense small target detection algorithm LSA_YOLO based on YOLOv5s for UAVs with complex backgrounds and multiples of small targets with dense distribution was proposed for UAV images. A multi-scale feature extraction module LM-fem was constructed to enhance the feature extraction capability of the network. A new hybrid domain attention module S-ECA relying on multi-scale contextual information has been put forward and a algorithm focus on target information was established aiming to suppress the interference of complex backgrounds. The adaptive weight dynamic fusion structure AFF was designed to assign reasonable fusion weights to both shallow and deep features. The capability of algorithm in detecting dense small targets in complex backgrounds was improved given the application of S-ECA and AFF in the structure of PANet. The loss function Focal-EIOU was utilized instead of the loss function CIOU to accelerate model detection efficiency. Experimental results on the public dataset VisDrone2021 public dataset show that the average detection accuracy for all target classes improves from 51.5% for YOLOv5s to 57.6% for LSA_YOLO when the set input resolution is set to 1 504 × 1 504.

Table and Figures | Reference | Related Articles | Metrics
Multi branch Siamese network target tracking based on double attention mechanism
Xiao-yan LI,Peng WANG,Jia GUO,Xue LI,Meng-yu SUN
Journal of ZheJiang University (Engineering Science)    2023, 57 (7): 1307-1316.   DOI: 10.3785/j.issn.1008-973X.2023.07.005
Abstract   HTML PDF (2692KB) ( 221 )  

A multi branch Siamese network target tracking algorithm based on dual attention mechanism was proposed in order to solve the problem of inaccurate positioning in the SiamRPN++ single target tracking algorithm when the target was briefly occluded and the appearance drastically changed. SiamRPN++ with lightweight backbone network was adopted as the basic algorithm. The algorithm was combined with lightweight channel and spatial attention mechanism in order to improve the anti-interference ability when dealing with occlusion challenges during the tracking process. A template branch was added from the previous frame, and the appearance changes of the target were dynamically updated. The ability to distinguish between foreground and background was enhanced during the tracking process using triplet loss. Local expansion search was conducted based on the speed of the target’s movement in order to enable timely and accurate tracking of the target even after short-term occlusion. The experimental results showed that the improved algorithm improved the success rate and precision of the OTB100 dataset by 2.4% and 1.6%, respectively, compared to the original algorithm. The average center position error decreased by 28.97 pixels, and the average overlap rate increased by 14.5%.

Table and Figures | Reference | Related Articles | Metrics