Most Downloaded Articles

Published in last 1 year | In last 2 years| In last 3 years| All| Most Downloaded in Recent Month | Most Downloaded in Recent Year|

In last 2 years
Please wait a minute...
Survey of deep learning based EEG data analysis technology
Bo ZHONG,Pengfei WANG,Yiqiao WANG,Xiaoling WANG
Journal of ZheJiang University (Engineering Science)    2024, 58 (5): 879-890.   DOI: 10.3785/j.issn.1008-973X.2024.05.001
Abstract   HTML PDF (690KB) ( 9047 )  

A thorough analysis and cross-comparison of recent relevant works was provided, outlining a closed-loop process for EEG data analysis based on deep learning. EEG data were introduced, and the application of deep learning in three key stages: preprocessing, feature extraction, and model generalization was unfolded. The research ideas and solutions provided by deep learning algorithms in the respective stages were delineated, including the challenges and issues encountered at each stage. The main contributions and limitations of different algorithms were comprehensively summarized. The challenges faced and future directions of deep learning technology in handling EEG data at each stage were discussed.

Table and Figures | Reference | Related Articles | Metrics
Multimodal sentiment analysis model based on multi-task learning and stacked cross-modal Transformer
Qiao-hong CHEN,Jia-jin SUN,Yang-bo LOU,Zhi-jian FANG
Journal of ZheJiang University (Engineering Science)    2023, 57 (12): 2421-2429.   DOI: 10.3785/j.issn.1008-973X.2023.12.009
Abstract   HTML PDF (1171KB) ( 2622 )  

A new multimodal sentiment analysis model (MTSA) was proposed on the basis of cross-modal Transformer, aiming at the difficult retention of the modal feature heterogeneity for single-modal feature extraction and feature redundancy for cross-modal feature fusion. Long short-term memory (LSTM) and multi-task learning framework were used to extract single-modal contextual semantic information, the noise was removed and the modal feature heterogeneity was preserved by adding up auxiliary modal task losses. Multi-tasking gating mechanism was used to adjust cross-modal feature fusion. Text, audio and visual modal features were fused in a stacked cross-modal Transformer structure to improve fusion depth and avoid feature redundancy. MTSA was evaluated in the MOSEI and SIMS data sets, results show that compared with other advanced models, MTSA has better overall performance, the accuracy of binary classification reached 83.51% and 84.18% respectively.

Table and Figures | Reference | Related Articles | Metrics
Multi-behavior aware service recommendation based on hypergraph graph convolution neural network
Jia-wei LU,Duan-ni LI,Ce-ce WANG,Jun XU,Gang XIAO
Journal of ZheJiang University (Engineering Science)    2023, 57 (10): 1977-1986.   DOI: 10.3785/j.issn.1008-973X.2023.10.007
Abstract   HTML PDF (1380KB) ( 1940 )  

A multi-behavior aware service recommendation method based on hypergraph graph convolutional neural network (MBSRHGNN) was proposed to resolve the problem of insufficient high-order service feature extraction in existing service recommendation methods. A multi-hypergraph was constructed according to user-service interaction types and service mashups. A dual-channel hypergraph convolutional network was designed based on the spectral decomposition theory with functional and structural properties of multi-hypergraph. Chebyshev polynomial was used to approximate hypergraph convolution kernel to reduce computational complexity. Self-attention mechanism and multi-behavior recommendation methods were combined to measure the importance difference between multi-behavior interactions during the hypergraph convolution process. A hypergraph pooling method named HG-DiffPool was proposed to reduce the feature dimensionality. The probability distribution for recommending different services was learned by integrating service embedding vector and hypergraph signals. Real service data was obtained by the crawler and used to construct datasets with different sparsity for experiments. Experimental results showed that the MBSRHGNN method could adapt to recommendation scenario with highly sparse data, and was superior to the existing baseline methods in accuracy and relevance.

Table and Figures | Reference | Related Articles | Metrics
Research overview on touchdown detection methods for footed robots
Xiaoyong JIANG,Kaijian YING,Qiwei WU,Xuan WEI
Journal of ZheJiang University (Engineering Science)    2024, 58 (2): 334-348.   DOI: 10.3785/j.issn.1008-973X.2024.02.012
Abstract   HTML PDF (1751KB) ( 1771 )  

The effects of leg structure design, foot-end design and sensor design on touchdown detection were comprehensively discussed by analyzing the existing legged robot touchdown detection methods. The touchdown method for direct detection of external sensors, the touchdown detection method based on kinematics and dynamics, and the touchdown detection method based on learning were summarized. Touchdown detection methods were summarized in three special scenarios: slippery ground, soft ground, and non-foot-end contact. The application scenarios of touchdown detection technology were analyzed, including the three application scenarios of motion control requirements, navigation applications, and terrain and geological sensing. The development trends were pointed out, which related to the four major touchdown detection methods of hardware improvement and integration, multi-mode touchdown detection, multi-sensor fusion touchdown detection, and intelligent touchdown detection. The specific relationships between various touchdown detection algorithms were summarized, which provided guidance for the development of follow-up technology for touchdown detection and specific applications of touchdown detection.

Table and Figures | Reference | Related Articles | Metrics
Spatial-temporal multi-graph convolution for traffic flow prediction by integrating knowledge graphs
Jinye LI,Yongqiang LI
Journal of ZheJiang University (Engineering Science)    2024, 58 (7): 1366-1376.   DOI: 10.3785/j.issn.1008-973X.2024.07.006
Abstract   HTML PDF (1616KB) ( 1656 )  

A spatial-temporal multi-graph convolution traffic flow prediction model by integrating static and dynamic knowledge graphs was proposed, as current traffic flow prediction methods focus on the spatial-temporal correlation of traffic information and fail to fully take into account the influence of external factors on traffic. An urban traffic knowledge graph and four road network topological graphs with distinct semantics were systematically constructed, drawing upon the road traffic information and the external factors. The urban traffic knowledge graph was inputted into the relational evolution graph convolutional neural network to realize the knowledge embedding. The traffic flow matrix and the knowledge embedding were integrated using the knowledge fusion module. The four road network topology graphs and the traffic flow matrix with fused knowledge were fed into the spatial-temporal multi-graph convolution module to extract spatiotemporal features, and the traffic flow prediction value was outputted through the fully connected layer. The model performance was evaluated on a Hangzhou traffic data set. Compared with the advanced baseline, the performance of the proposed model improved by 5.76%-10.71%. Robustness experiment results show that the proposed model has a strong ability to resist interference.

Table and Figures | Reference | Related Articles | Metrics
Fact-based similar case retrieval methods based on statutory knowledge
Linrui LI,Dongsheng WANG,Hongjie FAN
Journal of ZheJiang University (Engineering Science)    2024, 58 (7): 1357-1365.   DOI: 10.3785/j.issn.1008-973X.2024.07.005
Abstract   HTML PDF (814KB) ( 1224 )  

Existing research on the retrieval task of similar cases ignores the legal logic that the model should imply, and cannot adapt to the requirements of case similarity criteria in practical applications. Few datasets in Chinese for case retrieval tasks are difficult to meet the research needs. A similar case retrieval model was proposed based on legal logic and strong interpretability, and a case event logic graph was constructed based on predicate verbs. The statutory knowledge corresponding to various crimes was integrated into the proposed model, and the extracted elements were input to a neural network-based scorer to realize the task of case retrieval accurately and efficiently. A Confusing-LeCaRD dataset was built for the case retrieval task with a confusing group of charges as the main retrieval causes. Experiments show that the normalized discounted cumulative gain of the proposed model on the LeCaRD dataset and Confusing-LeCaRD dataset was 90.95% and 94.64%, and the model was superior to TF-IDF, BM25 and BERT-PLI in all indicators.

Table and Figures | Reference | Related Articles | Metrics
Structure and property of 2219 aluminum alloy fabricated by droplet+arc additive manufacturing
Yongchao WANG,Zhengying WEI,Pengfei HE
Journal of ZheJiang University (Engineering Science)    2024, 58 (8): 1585-1595.   DOI: 10.3785/j.issn.1008-973X.2024.08.006
Abstract   HTML PDF (7116KB) ( 1133 )  

A new arc additive manufacturing process—droplet+arc additive manufacturing (DAAM) technology was applied to manufacture aluminum alloy samples in order to improve the quality and the efficiency of aluminum alloy. A new droplet generation system (DGS) was applied instead of the conventional wire feeding system, which makes the material addition and arc energy independent of each other. The formed material is 2219 aluminum alloy, and a trace amount of Mg element was added through the DGS. A thin-walled structure was deposited using the DAAM system at a significantly higher deposition rate (160 $ {\mathrm{m}\mathrm{m}}^{3}/\mathrm{s} $) than conventional wire and arc additive manufacturing techniques. The microstructure of the cross section of the thin-walled structure was observed and analyzed. Results showed that the grain morphology of the thin-walled structure was dominated by columnar crystals and exhibited a periodic distribution of inner-layer columnar crystals and inter-layer equiaxed crystals. The average tensile strengths in the horizontal and vertical directions were 455.4 MPa and 417.0 MPa after T6 heat treatment, while the yield strengths were 342.2 MPa and 316.4 MPa, respectively. The comparison results with the previous studies show that the addition of Mg element increases the yield strength of 2219 aluminum alloy, but leads to a corresponding decrease in elongation.

Table and Figures | Reference | Related Articles | Metrics
Survey of multi-objective particle swarm optimization algorithms and their applications
Qianlin YE,Wanliang WANG,Zheng WANG
Journal of ZheJiang University (Engineering Science)    2024, 58 (6): 1107-1120.   DOI: 10.3785/j.issn.1008-973X.2024.06.002
Abstract   HTML PDF (1559KB) ( 1109 )  

Few existing studies cover the state-of-the-art multi-objective particle swarm optimization (MOPSO) algorithms. To fill the gap in this area, the research background of multi-objective optimization problems (MOPs) was introduced, and the fundamental theories of MOPSO were described. The MOPSO algorithms were divided into three categories according to their features: Pareto-dominated-based MOPSO, decomposition-based MOPSO, and indicator-based MOPSO, and a detailed description of their existing classical algorithms was also developed. Next, relevant evaluation indicators were described, and seven representative algorithms were selected for performance analysis. The experimental results demonstrated the strengths and weaknesses of each of the traditional MOPSO and three categories of improved MOPSO algorithms. Among them, the indicator-based MOPSO performed better in terms of convergence and diversity. Then, the applications of MOPSO algorithms in production scheduling, image processing, and power systems were briefly introduced. Finally, the limitations and future research directions of the MOPSO algorithm for solving complex optimization problems were discussed.

Table and Figures | Reference | Related Articles | Metrics
Compound operation scheduling optimization in four-way shuttle warehouse system
Li-li XU,Yan ZHAN,Jian-sha LU,Yi-ding LANG
Journal of ZheJiang University (Engineering Science)    2023, 57 (11): 2188-2199.   DOI: 10.3785/j.issn.1008-973X.2023.11.006
Abstract   HTML PDF (1485KB) ( 1074 )  

The compound operation scheduling optimization in four-way shuttle warehouse system was studied to improve the efficiency of storage system operations. A mathematical model was established with the goal of minimizing inbound and outbound operation times to optimize the scheduling problem of the system. This model was based on the combined operation of a four-way shuttle and an elevator, and the collaborative operation characteristics in both horizontal and vertical directions were considered. Furthermore, the model was analyzed under various operating modes by examining the connection between the start and end operation times of the four-way shuttle and the elevator, as well as the starting operation tiers. The method based on the task classification was proposed to initialize the population of the genetic algorithm. The crossover and the mutation of the population were completed to solve the model, and then the task allocation and sequence of the system were optimized. Some experiments were conducted to verify the effectiveness of the improved genetic algorithm. The influence of the number of four-way shuttles on the operation time and system cost was analyzed, and the operation efficiencies of single and double elevators in the system were compared. The effectiveness of the genetic algorithm based on the task classification was verified, and the results showed that the operation efficiency was improved by at least 10.3%, by using the proposed algorithm.

Table and Figures | Reference | Related Articles | Metrics
Solution approach of Burgers-Fisher equation based on physics-informed neural networks
Jian XU,Hai-long ZHU,Jiang-le ZHU,Chun-zhong LI
Journal of ZheJiang University (Engineering Science)    2023, 57 (11): 2160-2169.   DOI: 10.3785/j.issn.1008-973X.2023.11.003
Abstract   HTML PDF (1371KB) ( 1054 )  

Physical information was divided into rule information and numerical information, in order to explore the role of physical information in training neural network when solving differential equations with physics-informed neural network (PINN). The logic of PINN for solving differential equations was explained, as well as the data-driven approach of physical information and neural network interpretability. Synthetic loss function of neural network was designed based on the two types of information, and the training balance degree was established from the aspects of training sampling and training intensity. The experiment of solving the Burgers-Fisher equation by PINN showed that PINN can obtain good solution accuracy and stability. In the training of neural networks for solving the equation, numerical information of the Burgers-Fisher equation can better promote neural network to approximate the equation solution than rule information. The training effect of neural network was improved with the increase of training sampling, training epoch, and the balance between the two types of information. In addition, the solving accuracy of the equation was improved with the increasing of the scale of neural network, but the training time of each epoch was also increased. In a fixed training time, it is not true that the larger scale of the neural network, the better the effect.

Table and Figures | Reference | Related Articles | Metrics
Improved method for blockchain Kademlia network based on small world theory
Yue ZHAO,He ZHAO,Haibo TAN,Bin YU,Wangnian YU,Zhiyu MA
Journal of ZheJiang University (Engineering Science)    2024, 58 (1): 1-9.   DOI: 10.3785/j.issn.1008-973X.2024.01.001
Abstract   HTML PDF (1194KB) ( 1038 )  

An improved method for the blockchain Kademlia network based on small world theory was proposed aiming at the issue of sacrificing security to improve scalability in the current research of the blockchain Kademlia network. The idea of the small world theory was followed, and a probability formula for replacing expansion nodes was proposed. The probability was inversely proportional to the distance between nodes. The number of node replacements and additional nodes could be flexibly adjusted according to actual conditions. The theoretical analysis and experimental verification demonstrate that the network transformed by this method can reach a stable state. The experimental results showed that the transmission hierarchy required for broadcasting transaction messages throughout the network was reduced by 15.0% to 30.8% and the rate of locating nodes was increased. The level of network structure was reduced and network security was enhanced compared to other optimization algorithms that modify the network structure.

Table and Figures | Reference | Related Articles | Metrics
Open-set 3D model retrieval algorithm based on multi-modal fusion
Fuxin MAO,Xu YANG,Jiaqiang CHENG,Tao PENG
Journal of ZheJiang University (Engineering Science)    2024, 58 (1): 61-70.   DOI: 10.3785/j.issn.1008-973X.2024.01.007
Abstract   HTML PDF (993KB) ( 980 )  

An open domain 3D model retrieval algorithm was proposed in order to meet the requirement of management and retrieval of massive new model data under the open domain. The semantic consistency of multi-modal information can be effectively used. The category information among unknown samples was explored with the help of unsupervised algorithm. Then the unknown class information was introduced into the parameter optimization process of the network model. The network model has better characterization and retrieval performance in the open domain condition. A hierarchical multi-modal information fusion model based on a Transformer structure was proposed, which could effectively remove the redundant information among the modalities and obtain a more robust model representation vector. Experiments were conducted on the dataset ModelNet40, and the experiments were compared with other typical algorithms. The proposed method outperformed all comparative methods in terms of mAP metrics, which verified the effectiveness of the method in terms of retrieval performance improvement.

Table and Figures | Reference | Related Articles | Metrics
Structural design and experimental analysis of new UHPC-NC composite bent cap
Cijun LIU,Lifeng LI,Xudong SHAO,Tao CHEN,Guanhua ZHANG,Jiawei WANG,Huazhen YANG,Yalong ZHAO
Journal of ZheJiang University (Engineering Science)    2024, 58 (11): 2355-2363.   DOI: 10.3785/j.issn.1008-973X.2024.11.017
Abstract   HTML PDF (2785KB) ( 942 )  

A new composite bent cap consisting of a shell made of steel plate and ultra-high-performance concrete (UHPC) and cast-in-place core normal concrete (NC) was proposed in order to realize the assembly and rapid construction of ultra-large-scale bent cap for urban viaducts or highway reconstruction and expansion projects. Parametric analysis of different UHPC and steel plate thickness was conducted in order to analyze the influence of the thickness of UHPC and steel mold plate on its stress performance. Results showed that the stiffness of the shell was affected by the thickness of UHPC and steel plate and their ratio together under the action of self-weight. The thicker the UHPC and steel plate are, the better the stress performance of the shell is, but the economy will be reduced when tensioning prestress and casting concrete. It is recommended to use UHPC thickness of 70 mm and steel plate thickness of 6 mm. A piece of 1∶2.5 scaled-down model was designed and static loading test was conducted in order to verify the feasibility and safety of this scheme. Results show that the new UHPC-NC composite bent cap has good force performance and high safety reserve, which can provide reference for the assembly construction of bent cap.

Table and Figures | Reference | Related Articles | Metrics
Pavement distress situation prediction method based on graph neural network
Zechao MA,Xiaoming LIU,Hanqing XIA,Weiqiang WANG,Jiuzeng WANG,Haitao SHEN
Journal of ZheJiang University (Engineering Science)    2024, 58 (12): 2596-2608.   DOI: 10.3785/j.issn.1008-973X.2024.12.019
Abstract   HTML PDF (1111KB) ( 858 )  

A road pavement distress situation forecasting method employing graph convolutional networks was introduced, addressing the prediction problem of road pavement distress generation and deterioration. Firstly, a topological network was established through clustering algorithms, selecting the main influencing factors of the target pavement distress during its evolution. Subsequently, to enhance the expressive capability of the graph neural network for distress information, a graph topology enhancement method was employed, constructing views related to distress information from both static and dynamic aspects. Finally, an enhanced graph neural network (GNN) architecture was applied, by incorporating attention mechanisms in the view dimension to adjust the influence of different views and utilizing Transformer and GRU modules in the temporal dimension to enhance the predictive performance of the model for pavement distress states over extended time sequences. The internal calibration tests of the model, including ablation studies, multi-sample testing, and hyperparameter control group validation, demonstrated the applicability and stability of the proposed model. For the large and sparse pavement disease dataset, the mean absolute error of this model converged within 4.0, which was better than the results of the traditional prediction algorithms in terms of comprehensive performance.

Table and Figures | Reference | Related Articles | Metrics
Parallel optimization of large-point FFT on Sunway 26010
Jun GUO,Peng LIU,Xinyao YANG,Lufei ZHANG,Dong WU
Journal of ZheJiang University (Engineering Science)    2024, 58 (1): 78-86.   DOI: 10.3785/j.issn.1008-973X.2024.01.009
Abstract   HTML PDF (1231KB) ( 857 )  

A many-core parallel optimization scheme for large-point FFT was proposed according to the structural characteristics and programming specifications of the domestic Sunway 26010 processor, which was used in the Sunway Taihu Light supercomputer. The scheme was derived from the classic Cooley-Tukey FFT algorithm, and was accelerated in parallel by iteratively decomposing the one-dimensional large-point data into two-dimensional small-scale matrices. The "column-sharing, row-continuity" strategy was specially proposed in order to solve the problem of reading, writing, transposing and calculating of the "column FFT" of the matrix. The computing resources and transmission bandwidth of the many-core processor were fully utilized by reasonable data allocation, rearrangement and exchange combined with other optimization methods such as SIMD vectorization, twiddle factor optimization, double-buffering, register communication and stride transmission. The experimental results prove that the single core-group of 64 slave cores running parallel program can achieve a maximum speed-up of 65x and an average speed-up of more than 48x compared with the main core running the FFTW library.

Table and Figures | Reference | Related Articles | Metrics
Research progress of YOLO detection technology for traffic object
Hongzhao DONG,Shaoxuan LIN,Yini SHE
Journal of ZheJiang University (Engineering Science)    2025, 59 (2): 249-260.   DOI: 10.3785/j.issn.1008-973X.2025.02.003
Abstract   HTML PDF (3207KB) ( 834 )  

The development and research status of YOLO algorithm in traffic object detection were systematically summarized from the perspective of the three core elements of 'people-vehicle-road' in order to comprehensively analyze the important role of YOLO (You Only Look Once) algorithm in improving traffic safety and efficiency. The commonly used evaluation indexes of YOLO algorithm were outlined, and the practical significance of these indexes in traffic scenarios was elaborately expounded. An overview of the core architecture of YOLO algorithm was provided, its development process was traced, and the optimization and improvement measures in each version iteration were analyzed. The research status and application scenarios of YOLO algorithm for traffic object detection were sorted out and discussed from the perspective of the three traffic objects 'people-vehicle-road'. The limitations and challenges of YOLO algorithm in traffic object detection were analyzed, and corresponding improvement methods were proposed. Future research focuses were anticipated, providing a research reference for the intelligent development of road traffic.

Table and Figures | Reference | Related Articles | Metrics
Obstacle recognition of unmanned rail electric locomotive in underground coal mine
Tun YANG,Yongcun GUO,Shuang WANG,Xin MA
Journal of ZheJiang University (Engineering Science)    2024, 58 (1): 29-39.   DOI: 10.3785/j.issn.1008-973X.2024.01.004
Abstract   HTML PDF (2463KB) ( 829 )  

The PDM-YOLO model for accurate real-time obstacle detection in unmanned electric locomotives was proposed in order to address the problem of low accuracy of obstacle recognition in existing coal mine underground unmanned electric locomotives due to poor roadway environments. The ordinary convolution in the C3 module of the conventional YOLOv5 model was replaced with partial convolution to construct the C3_P feature extraction module, which effectively reduced the floating-point operations (FLOPs) and computational delay of the model. The improved decoupled head was used to decouple the prediction head of the conventional YOLOv5 model in order to improve the convergence speed of the model and the accuracy of obstacle recognition. The Mosaic data augmentation method was optimized to enrich the feature information of the training images and enhance the generalizability and robustness of the model. The experimental results showed that the mean average precision (mAP) of the PDM-YOLO model reached 96.3% and the average detection speed reached 109.2 frames per second on the self-built dataset. The detection accuracy of the PDM-YOLO model on the PASCAL VOC public dataset is higher than that of the existing mainstream YOLO series models.

Table and Figures | Reference | Related Articles | Metrics
Three-dimensional sector automatic design based on improved NSGA-II algorithm
Yingfei ZHANG,Xiaobing HU,Hang ZHOU,Xuzeng FENG
Journal of ZheJiang University (Engineering Science)    2025, 59 (2): 413-422.   DOI: 10.3785/j.issn.1008-973X.2025.02.019
Abstract   HTML PDF (1634KB) ( 825 )  

An improved non-dominated sorting genetic algorithm II (NSGA-II) was proposed in order to address the challenges of time-consuming manual airspace sectorization and the difficulty in comparing the quality of different sectorization schemes. A three-dimensional multi-objective optimization model for sectorization was established by using a grid-region-sector hierarchy in order to balance controllers’ workload within sectors and reduce workload differences between sectors. A fitness evaluation operator, a probability-adaptive combination crossover operator and a dynamic mutation operator were incorporated in the NSGA-II algorithm in order to enhance the number of feasible solutions, solution diversity and computational efficiency. A simulation was conducted for the automatic 3D sectorization of Xi'an high-altitude airspace. Results showed that the optimized scheme improved workload balance within sectors by 37% and reduced inter-sector workload by 24% compared with the current sectorization configuration. The proposed improved NSGA-II provided a broader range of options for decision-makers with varying preferences compared with traditional weighted multi-objective optimization algorithms.

Table and Figures | Reference | Related Articles | Metrics
Lightweight object detection scheme for garbage classification scenario
Jiansong CHEN,Yijun CAI
Journal of ZheJiang University (Engineering Science)    2024, 58 (1): 71-77.   DOI: 10.3785/j.issn.1008-973X.2024.01.008
Abstract   HTML PDF (1542KB) ( 802 )  

A lightweight Yolov5 garbage detection solution was proposed aiming at the issue of poor real-time performance in garbage detection classification on edge devices. The Stem module was introduced to enhance the model’s ability to extract features from input images. The C3 module of the backbone was improved to increase feature extraction capabilities. Depthwise separable convolution was used to replace the 3×3 downsampling convolutions in the network, achieving model lightweighting. The K-means++ algorithm was employed to recompute anchor box values for objects, enabling the model to better predict target box sizes during training. Experimental research and comparisons show that the improved model achieves a 0.8% increase in mAP_0.5 and a 3% increase in mAP_0.5:0.95, while reducing model parameters by 77.9% and improving inference speed by 21.9% compared with the Yolov5s model, significantly enhancing the detection performance of the model.

Table and Figures | Reference | Related Articles | Metrics
UAV small target detection algorithm based on improved YOLOv5s
Yaolian SONG,Can WANG,Dayan LI,Xinyi LIU
Journal of ZheJiang University (Engineering Science)    2024, 58 (12): 2417-2426.   DOI: 10.3785/j.issn.1008-973X.2024.12.001
Abstract   HTML PDF (708KB) ( 787 )  

An unmanned aerial vehicle (UAV) small target detection algorithm based on YOLOv5, termed FDB-YOLO, was proposed to address the significant issue of misidentification and omissions in traditional target detection algorithms when applied to UAV aerial photography of small targets. Initially, a small target detection layer was added on the basis of YOLOv5, and the feature fusion network was optimized to fully leverage the fine-grained information of small targets in shallow layers, thereby enhancing the network’s perceptual capabilities. Subsequently, a novel loss function, FPIoU, was introduced, which capitalized on the geometric properties of anchor boxes and utilized a four-point positional bias constraint function to optimize the anchor box positioning and accelerate the convergence speed of the loss function. Furthermore, a dynamic target detection head (DyHead) incorporating attention mechanism was employed to enhance the algorithm’s detection capabilities through increased awareness of scale, space, and task. Finally, a bi-level routing attention mechanism (BRA) was integrated into the feature extraction phase, selectively computing relevant areas to filter out irrelevant regions, thereby improving the model’s detection accuracy. Experimental validation conducted on the VisDrone2019 dataset demonstrated that the proposed algorithm outperformed the YOLOv5s baseline in terms of Precision by an increase of 3.7 percentage points, Recall by an increase of 5.1 percentage points, mAP50 by an increase of 5.8 percentage points, and mAP50:95 by an increase of 3.4 percentage points, showcasing superior performance compared to current mainstream algorithms.

Table and Figures | Reference | Related Articles | Metrics