Most Read Articles

Published in last 1 year |  In last 2 years |  In last 3 years |  All
Please wait a minute...
UAV dense small target detection algorithm based on YOLOv5s
Jun HAN,Xiao-ping YUAN,Zhun WANG,Ye CHEN
Journal of ZheJiang University (Engineering Science)    2023, 57 (6): 1224-1233.   DOI: 10.3785/j.issn.1008-973X.2023.06.018
Abstract   HTML PDF (2789KB) ( 224 )  

The dense small target detection algorithm LSA_YOLO based on YOLOv5s for UAVs with complex backgrounds and multiples of small targets with dense distribution was proposed for UAV images. A multi-scale feature extraction module LM-fem was constructed to enhance the feature extraction capability of the network. A new hybrid domain attention module S-ECA relying on multi-scale contextual information has been put forward and a algorithm focus on target information was established aiming to suppress the interference of complex backgrounds. The adaptive weight dynamic fusion structure AFF was designed to assign reasonable fusion weights to both shallow and deep features. The capability of algorithm in detecting dense small targets in complex backgrounds was improved given the application of S-ECA and AFF in the structure of PANet. The loss function Focal-EIOU was utilized instead of the loss function CIOU to accelerate model detection efficiency. Experimental results on the public dataset VisDrone2021 public dataset show that the average detection accuracy for all target classes improves from 51.5% for YOLOv5s to 57.6% for LSA_YOLO when the set input resolution is set to 1 504 × 1 504.

Table and Figures | Reference | Related Articles | Metrics
Multi-agent pursuit and evasion games based on improved reinforcement learning
Ya-li XUE,Jin-ze YE,Han-yan LI
Journal of ZheJiang University (Engineering Science)    2023, 57 (8): 1479-1486.   DOI: 10.3785/j.issn.1008-973X.2023.08.001
Abstract   HTML PDF (1158KB) ( 441 )  

A multi-agent reinforcement learning algorithm based on priority experience replay and decomposed reward function was proposed in multi-agent pursuit and evasion games. Firstly, multi-agent twin delayed deep deterministic policygradient algorithm (MATD3) algorithm based on multi-agent deep deterministic policy gradient algorithm (MADDPG) and twin delayed deep deterministic policy gradient algorithm (TD3) was proposed. Secondly, the priority experience replay was proposed to determine the priority of experience and sample the experience with high reward, aiming at the problem that the reward function is almost sparse in the multi-agent pursuit and evasion problem. In addition, a decomposed reward function was designed to divide multi-agent rewards into individual rewards and joint rewards to maximize the global and local rewards. Finally, a simulation experiment was designed based on DEPER-MATD3. Comparison with other algorithms showed that DEPER-MATD3 algorithm solved the over-estimation problem, and the time consumption was improved compared with MATD3 algorithm. In the decomposed reward function environment, the global mean rewards of the pursuers were improved, and the pursuers had a greater probability of chasing the evader.

Table and Figures | Reference | Related Articles | Metrics
Driver fatigue state detection method based on multi-feature fusion
Hao-jie FANG,Hong-zhao DONG,Shao-xuan LIN,Jian-yu LUO,Yong FANG
Journal of ZheJiang University (Engineering Science)    2023, 57 (7): 1287-1296.   DOI: 10.3785/j.issn.1008-973X.2023.07.003
Abstract   HTML PDF (1481KB) ( 358 )  

The improved YOLOv5 object detection algorithm was used to detect the facial region of the driver and a multi-feature fusion fatigue state detection method was established aiming at the problem that existing fatigue state detection method cannot be applied to drivers under the epidemic prevention and control. The image tag data including the situation of wearing a mask and the situation without wearing a mask were established according to the characteristics of bus driving. The detection accuracy of eyes, mouth and face regions was improved by increasing the feature sampling times of YOLOv5 model. The BiFPN network structure was used to retain multi-scale feature information, which makes the prediction network more sensitive to targets of different sizes and improves the detection ability of the overall model. A parameter compensation mechanism was proposed combined with face keypoint algorithm in order to improve the accuracy of blink and yawn frame number. A variety of fatigue parameters were fused and normalized to conduct fatigue classification. The results of the public dataset NTHU and the self-made dataset show that the proposed method can recognize the blink and yawn of drivers both with and without masks, and can accurately judge the fatigue state of drivers.

Table and Figures | Reference | Related Articles | Metrics
Survey of text-to-image synthesis
Yin CAO,Junping QIN,Qianli MA,Hao SUN,Kai YAN,Lei WANG,Jiaqi REN
Journal of ZheJiang University (Engineering Science)    2024, 58 (2): 219-238.   DOI: 10.3785/j.issn.1008-973X.2024.02.001
Abstract   HTML PDF (2809KB) ( 236 )  

A comprehensive evaluation and categorization of text-to-image generation tasks were conducted. Text-to-image generation tasks were classified into three major categories based on the principles of image generation: text-to-image generation based on the generative adversarial network architecture, text-to-image generation based on the autoregressive model architecture, and text-to-image generation based on the diffusion model architecture. Improvements in different aspects were categorized into six subcategories for text-to-image generation methods based on the generative adversarial network architecture: adoption of multi-level hierarchical architectures, application of attention mechanisms, utilization of siamese networks, incorporation of cycle-consistency methods, deep fusion of text features, and enhancement of unconditional models. The general evaluation indicators and datasets of existing text-to-image methods were summarized and discussed through the analysis of different methods.

Table and Figures | Reference | Related Articles | Metrics
Prediction model of axial bearing capacity of concrete-filled steel tube columns based on XGBoost-SHAP
Xi-ze CHEN,Jun-feng JIA,Yu-lei BAI,Tong GUO,Xiu-li DU
Journal of ZheJiang University (Engineering Science)    2023, 57 (6): 1061-1070.   DOI: 10.3785/j.issn.1008-973X.2023.06.001
Abstract   HTML PDF (2896KB) ( 246 )  

To reliably and accurately predict the axial bearing capacity of concrete-filled steel tube (CFST) columns, a prediction model of CFST column axial bearing capacity with ensemble machine learning was developed and explained. The quality of the CFST column database was evaluated using the Mahalanobis distance, the prediction model of CFST column axial bearing capacity was established by the extreme gradient boosting (XGBoost) algorithm, and the optimal hyperparameter combination of the model was found using the K-Fold cross-validation (K-Fold CV) and the tree-structured Parzen estimator (TPE) algorithms. The predicted values of the optimized XGBoost model were compared with the calculated values of the existing methods and the unoptimized XGBoost model using different evaluation metrics. The Shapley additive explanations (SHAP) approach was used to produce both global and local explanations for the predictions of XGBoost model. Results show that, after hyperparameter tuning, the XGBoost model’s performance surpasses performance of relevant standards and empirical formulas, and the SHAP approach can effectively explain the XGBoost model’s output.

Table and Figures | Reference | Related Articles | Metrics
Lightweight semantic segmentation network for underwater image
Hao-ran GUO,Ji-chang GUO,Yu-dong WANG
Journal of ZheJiang University (Engineering Science)    2023, 57 (7): 1278-1286.   DOI: 10.3785/j.issn.1008-973X.2023.07.002
Abstract   HTML PDF (2385KB) ( 240 )  

A semantic segmentation network was designed for underwater images. A lightweight and efficient encoder-decoder architecture was used by considering the trade-off between speed and accuracy. Inverted bottleneck layer and pyramid pooling module were designed in the encoder part to efficiently extract features. Feature fusion module was constructed in the decoder part in order to fuse multi-level features, which improved the segmentation accuracy. Auxiliary edge loss function was used to train the network better aiming at the problem of fuzzy edges of underwater images, and the edges of segmentation were refined through the supervision of semantic boundaries. The experimental data on the underwater semantic segmentation dataset SUIM show that the network achieves 53.55% mean IoU with an inference speed of 258.94 frames per second on one NVIDIA GeForce GTX 1080 Ti card for the input image of pixel 320×256, which can achieve real-time processing speed while maintaining high accuracy.

Table and Figures | Reference | Related Articles | Metrics
Continual learning framework of named entity recognition in aviation assembly domain
Pei-feng LIU,Lu QIAN,Xing-wei ZHAO,Bo TAO
Journal of ZheJiang University (Engineering Science)    2023, 57 (6): 1186-1194.   DOI: 10.3785/j.issn.1008-973X.2023.06.014
Abstract   HTML PDF (1091KB) ( 265 )  

In order to build an aviation assembly knowledge graph composed of assembly process information, assembly technology knowledge, related industry standards and internal connections of the three, a named entity recognition technology framework based on continual learning was proposed. The characteristic of the proposed framework was that it maintained high recognition performance throughout the progressive learning process from zero corpus to large-scale corpus, without relying on manual feature setting. A comparative performance experiment of the proposed framework was carried out in practical industrial scenarios, the experiment proceeded from general assembly and component assembly, and the manipulations of the pull rod and cable installation were regard as a specific experimental case. Experimental results show that the proposed framework is significantly better in accuracy, recall, and F1 value than previous algorithms, while handling different-scale corpus environments. And the credible results for named entity recognition tasks can be provided consistently by the proposed framework in the aviation assembly domain.

Table and Figures | Reference | Related Articles | Metrics
New method and application of inverse kinematic solution for spherical wrist rehabilitation mechanism
Wen-jie JIAO,Shuai-xu JI,Hui-min HAO,Jia-hai HUANG,Li-na LI,Shi-yu LI
Journal of ZheJiang University (Engineering Science)    2023, 57 (7): 1365-1373.   DOI: 10.3785/j.issn.1008-973X.2023.07.011
Abstract   HTML PDF (7412KB) ( 192 )  

The inverse kinematic step-by-step solution method based on Euler's angle was proposed to address the problem of incomplete or no analytical solution for the coaxial 3RRR spherical parallel mechanism (CSPM), which was the end-effector of the spherical wrist rehabilitation robot. The CSPM posture Euler angle can be decomposed into two sub-postures rotating around Z-axis and X, Y-axis based on the characteristics of the co-axial spherical parallel mechanism. The set of inverse kinematic solutions for the sub-postures rotating around X-axis and Y-axis was solved. The smaller value in the set of inverse kinematics solutions for each joint was selected and added to the angle of rotation around the Z-axis as the CSPM inverse kinematics solution. The correctness of the proposed method was verified by using CSPM forward kinematics. The actual attitude space of the wrist rehabilitation device was solved by using the proposed method with the constraints of no linkage collision point and no singularity configuration based on the real wrist motion range. The proposed inverse kinematics solution method was interconverted with unit quaternion in the actual posture space, and unit quaternion interpolation was applied to CSPM motion planning. The theoretical calculation results and experimental results were smooth trajectory curves, and the maximum value of both errors didn’t exceed 2.5°.

Table and Figures | Reference | Related Articles | Metrics
EEG and fNIRS emotion recognition based on modality attention graph convolution feature fusion
Qing ZHAO,Xue-ying ZHANG,Gui-jun CHEN,Jing ZHANG
Journal of ZheJiang University (Engineering Science)    2023, 57 (10): 1987-1997.   DOI: 10.3785/j.issn.1008-973X.2023.10.008
Abstract   HTML PDF (1285KB) ( 215 )  

A feature fusion emotion recognition method based on modality attention multi-path convolutional neural network was proposed, extracting the connection between the signals of each channel from the electroencephalogram (EEG) and functional near infrared spectroscopy (fNIRS) data induced by emotional video to improve the accuracy of emotion recognition. The EEG and fNIRS data were constructed as graph structure data, and the feature of each mode signal was extracted by multi-path graph convolution. The information of connection between different modal channels was fused by modality attention graph convolution. The modality attention mechanism can give different weights to different modal nodes, thus the graph convolution layer can more fully extract the connection relationship between different modal nodes. Experimental tests were carried out on four types of emotional data collected from 30 subjects. Compared with the results of EEG only and fNIRS only, the recognition accuracy of the proposed graph convolution fusion method was higher, which increased by 8.06% and 22.90% respectively. Compared with the current commonly used EEG and fNIRS fusion method, the average recognition accuracy of the proposed graph convolution fusion method was improved by 2.76%~7.36%. The recognition rate of graph convolution fusion method increased by 1.68% after adding modality attention.

Table and Figures | Reference | Related Articles | Metrics
Review on droplets impact process on moving and rotating surfaces
Yi ZHOU,Zhe-yan JIN,Zhi-gang YANG
Journal of ZheJiang University (Engineering Science)    2023, 57 (10): 2060-2076.   DOI: 10.3785/j.issn.1008-973X.2023.10.015
Abstract   HTML PDF (3621KB) ( 135 )  

Based on existing research on droplet impact on moving and rotating surfaces, the phenomenon of droplet impact on moving and rotating surfaces needs to be briefly summarized. Moving surfaces can be divided into three forms: translating solid surfaces, rotating solid surfaces, and moving liquid films. The comprehensive study and summary on the impact of liquid droplets on the moving surface from three directions: experimental system, model establishment and numerical simulation. The research on droplet impact movement and rotating surfaces has a certain foundation, while the research on high impact velocity, small droplets, rotating surfaces and other situations is relatively blank. The theoretical and experimental results of rotating surface wave propulsion also lack numerical simulation supplementation. Based on the above situation, the research prospects of droplet impact on moving and rotating surfaces are proposed.

Table and Figures | Reference | Related Articles | Metrics
Research progress of recommendation system based on knowledge graph
Hui-xin WANG,Xiang-rong TONG
Journal of ZheJiang University (Engineering Science)    2023, 57 (8): 1527-1540.   DOI: 10.3785/j.issn.1008-973X.2023.08.006
Abstract   HTML PDF (1419KB) ( 97 )  

Aiming at the problems of data sparsity, cold start, low interpretability of recommendation, and insufficient personalization in recommender system, the integration of knowledge graph into recommender system was analyzed. From the demand of recommender system, the concept of knowledge graph, and the integration approach of recommender system and knowledge graph, the problems of current recommender system and the solutions of recommender system after integrating knowledge graph were summarized. It was reviewed that, in recent years, the attention mechanism, neural network and reinforcement learning methods were combined, by which the principles of node trade-off, node integration, and paths exploring were used to make full use of the complex structural information in knowledge graph, so as to improve the satisfaction degree with the recommender system. The challenges and possible future development direction of the recommender system integrating the knowledge graph were put forward in terms of knowledge graph completeness, dynamics, availability of higher-order relationships, and the performance of the recommendation.

Table and Figures | Reference | Related Articles | Metrics
Factor extraction and SOH estimation of lithium-ion battery based on temperature and SOC
Hao DONG,Ling MAO,Ke-qing QU,Jin-bin ZHAO,Fen LI
Journal of ZheJiang University (Engineering Science)    2023, 57 (7): 1470-1478.   DOI: 10.3785/j.issn.1008-973X.2023.07.022
Abstract   HTML PDF (1630KB) ( 177 )  

The changing curve of the state of charge (SOC) and charging voltage of lithium-ion batteries (LIB) at different temperatures was analyzed in order to solve the problems of insufficient data acquisition and difficulty in extracting health factors (HFs) during the daily use of LIB. A method for LIB HFs extraction and online estimation of state of health (SOH) considering temperature and SOC was proposed. The charging voltage and current were selected as HFs according to the ambient temperature difference during the actual charging process of the battery. Then the network parameters of the extreme learning machine were optimized by the genetic-hill climbing algorithm, and the mapping relationship between the HFs and the SOH was established to realize the online SOH estimation. Nine groups of NASA LIB aging data were used for verification. Results show that the proposed method has the advantages of high estimation accuracy and strong adaptability for ambient temperature.

Table and Figures | Reference | Related Articles | Metrics
Adaptive salp swarm algorithm for solving flexible job shop scheduling problem with transportation time
Hao-yi NIU,Wei-min WU,Ting-qi ZHANG,Wei SHEN,Tao ZHANG
Journal of ZheJiang University (Engineering Science)    2023, 57 (7): 1267-1277.   DOI: 10.3785/j.issn.1008-973X.2023.07.001
Abstract   HTML PDF (1024KB) ( 350 )  

An adaptive salp swarm algorithm was proposed by minimizing the makespan in order to solve the flexible job shop scheduling problem with transportation time. A three-layer coding scheme was designed based on random key in order to make the discrete solution space continuous. The inertia weight was introduced to evaluate the influence among followers in order to enhance the global exploration and local search performance of the algorithm. An adaptive leader-follower population update strategy was proposed, and the number of leaders and followers was adjusted by the population status. The tabu search strategy was combined with the neighborhood search in order to prevent the algorithm from falling into local optimum. The benchmark instances verified the effectiveness and superiority of the proposed algorithm. The influence of the number of AGVs on the makespan conforms to the law of diminishing marginal effect.

Table and Figures | Reference | Related Articles | Metrics
Daily water supply prediction method based on integrated learning and deep learning
Xin-lei ZHOU,Hai-ting GU,Jing LIU,Yue-ping XU,Fang GENG,Chong WANG
Journal of ZheJiang University (Engineering Science)    2023, 57 (6): 1120-1127.   DOI: 10.3785/j.issn.1008-973X.2023.06.007
Abstract   HTML PDF (1780KB) ( 240 )  

Making use of the historical daily water supply data of four water plants in Yiwu city, a new water supply prediction model based on long short term memory (LSTM) neural network improved by integrated learning algorithm was proposed, in order to effectively resolve the problems of low accuracy and insufficient generalization ability of the daily water supply prediction. In the model, a historical daily water supply after pre-processing by Pauta criterion was taken as the data input, the LSTM neural network with long-term temporal information memory was applied as the weak predictor of integrated learning, the grid search method was utilized for network hyperparameter tuning, and the AdaBoost integrated learning algorithm was used to weight the combination of the weak predictors to obtain the strong predictor. Results show that the improved LSTM neural network based on integrated learning algorithm has the highest Nash efficiency coefficient (NSE) with the lowest root mean square error (RMSE) and mean absolute error (MAE), the best fitting effect on the change trend and the peak value of daily water supply data, compared with the random forest (RF), AdaBoost and LSTM neural network. The time series prediction accuracy of the improved LSTM water supply forecasting model is significantly improved, with good generalization ability and stable prediction performance. The results can provide an important reference for the rational allocation of urban water resources planning and integrated intelligent water supply scheduling.

Table and Figures | Reference | Related Articles | Metrics
Solution approach of Burgers-Fisher equation based on physics-informed neural networks
Jian XU,Hai-long ZHU,Jiang-le ZHU,Chun-zhong LI
Journal of ZheJiang University (Engineering Science)    2023, 57 (11): 2160-2169.   DOI: 10.3785/j.issn.1008-973X.2023.11.003
Abstract   HTML PDF (1371KB) ( 208 )  

Physical information was divided into rule information and numerical information, in order to explore the role of physical information in training neural network when solving differential equations with physics-informed neural network (PINN). The logic of PINN for solving differential equations was explained, as well as the data-driven approach of physical information and neural network interpretability. Synthetic loss function of neural network was designed based on the two types of information, and the training balance degree was established from the aspects of training sampling and training intensity. The experiment of solving the Burgers-Fisher equation by PINN showed that PINN can obtain good solution accuracy and stability. In the training of neural networks for solving the equation, numerical information of the Burgers-Fisher equation can better promote neural network to approximate the equation solution than rule information. The training effect of neural network was improved with the increase of training sampling, training epoch, and the balance between the two types of information. In addition, the solving accuracy of the equation was improved with the increasing of the scale of neural network, but the training time of each epoch was also increased. In a fixed training time, it is not true that the larger scale of the neural network, the better the effect.

Table and Figures | Reference | Related Articles | Metrics
Devices’ optimal deployment of roadside sensing system for expressway driving risk
Li LI,Zhen-dong PING,Zhi-gang XU,Gui-ping WANG
Journal of ZheJiang University (Engineering Science)    2023, 57 (6): 1137-1146.   DOI: 10.3785/j.issn.1008-973X.2023.06.009
Abstract   HTML PDF (1542KB) ( 119 )  

The driving risk measurement indexes were selected based on the correlation test results of maximum information coefficient. A multi-indicator calculation and fusion method for driving risk was designed based on an information entropy theory. An optimal deployment model of roadside sensing devices, aiming to maximize the value of driving risk entropy collected by the devices, was developed which took construction cost and device detection range as constraints. Taking a multi-lane expressway driving trajectory as data, the optimal deployment schemes of roadside sensing devices under different budget constraints was calculated. The factors affecting the capacity of the roadside sensing system to capture the roadway driving risk were analyzed, such as the device type selection, the traditional scheme of uniformly-spaced device deployment, and the original data noise. Results show that the increasing of construction budget of the roadside sensing system and the system’s abilities of sensing driving risk improvement follow the law of diminish marginal effect. Comparing with the situations where there are too many or too few devices, a moderate number of sensing devices has a higher cost-effectiveness ratio. The cost-effectiveness ratio of the optimal device deployment scheme is higher than that of the uniformly-spaced device scheme. Less than 10% original data detection error does not affect the calculation results of the optimal device deployment scheme.

Table and Figures | Reference | Related Articles | Metrics
Numerical simulation and experimental study on forming of overhang structure by laser power bed fusion of In718 alloy
Cai-hua WANG,Xu-hui LAI,Huan-qing YANG,Zheng-ying WEI
Journal of ZheJiang University (Engineering Science)    2023, 57 (6): 1175-1185.   DOI: 10.3785/j.issn.1008-973X.2023.06.013
Abstract   HTML PDF (2407KB) ( 126 )  

A three-dimensional mesoscopic numerical model of the In718 overhanging fusion channel was developed to address the problem of overhanging print quality of lattice tilting struts in laser powder bed fusion (LPBF). The powder bed was established in EDEM based on the discrete element method. The LPBF channel forming process was implemented in Flow-3D based on the finite volume method, and the flow, heat transfer, melting and solidification processes of the laser-powder particle interaction were analysed by numerical simulation. Results show that the solid-powder interface region is prone to discontinuous fusion channel, and improving the process parameters can improve the continuity of fusion channel forming in the region. The high laser power (300 W) applied at low energy density (44.19 J/mm3) not only does not produce keyhole defects, but also results in stronger Marangoni flow and faster melt pool flow than the low power group (87.5 W) to fill the discontinuities, and improves the continuity of the fusion channel in the solid-powder interface region.

Table and Figures | Reference | Related Articles | Metrics
Dynamic response characteristics of wind turbine drivetrain and influence of support system
Cong-er BAI,Zhe-jie SUN,Mei-juan QIN,Xiao WANG,Yong LIU
Journal of ZheJiang University (Engineering Science)    2023, 57 (6): 1165-1174.   DOI: 10.3785/j.issn.1008-973X.2023.06.012
Abstract   HTML PDF (2793KB) ( 82 )  

To investigate the dynamic response of wind turbine drivetrain, a multi-body dynamic simulation model with rigid-flexible coupling drivetrain was established for a certain MW class wind turbine, which was taken as the research object. The influence of frame flexibility, isolator stiffness of gearbox and generator on the dynamic response characteristics of drivetrain, including the modal of drivetrain, the resonance and the dynamic response under different wind conditions, was analyzed respectively. The validity of the model was verified through in-plant vibration test in time domain and frequency domain. Results show that the modals where the vibration energy is mainly distributed in the generator shell and the gearbox housing are affected mostly by the support system. Reasonable stiffness design of the support system can effectively reduce the resonance risk of the drivetrain. The time domain analysis results show that the vibration velocity deviation of components caused by resonance can reach up to 120%. Increasing the isolator stiffness of gearbox and decreasing the isolator stiffness of generator are conducive for reducing the vibration level of drivetrain.

Table and Figures | Reference | Related Articles | Metrics
Rapid prediction of unsteady aerodynamic characteristics of flapping wing based on GRU
Jia-chi ZHAO,Tian-qi WANG,Li-fang ZENG,Xue-ming SHAO
Journal of ZheJiang University (Engineering Science)    2023, 57 (6): 1251-1256.   DOI: 10.3785/j.issn.1008-973X.2023.06.021
Abstract   HTML PDF (2600KB) ( 74 )  

Traditional computational fluid dynamics surrogate model cannot effectively simulate the highly nonlinear fluid, and existed deep learning-based surrogate models are difficult to deal with temporal sequence information effectively. Based on the gated recurrent units (GRU) and the multilayer perceptron, a two-dimensional airfoil of a flapping-wing aircraft was studied to establish a model for rapid predict unsteady aerodynamic parameters of the flapping-wing. The real-time prediction for the highly unsteady and nonlinear aerodynamic parameters of the flapping wing was realized. The computational fluid dynamics method was used to obtain the aerodynamic parameters of the flapping two-dimensional airfoil and the parameters were used as samples to train the prediction model. The flapping amplitude, the frequency, the swing angle and the motion time of the flapping wing were fed into the prediction model, and the lift, the drag and the moment in the relevant condition could be quickly output. Experimental results showed that the established prediction model has high accuracy and fast calculation speed. The prediction model could realize real-time high-precision prediction for unsteady aerodynamic parameters of flapping wings.

Table and Figures | Reference | Related Articles | Metrics
Lightweight object detection based on split attention and linear transformation
Yan ZHANG,Jing-xue SUN,Ye-mei SUN,Shu-dong LIU,Chuan-qi WANG
Journal of ZheJiang University (Engineering Science)    2023, 57 (6): 1195-1204.   DOI: 10.3785/j.issn.1008-973X.2023.06.015
Abstract   HTML PDF (1383KB) ( 212 )  

To meet the real-time and model lightweight requirements of target detection and improve the accuracy of object detection, a lightweight target detection algorithm PG-YOLOv5 based on pyramid split attention and linear transformation was proposed. The feature fusion module in YOLOv5 was optimized by PG-YOLOv5. First, the pyramid split attention module was used to capture the spatial information of feature maps at different scales to enrich the feature space, thus the multi-scale feature representation ability of the network and the accuracy of object detection were improved. Then, the GhostBottleNeck module based on linear transformation was used to combine a small amount of original feature maps with those obtained from linear transformation, which reduced the number of model parameters effectively. The mean average precision of the algorithm increased from 81.2% of YOLOv5L to 85.7% of PG-YOLOv5, and the number of parameters of PG-YOLOv5 was 36% lower than that of YOLOv5L. The PG-YOLOv5 was deployed on Jetson TX2 and an object detection software was designed. Experimental results showed that the image processing speed of the target detection system based on Jetson TX2 was 262.1 ms/frame, and the mean average precision of PG-YOLOv5 was 85.2%. Compared with the YOLOv5L original model, PG-YOLOv5 is more suitable for edge deployment.

Table and Figures | Reference | Related Articles | Metrics