Most Read Articles

Published in last 1 year |  In last 2 years |  In last 3 years |  All
Please wait a minute...
Light-weight algorithm for real-time robotic grasp detection
Mingjun SONG,Wen YAN,Yizhao DENG,Junran ZHANG,Haiyan TU
Journal of ZheJiang University (Engineering Science)    2024, 58 (3): 599-610.   DOI: 10.3785/j.issn.1008-973X.2024.03.017
Abstract   HTML PDF (4675KB) ( 254 )  

A light-weight, real-time approach named RTGN (real-time grasp net) was proposed to improve the accuracy and speed of robotic grasp detection for novel objects of diverse shapes, types and sizes. Firstly, a multi-scale dilated convolution module was designed to construct a light-weight feature extraction backbone. Secondly, a mixed attention module was designed to help the network focus more on meaningful features. Finally, the pyramid pool module was deployed to fuse the multi-level features extracted by the network, thereby improving the capability of grasp perception to the object. On the Cornell grasping dataset, RTGN generated grasps at a speed of 142 frame per second and attained accuracy rates of 98.26% and 97.65% on image-wise and object-wise splits, respectively. In real-world robotic grasping experiments, RTGN obtained a success rate of 96.0% in 400 grasping attempts across 20 novel objects. Experimental results demonstrate that RTGN outperforms existing methods in both detection accuracy and detection speed. Furthermore, RTGN shows strong adaptability to variations in the position and pose of grasped objects, effectively generalizing to novel objects of diverse shapes, types and sizes.

Table and Figures | Reference | Related Articles | Metrics
UAV dense small target detection algorithm based on YOLOv5s
Jun HAN,Xiao-ping YUAN,Zhun WANG,Ye CHEN
Journal of ZheJiang University (Engineering Science)    2023, 57 (6): 1224-1233.   DOI: 10.3785/j.issn.1008-973X.2023.06.018
Abstract   HTML PDF (2789KB) ( 360 )  

The dense small target detection algorithm LSA_YOLO based on YOLOv5s for UAVs with complex backgrounds and multiples of small targets with dense distribution was proposed for UAV images. A multi-scale feature extraction module LM-fem was constructed to enhance the feature extraction capability of the network. A new hybrid domain attention module S-ECA relying on multi-scale contextual information has been put forward and a algorithm focus on target information was established aiming to suppress the interference of complex backgrounds. The adaptive weight dynamic fusion structure AFF was designed to assign reasonable fusion weights to both shallow and deep features. The capability of algorithm in detecting dense small targets in complex backgrounds was improved given the application of S-ECA and AFF in the structure of PANet. The loss function Focal-EIOU was utilized instead of the loss function CIOU to accelerate model detection efficiency. Experimental results on the public dataset VisDrone2021 public dataset show that the average detection accuracy for all target classes improves from 51.5% for YOLOv5s to 57.6% for LSA_YOLO when the set input resolution is set to 1 504 × 1 504.

Table and Figures | Reference | Related Articles | Metrics
Research progress of recommendation system based on knowledge graph
Hui-xin WANG,Xiang-rong TONG
Journal of ZheJiang University (Engineering Science)    2023, 57 (8): 1527-1540.   DOI: 10.3785/j.issn.1008-973X.2023.08.006
Abstract   HTML PDF (1419KB) ( 243 )  

Aiming at the problems of data sparsity, cold start, low interpretability of recommendation, and insufficient personalization in recommender system, the integration of knowledge graph into recommender system was analyzed. From the demand of recommender system, the concept of knowledge graph, and the integration approach of recommender system and knowledge graph, the problems of current recommender system and the solutions of recommender system after integrating knowledge graph were summarized. It was reviewed that, in recent years, the attention mechanism, neural network and reinforcement learning methods were combined, by which the principles of node trade-off, node integration, and paths exploring were used to make full use of the complex structural information in knowledge graph, so as to improve the satisfaction degree with the recommender system. The challenges and possible future development direction of the recommender system integrating the knowledge graph were put forward in terms of knowledge graph completeness, dynamics, availability of higher-order relationships, and the performance of the recommendation.

Table and Figures | Reference | Related Articles | Metrics
Intelligent connected vehicle motion planning at unsignalized intersections based on deep reinforcement learning
Mingfang ZHANG,Jian MA,Nale ZHAO,Li WANG,Ying LIU
Journal of ZheJiang University (Engineering Science)    2024, 58 (9): 1923-1934.   DOI: 10.3785/j.issn.1008-973X.2024.09.017
Abstract   HTML PDF (2586KB) ( 90 )  

A vehicle motion planning algorithm based on deep reinforcement learning was proposed to satisfy the efficiency and comfort requirements of intelligent connected vehicles at unsignalized intersections. Temporal convolutional network (TCN) and Transformer algorithms were combined to construct the intention prediction model for surrounding vehicles. The multi-layer convolution and self-attention mechanisms were used to improve the capability of capturing vehicle motion feature. The twin delayed deep deterministic policy gradient (TD3) reinforcement learning algorithm was employed to build the vehicle motion planning model. Taking the driving intention of surrounding vehicle, driving style, interaction risk, and the comfort of ego vehicle into consideration comprehensively, the state space and reward functions were designed to enhance understanding the dynamic environment. Delaying the policy updates and smoothing the target policies were conducted to improve the stability of the proposed algorithm, and the desired acceleration was output in real-time. Experimental results demonstrated that the proposed motion planning algorithm can perceive the real-time potential interaction risk based on the driving intention of surrounding vehicles. The generated motion planning strategy met the requirements of the efficiency, safety and comfort. It showed excellent adaptability to different styles of surrounding vehicles and dense interaction scenarios, and the success rates exceeded 92.1% in various scenarios.

Table and Figures | Reference | Related Articles | Metrics
Traffic scene perception algorithm with joint semantic segmentation and depth estimation
Kang FAN,Ming’en ZHONG,Jiawei TAN,Zehui ZHAN,Yan FENG
Journal of ZheJiang University (Engineering Science)    2024, 58 (4): 684-695.   DOI: 10.3785/j.issn.1008-973X.2024.04.004
Abstract   HTML PDF (2815KB) ( 206 )  

Inspired by the idea that feature information between different pixel-level visual tasks can guide and optimize each other, a traffic scene perception algorithm based on multi-task learning theory was proposed for joint semantic segmentation and depth estimation. A bidirectional cross-task attention mechanism was proposed to achieve explicit modeling of global correlation between tasks, guiding the network to fully explore and utilize complementary pattern information between tasks. A multi-task Transformer was constructed to enhance the spatial global representation of specific task features, implicitly model the cross-task global context relationship, and promote the fusion of complementary pattern information between tasks. An encoder-decoder fusion upsampling module was designed to effectively fuse the spatial details contained in the encoder to generate fine-grained high-resolution specific task features. The experimental results on the Cityscapes dataset showed that the mean IoU of semantic segmentation of the proposed algorithm reached 79.2%, the root mean square error of depth estimation was 4.485, and the mean relative error of distance estimation for five typical traffic participants was 6.1%. Compared with the mainstream algorithms, the proposed algorithm can achieve better comprehensive performance with lower computational complexity.

Table and Figures | Reference | Related Articles | Metrics
Ankle flexible exoskeleton based on force feedback admittance control
Dong CHEN,Weida LI,Hongmiao ZHANG,Juan LI
Journal of ZheJiang University (Engineering Science)    2024, 58 (4): 772-778.   DOI: 10.3785/j.issn.1008-973X.2024.04.012
Abstract   HTML PDF (2648KB) ( 165 )  

In response to the need for ankle rehabilitation training, a lightweight, easy-to-wear flexible ankle exoskeleton robot was designed using modular drive units and Bowden cables through analysis of ankle joint mechanics. The robot can provide assistance for ankle plantarflexion/dorsiflexion and inversion/eversion movements. Position control and torque control are used for flexible exoskeleton during the dorsiflexion and plantarflexion stages, respectively. Position control is mainly based on traditional proportional integral derivative(PID), while torque control uses force as a feedback signal to establish an admittance model between the interaction force difference and the Bowden cable core displacement compensation. The admittance parameters are dynamically adjusted through the Sigmoid deformation function to meet the requirements of assistive torque output and human-machine interaction compliance. Experimental data showed that the position tracking error was stable within 0.46 cm, and the force output error was stable within ?1.5-1.5 N, meeting the needs of human rehabilitation training.

Table and Figures | Reference | Related Articles | Metrics
Multi-behavior aware service recommendation based on hypergraph graph convolution neural network
Jia-wei LU,Duan-ni LI,Ce-ce WANG,Jun XU,Gang XIAO
Journal of ZheJiang University (Engineering Science)    2023, 57 (10): 1977-1986.   DOI: 10.3785/j.issn.1008-973X.2023.10.007
Abstract   HTML PDF (1380KB) ( 650 )  

A multi-behavior aware service recommendation method based on hypergraph graph convolutional neural network (MBSRHGNN) was proposed to resolve the problem of insufficient high-order service feature extraction in existing service recommendation methods. A multi-hypergraph was constructed according to user-service interaction types and service mashups. A dual-channel hypergraph convolutional network was designed based on the spectral decomposition theory with functional and structural properties of multi-hypergraph. Chebyshev polynomial was used to approximate hypergraph convolution kernel to reduce computational complexity. Self-attention mechanism and multi-behavior recommendation methods were combined to measure the importance difference between multi-behavior interactions during the hypergraph convolution process. A hypergraph pooling method named HG-DiffPool was proposed to reduce the feature dimensionality. The probability distribution for recommending different services was learned by integrating service embedding vector and hypergraph signals. Real service data was obtained by the crawler and used to construct datasets with different sparsity for experiments. Experimental results showed that the MBSRHGNN method could adapt to recommendation scenario with highly sparse data, and was superior to the existing baseline methods in accuracy and relevance.

Table and Figures | Reference | Related Articles | Metrics
Path planning based on fusion of improved A* and ROA-DWA for robot
Yuting LIU,Shijie GUO,Shufeng TANG,Xuewei ZHANG,Tiantian LI
Journal of ZheJiang University (Engineering Science)    2024, 58 (2): 360-369.   DOI: 10.3785/j.issn.1008-973X.2024.02.014
Abstract   HTML PDF (2840KB) ( 296 )  

A path planning algorithm based on the fusion of the improved A* algorithm and the random obstacle avoidance dynamic window method (ROA-DWA) was proposed in order to address the issues of excessive traversal nodes, redundant points, non-smooth paths, lack of global guidance, susceptibility to local optima, and low safety in traditional A* algorithm and dynamic window approach (DWA) for robot path planning. The search efficiency was improved by adjusting the weights of heuristic functions, Floyd’s algorithm, redundant point deletion strategy, static and dynamic obstacle classification, and speed adaptive factor. The length of the path and the number of inflection points were reduced, and the influence of known obstacles on the path was minimized to improve the efficiency of dynamic obstacle avoidance, which enabled the robot to smoothly arrive at the target point and improved the safety of the robot, and better adapted to complex dynamic and static environments. The experimental results show that the algorithm has better global optimality and local obstacle avoidance ability, and shows better advantages in large maps.

Table and Figures | Reference | Related Articles | Metrics
Interactive visualization generation method for time series data based on transfer learning
Zihan ZHOU,Xumeng WANG,Wei CHEN
Journal of ZheJiang University (Engineering Science)    2024, 58 (2): 239-246.   DOI: 10.3785/j.issn.1008-973X.2024.02.002
Abstract   HTML PDF (2239KB) ( 241 )  

An interactive visualization generation method for time series data based on transfer learning was proposed in order to address the inconsistency in data distribution across time-series data and facilitate the application of pattern analysis to other data. Transfer component analysis was applied to transfer features extracted from each time series data. The user’s analysis on one of the time series data served as labels. The classifier was trained on the source domain and applied to multiple target domains in order to achieve pattern recommendations. Two case studies and expert interviews with real-world weather data and bearing signal data were conducted to verify the effectiveness and practicality of the method by improving the efficiency of temporal data exploration and reducing the impact of inconsistent data distribution.

Table and Figures | Reference | Related Articles | Metrics
Survey of embodied agent in context of foundation model
Songyuan LI,Xiangwei ZHU,Xi LI
Journal of ZheJiang University (Engineering Science)    2025, 59 (2): 213-226.   DOI: 10.3785/j.issn.1008-973X.2025.02.001
Abstract   HTML PDF (841KB) ( 342 )  

Foundational models in natural language processing, computer vision and multimodal learning have achieved significant breakthroughs in recent years, showcasing the potential of general artificial intelligence. However, these models still fall short of human or animal intelligence in areas such as causal reasoning and understanding physical commonsense. This is because these models primarily rely on vast amounts of data and computational power, lacking direct interaction with and experiential learning from the real world. Many researchers are beginning to question whether merely scaling up model size is sufficient to address these fundamental issues. This has led the academic community to reevaluate the nature of intelligence, suggesting that intelligence arises not just from enhanced computational capabilities but from interactions with the environment. Embodied intelligence is gaining attention as it emphasizes that intelligent agents learn and adapt through direct interactions with the physical world, exhibiting characteristics closer to biological intelligence. A comprehensive survey of embodied artificial intelligence was provided in the context of foundational models. The underlying technical ideas, benchmarks, and applications of current embodied agents were discussed. A forward-looking analysis of future trends and challenges in embodied AI was offered.

Table and Figures | Reference | Related Articles | Metrics
Occluded human pose estimation network based on knowledge sharing
Jiahong JIANG,Nan XIA,Changwu LI,Xinmiao YU
Journal of ZheJiang University (Engineering Science)    2024, 58 (10): 2001-2010.   DOI: 10.3785/j.issn.1008-973X.2024.10.003
Abstract   HTML PDF (1801KB) ( 129 )  

A new estimation network was proposed for improving the insufficient occlusion handling ability of existing human pose estimation methods. An occluded parts enhanced convolutional network (OCNN) and an occluded features compensation graph convolutional network (OGCN) were included in the proposed network. A high-low order feature matching attention was designed to strengthen the occlusion area features, and high-adaptation weights were extracted by OCNN, achieving enhanced detection of the occluded parts with a small amount of occlusion data. OGCN strengthened the shared and private attribute compensation node features by eliminating the obstacle features. The adjacency matrix was importance-weighted to enhance the quality of the occlusion area features and to improve the detection accuracy. The proposed network achieved detection accuracy of 78.5%, 67.1%, and 77.8% in the datasets COCO2017, COCO-Wholebody, and CrowdPose, respectively, outperforming the comparative algorithms. The proposed network saved 75% of the training data usage in the self-built occlusion dataset.

Table and Figures | Reference | Related Articles | Metrics
Shared lane-keeping control based on non-cooperative game theory
Junhui ZHANG,Xiaoman GUO,Jingxian WANG,Zongjie FU,Yuxi LIU
Journal of ZheJiang University (Engineering Science)    2024, 58 (5): 1001-1008.   DOI: 10.3785/j.issn.1008-973X.2024.05.013
Abstract   HTML PDF (1298KB) ( 179 )  

A driver-automation shared control strategy based on non-cooperative game (NCG) theory was proposed in order to reduce the conflict operations between the driver and intelligent system during the co-driving. The lane-keeping shared control problem was mathematically described by the first-order differential equation based on the linear two degree-of-freedom vehicle model. The NCG theory was employed to resolve the weight allocation problem of the shared control system, where the decision makers would act on the same dynamic system. The driving control authority was designed. Then the smooth transition of driving control authority between the driver and intelligent system was achieved by utilizing the preview offset distance (POD) to update the confidence matrix. The desired front wheel angle of lane-keeping shared control was transformed into an online quadratic programming problem formulated as a quadratic cost function with linear inequality constraints based on the model predictive control (MPC) framework. The shared control strategy was validated on the driver-in-the-loop CarSim/Simulink platform. Results demonstrate that such strategy can well-guarantee lateral tracking accuracy and the priority of the driver’s control authority.

Table and Figures | Reference | Related Articles | Metrics
UAV detection algorithm based on spatial correlation enhancement
Huijuan ZHANG,Kunpeng LI,Miaoxin JI,Zhenjiang LIU,Jianjuan LIU,Chi ZHANG
Journal of ZheJiang University (Engineering Science)    2024, 58 (3): 468-479.   DOI: 10.3785/j.issn.1008-973X.2024.03.004
Abstract   HTML PDF (3100KB) ( 144 )  

A small target detection method for unmanned aerial vehicle (UAV) based on adaptive up-sampling and spatial correlation enhancement was proposed, to resolve the problem of false detection and missed detection caused by the small size of UAV and the difficulty of feature extraction under complex backgrounds. Firstly, the important contextual information was obtained by multi-scale dilated convolution, and then the attention feature fusion module was used to suppress the information conflict of multi-scale feature fusion; Secondly, a new up-sampling method of sub-pixel convolution and bilinear interpolation adaptive fusion was adopted to balance the computation and to fuse more UAV feature information; Finally, spatial correlation enhancement strategies for local and global spatial features were performed on deep features to improve the sensitivity of foreground targets in complex backgrounds and enhance target expression to suppress background noise. Ablation experiments and comparative experiments were implemented on the self-made UAV dataset. The mAP0.5 and mAP0.5:0.95 of the proposed algorithm were increased by 2.4% and 2.7% respectively, compared with those of the original YOLOv5 algorithm. Furthermore, the detection speed was able to achieve 58.5 frames per second. The performance of the proposed algorithm was also verified on the VisDrone2019 dataset, and its mAP0.5 and mAP0.5:0.95 were respectively higher than those of the YOLOv5 algorithm by 4.6% and 1.3%.

Table and Figures | Reference | Related Articles | Metrics
Area coverage path planning for tilt-rotor unmanned aerial vehicle based on enhanced genetic algorithm
Yue’an WU,Changping DU,Rui YANG,Jiahao YU,Tianrui FANG,Yao ZHENG
Journal of ZheJiang University (Engineering Science)    2024, 58 (10): 2031-2039.   DOI: 10.3785/j.issn.1008-973X.2024.10.006
Abstract   HTML PDF (2211KB) ( 93 )  

An enhanced genetic algorithm was proposed to address the challenge of area coverage path planning for a tilt-rotor unmanned aerial vehicle (TRUAV) amidst multiple obstacles. A preliminary coverage path plan for the designated task area was devised, utilizing the minimum spanning and back-and-forth path generation algorithms. The area coverage dilemma was transformed into a traveling salesman problem to optimize the sequence of the coverage path. A fishtail-shaped obstacle avoidance strategy was proposed to circumvent obstacles within the region. The nearest neighbor algorithm was introduced to generate a superior initial population than a genetic algorithm. A three-point crossover operator and a dynamic interval mutation operator were adopted in the genetic processes to improve the proposed algorithm's global search capacity and prevent the algorithm from falling into local optima. The efficacy of the proposed algorithm was rigorously tested through simulations in polygonal areas with multiple obstacles. Results showed that, compared to the sequential path coverage algorithm and the genetic algorithm, the proposed algorithm reduced the length of the coverage path by 7.80%, significantly enhancing the coverage efficiency of TRUAV in the given task areas.

Table and Figures | Reference | Related Articles | Metrics
Multi-modal information augmented model for micro-video recommendation
Yufu HUO,Beihong JIN,Zhaoyi LIAO
Journal of ZheJiang University (Engineering Science)    2024, 58 (6): 1142-1152.   DOI: 10.3785/j.issn.1008-973X.2024.06.005
Abstract   HTML PDF (906KB) ( 256 )  

A multi-modal augmented model for click through rate (MMa4CTR) tailored for micro-videos recommendation was proposed. Multi-modal data derived from user interactions with micro-videos were effectively leveraged to construct embedded user representations and capture diverse user interests across multi-modal. The aim was to reveal the latent semantic commonalities, by combining and crossing features across modalities. The overall recommendation performance was boosted via two training strategies, automatic learning rate adjustment and validation interruption. A computationally efficient multi-layer perceptron architecture was employed, in order to address the computational demands brought on by the vast amount of multi-modal data. Performance comparison experiments and sensitivity analyses of hyperparameter on WeChat Video Channel and TikTok datasets demonstrated that MMa4CTR outperformed baseline models, delivering superior recommendation results with minimal computational resources. Additionally, ablation studies performed on both datasets further validated the significance and efficacy of the micro-video modality cross module, the user multi-modal embedding layer, and the strategies for automatic learning rate adjustment and validation interruption in enhancing recommendation performance.

Table and Figures | Reference | Related Articles | Metrics
Structure design and motion analysis of bionic hexapod origami robot
Dongxing CAO,Yanchao JIA,Xiangying GUO,Jiajia MAO
Journal of ZheJiang University (Engineering Science)    2024, 58 (8): 1543-1555.   DOI: 10.3785/j.issn.1008-973X.2024.08.002
Abstract   HTML PDF (3603KB) ( 210 )  

A new design scheme of crab-like hexapod origami robot was proposed by combining the origami structure with the multi-legged robot design and coupling Miura origami and six-fold origami aiming at the problems that the existing origami robots have a single structure and insufficient flexibility in movement. The motion configuration of the origami robot was expanded, and the motion flexibility of the origami robot was improved. Each leg of the robot has two degrees of freedom under the symmetry hypothesis. The vertices of the robot legs were treated as joints, and the crease lines were regarded as links. A planar link equivalent model of the robot legs was established with the folding angle as the motion variable. The theoretical range of motion for the robot’s foot was determined through simulation calculations. Then tapered panel technique was utilized to thicken the folding surfaces and prevent physical interference between adjacent folding surfaces. A three-dimensional model of the origami crab-like hexapod robot was constructed. The relationship between the folding angle and foot motion was analyzed based on the equivalent model of planar links, and the foot motion trajectory and gait of the robot were designed. The experimental prototype of origami bionic hexapod robot was designed and manufactured by using 3D printing technology, and the lateral movement of the robot was realized based on STM32 microcontroller control. Results show that the origami bio-inspired robot can realize the conversion from plane configuration to a crab-like configuration. The robot can move smoothly left and right under the coordinated movement of six legs.

Table and Figures | Reference | Related Articles | Metrics
Cross-domain recommendation model based on source domain data augmentation and multi-interest refinement transfer
Yabo YIN,Xiaofei ZHU,Yidan LIU
Journal of ZheJiang University (Engineering Science)    2024, 58 (8): 1717-1727.   DOI: 10.3785/j.issn.1008-973X.2024.08.018
Abstract   HTML PDF (2143KB) ( 146 )  

A cross-domain recommendation model that utilizes source domain data augmentation and multi-interest refinement transfer was proposed in order to address the issues of difficulty in modeling interest preferences in cross-domain recommendation tasks caused by the lack of user interaction data in the source domain, as well as the problem of ignored associations between multiple interests. A source-domain data augmentation strategy was introduced, generating a denoised auxiliary sequence for each user in the source domain. Then the sparsity of user interaction data in the source domain was alleviated, and enriched user interest preferences were obtained. The interest extraction and multi-interest refinement transfer were implemented by utilizing the dual sequence multi-interest extraction module and the multi-interest refinement transfer module. Three publicly cross-domain recommendation evaluation tasks were conducted. The proposed model achieved the best performance compared with the best baseline, reducing the average MAE by 22.86% and the average RMSE by 19.65%, which verified the effectiveness of the method.

Table and Figures | Reference | Related Articles | Metrics
Empty-load charging strategy for autonomous vehicle parking based on multi-agent system
Wenhao LI,Yanjie JI,Hao WU,Yewen JIA,Shuichao ZHANG
Journal of ZheJiang University (Engineering Science)    2024, 58 (8): 1659-1670.   DOI: 10.3785/j.issn.1008-973X.2024.08.013
Abstract   HTML PDF (2970KB) ( 128 )  

A multi-agent parking simulation framework was constructed in order to formulate autonomous vehicle (AV) parking demand management strategies. Two charging strategies for empty-load driving were proposed: a static charge based on driving distance and a dynamic charge based on road congestion levels. Rate calculation method was analyzed. Cost functions for parking lots, residential parking, and continuous empty cruising were established under these charging policies. A logit model was used to describe the choice behavior under different parking modes. The simulation of urban mobility (SUMO) was used to conduct a large-scale road network simulation experiment in Nanning’s main urban area. AV parking behavior and road network operation under both strategies were analyzed. The simulation results showed that the empty-load driving mileage of AVs decreased by 20.16% and 10.85% under the static and dynamic charging strategies, respectively. Total vehicle delay decreased by 39.80% and 43.52%, respectively. The dynamic charging strategy was adjustable in real-time based on road conditions, and operational efficiency of the road network was significantly enhanced.

Table and Figures | Reference | Related Articles | Metrics
Obstacle recognition of unmanned rail electric locomotive in underground coal mine
Tun YANG,Yongcun GUO,Shuang WANG,Xin MA
Journal of ZheJiang University (Engineering Science)    2024, 58 (1): 29-39.   DOI: 10.3785/j.issn.1008-973X.2024.01.004
Abstract   HTML PDF (2463KB) ( 445 )  

The PDM-YOLO model for accurate real-time obstacle detection in unmanned electric locomotives was proposed in order to address the problem of low accuracy of obstacle recognition in existing coal mine underground unmanned electric locomotives due to poor roadway environments. The ordinary convolution in the C3 module of the conventional YOLOv5 model was replaced with partial convolution to construct the C3_P feature extraction module, which effectively reduced the floating-point operations (FLOPs) and computational delay of the model. The improved decoupled head was used to decouple the prediction head of the conventional YOLOv5 model in order to improve the convergence speed of the model and the accuracy of obstacle recognition. The Mosaic data augmentation method was optimized to enrich the feature information of the training images and enhance the generalizability and robustness of the model. The experimental results showed that the mean average precision (mAP) of the PDM-YOLO model reached 96.3% and the average detection speed reached 109.2 frames per second on the self-built dataset. The detection accuracy of the PDM-YOLO model on the PASCAL VOC public dataset is higher than that of the existing mainstream YOLO series models.

Table and Figures | Reference | Related Articles | Metrics
Compound fault decoupling diagnosis method based on improved Transformer
Yu-xiang WANG,Zhi-wei ZHONG,Peng-cheng XIA,Yi-xiang HUANG,Cheng-liang LIU
Journal of ZheJiang University (Engineering Science)    2023, 57 (5): 855-864.   DOI: 10.3785/j.issn.1008-973X.2023.05.001
Abstract   HTML PDF (2584KB) ( 835 )  

Most of the compound fault diagnosis methods regard the compound fault as a new single fault type, ignoring the interaction of internal single faults, and the fault analysis is vague in granularity and poor in interpretation. An improved Transformer-based compound fault decoupling diagnosis method was proposed for industrial environments with very little compound fault data. The diagnosis process included pre-processing, feature extraction and fault decoupling. With introducing the decoder of the Transformer, the cross-attention mechanism enables each single fault label to adaptively in the extracted feature layer focus on the discriminative feature region corresponding to the fault feature and predicts the output probability to achieve compound fault decoupling. Compound fault tests were designed to verify the effectiveness of the method compared with the advanced algorithms. The results showed that the proposed method had high diagnostic accuracy with a small number of single fault training samples and a very small number of compound fault training samples. The compound fault diagnosis accuracy reached 88.29% when the training set contained only 5 compound fault samples. Thus the new method has a significant advantage over other methods.

Table and Figures | Reference | Related Articles | Metrics