Most Read Articles

Published in last 1 year |  In last 2 years |  In last 3 years |  All
Please wait a minute...
Survey of deep learning based EEG data analysis technology
Bo ZHONG,Pengfei WANG,Yiqiao WANG,Xiaoling WANG
Journal of ZheJiang University (Engineering Science)    2024, 58 (5): 879-890.   DOI: 10.3785/j.issn.1008-973X.2024.05.001
Abstract   HTML PDF (690KB) ( 13369 )  

A thorough analysis and cross-comparison of recent relevant works was provided, outlining a closed-loop process for EEG data analysis based on deep learning. EEG data were introduced, and the application of deep learning in three key stages: preprocessing, feature extraction, and model generalization was unfolded. The research ideas and solutions provided by deep learning algorithms in the respective stages were delineated, including the challenges and issues encountered at each stage. The main contributions and limitations of different algorithms were comprehensively summarized. The challenges faced and future directions of deep learning technology in handling EEG data at each stage were discussed.

Table and Figures | Reference | Related Articles | Metrics
Wolfberry pest detection based on improved YOLOv5
Dingjian DU,Zunhai GAO,Zhuo CHEN
Journal of ZheJiang University (Engineering Science)    2024, 58 (10): 1992-2000.   DOI: 10.3785/j.issn.1008-973X.2024.10.002
Abstract   HTML PDF (3603KB) ( 613 )  

A model based on improved YOLOv5m was proposed for wolfberry pest detection in a complex environment. The next generation vision transformer (Next-ViT) was used as the backbone network to improve the feature extraction ability of the model, and the key target features were given more attention by the model. An adaptive fusion context enhancement module was added to the neck to enhance the model’s ability to understand and process contextual information, and the precision of the model for the small object (aphids) detection was improved. The C3 module in the neck network was replaced by using the C3_Faster module to reduce the model footprint and further improve the model precision. Experimental results showed that the proposed model achieved a precision of 97.0% and a recall of 92.1%. The mean average precision (mAP50) was 94.7%, which was 1.9 percentage points higher than that of the YOLOv5m, and the average precision of aphid detection was improved by 9.4 percentage points. The mAP50 of different models were compared and the proposed was 1.6, 1.6, 2.8, 3.5, and 1.0 percentage points higher than the mainstream models YOLOv7, YOLOX, DETR, EfficientDet-D1, and Cascade R-CNN, respectively. The proposed model improves the detection performance while maintaining a reasonable model footprint.

Table and Figures | Reference | Related Articles | Metrics
Path planning of agricultural robots based on improved deep reinforcement learning algorithm
Wei ZHAO,Wanzhi ZHANG,Jialin HOU,Rui HOU,Yuhua LI,Lejun ZHAO,Jin Cheng
Journal of ZheJiang University (Engineering Science)    2025, 59 (7): 1492-1503.   DOI: 10.3785/j.issn.1008-973X.2025.07.017
Abstract   HTML PDF (2200KB) ( 200 )  

In order to solve the problems of difficulty in finding target points, sparse rewards, and slow convergence when using deep reinforcement learning algorithms for path planning of agricultural robots, a path-planning method based on multi-target point navigation integrated improved deep Q-network algorithm (MPN-DQN) was proposed. The laser simultaneous localization and mapping (SLAM) was used to scan the global environment to construct a prior map and divide the walking row and crop row areas, and the map boundary was expanded and fitted to form a forward bow-shaped operation corridor. The middle target point was used to segment the global environment, and the complex environment was divided into a multi-stage short-range navigation environment to simplify the target point search process. The deep Q-network algorithm was improved from three aspects: action space, exploration strategy and reward function to improve the reward sparsity problem, accelerate the convergence speed of the algorithm, and improve the navigation success rate. Experimental results showed that the total number of collisions of agricultural robots equipped with the MPN-DQN algorithm was 1, the average navigation time was 104.27 s, the average navigation distance was 16.58 m, and the average navigation success rate was 95%.

Table and Figures | Reference | Related Articles | Metrics
Area coverage path planning for tilt-rotor unmanned aerial vehicle based on enhanced genetic algorithm
Yue’an WU,Changping DU,Rui YANG,Jiahao YU,Tianrui FANG,Yao ZHENG
Journal of ZheJiang University (Engineering Science)    2024, 58 (10): 2031-2039.   DOI: 10.3785/j.issn.1008-973X.2024.10.006
Abstract   HTML PDF (2211KB) ( 790 )  

An enhanced genetic algorithm was proposed to address the challenge of area coverage path planning for a tilt-rotor unmanned aerial vehicle (TRUAV) amidst multiple obstacles. A preliminary coverage path plan for the designated task area was devised, utilizing the minimum spanning and back-and-forth path generation algorithms. The area coverage dilemma was transformed into a traveling salesman problem to optimize the sequence of the coverage path. A fishtail-shaped obstacle avoidance strategy was proposed to circumvent obstacles within the region. The nearest neighbor algorithm was introduced to generate a superior initial population than a genetic algorithm. A three-point crossover operator and a dynamic interval mutation operator were adopted in the genetic processes to improve the proposed algorithm's global search capacity and prevent the algorithm from falling into local optima. The efficacy of the proposed algorithm was rigorously tested through simulations in polygonal areas with multiple obstacles. Results showed that, compared to the sequential path coverage algorithm and the genetic algorithm, the proposed algorithm reduced the length of the coverage path by 7.80%, significantly enhancing the coverage efficiency of TRUAV in the given task areas.

Table and Figures | Reference | Related Articles | Metrics
Structure design and motion analysis of bionic hexapod origami robot
Dongxing CAO,Yanchao JIA,Xiangying GUO,Jiajia MAO
Journal of ZheJiang University (Engineering Science)    2024, 58 (8): 1543-1555.   DOI: 10.3785/j.issn.1008-973X.2024.08.002
Abstract   HTML PDF (3603KB) ( 345 )  

A new design scheme of crab-like hexapod origami robot was proposed by combining the origami structure with the multi-legged robot design and coupling Miura origami and six-fold origami aiming at the problems that the existing origami robots have a single structure and insufficient flexibility in movement. The motion configuration of the origami robot was expanded, and the motion flexibility of the origami robot was improved. Each leg of the robot has two degrees of freedom under the symmetry hypothesis. The vertices of the robot legs were treated as joints, and the crease lines were regarded as links. A planar link equivalent model of the robot legs was established with the folding angle as the motion variable. The theoretical range of motion for the robot’s foot was determined through simulation calculations. Then tapered panel technique was utilized to thicken the folding surfaces and prevent physical interference between adjacent folding surfaces. A three-dimensional model of the origami crab-like hexapod robot was constructed. The relationship between the folding angle and foot motion was analyzed based on the equivalent model of planar links, and the foot motion trajectory and gait of the robot were designed. The experimental prototype of origami bionic hexapod robot was designed and manufactured by using 3D printing technology, and the lateral movement of the robot was realized based on STM32 microcontroller control. Results show that the origami bio-inspired robot can realize the conversion from plane configuration to a crab-like configuration. The robot can move smoothly left and right under the coordinated movement of six legs.

Table and Figures | Reference | Related Articles | Metrics
Improved YOLOv7 based apple target detection in complex environment
Henghui MO,linjing WEI
Journal of ZheJiang University (Engineering Science)    2024, 58 (12): 2447-2458.   DOI: 10.3785/j.issn.1008-973X.2024.12.004
Abstract   HTML PDF (1927KB) ( 399 )  

Robotic harvesters face challenges in identifying apples under complex natural conditions such as unstable lighting, high fruit diversity, and severe leaf occlusion, which impedes the capture of key features, reducing harvesting efficiency and accuracy. An enhanced apple detection algorithm based on the YOLOv7 model for complex scenarios was proposed. A limited contrast adaptive histogram equalization technique was employed to enhance the contrast of apple images, reducing the background interference and clarifying the target contours. A multi-scale hybrid adaptive attention mechanism was introduced. The features were decomposed and reconstructed, and the spatial and channel attention directives were synergistically integrated to optimize multi-layer feature modeling over various distances, thereby boosting the model’s capability to extract apple features and resist background noise. Full-dimensional dynamic convolution was implemented to refine the feature selection process through a meticulous attention mechanism. The number of detection heads was increased to address the challenges of detecting small targets. The Meta-ACON activation function was used to optimize the attention allocation during feature extraction process. Experimental results demonstrated that the improved YOLOv7 model, achieved average accuracy and recall rates of 85.7% and 87.0%, respectively. Compared to Faster R-CNN, SSD, YOLOv5, and the original YOLOv7, the average detection precision was improved by 15.2, 7.5, 4.5, and 2.5 percentage points, and the average recall was improved by 13.7, 6.5, 3.6, and 1.3 percentage points, respectively. The model exhibits exceptional performance, providing robust technical support for apple growth monitoring and mechanical harvesting research.

Table and Figures | Reference | Related Articles | Metrics
Evaluation of generator side inertia based on electromechanical oscillation of power system
Zhiqiang REN,Mingxing TIAN,Yu JIANG,Dongfeng XING
Journal of ZheJiang University (Engineering Science)    2025, 59 (4): 870-878.   DOI: 10.3785/j.issn.1008-973X.2025.04.023
Abstract   HTML PDF (1102KB) ( 242 )  

The connection of new energy power generation equipment to the power generation side leads to the emergence of “weak inertia” characteristics on the power generation side, which affects the safe and stable operation of the system. The synchronous phase measurement unit (PMU) was used to measure the electromechanical oscillation response, and based on the electromechanical oscillation parameter under small perturbation, an inertia assessment method for the power generation side was proposed. Based on the characteristics of the inertia response process, the unbalanced power allocation equation related to the inertia of each generator was derived. Based on the relationship between the small-signal state equation and the characteristic root of the multi-machine system, the formula for calculating the inertia of the generation side of a multi-machine system was derived. The inertia calculation of the generation side of a single-machine system was introduced, and the measurement methods of inertia ratio and the intrinsic oscillation frequency in the inertia calculation formula were described. The correctness of the proposed method was verified by simulation examples of a single-machine system, a dual-machine interconnection system, a WSCC 3-machine 9-node system, and a 10-machine 39-node system. Results show that the generation side inertia evaluation values obtained with the proposed method in several systems are close to the actual values and have good adaptability. The method can be used for power system generation side inertia evaluation.

Table and Figures | Reference | Related Articles | Metrics
Empty-load charging strategy for autonomous vehicle parking based on multi-agent system
Wenhao LI,Yanjie JI,Hao WU,Yewen JIA,Shuichao ZHANG
Journal of ZheJiang University (Engineering Science)    2024, 58 (8): 1659-1670.   DOI: 10.3785/j.issn.1008-973X.2024.08.013
Abstract   HTML PDF (2970KB) ( 214 )  

A multi-agent parking simulation framework was constructed in order to formulate autonomous vehicle (AV) parking demand management strategies. Two charging strategies for empty-load driving were proposed: a static charge based on driving distance and a dynamic charge based on road congestion levels. Rate calculation method was analyzed. Cost functions for parking lots, residential parking, and continuous empty cruising were established under these charging policies. A logit model was used to describe the choice behavior under different parking modes. The simulation of urban mobility (SUMO) was used to conduct a large-scale road network simulation experiment in Nanning’s main urban area. AV parking behavior and road network operation under both strategies were analyzed. The simulation results showed that the empty-load driving mileage of AVs decreased by 20.16% and 10.85% under the static and dynamic charging strategies, respectively. Total vehicle delay decreased by 39.80% and 43.52%, respectively. The dynamic charging strategy was adjustable in real-time based on road conditions, and operational efficiency of the road network was significantly enhanced.

Table and Figures | Reference | Related Articles | Metrics
Optimization of 3D multi-UAVs low altitude penetration based on bald eagle search algorithm
Xialu WEN,He HUANG,Huifeng WANG,Lan YANG,Tao GAO
Journal of ZheJiang University (Engineering Science)    2024, 58 (10): 2020-2030.   DOI: 10.3785/j.issn.1008-973X.2024.10.005
Abstract   HTML PDF (2152KB) ( 431 )  

In response to the complex three-dimensional space environment and the high computational complexity of low altitude penetration path planning for multi-UAVs, the existing multi-objective bald eagle search algorithm has the shortcomings of easily approaching the center point and low accuracy. A 3D multi-UAVs low altitude penetration method based on the improved multi-objective bald eagle search algorithm (IMBES) was proposed. Models for the 3D environment, threat sources, UAV physical constraints, multi-UAVs cooperative constraints, and path smoothness were constructed to define a multi-objective cost function. A coupling chaotic mapping initialization was designed to enhance the quality of the initial population. An adaptive Gauss walk strategy based on the “scout eagle” was devised to balance development and search capabilities. Fast non-dominated sorting was introduced to further enhance algorithm efficiency. By leveraging the correspondence between the bald eagle position and UAV speed, turning angle, and climbing angle, the IMBES efficiently explored the UAV configuration space to identify the optimal Pareto front. Experimental results showed that the success rate of the IMBES was 70.5%. Compared with existing path planning methods, the proposed method demonstrates strong optimization capabilities and low energy consumption, making it suitable for collaborative low-altitude penetration by multiple UAVs.

Table and Figures | Reference | Related Articles | Metrics
3D path planning of plant protection UAVs in hilly mountainous orchards
Shaomeng YU,Ming YAN,Pengfei WANG,Jianxi ZHU,Xin YANG
Journal of ZheJiang University (Engineering Science)    2025, 59 (3): 635-642.   DOI: 10.3785/j.issn.1008-973X.2025.03.021
Abstract   HTML PDF (2985KB) ( 214 )  

A full-coverage 3D path planning method for mountainous orchard plant protection UAVs was proposed to address the challenges of manual control and the lack of 3D path planning for plant protection drones operating in hilly orchards. 3D coordinates of the operation area obtained from a real scene 3D model of the area were utilized. Comprehensive 3D path planning for plant protection UAVs was carried out based on the reciprocating cattle farming method and the real scene 3D model of the hilly orchard. An energy consumption model for the UAV was constructed, considering its movement status and load changes. The operating heading angle (ranging from 1° to 180°) was optimized to determine the path with minimal energy consumption. Results of field experiments showed that the path with the minimal energy consumption (heading angle of 91°) reduced the total energy consumption by 20.88% and the time required to complete the plant protection operation by 16.31%, compared to the path with the maximum energy consumption (heading angle of 147°). The fluctuation in canopy droplet deposition at each sampling point within the operation area was minimal. This method not only optimizes the energy consumption and improves the operational efficiency, but also ensures full coverage of plant protection within the working area.

Table and Figures | Reference | Related Articles | Metrics
Low-carbon optimal scheduling of integrated energy system considering multiple flexible resources
Haijun XING,Yujing YE,Zheyuan LIU,Weijian JIANG,Wenbo ZHANG,Shuxin TIAN
Journal of ZheJiang University (Engineering Science)    2024, 58 (6): 1243-1254.   DOI: 10.3785/j.issn.1008-973X.2024.06.014
Abstract   HTML PDF (1651KB) ( 363 )  

An integrated energy distributed low-carbon economic dispatch model that considered multiple flexible resources was proposed, aiming at the problem of insufficient system flexibility and low carbon of integrated energy systems (IES) with multiple parks. Firstly, the flexibility requirements of the system were analyzed, the IES flexibility margin constraints were proposed, and multiple flexibility resource models including carbon capture plants were constructed to make full use of the flexible operation mode of carbon capture plants. Second, ladder-type carbon trading was introduced to establish a two-tier scheduling model for the integrated energy system. The upper layer of the model aimed to minimize the cost of energy supply by energy suppliers, and the lower layer aimed to minimize the operating cost of energy operators consisting of energy hubs (EH). The model was solved by using the objective cascade analysis method to achieve the collaborative scheduling between the upper and lower layers of the energy supplier and energy service provider with respect to the characteristics of the multi-subject operation. Finally, the positive effect of the proposed model on enhancing the system flexibility and low carbon was verified through an arithmetic example consisting of IEEE30-node network, Belgium 20-node gas network and multiple energy hubs.

Table and Figures | Reference | Related Articles | Metrics
Intelligent connected vehicle motion planning at unsignalized intersections based on deep reinforcement learning
Mingfang ZHANG,Jian MA,Nale ZHAO,Li WANG,Ying LIU
Journal of ZheJiang University (Engineering Science)    2024, 58 (9): 1923-1934.   DOI: 10.3785/j.issn.1008-973X.2024.09.017
Abstract   HTML PDF (2586KB) ( 216 )  

A vehicle motion planning algorithm based on deep reinforcement learning was proposed to satisfy the efficiency and comfort requirements of intelligent connected vehicles at unsignalized intersections. Temporal convolutional network (TCN) and Transformer algorithms were combined to construct the intention prediction model for surrounding vehicles. The multi-layer convolution and self-attention mechanisms were used to improve the capability of capturing vehicle motion feature. The twin delayed deep deterministic policy gradient (TD3) reinforcement learning algorithm was employed to build the vehicle motion planning model. Taking the driving intention of surrounding vehicle, driving style, interaction risk, and the comfort of ego vehicle into consideration comprehensively, the state space and reward functions were designed to enhance understanding the dynamic environment. Delaying the policy updates and smoothing the target policies were conducted to improve the stability of the proposed algorithm, and the desired acceleration was output in real-time. Experimental results demonstrated that the proposed motion planning algorithm can perceive the real-time potential interaction risk based on the driving intention of surrounding vehicles. The generated motion planning strategy met the requirements of the efficiency, safety and comfort. It showed excellent adaptability to different styles of surrounding vehicles and dense interaction scenarios, and the success rates exceeded 92.1% in various scenarios.

Table and Figures | Reference | Related Articles | Metrics
Survey of embodied agent in context of foundation model
Songyuan LI,Xiangwei ZHU,Xi LI
Journal of ZheJiang University (Engineering Science)    2025, 59 (2): 213-226.   DOI: 10.3785/j.issn.1008-973X.2025.02.001
Abstract   HTML PDF (841KB) ( 768 )  

Foundational models in natural language processing, computer vision and multimodal learning have achieved significant breakthroughs in recent years, showcasing the potential of general artificial intelligence. However, these models still fall short of human or animal intelligence in areas such as causal reasoning and understanding physical commonsense. This is because these models primarily rely on vast amounts of data and computational power, lacking direct interaction with and experiential learning from the real world. Many researchers are beginning to question whether merely scaling up model size is sufficient to address these fundamental issues. This has led the academic community to reevaluate the nature of intelligence, suggesting that intelligence arises not just from enhanced computational capabilities but from interactions with the environment. Embodied intelligence is gaining attention as it emphasizes that intelligent agents learn and adapt through direct interactions with the physical world, exhibiting characteristics closer to biological intelligence. A comprehensive survey of embodied artificial intelligence was provided in the context of foundational models. The underlying technical ideas, benchmarks, and applications of current embodied agents were discussed. A forward-looking analysis of future trends and challenges in embodied AI was offered.

Table and Figures | Reference | Related Articles | Metrics
Identification of apple leaf diseases based on MA-ConvNext network and stepwise relational knowledge distillation
Huan LIU,Yunhong LI,Leitao ZHANG,Yue GUO,Xueping SU,Yaolin ZHU,Lele HOU
Journal of ZheJiang University (Engineering Science)    2024, 58 (9): 1757-1767.   DOI: 10.3785/j.issn.1008-973X.2024.09.001
Abstract   HTML PDF (5637KB) ( 810 )  

The backgrounds are cluttered, the spot sizes of apple leaf disease are varying in complex environments, and the existing models have the problems of multiple parameters and a large amount of calculation. Thus, an apple leaf disease recognition network, ConvNext network based on attention and multiscale feature fusion (MA-ConvNext), was proposed. A multiscale spatial reconstruction and channel reconstruction block (MSCB) and a feature extraction block with triplet attention fusion (TAFB) were utilized to effectively extract the features at different scales and enhance the focus on leaf disease spots. Additionally, a stepwise relational knowledge distillation method was employed to fuse the "teacher" network (MA-ConvNext) with an "intermediate" network (DenseNet121) to guide the training of the "student" network (EfficientNet-B0) and achieve the model lightweighting. Experimental results showed that MA-ConvNext achieved a recognition accuracy of 99.38%, improving by 3.98 percentage points, 7.55 percentage points and 4.27 percentage points compared to ResNet50, MobileNet-V3, and EfficientNet-V2 networks, respectively. After the stepwise relational knowledge distillation, the recognition accuracy further improved by 1.76 percentage points, with a smaller network size and parameters of 1.56×107 and 5.29×106. respectively. The proposed method offers new insights and technical support for the precise detection of pests and diseases in agriculture.

Table and Figures | Reference | Related Articles | Metrics
Attention-fused filter bank dual-view graph convolution motor imagery EEG classification
Shuhan WU,Dan WANG,Yuanfang CHEN,Ziyu JIA,Yueqi ZHANG,Meng XU
Journal of ZheJiang University (Engineering Science)    2024, 58 (7): 1326-1335.   DOI: 10.3785/j.issn.1008-973X.2024.07.002
Abstract   HTML PDF (2102KB) ( 338 )  

In motor imagery tasks, the brain often involves simultaneous activation of multiple regions, and traditional convolutional neural networks struggle to accurately represent the coordinated neural activity across these regions. Graph convolutional network GCN is suitable for representing the collaborative tasks of different brain regions by considering the connections and relationships between nodes (brain regions) in graph data. Attention-fused filter bank dual-view GCN(AFB-DVGCN)was proposed. A dual-branch network was constructed using filter banks to extract temporal and spatial information from different frequency bands. Information complementarity was achieved by a convolutional spatial feature extraction method for dual-view graphs. In order to improve the classification accuracy, the effective channel attention mechanism was utilized to enhance features and capture the interaction information between different feature maps. Validation results in the publicly available datasets BCI Competition IV-2a and OpenBMI show that AFB-DVGCN has achieved good classification performance, and the classification accuracy is significantly higher than that of the comparison networks.

Table and Figures | Reference | Related Articles | Metrics
Lightweight road extraction model based on multi-scale feature fusion
Yi LIU,Yidan CHEN,Lin GAO,Jiao HONG
Journal of ZheJiang University (Engineering Science)    2024, 58 (5): 951-959.   DOI: 10.3785/j.issn.1008-973X.2024.05.008
Abstract   HTML PDF (1551KB) ( 758 )  

A road extraction model based on multi-scale feature fusion lightweight DeepLab V3+ (MFL-DeepLab V3+) was proposed aiming at the problems of high computational complexity and poor road extraction effect of the current semantic models used in the field of remote sensing image road extraction. The lightweight MobileNet V2 network was used to replace the original model’s Xception network as the backbone network in order to reduce the parameters of the model and the computational complexity of the model. Deep separable convolution was introduced into the Atlas spatial pyramid pooling (ASPP) module. A multi-scale feature fusion with attention (MFFA) was proposed in the decoding area in order to enhance the road extraction ability of the model and optimize the extraction effect on small road segments. Experiments based on the Massachusetts roads dataset showed that the parameter size of the MFL-DeepLab V3+ model was significantly reduced with a parameter compression of 88.67% compared to the original model. The road extraction image had clear edges, and its accuracy, recall, and F1-score were 88.45%, 86.41% and 87.42%, achieving better extraction performance compared to other models.

Table and Figures | Reference | Related Articles | Metrics
Control strategy of power conversion system based on sliding mode active disturbance rejection control
Jinfeng HUANG,Jie ZHOU,Hongjie HUANG
Journal of ZheJiang University (Engineering Science)    2024, 58 (10): 2171-2181.   DOI: 10.3785/j.issn.1008-973X.2024.10.021
Abstract   HTML PDF (1993KB) ( 160 )  

In order to improve the dynamic performance of the power conversion system (PCS), an improved active disturbance rejection control (ADRC) strategy based on reduced-order cascaded extended state observer (ESO) and complementary sliding mode control (CSMC) was designed and applied to the voltage outer loop of the bidirectional DC/AC converter in the PCS. The ESO was modified to a reduced-order cascaded ESO to improve the estimation speed of the state variables and the overall disturbance, enhancing the disturbance estimation capability. The PD control was replaced with CSMC to design a state error feedback law to enhance the robustness of the system, and an improved exponential reaching law was designed to suppress the chattering phenomenon. A simulation model was established and a related experimental platform was built to demonstrate the superiority of the improved ADRC strategy compared to PI control and traditional ADRC. The simulation and experimental results show that the improved ADRC strategy reduces the fluctuation of the DC bus voltage during the transient operation of the PCS, improves the power response speed on the AC side of the PCS, and enhances the output power quality on the AC side.

Table and Figures | Reference | Related Articles | Metrics
Super-resolution reconstruction of remote sensing image based on CNN and Transformer aggregation
Mingzhi HU,Jun SUN,Biao YANG,Kairong CHANG,Junlong YANG
Journal of ZheJiang University (Engineering Science)    2025, 59 (5): 938-946.   DOI: 10.3785/j.issn.1008-973X.2025.05.007
Abstract   HTML PDF (12404KB) ( 299 )  

A multi-layer degradation module was proposed aiming at the problem that most remote sensing image super-resolution models rarely consider the impact of noise, blur, JPEG compression, and other factors on image reconstruction, as well as the limitations of Transformer modules in capturing high-frequency information. A CNN-Transformer hybrid network was designed, where CNN captures high-frequency details and Transformer extracts global information. These two components were combined by an attention-based aggregation module, enhancing local high-frequency detail reconstruction while maintaining global structural coherence. The model was tested on six random scenes from the AID dataset and compared with the MM-realSR model in PSNR and SSIM. Results show an average PSNR improvement of 1.61 dB and a SSIM increase of 0.023 over MM-realSR.

Table and Figures | Reference | Related Articles | Metrics
Oriented ship detection algorithm in SAR image based on improved YOLOv5
Yali XUE,Yiming HE,Shan CUI,Quan OUYANG
Journal of ZheJiang University (Engineering Science)    2025, 59 (2): 261-268.   DOI: 10.3785/j.issn.1008-973X.2025.02.004
Abstract   HTML PDF (2761KB) ( 628 )  

A novel detection algorithm (efficient multi-scale attention (EMA) and small object detection based on YOLOv5, ES-YOLOv5) was proposed by targeting small ship targets in SAR scenes aiming at the issues of inconspicuous imaging features and low detection accuracy caused by arbitrary orientation of small targets in synthetic aperture radar (SAR) imaging. A small target detection layer was added to adjust the receptive field size, making it more suitable for capturing small target scale features and facilitating multi-scale fusion. An EMA mechanism was introduced to focus on key target information and enhance feature representation capability. The circular smooth label (CSL) technique was utilized to adapt to the periodicity of angles, achieving high-precision angle classification. The experimental results demonstrate that the proposed method achieves an average detection accuracy of 90.9% at an intersection over union (IoU) threshold of 0.5 on the RSDD-SAR dataset. The algorithm outperforms the baseline algorithm YOLOv5 by 6% in improving the precision of detecting small SAR ship targets, significantly enhancing the model’s detection performance.

Table and Figures | Reference | Related Articles | Metrics
Multi-modal information augmented model for micro-video recommendation
Yufu HUO,Beihong JIN,Zhaoyi LIAO
Journal of ZheJiang University (Engineering Science)    2024, 58 (6): 1142-1152.   DOI: 10.3785/j.issn.1008-973X.2024.06.005
Abstract   HTML PDF (906KB) ( 629 )  

A multi-modal augmented model for click through rate (MMa4CTR) tailored for micro-videos recommendation was proposed. Multi-modal data derived from user interactions with micro-videos were effectively leveraged to construct embedded user representations and capture diverse user interests across multi-modal. The aim was to reveal the latent semantic commonalities, by combining and crossing features across modalities. The overall recommendation performance was boosted via two training strategies, automatic learning rate adjustment and validation interruption. A computationally efficient multi-layer perceptron architecture was employed, in order to address the computational demands brought on by the vast amount of multi-modal data. Performance comparison experiments and sensitivity analyses of hyperparameter on WeChat Video Channel and TikTok datasets demonstrated that MMa4CTR outperformed baseline models, delivering superior recommendation results with minimal computational resources. Additionally, ablation studies performed on both datasets further validated the significance and efficacy of the micro-video modality cross module, the user multi-modal embedding layer, and the strategies for automatic learning rate adjustment and validation interruption in enhancing recommendation performance.

Table and Figures | Reference | Related Articles | Metrics