Most Downloaded Articles

Published in last 1 year| In last 2 years| In last 3 years| All| Most Downloaded in Recent Month | Most Downloaded in Recent Year|

Published in last 1 year
Please wait a minute...
Dynamic 3D reconstruction method using binocular vision and improved YOLOv8
Jingyao HE,Pengfei LI,Chengzhi WANG,Zhenming LV,Ping MU
Journal of ZheJiang University (Engineering Science)    2025, 59 (7): 1443-1450.   DOI: 10.3785/j.issn.1008-973X.2025.07.012
Abstract   HTML PDF (7266KB) ( 1556 )  

A dynamic 3D reconstruction technology for construction sites was proposed to ensure safety and efficiency in the construction process. A Binocular camera was deployed to scan the reconstruction site in 3D to obtain the model base and target activity trajectory. The YOLOv8 model was enhanced with an attentional scale sequence fusion (ASF) module to form the YOLOv8-ASF framework, which improved the accuracy and performance of the model, to solve the pain points such as target occlusion and target loss. The improved semi-global block matching (SGBM) algorithm was fused, and the YOLOv8-ASF-SGBM algorithm was integrated with the YOLOv8-ASF to achieve near-real-time target recognition and localization based on 2D images. The obtained depth information was used to 3D project the behavior trajectories of dynamic elements into the substrate, to realize the near-real-time and full-view monitoring of the real construction site. Experimental results show that the proposed technology reproduces the movement trajectory of construction dynamic elements in high-precision three-dimensional, and the relative error with the real motion trajectory of dynamic elements is less than 5%, which can realize high-precision full-view three-dimensional monitoring based on two-dimensional image and video information, and has good application scenarios and engineering value.

Table and Figures | Reference | Related Articles | Metrics
Research progress of YOLO detection technology for traffic object
Hongzhao DONG,Shaoxuan LIN,Yini SHE
Journal of ZheJiang University (Engineering Science)    2025, 59 (2): 249-260.   DOI: 10.3785/j.issn.1008-973X.2025.02.003
Abstract   HTML PDF (3207KB) ( 1189 )  

The development and research status of YOLO algorithm in traffic object detection were systematically summarized from the perspective of the three core elements of 'people-vehicle-road' in order to comprehensively analyze the important role of YOLO (You Only Look Once) algorithm in improving traffic safety and efficiency. The commonly used evaluation indexes of YOLO algorithm were outlined, and the practical significance of these indexes in traffic scenarios was elaborately expounded. An overview of the core architecture of YOLO algorithm was provided, its development process was traced, and the optimization and improvement measures in each version iteration were analyzed. The research status and application scenarios of YOLO algorithm for traffic object detection were sorted out and discussed from the perspective of the three traffic objects 'people-vehicle-road'. The limitations and challenges of YOLO algorithm in traffic object detection were analyzed, and corresponding improvement methods were proposed. Future research focuses were anticipated, providing a research reference for the intelligent development of road traffic.

Table and Figures | Reference | Related Articles | Metrics
Three-dimensional sector automatic design based on improved NSGA-II algorithm
Yingfei ZHANG,Xiaobing HU,Hang ZHOU,Xuzeng FENG
Journal of ZheJiang University (Engineering Science)    2025, 59 (2): 413-422.   DOI: 10.3785/j.issn.1008-973X.2025.02.019
Abstract   HTML PDF (1634KB) ( 985 )  

An improved non-dominated sorting genetic algorithm II (NSGA-II) was proposed in order to address the challenges of time-consuming manual airspace sectorization and the difficulty in comparing the quality of different sectorization schemes. A three-dimensional multi-objective optimization model for sectorization was established by using a grid-region-sector hierarchy in order to balance controllers’ workload within sectors and reduce workload differences between sectors. A fitness evaluation operator, a probability-adaptive combination crossover operator and a dynamic mutation operator were incorporated in the NSGA-II algorithm in order to enhance the number of feasible solutions, solution diversity and computational efficiency. A simulation was conducted for the automatic 3D sectorization of Xi'an high-altitude airspace. Results showed that the optimized scheme improved workload balance within sectors by 37% and reduced inter-sector workload by 24% compared with the current sectorization configuration. The proposed improved NSGA-II provided a broader range of options for decision-makers with varying preferences compared with traditional weighted multi-objective optimization algorithms.

Table and Figures | Reference | Related Articles | Metrics
Review on computational intelligence based on parallel computing
Fei WU,Jiacheng CHEN,Wanliang WANG
Journal of ZheJiang University (Engineering Science)    2025, 59 (1): 27-38.   DOI: 10.3785/j.issn.1008-973X.2025.01.003
Abstract   HTML PDF (760KB) ( 823 )  

Traditional computational intelligence technology was found to lack real-time capabilities and adaptability, and computational intelligence technology based on parallel computing made computational efficiency improve and addressed the issue of compatible processing of multimodal information. From three branches of computational intelligence: neural networks, evolutionary algorithms, and swarm intelligence algorithms, the current states were reviewed on the integration of computational intelligence and big data-parallel computing. Problems present in parallel computing intelligence were summarized, and some thoughts were given to the development direction of related studies.

Table and Figures | Reference | Related Articles | Metrics
Multimodal emotional feature analysis based on short video resources of traffic incidents
Zhentao DONG,Kaimin XU,Qingying WAN,Xiaofei LIU,Hao SHEN,Shuhan LI,Geqi QI
Journal of ZheJiang University (Engineering Science)    2025, 59 (4): 661-668.   DOI: 10.3785/j.issn.1008-973X.2025.04.001
Abstract   HTML PDF (3695KB) ( 767 )  

In order to portray the public emotion orientation caused by the public opinion on traffic incidents disseminated in short videos, a physiological feature graph was constructed by the text sentiment analysis and the multimodal physiological signal feature extraction. This work collected 136 highly-liked videos with 38 805 comments on TikTok. Considering all videos as a document set, with each video treated as a document and comments as words, the latent Dirichlet allocation topic model was adopted to obtain the distribution of comments under different topics and the distribution of topics under different videos. Naive Bayes-based SnowNLP was utilized to calculate the sentiment scores of comments and analyze the sentiment tendencies expressed by different opinion topics. Neuroscience experiments were carried out to collect multimodal physiological signals such as EEG, eye movement, ECG, and respiration as well as emotion ratings. Statistical test results show that videos with different sentiment tendencies induce different emotions, and the multimodal physiological features such as the relative spectral power of EEG, blinking frequency, respiration standard deviation, and the very low-frequency power of ECG are specific under different emotions. The emotional semantics embedded in the comments influence public emotion in various ways beyond that evoked by videos.

Table and Figures | Reference | Related Articles | Metrics
Survey of embodied agent in context of foundation model
Songyuan LI,Xiangwei ZHU,Xi LI
Journal of ZheJiang University (Engineering Science)    2025, 59 (2): 213-226.   DOI: 10.3785/j.issn.1008-973X.2025.02.001
Abstract   HTML PDF (841KB) ( 697 )  

Foundational models in natural language processing, computer vision and multimodal learning have achieved significant breakthroughs in recent years, showcasing the potential of general artificial intelligence. However, these models still fall short of human or animal intelligence in areas such as causal reasoning and understanding physical commonsense. This is because these models primarily rely on vast amounts of data and computational power, lacking direct interaction with and experiential learning from the real world. Many researchers are beginning to question whether merely scaling up model size is sufficient to address these fundamental issues. This has led the academic community to reevaluate the nature of intelligence, suggesting that intelligence arises not just from enhanced computational capabilities but from interactions with the environment. Embodied intelligence is gaining attention as it emphasizes that intelligent agents learn and adapt through direct interactions with the physical world, exhibiting characteristics closer to biological intelligence. A comprehensive survey of embodied artificial intelligence was provided in the context of foundational models. The underlying technical ideas, benchmarks, and applications of current embodied agents were discussed. A forward-looking analysis of future trends and challenges in embodied AI was offered.

Table and Figures | Reference | Related Articles | Metrics
Characteristic of stress concentration distribution in layered rock of tunnel under dynamic and static load
Yumin YANG,Nan JIANG,Yingkang YAO,Chuanbo ZHOU,Xianzhong MENG,Moxi ZHAO
Journal of ZheJiang University (Engineering Science)    2025, 59 (2): 319-331.   DOI: 10.3785/j.issn.1008-973X.2025.02.010
Abstract   HTML PDF (3623KB) ( 694 )  

The similar test of physical model was designed aiming at the diversion tunnel project of layered surrounding rock of San Gavan Hydropower Station. LSDYNA was used to analyze the propagation characteristics and distribution characteristics of stress wave in layered rock mass by considering the static load, dynamic load and the dip angle. The sensitivity of different factors to the peak stress and secondary equilibrium stress of surrounding rock was analyzed by orthogonal test. A stress prediction model under the influence of multiple factors was established based on the dimensional analysis in order to determine the safety load control range of surrounding rock. Results showed that there was initial stress concentration in the surrounding rock of tunnel under high ground stress. The dynamic load had a significant impact on the value of stress concentration. The stress wave front was discontinuously distributed due to the influence of bedding. The dynamic and static loads were positively linearly correlated with the peak stress and secondary equilibrium stress. The peak stress and secondary equilibrium stress showed '∧' type distribution with the increase of dip angle. The sensitivity order of different factors to the peak stress and secondary equilibrium stress was dynamic load>static load>dip angle. The static load limit values were 0.731, 0.555, 0.479 and 0.456 MPa respectively, and the dynamic load limit values were 0.624, 0.523, 0.477 and 0.463 MPa respectively when the dip angle was 90°(0°), 75°(15°), 60°(30°) and 45°.

Table and Figures | Reference | Related Articles | Metrics
Multi-distortion type underwater image enhancement based on improved CycleGAN
Zhenming LV,Shaojiang DONG,Zongyou XIA,Xiaoyan MOU,Mingquan WANG
Journal of ZheJiang University (Engineering Science)    2025, 59 (6): 1148-1158.   DOI: 10.3785/j.issn.1008-973X.2025.06.006
Abstract   HTML PDF (4587KB) ( 693 )  

A multi-distortion type underwater image enhancement algorithm based on improved CycleGAN was proposed, aiming at the difficulties of underwater image blurring, low contrast and image distortion recognition caused by various factors such as scattering, absorption and color deviation. Firstly, in order to improve the image enhancement effect, Auto-Encoder+Skip-connection network structure was used in the generator of CycleGAN, and global color correction structure was added for global enhancement in terms of pixel as well as color, so as to better capture the color information in underwater images. Secondly, a multidimensional perceptual discriminator was designed to learn the global and local features of the image. This discriminator payed more attention to the local details of the image, effectively targeted scattering and color noise, perceived the image from a multidimensional space, and had a stronger ability to extract the features, thereby enhancing the accuracy of image discrimination. Finally, the experimental results on EUVP, UIEB and U45 datasets showed that the proposed method achieved better results, compared with other algorithms. In processing multi-distortion types of underwater images, the algorithm’s SSIM indicator was higher than that of the second place by an average of 1.57%, the PSNR indicator was higher by 1.836%, the UIQM indicator was higher by 1.324%, and the UCIQE indicator was higher by 1.086%. The proposed method performed well in processing color and noise details.

Table and Figures | Reference | Related Articles | Metrics
Effect of segregated pit construction on displacement of adjacent strata and tunnel
Dingwen ZHOU,Lei HAN,Hongwei YING,Chengwei ZHU,Huihui LI
Journal of ZheJiang University (Engineering Science)    2025, 59 (5): 1072-1082.   DOI: 10.3785/j.issn.1008-973X.2025.05.020
Abstract   HTML PDF (1813KB) ( 684 )  

A finite element numerical model of the segregated foundation pit was established based on the case of a deep foundation pit in Hangzhou adjacent to an operating underground shield tunnel in order to analyze the influence of the construction sequence, the separation wall location and other factors on the deformation of deep and large foundation pits and adjacent facilities caused by the segregated-pit construction. The reasonableness of the parameters of the HSS model was verified by combining with the measured data. The influence of the construction sequence of the "platform" type segregated pit on the displacements of out-of-pit strata and existing adjacent tunnels were analyzed by combining with a simplified model based on the case. Results show that the displacements of strata and tunnels caused by the excavation of the segregated pit in Hangzhou soft soil are related to the construction sequence, the location of the separation wall, the thickness of the soft clay, and the relative position of the tunnel and the pit. The deformation of the close pit retaining wall, the surface settlement and the tunnel displacement will be greater with a wider far sub-pit when the close sub-pit is firstly constructed. An opposite finding is observed if the far sub-pit is firstly excavated, and the optimal control effect on the deformation of the retaining wall and adjacent tunnels is achieved by dividing the ratio of the far sub-pit width to the close one by 3.0 to 4.0 and the width of the close sub-pit by 15 m to 20 m. The deformation of the close pit retaining wall, the surface settlement and the tunnel displacement caused by the two sub-pit construction sequences will increase as the thickness of the soft clay layer increases. The concept of the displacement impact zone resulting from different sub-pit construction sequences was proposed, and the demarcation line of the zone can be simplified to be a straight line with an angle of 45° to the wall of the pit. The range of the displacement impact zone which is defined as the strata displacement caused by the close-first-then-far construction sequence is smaller than that of the far-first-then-close construction sequence gradually decreases with the increase of the width of the far sub-pit and the thickness of the soft clay layer. A parametric analysis was conducted to propose formula for fitting the demarcation line of the impact zones related to the location of the separation wall and the thickness of the soft soil layer.

Table and Figures | Reference | Related Articles | Metrics
Review of data-driven intelligent computation and its application
Rui DAI,Jing JIE,Wanliang WANG,Qianlin YE,Fei WU
Journal of ZheJiang University (Engineering Science)    2025, 59 (2): 227-248.   DOI: 10.3785/j.issn.1008-973X.2025.02.002
Abstract   HTML PDF (1476KB) ( 624 )  

State-of-the-art data-driven intelligent computations (DDICs) were comprehensively reviewed in order to effectively solve the increasingly complex and expensive optimization problems (EOPs) emerging in real-world applications, which can effectively reduce computing costs and improve solutions. The latest research achievements of DDICs were outlined from both algorithm and application perspectives. Various technical points in generalized DDICs and adaptive DDICs were summarized and categorized. The challenges and opportunities faced by DDICs in solving EOPs were analyzed. Future research potential trends were proposed, such as conducting deeper theoretical analyses, exploring novel learning paradigms, applying these methods in various practical fields, and so on. This aims to provide targeted references and directions for researchers, stimulating innovative ideas to more effectively address the complex EOPs encountered in real-world applications.

Table and Figures | Reference | Related Articles | Metrics
Channel-weighted multimodal feature fusion for EEG-based fatigue driving detection
Wenxin CHENG,Guanghui YAN,Wenwen CHANG,Baijing WU,Yaning HUANG
Journal of ZheJiang University (Engineering Science)    2025, 59 (9): 1775-1783.   DOI: 10.3785/j.issn.1008-973X.2025.09.001
Abstract   HTML PDF (1789KB) ( 587 )  

A multimodal feature fusion model based on non-smooth non-negative matrix factorization (nsNMF-PCNN-GRU-MSA) was proposed to address the problems of poor generalisation ability, single feature extraction mode and model uninterpretability in the fatigue driving detection methods. This model detected the level of driver fatigue by analyzing electroencephalogram (EEG) signals. A channel weighting module was designed in the shallow layer of the network, and the non-smooth non-negative matrix factorization (nsNMF) algorithm was introduced to compute the contribution of the electrode channels. A multimodal feature fusion module was designed in the middle layer of the network, where the Gramian angular field imaging method was introduced to map the 1D EEG data into a 2D image, and the spatio-temporal features of different modes were fused in parallel with the PCNN-GRU module. The multi-head self-attention (MSA) mechanism was fused in the deep layer of the network to complete the task of fatigue driving state classification. The experimental results showed that the fatigue detection accuracies of the model on the mixed samples of the SEED-VIG and SAD datasets were 93.37% and 90.78%, respectively, and the lowest accuracies for single-subject data were 86.60% and 85.59%, respectively, which were higher than those of the state-of-the-art models. The analysis method of mapping the feature activation values onto the brain topology map not only improves the interpretability of the model, but also provides a new perspective on fatigue driving detection.

Table and Figures | Reference | Related Articles | Metrics
Oriented ship detection algorithm in SAR image based on improved YOLOv5
Yali XUE,Yiming HE,Shan CUI,Quan OUYANG
Journal of ZheJiang University (Engineering Science)    2025, 59 (2): 261-268.   DOI: 10.3785/j.issn.1008-973X.2025.02.004
Abstract   HTML PDF (2761KB) ( 522 )  

A novel detection algorithm (efficient multi-scale attention (EMA) and small object detection based on YOLOv5, ES-YOLOv5) was proposed by targeting small ship targets in SAR scenes aiming at the issues of inconspicuous imaging features and low detection accuracy caused by arbitrary orientation of small targets in synthetic aperture radar (SAR) imaging. A small target detection layer was added to adjust the receptive field size, making it more suitable for capturing small target scale features and facilitating multi-scale fusion. An EMA mechanism was introduced to focus on key target information and enhance feature representation capability. The circular smooth label (CSL) technique was utilized to adapt to the periodicity of angles, achieving high-precision angle classification. The experimental results demonstrate that the proposed method achieves an average detection accuracy of 90.9% at an intersection over union (IoU) threshold of 0.5 on the RSDD-SAR dataset. The algorithm outperforms the baseline algorithm YOLOv5 by 6% in improving the precision of detecting small SAR ship targets, significantly enhancing the model’s detection performance.

Table and Figures | Reference | Related Articles | Metrics
Mechanical and electrochemical characteristic of LiFePO4 battery under multi-temperature and electric field condition
Hongru ZHU,Ziqiang CHEN,Ping YI
Journal of ZheJiang University (Engineering Science)    2025, 59 (11): 2300-2308.   DOI: 10.3785/j.issn.1008-973X.2025.11.009
Abstract   HTML PDF (1740KB) ( 483 )  

The mechanical and electrochemical characteristics of LiFePO4 battery under different temperature and electric field were analyzed in order to introduce the in-situ surface expansion force as an additional input variable for the estimation of state of charge (SOC) and thus improve the estimation accuracy. A multi-physics signal acquisition platform was designed and constructed. Open-circuit voltage (OCV) tests, hybrid pulse power characterization (HPPC) tests, and in-situ surface expansion force measurements were conducted at different temperature. The mechanical and electrochemical characteristics of battery and its multi-physics responses under various operating conditions were analyzed. Results show that the in-situ surface expansion force first increases, then decreases, and then increases again as SOC rises, and it is more sensitive to SOC than OCV. The extrema of the expansion force curves are slightly affected by temperature, showing small delays with increasing temperature. They are strongly affected by current, occurring earlier and gradually disappearing as the current increases. The internal resistance decreases significantly with increasing temperature. The OCV curves exhibit high consistency across different temperature. The experimental results demonstrate that the expansion force signal has potential in SOC estimation and provide theoretical foundation and data support for SOC estimation methods based on expansion force signals.

Table and Figures | Reference | Related Articles | Metrics
Multi-scale parallel magnetic resonance imaging reconstruction based on variational model and Transformer
Jizhong DUAN,Haiyuan LI
Journal of ZheJiang University (Engineering Science)    2025, 59 (9): 1826-1837.   DOI: 10.3785/j.issn.1008-973X.2025.09.006
Abstract   HTML PDF (7078KB) ( 481 )  

A multi-scale parallel MRI reconstruction model based on a variational model and Transformer (VNTM) was proposed, to enhance the quality of reconstructed MR images from undersampled multi-coil MR data. First, undersampled multi-coil k-space data were used to estimate sensitivity maps, with an intermediate-stage enhancement strategy applied to improve the accuracy of these maps. Next, the undersampled multi-coil k-space data and estimated sensitivity maps were input into a variational model for reconstruction. In the variational model, resolution was reduced through a pre-processing module to reduce computational load; multi-scale features were then effectively fused through a multi-scale U-shaped network with the Transformer. Finally, a post-processing module was applied to restore resolution, and data consistency operations were performed on the output to ensure fidelity. Extensive quantitative and qualitative experiments were conducted on publicly available datasets to validate the effectiveness of the proposed method. The experimental results indicate that the proposed reconstruction model achieves superior reconstruction quality and more stable performance in terms of peak signal-to-noise ratio, structural similarity, and visual effects. In addition, a series of ablation studies and robustness evaluations with varying auto-calibration signal (ACS) region sizes were carried out, confirming that VNTM maintained consistently high reconstruction performance under diverse conditions.

Table and Figures | Reference | Related Articles | Metrics
LLC resonant three port DC-DC converter and its decoupling control
Ziyu WANG,Jianjiang SHI
Journal of ZheJiang University (Engineering Science)    2025, 59 (6): 1322-1332.   DOI: 10.3785/j.issn.1008-973X.2025.06.023
Abstract   HTML PDF (7603KB) ( 479 )  

A LLC resonant three port DC-DC converter with integrated photovoltaic and storage design and its advanced control strategy were proposed, for the application requirements of solar powered UAV’s energy manager. Firstly, time-domain analysis was used to analyze the multiple operating modes of the resonant tank of the three-port converter under different power transmission modes. Phase shift control was used to achieve the flexible power control among the three ports. Secondly, polynomial approximation was used to fit the gain surface obtained from time-domain analysis to obtain an accurate mathematical expression for the gain characteristics of the converter. On this basis, a decoupling control strategy was proposed. The design of the decoupling loop could effectively reduce the power coupling degree between multiple control loops of the three-port converter and optimize its dynamic performance. Finally, a 500 W experimental prototype was built, to verify the steady-state operating characteristics, dynamic mode switching process, and decoupling loop design of the three-port topology. The experimental results verified that the time-domain analysis method could accurately describe the circuit characteristics, and the decoupling loop could effectively reduce the degree of power coupling between control loops and improve the dynamic response performance of the system.

Table and Figures | Reference | Related Articles | Metrics
Vehicle multimodal trajectory prediction model based on spatio-temporal graph attention network
Wenqiang CHEN,Dongdan WANG,Wenying ZHU,Yongjie WANG,Tao WANG
Journal of ZheJiang University (Engineering Science)    2025, 59 (3): 443-450.   DOI: 10.3785/j.issn.1008-973X.2025.03.001
Abstract   HTML PDF (1204KB) ( 472 )  

A spatio-temporal graph attention network for vehicle multimodal trajectory prediction (STGAMT) was proposed to address the challenges of predicting manually-driven vehicle trajectories and investigating their impact on autonomous driving decisions. The temporal and spatial characteristics were modeled based on the historical information about the vehicle. A two-dimensional convolutional neural network was employed to identify transverse and longitudinal lane change states, which were then combined with the output from the spatio-temporal dynamic interaction module to form transverse and longitudinal motion characteristics. The Softmax function was used to determine the vehicle’s driving intention. The multi-mode trajectory output was achieved by using a GRU network based on Gaussian conditional distribution. Experimental results showed that, in short-term predictions, the STGAMT model reduced the average error by 63.8% and 41.0% compared to the other five classic models on HighD and NGSIM datasets, respectively. In long-term predictions, the STGAMT model reduced the RMSE by 62.5% and 19.1% compared to the average RMSE of the other five classic models on HighD and NGSIM datasets, respectively. Results indicated that the STGAMT model could effectively improve the accuracy of manually-driven vehicle trajectory prediction.

Table and Figures | Reference | Related Articles | Metrics
Usage prediction of shared bike based on multi-channel graph aggregation attention mechanism
Fujian WANG,Zetian ZHANG,Xiqun CHEN,Dianhai WANG
Journal of ZheJiang University (Engineering Science)    2025, 59 (9): 1986-1995.   DOI: 10.3785/j.issn.1008-973X.2025.09.022
Abstract   HTML PDF (2518KB) ( 470 )  

A prediction method based on the multi-channel graph aggregated attention mechanism was proposed, to address the challenges of limited spatial scope, insufficient spatiotemporal information capture, and low accuracy in short-term bike-sharing demand prediction. Firstly, the city was divided into multiple bike-sharing virtual stations using a flow-adjusted virtual station partitioning method according to bike flows in different areas. A dynamic adjacency matrix was constructed using the origin-destination (OD) matrix between stations to form a bike-sharing graph network structure. Next, spatial information of stations across different time periods was captured via a multi-channel graph aggregation module, which was combined with a multi-head self-attention module to capture temporal correlations. Finally, a cross-attention mechanism, along with exogenous variables, was introduced to uncover potential relationships among various variables. Experiments conducted in Shenzhen and New York demonstrated that the model significantly outperformed other deep learning methods across various time periods and regions, maintaining stable and low prediction errors. The results confirmed that the dynamic adjacency matrix and the cross-attention mechanism integrating external features could effectively enhance the prediction accuracy of shared bike usage.

Table and Figures | Reference | Related Articles | Metrics
Visual induced motion sickness estimation model based on attention mechanism
Yongqing CAI,Cheng HAN,Wei QUAN,Wudi CHEN
Journal of ZheJiang University (Engineering Science)    2025, 59 (6): 1110-1118.   DOI: 10.3785/j.issn.1008-973X.2025.06.002
Abstract   HTML PDF (1295KB) ( 461 )  

A visual induced motion sickness (VIMS) estimation model based on attention mechanism was proposed to accurately assess the degree of VIMS experienced by users when interacting with virtual products. The model was constructed upon Transformer architecture, incorporating the self-attention mechanism within temporal and spatial sequences to capture the complex interactions between temporal and spatial features. By utilizing the optical flow information and user attention information, two sub-networks of motion flow and attention flow were designed to form a dual-flow network structure. The motion flow sub-network was responsible for capturing the motion features in the visual content, and the attention flow sub-network focused on extracting critical information, such as objects, textures, and other key elements within the user’s attention area. A late fusion strategy was employed to effectively combine the outputs of the dual-flow network. Experimental validation conducted on public video datasets demonstrated that the synergistic interaction between the attention flow sub-network and the Transformer architecture significantly enhanced the model accuracy. The VIMS model achieved optimal results in terms of the F1 score, accuracy and precision with values of 0.8468, 89.19% and 92.28%, respectively, representing a notable advancement over existing approaches.

Table and Figures | Reference | Related Articles | Metrics
Multi-goal multi-agent path finding algorithm
Jing ZHANG,Yi WANG,Zilong CHEN,Yunsong LI
Journal of ZheJiang University (Engineering Science)    2025, 59 (8): 1689-1697.   DOI: 10.3785/j.issn.1008-973X.2025.08.016
Abstract   HTML PDF (1143KB) ( 457 )  

A multi-goal multi-agent path planning algorithm was proposed to realize the efficient assignment of tasks to each agent and plan the shortest possible paths for the agents without collision with other agents. The definition of conflict between agents in continuous time and the way of conflict resolution were defined, and the concepts of safety interval and labeling were introduced based on A* algorithm aiming at the problem of low success rate due to the use of discrete time in traditional path planning algorithms. Then the A* algorithm can plan optimal paths that satisfy continuous time constraints. A conflict hierarchical strategy was proposed to reduce the number of nodes extended in the algorithm solving process aiming at the large amount of computation caused by collision detection and conflict avoidance in the multi-agent path planning problem. The experimental results show that the proposed algorithm can solve a better solution and has better applicability, with lower total path cost and higher success rate in the scenario of densely distributed agents.

Table and Figures | Reference | Related Articles | Metrics
Lightweight YOLOv5s-OCG rail sleeper crack detection algorithm
Chaoqun DONG,Zhan WANG,Ping LIAO,Shuai XIE,Yujie RONG,Jingsong ZHOU
Journal of ZheJiang University (Engineering Science)    2025, 59 (9): 1838-1845.   DOI: 10.3785/j.issn.1008-973X.2025.09.007
Abstract   HTML PDF (1980KB) ( 411 )  

An improved YOLOv5s sleeper crack target detection algorithm was proposed, in response to the safety hazards posed by the increasing number of crack defects in high-speed rail sleepers due to extended service life, as well as the issues of missed and false detections of surface fine cracks in high-speed rail sleepers. In the backbone network of the YOLOv5s algorithm, the full-dimensional dynamic convolution based on the multi-dimensional attention mechanism was used instead of the traditional convolution to enhance the overall feature extraction ability of the network and improve the detection accuracy of fine cracks. An improved lightweight C3 structure was proposed based on the ConvNeXt module and depth-separable convolution to compress the model volume and accelerate the convergence of the network to improve the detection efficiency. The scale-optimized weighted GFPN feature fusion network was used to solve the problem of detail feature loss in the sampling process of small targets at multiple scales. The improved YOLOv5s sleeper crack target detection algorithm could solve the problem of missed detection of fine cracks on the sleeper surface effectively. The experimental results showed that the parameter count of the improved algorithm model was decreased by 19.7%, the accuracy rate, recall rate and mean average precision were increased by 1.8, 2.4 and 4.2 percentage points respectively, and the detection speed was up to 96 frames per second. The results verify that the proposed lightweight YOLOv5s-OCG algorithm model provides an effective solution for the real-time detection of surface cracks on sleepers.

Table and Figures | Reference | Related Articles | Metrics