Most Downloaded Articles

Published in last 1 year | In last 2 years| In last 3 years| All| Most Downloaded in Recent Month | Most Downloaded in Recent Year|

In last 2 years
Please wait a minute...
Code development and verification for weak coupling of seepage-stress based on TOUGH2 and FLAC3D
Xia-lin LIU,Sheng-bin ZHANG,Quan CHEN,Heng SHU,Shang-ge LIU
Journal of ZheJiang University (Engineering Science)    2022, 56 (8): 1485-1494.   DOI: 10.3785/j.issn.1008-973X.2022.08.002
Abstract   HTML PDF (1589KB) ( 713 )  

Traditional and new geotechnical engineering problems such as compressed air energy storage, intercepting water with compressed air, carbon dioxide sequestration and oil and gas underground reserve project are all involving air-water two-phase flow and stress coupling problems. For this engineering reality, based on the weak coupling theory of gas-water two-phase seepage and stress in unsaturated soil, a air-water two-phase percolation-stress coupling calculation program based on coupled TOUGH2 and FLAC3D was developed. The calculation program can simulate real air-water two phase flow, and can investigate the gas-water interaction of seepage process. The calculation program considers the direct interaction between gas-water two-phase seepage and soil skeleton deformation, reflects the process of porosity, permeability, capillary pressure and the change of soil physical and mechanical parameters, and achieve a more perfect gas-water two-phase seepage-stress coupling analysis. Furthermore, by comparing with classical drainage test and model test, it is verified that the program can accurately simulate the gas-water two-phase flow-stress interaction.

Table and Figures | Reference | Related Articles | Metrics
Compound fault decoupling diagnosis method based on improved Transformer
Yu-xiang WANG,Zhi-wei ZHONG,Peng-cheng XIA,Yi-xiang HUANG,Cheng-liang LIU
Journal of ZheJiang University (Engineering Science)    2023, 57 (5): 855-864.   DOI: 10.3785/j.issn.1008-973X.2023.05.001
Abstract   HTML PDF (2584KB) ( 631 )  

Most of the compound fault diagnosis methods regard the compound fault as a new single fault type, ignoring the interaction of internal single faults, and the fault analysis is vague in granularity and poor in interpretation. An improved Transformer-based compound fault decoupling diagnosis method was proposed for industrial environments with very little compound fault data. The diagnosis process included pre-processing, feature extraction and fault decoupling. With introducing the decoder of the Transformer, the cross-attention mechanism enables each single fault label to adaptively in the extracted feature layer focus on the discriminative feature region corresponding to the fault feature and predicts the output probability to achieve compound fault decoupling. Compound fault tests were designed to verify the effectiveness of the method compared with the advanced algorithms. The results showed that the proposed method had high diagnostic accuracy with a small number of single fault training samples and a very small number of compound fault training samples. The compound fault diagnosis accuracy reached 88.29% when the training set contained only 5 compound fault samples. Thus the new method has a significant advantage over other methods.

Table and Figures | Reference | Related Articles | Metrics
Surface defect detection algorithm of electronic components based on improved YOLOv5
Yao ZENG,Fa-qin GAO
Journal of ZheJiang University (Engineering Science)    2023, 57 (3): 455-465.   DOI: 10.3785/j.issn.1008-973X.2023.03.003
Abstract   HTML PDF (1697KB) ( 464 )  

For the poor real-time detection capability of the current object detection model in the production environment of electronic components, GhostNet was used to replace the backbone network of YOLOv5. And for the existence of small objects and objects with large scale changes on the surface defects of electronic components, a coordinate attention module was added to the YOLOv5 backbone network, which enhanced the sensory field while avoiding the consumption of large computational resources. The coordinate information was embedded into the channel attention to improve the object localization of the model. The feature pyramid networks (FPN) structure in the YOLOv5 feature fusion module was replaced with a weighted bi-directional feature pyramid network structure, to enhance the fusion capability of multi-scale weighted features. Experimental results on the self-made defective electronic component dataset showed that the improved GCB-YOLOv5 model achieved an average accuracy of 93% and an average detection time of 33.2 ms, which improved the average accuracy by 15.0% and the average time by 7 ms compared with the original YOLOv5 model. And the improved model can meet the requirements of both accuracy and speed of electronic component surface defect detection.

Table and Figures | Reference | Related Articles | Metrics
New method for news recommendation based on Transformer and knowledge graph
Li-zhou FENG,Yang YANG,You-wei WANG,Gui-jun YANG
Journal of ZheJiang University (Engineering Science)    2023, 57 (1): 133-143.   DOI: 10.3785/j.issn.1008-973X.2023.01.014
Abstract   HTML PDF (1590KB) ( 457 )  

A news recommendation method based on Transformer and knowledge graph was proposed to increase the auxiliary information and improve the prediction accuracy. The self-attention mechanism was used to obtain the connection between news words and news entities in order to combine news semantic information and entity information. The additive attention mechanism was employed to capture the influence of words and entities on news representation. Transformer was introduced to pick up the correlation information between clicked news of user and capture the change of user interest over time by considering the time-series characteristics of user preference for news. High-order structural information in knowledge graphs was used to fuse adjacent entities of the candidate news and enhance the integrity of the information contained in the candidate news embedding vector. The comparison experiments with five typical recommendation methods on two versions of the MIND news dataset show that the introduction of attention mechanism, Transformer and knowledge graph can improve the performance of the algorithm on news recommendation.

Table and Figures | Reference | Related Articles | Metrics
Multi-agent pursuit and evasion games based on improved reinforcement learning
Ya-li XUE,Jin-ze YE,Han-yan LI
Journal of ZheJiang University (Engineering Science)    2023, 57 (8): 1479-1486.   DOI: 10.3785/j.issn.1008-973X.2023.08.001
Abstract   HTML PDF (1158KB) ( 423 )  

A multi-agent reinforcement learning algorithm based on priority experience replay and decomposed reward function was proposed in multi-agent pursuit and evasion games. Firstly, multi-agent twin delayed deep deterministic policygradient algorithm (MATD3) algorithm based on multi-agent deep deterministic policy gradient algorithm (MADDPG) and twin delayed deep deterministic policy gradient algorithm (TD3) was proposed. Secondly, the priority experience replay was proposed to determine the priority of experience and sample the experience with high reward, aiming at the problem that the reward function is almost sparse in the multi-agent pursuit and evasion problem. In addition, a decomposed reward function was designed to divide multi-agent rewards into individual rewards and joint rewards to maximize the global and local rewards. Finally, a simulation experiment was designed based on DEPER-MATD3. Comparison with other algorithms showed that DEPER-MATD3 algorithm solved the over-estimation problem, and the time consumption was improved compared with MATD3 algorithm. In the decomposed reward function environment, the global mean rewards of the pursuers were improved, and the pursuers had a greater probability of chasing the evader.

Table and Figures | Reference | Related Articles | Metrics
Calculation and prediction of flue gas residence time from CFB municipal solid waste incinerator
Xiao-qing LIN,Yu-xuan YING,Hong YU,Xiao-dong LI,Jian-hua YAN
Journal of ZheJiang University (Engineering Science)    2022, 56 (8): 1578-1587.   DOI: 10.3785/j.issn.1008-973X.2022.08.012
Abstract   HTML PDF (1591KB) ( 414 )  

Ensuring that the flue gas in the furnace stays within the temperature range of no less than 850 ℃ for at least 2 s contributes to the steady municipal solid waste (MSW) incineration, and the reduction of secondary pollution. However, at present, it is difficult to quantitatively calculate and predict the residence time of flue gas in the high temperature area by only using the thermocouple for qualitative evaluation. Based on the thermodynamic calculation, correlation analysis of practical operation parameters, and a variety of machine learning algorithms (backpropagation neural network, recurrent neural network, and random forest regression), the residence time of flue gas in high-temperature areas (>850 ℃) was calculated, correlation analysis of key operation parameters was conducted, and the prediction model of residence time was constructed, aiming at a typical MSW circulating fluidized bed boiler in China. Results revealed that 10 key operating parameters, e.g. section temperature of the furnace, temperature and pressure of primary air and secondary air, etc., had a strong correlation and predictability with the high-temperature flue gas residence time. Moreover, the model of the recurrent neural network was relatively optimal, with a higher fitting degree and accuracy. Specifically, the mean square error (MSE) was 0.11626, and the average absolute error between the predicted value and real value was 1.174%. Research enabled the prediction of flue gas temperature variation in high-temperature areas, helped optimize the MSW incineration, and contributed to the advanced control of pollutant emission reduction.

Table and Figures | Reference | Related Articles | Metrics
Structured image super-resolution network based on improved Transformer
Xin-dong LV,Jiao LI,Zhen-nan DENG,Hao FENG,Xin-tong CUI,Hong-xia DENG
Journal of ZheJiang University (Engineering Science)    2023, 57 (5): 865-874.   DOI: 10.3785/j.issn.1008-973X.2023.05.002
Abstract   HTML PDF (1744KB) ( 412 )  

Most of existing structural image super-resolution reconstruction algorithms can only solve a specific single type of structural image super-resolution problem. A structural image super-resolution network based on improved Transformer (TransSRNet) was proposed. The network used the self-attention mechanism of Transformer mine a wide range of global information in spatial sequences. A spatial attention unit was built by using the hourglass block structure. The mapping relationship between the low-resolution space and the high-resolution space in the local area was concerned. The structured information in the image mapping process was extracted. The channel attention module was used to fuse the features of the self-attention module and the spatial attention module. The TransSRNet was evaluated on highly-structured CelebA, Helen, TCGA-ESCA and TCGA-COAD datasets. Results of evaluation showed that the TransSRNet model had a better overall performance compared with the super-resolution algorithms. With a upscale factor of 8, the PSNR of the face dataset and the medical image dataset could reach 28.726 and 26.392 dB respectively, and the SSIM could reach 0.844 and 0.881 respectively.

Table and Figures | Reference | Related Articles | Metrics
Survey on program representation learning
Jun-chi MA,Xiao-xin DI,Zong-tao DUAN,Lei TANG
Journal of ZheJiang University (Engineering Science)    2023, 57 (1): 155-169.   DOI: 10.3785/j.issn.1008-973X.2023.01.016
Abstract   HTML PDF (1100KB) ( 403 )  

There has been a trend of intelligent development using artificial intelligence technology in order to improve the efficiency of software development. It is important to understand program semantics to support intelligent development. A series of research work on program representation learning has emerged to solve the problem. Program representation learning can automatically learn useful features from programs and represent the features as low-dimensional dense vectors in order to efficiently extract program semantic and apply it to corresponding downstream tasks. A comprehensive review to categorize and analyze existing research work of program representation learning was provided. The mainstream models for program representation learning were introduced, including the frameworks based on graph structure and token sequence. Then the applications of program representation learning technology in defect detection, defect localization, code completion and other tasks were described. The common toolsets and benchmarks for program representation learning were summarized. The challenges for program representation learning in the future were analyzed.

Table and Figures | Reference | Related Articles | Metrics
Improved method for blockchain Kademlia network based on small world theory
Yue ZHAO,He ZHAO,Haibo TAN,Bin YU,Wangnian YU,Zhiyu MA
Journal of ZheJiang University (Engineering Science)    2024, 58 (1): 1-9.   DOI: 10.3785/j.issn.1008-973X.2024.01.001
Abstract   HTML PDF (1194KB) ( 384 )  

An improved method for the blockchain Kademlia network based on small world theory was proposed aiming at the issue of sacrificing security to improve scalability in the current research of the blockchain Kademlia network. The idea of the small world theory was followed, and a probability formula for replacing expansion nodes was proposed. The probability was inversely proportional to the distance between nodes. The number of node replacements and additional nodes could be flexibly adjusted according to actual conditions. The theoretical analysis and experimental verification demonstrate that the network transformed by this method can reach a stable state. The experimental results showed that the transmission hierarchy required for broadcasting transaction messages throughout the network was reduced by 15.0% to 30.8% and the rate of locating nodes was increased. The level of network structure was reduced and network security was enhanced compared to other optimization algorithms that modify the network structure.

Table and Figures | Reference | Related Articles | Metrics
Improved YOLOv3-based defect detection algorithm for printed circuit board
Bai-cheng BIAN,Tian CHEN,Ru-jun WU,Jun LIU
Journal of ZheJiang University (Engineering Science)    2023, 57 (4): 735-743.   DOI: 10.3785/j.issn.1008-973X.2023.04.011
Abstract   HTML PDF (1420KB) ( 370 )  

An AT-YOLO algorithm based on improved YOLOv3 was proposed aiming at the problem that the existing deep learning-based defect detection algorithm for printed circuit boards (PCB) could not meet the accuracy and efficiency requirements at the same time. Feature extraction capabilities were improved and the number of parameters was reduced by replacing the backbone with ResNeSt50. SPP module was added to integrate the features of different receptive fields and enrich the ability of feature representation. The PANet structure was improved to replace FPN, and the SE module was inserted to enhance the expression capability of effective feature maps. A set of high-resolution feature maps were added to the input and output in order to improve the sensitivity to small target objects, and the detection scale was increased from three to four. K-means algorithm was re-used to generate sizes of anchors in order to improve the accuracy of object detection. The experimental results showed that the AT-YOLO algorithm had an AP0.5 value of 98.42%, the number of parameters was 3.523×107, and the average detection speed was 36 frame per second on the PCB defect detection dataset, which met the requirements of accuracy and efficiency.

Table and Figures | Reference | Related Articles | Metrics
Ship detection algorithm in complex backgrounds via multi-head self-attention
Nan-jing YU,Xiao-biao FAN,Tian-min DENG,Guo-tao MAO
Journal of ZheJiang University (Engineering Science)    2022, 56 (12): 2392-2402.   DOI: 10.3785/j.issn.1008-973X.2022.12.008
Abstract   HTML PDF (1335KB) ( 352 )  

A ship object detection algorithm was proposed based on a multi-head self-attention (MHSA) mechanism and YOLO network (MHSA-YOLO), aiming at the characteristics of complex backgrounds, large differences in scale between classes and many small objects in inland rivers and ports. In the feature extraction process, a parallel self-attention residual module (PARM) based on MHSA was designed to weaken the interference of complex background information and strengthen the feature information of the ship objects. In the feature fusion process, a simplified two-way feature pyramid was developed so as to strengthen the feature fusion and representation ability. Experimental results on the Seaships dataset showed that the MHSA-YOLO method had a better learning ability, achieved 97.59% mean average precision in the aspect of object detection and was more effective compared with the state-of-the-art object detection methods. Experimental results based on a self-made dataset showed that MHSA-YOLO had strong generalization.

Table and Figures | Reference | Related Articles | Metrics
Bearing life prediction based on multi-scale features and attention mechanism
Ren-peng MO,Xiao-sheng SI,Tian-mei LI,Xu ZHU
Journal of ZheJiang University (Engineering Science)    2022, 56 (7): 1447-1456.   DOI: 10.3785/j.issn.1008-973X.2022.07.020
Abstract   HTML PDF (1709KB) ( 351 )  

A bearing RUL prediction method based on multi-scale features and attention mechanism was proposed aiming at the problem that the previous remaining useful life (RUL) prediction methods were insufficient in mining bearing degradation information and ignored the difference in the contribution of different features, which affected the prediction accuracy. Several time-domain and frequency-domain features of the original bearing vibration signal at multiple scales were calculated as the input feature set. The multi-scale feature set was input into the network, and the attention module was used to adaptively assign the best weights to different features. Then the convolutional neural network (CNN) module was used for deep feature extraction and multi-scale feature fusion. The RUL prediction value was obtained through the feedforward neural network (FNN) module mapping. The proposed method was applied to the public bearing datasets for comparative studies. Results showed the superior prediction performance of the proposed method.

Table and Figures | Reference | Related Articles | Metrics
Driver fatigue state detection method based on multi-feature fusion
Hao-jie FANG,Hong-zhao DONG,Shao-xuan LIN,Jian-yu LUO,Yong FANG
Journal of ZheJiang University (Engineering Science)    2023, 57 (7): 1287-1296.   DOI: 10.3785/j.issn.1008-973X.2023.07.003
Abstract   HTML PDF (1481KB) ( 350 )  

The improved YOLOv5 object detection algorithm was used to detect the facial region of the driver and a multi-feature fusion fatigue state detection method was established aiming at the problem that existing fatigue state detection method cannot be applied to drivers under the epidemic prevention and control. The image tag data including the situation of wearing a mask and the situation without wearing a mask were established according to the characteristics of bus driving. The detection accuracy of eyes, mouth and face regions was improved by increasing the feature sampling times of YOLOv5 model. The BiFPN network structure was used to retain multi-scale feature information, which makes the prediction network more sensitive to targets of different sizes and improves the detection ability of the overall model. A parameter compensation mechanism was proposed combined with face keypoint algorithm in order to improve the accuracy of blink and yawn frame number. A variety of fatigue parameters were fused and normalized to conduct fatigue classification. The results of the public dataset NTHU and the self-made dataset show that the proposed method can recognize the blink and yawn of drivers both with and without masks, and can accurately judge the fatigue state of drivers.

Table and Figures | Reference | Related Articles | Metrics
Fire detection algorithm based on improved GhostNet-FCOS
Rong ZHANG,Wei ZHANG
Journal of ZheJiang University (Engineering Science)    2022, 56 (10): 1891-1899.   DOI: 10.3785/j.issn.1008-973X.2022.10.001
Abstract   HTML PDF (2209KB) ( 348 )  

A fire detection algorithm based on improved GhostNet-FCOS was proposed in view of the low detection accuracy and high complexity of existing fire detection algorithms. The algorithm was based on FCOS with reduced channel dimensions, and GhostNet was selected as the feature extraction network to implement a lightweight fire detection algorithm. Dynamic convolution was introduced to optimize the basic modules of the backbone without increasing width and depth, resulting in improved feature extraction ability for variable flames. A spatial attention module was introduced into the backbone network in order to optimize the expression of network spatial features. The definition of positive and negative samples and the regression loss function were improved to optimize the network’s attention to different areas in the ground truth box during the training process. The experimental results in self-built fire dataset and public dataset show that the algorithm has advantages in detection accuracy and model complexity. The detection accuracy of the algorithm in the self-built fire dataset was 90.9%, the amount of parameter was 4.58×106, and the floating point operation was 31.45×109.

Table and Figures | Reference | Related Articles | Metrics
Adaptive salp swarm algorithm for solving flexible job shop scheduling problem with transportation time
Hao-yi NIU,Wei-min WU,Ting-qi ZHANG,Wei SHEN,Tao ZHANG
Journal of ZheJiang University (Engineering Science)    2023, 57 (7): 1267-1277.   DOI: 10.3785/j.issn.1008-973X.2023.07.001
Abstract   HTML PDF (1024KB) ( 344 )  

An adaptive salp swarm algorithm was proposed by minimizing the makespan in order to solve the flexible job shop scheduling problem with transportation time. A three-layer coding scheme was designed based on random key in order to make the discrete solution space continuous. The inertia weight was introduced to evaluate the influence among followers in order to enhance the global exploration and local search performance of the algorithm. An adaptive leader-follower population update strategy was proposed, and the number of leaders and followers was adjusted by the population status. The tabu search strategy was combined with the neighborhood search in order to prevent the algorithm from falling into local optimum. The benchmark instances verified the effectiveness and superiority of the proposed algorithm. The influence of the number of AGVs on the makespan conforms to the law of diminishing marginal effect.

Table and Figures | Reference | Related Articles | Metrics
SQL generation from natural language queries with complex calculations on financial data
Jia-hao HE,Xi-ping LIU,Qing SHU,Chang-xuan WAN,De-xi LIU,Guo-qiong LIAO
Journal of ZheJiang University (Engineering Science)    2023, 57 (2): 277-286.   DOI: 10.3785/j.issn.1008-973X.2023.02.008
Abstract   HTML PDF (739KB) ( 329 )  

The problem of structured query language (SQL) generation from natural language queries (Text-to-SQL) in financial domain was investigated. First, SOFT, a Text-to-SQL dataset in the financial domain was constructed. The dataset covered common queries in the financial domain with distinctive features and presented challenges to Text-to-SQL research. Then, FinSQL, a Text-to-SQL model, which optimized the support for complex queries in the financial domain, was proposed. In particular, by analyzing the characteristics of row calculation queries, a class of queries with complex numerical calculations, a divide-and-conquer based method was proposed. A row calculation query was divided into several subqueries, the SQL statement for each subquery was generated, and the SQL statements were finally combined into together to get the SQL statement for the original query. Experimental results on SOFT dataset show that the proposed FinSQL model outperforms existing methods for the hard queries, and performs well for row calculation queries.

Table and Figures | Reference | Related Articles | Metrics
Βearing performance of integrated cutter holder structure suitable for robot cutter change
Yi-min XIA,Yu-hang LANG,Zhi-yong JI,Yong REN
Journal of ZheJiang University (Engineering Science)    2023, 57 (2): 392-403.   DOI: 10.3785/j.issn.1008-973X.2023.02.018
Abstract   HTML PDF (1536KB) ( 319 )  

The loading state of integrated cutter holder system was analyzed by combining numerical simulation with experimental research, in order to improve the bearing performance of integrated cutter holder system. Combined with the automatic assembly process of cutters, the linkage relationship between each structural parameter in integrated cutter holder system and the influence of different structural parameters on its bearing performance were studied. The structural parameters significantly affecting the bearing performance of integrated cutter holder system were optimized based on the weight matrix method of orthogonal test. Results showed that the influence degree of each test factor on the bearing performance of integrated cutter holder system in descending order was as follows: the neck fillet radius of rotating block, the width of rotating block, the vertical distance between cutter shaft and rotating block shaft. The optimal scheme for comprehensive performance of integrated cutter holder system was obtained as follows: the width of rotating block was 107.5 mm, the neck fillet radius of rotating block was 60.0 mm, and the vertical distance between cutter shaft and rotating block shaft was 97.5 mm. Compared with the original scheme of integrated cutter holder system, the overall maximum deformation was reduced by 11.31%, the maximum stress of end cap was reduced by 34.07%, and the maximum stress of rotating block was reduced by 41.01%.

Table and Figures | Reference | Related Articles | Metrics
Task allocation method for Internet of vehicles spatial crowdsourcing with privacy protection
Xue-jiao LIU,Hui-min WANG,Ying-jie XIA,Si-wei ZHAO
Journal of ZheJiang University (Engineering Science)    2022, 56 (7): 1267-1275.   DOI: 10.3785/j.issn.1008-973X.2022.07.001
Abstract   HTML PDF (1158KB) ( 317 )  

A task allocation method for Internet of vehicles spatial crowdsourcing with privacy protection was proposed under the blockchain architecture in order to solve the problem that centralized spatial crowdsourcing server in the traditional spatial crowdsourcing of Internet of vehicles was untrusted and vulnerable to malicious attacks, which posed a great threat to users’ privacy. A distributed and trusted spatial crowdsourcing system of Internet of vehicles was designed based on the blockchain technology. The multi-key homomorphic encryption algorithm was adopted to distribute tasks, which supported task allocation of location ciphertext data of different vehicle users (keys). Then the possibility of privacy disclosure of vehicle users was reduced. The experimental results show that the proposed method can effectively protect users’ privacy information, reduce the computing overhead of task allocation by 34.3% compared with the existing methods, and improve the efficiency of task allocation.

Table and Figures | Reference | Related Articles | Metrics
Optimization method of CNC milling parameters based on deep reinforcement learning
Qi-lin DENG,Juan LU,Yong-hui CHEN,Jian FENG,Xiao-ping LIAO,Jun-yan MA
Journal of ZheJiang University (Engineering Science)    2022, 56 (11): 2145-2155.   DOI: 10.3785/j.issn.1008-973X.2022.11.005
Abstract   HTML PDF (1928KB) ( 311 )  

A deep reinforcement learning-based optimization method for CNC milling machining parameters was proposed to improve the machine tool effectiveness and the machining efficiency in CNC machining, and the applicability of deep reinforcement learning to machining parameters optimization problems was explored. The combined cutting force and material removal rate were selected as the optimization objectives of effectiveness and efficiency. The optimization function of combined cutting force and milling parameters were constructed using genetic algorithm optimization back propagation neural network (GA-BPNN) and the optimization function of material removal rate was established using empirical formulas. The competing network architecture (Dueling DQN) algorithm was applied to obtain Pareto frontier for combined cutting force and material removal rate multi-objective optimization and the decision solution was selected from Pareto frontier by combining the superior-inferior solution distance method and the entropy value method. The effectiveness of the Dueling DQN algorithm for machining parameter optimization was verified based on milling tests on 45 steel. Compared with the empirically selected machining parameters, the machining solution obtained by Dueling DQN optimization resulted in 8.29% reduction of combined cutting force and 4.95% improvement of machining efficiency, which provided guidance for the multi-objective optimization method of machining parameters and the selection of machining parameters.

Table and Figures | Reference | Related Articles | Metrics
Multimodal image retrieval model based on semantic-enhanced feature fusion
Fan YANG,Bo NING,Huai-qing LI,Xin ZHOU,Guan-yu LI
Journal of ZheJiang University (Engineering Science)    2023, 57 (2): 252-258.   DOI: 10.3785/j.issn.1008-973X.2023.02.005
Abstract   HTML PDF (928KB) ( 308 )  

A multimodal image retrieval model based on semantic-enhanced feature fusion (SEFM) was proposed to establish the correlation between text features and image features in multimodal image retrieval tasks. Semantic enhancement was conducted on the combined features during feature fusion by two proposed modules including the text semantic enhancement module and the image semantic enhancement module. Firstly, to enhance the text semantics, a multimodal dual attention mechanism was established in the text semantic enhancement module, which associated the multimodal correlation between text and image. Secondly, to enhance the image semantics, the retain intensity and update intensity were introduced in the image semantic enhancement module, which controlled the retaining and updating degrees of the query image features in combined features. Based on the above two modules, the combined features can be optimized, and be closer to the target image features. In the experiment part, the SEFM model was evaluated on MIT-States and Fashion IQ datasets, and experimental results show that the proposed model performs better than the existing works on recall and precision metrics.

Table and Figures | Reference | Related Articles | Metrics