Most Downloaded Articles

Published in last 1 year | In last 2 years| In last 3 years| All| Most Downloaded in Recent Month | Most Downloaded in Recent Year|

In last 2 years
Please wait a minute...
Survey of deep learning based EEG data analysis technology
Bo ZHONG,Pengfei WANG,Yiqiao WANG,Xiaoling WANG
Journal of ZheJiang University (Engineering Science)    2024, 58 (5): 879-890.   DOI: 10.3785/j.issn.1008-973X.2024.05.001
Abstract   HTML PDF (690KB) ( 858 )  

A thorough analysis and cross-comparison of recent relevant works was provided, outlining a closed-loop process for EEG data analysis based on deep learning. EEG data were introduced, and the application of deep learning in three key stages: preprocessing, feature extraction, and model generalization was unfolded. The research ideas and solutions provided by deep learning algorithms in the respective stages were delineated, including the challenges and issues encountered at each stage. The main contributions and limitations of different algorithms were comprehensively summarized. The challenges faced and future directions of deep learning technology in handling EEG data at each stage were discussed.

Table and Figures | Reference | Related Articles | Metrics
Compound fault decoupling diagnosis method based on improved Transformer
Yu-xiang WANG,Zhi-wei ZHONG,Peng-cheng XIA,Yi-xiang HUANG,Cheng-liang LIU
Journal of ZheJiang University (Engineering Science)    2023, 57 (5): 855-864.   DOI: 10.3785/j.issn.1008-973X.2023.05.001
Abstract   HTML PDF (2584KB) ( 741 )  

Most of the compound fault diagnosis methods regard the compound fault as a new single fault type, ignoring the interaction of internal single faults, and the fault analysis is vague in granularity and poor in interpretation. An improved Transformer-based compound fault decoupling diagnosis method was proposed for industrial environments with very little compound fault data. The diagnosis process included pre-processing, feature extraction and fault decoupling. With introducing the decoder of the Transformer, the cross-attention mechanism enables each single fault label to adaptively in the extracted feature layer focus on the discriminative feature region corresponding to the fault feature and predicts the output probability to achieve compound fault decoupling. Compound fault tests were designed to verify the effectiveness of the method compared with the advanced algorithms. The results showed that the proposed method had high diagnostic accuracy with a small number of single fault training samples and a very small number of compound fault training samples. The compound fault diagnosis accuracy reached 88.29% when the training set contained only 5 compound fault samples. Thus the new method has a significant advantage over other methods.

Table and Figures | Reference | Related Articles | Metrics
Multi-agent pursuit and evasion games based on improved reinforcement learning
Ya-li XUE,Jin-ze YE,Han-yan LI
Journal of ZheJiang University (Engineering Science)    2023, 57 (8): 1479-1486.   DOI: 10.3785/j.issn.1008-973X.2023.08.001
Abstract   HTML PDF (1158KB) ( 701 )  

A multi-agent reinforcement learning algorithm based on priority experience replay and decomposed reward function was proposed in multi-agent pursuit and evasion games. Firstly, multi-agent twin delayed deep deterministic policygradient algorithm (MATD3) algorithm based on multi-agent deep deterministic policy gradient algorithm (MADDPG) and twin delayed deep deterministic policy gradient algorithm (TD3) was proposed. Secondly, the priority experience replay was proposed to determine the priority of experience and sample the experience with high reward, aiming at the problem that the reward function is almost sparse in the multi-agent pursuit and evasion problem. In addition, a decomposed reward function was designed to divide multi-agent rewards into individual rewards and joint rewards to maximize the global and local rewards. Finally, a simulation experiment was designed based on DEPER-MATD3. Comparison with other algorithms showed that DEPER-MATD3 algorithm solved the over-estimation problem, and the time consumption was improved compared with MATD3 algorithm. In the decomposed reward function environment, the global mean rewards of the pursuers were improved, and the pursuers had a greater probability of chasing the evader.

Table and Figures | Reference | Related Articles | Metrics
Surface defect detection algorithm of electronic components based on improved YOLOv5
Yao ZENG,Fa-qin GAO
Journal of ZheJiang University (Engineering Science)    2023, 57 (3): 455-465.   DOI: 10.3785/j.issn.1008-973X.2023.03.003
Abstract   HTML PDF (1697KB) ( 642 )  

For the poor real-time detection capability of the current object detection model in the production environment of electronic components, GhostNet was used to replace the backbone network of YOLOv5. And for the existence of small objects and objects with large scale changes on the surface defects of electronic components, a coordinate attention module was added to the YOLOv5 backbone network, which enhanced the sensory field while avoiding the consumption of large computational resources. The coordinate information was embedded into the channel attention to improve the object localization of the model. The feature pyramid networks (FPN) structure in the YOLOv5 feature fusion module was replaced with a weighted bi-directional feature pyramid network structure, to enhance the fusion capability of multi-scale weighted features. Experimental results on the self-made defective electronic component dataset showed that the improved GCB-YOLOv5 model achieved an average accuracy of 93% and an average detection time of 33.2 ms, which improved the average accuracy by 15.0% and the average time by 7 ms compared with the original YOLOv5 model. And the improved model can meet the requirements of both accuracy and speed of electronic component surface defect detection.

Table and Figures | Reference | Related Articles | Metrics
New method for news recommendation based on Transformer and knowledge graph
Li-zhou FENG,Yang YANG,You-wei WANG,Gui-jun YANG
Journal of ZheJiang University (Engineering Science)    2023, 57 (1): 133-143.   DOI: 10.3785/j.issn.1008-973X.2023.01.014
Abstract   HTML PDF (1590KB) ( 589 )  

A news recommendation method based on Transformer and knowledge graph was proposed to increase the auxiliary information and improve the prediction accuracy. The self-attention mechanism was used to obtain the connection between news words and news entities in order to combine news semantic information and entity information. The additive attention mechanism was employed to capture the influence of words and entities on news representation. Transformer was introduced to pick up the correlation information between clicked news of user and capture the change of user interest over time by considering the time-series characteristics of user preference for news. High-order structural information in knowledge graphs was used to fuse adjacent entities of the candidate news and enhance the integrity of the information contained in the candidate news embedding vector. The comparison experiments with five typical recommendation methods on two versions of the MIND news dataset show that the introduction of attention mechanism, Transformer and knowledge graph can improve the performance of the algorithm on news recommendation.

Table and Figures | Reference | Related Articles | Metrics
Survey on program representation learning
Jun-chi MA,Xiao-xin DI,Zong-tao DUAN,Lei TANG
Journal of ZheJiang University (Engineering Science)    2023, 57 (1): 155-169.   DOI: 10.3785/j.issn.1008-973X.2023.01.016
Abstract   HTML PDF (1100KB) ( 535 )  

There has been a trend of intelligent development using artificial intelligence technology in order to improve the efficiency of software development. It is important to understand program semantics to support intelligent development. A series of research work on program representation learning has emerged to solve the problem. Program representation learning can automatically learn useful features from programs and represent the features as low-dimensional dense vectors in order to efficiently extract program semantic and apply it to corresponding downstream tasks. A comprehensive review to categorize and analyze existing research work of program representation learning was provided. The mainstream models for program representation learning were introduced, including the frameworks based on graph structure and token sequence. Then the applications of program representation learning technology in defect detection, defect localization, code completion and other tasks were described. The common toolsets and benchmarks for program representation learning were summarized. The challenges for program representation learning in the future were analyzed.

Table and Figures | Reference | Related Articles | Metrics
Driver fatigue state detection method based on multi-feature fusion
Hao-jie FANG,Hong-zhao DONG,Shao-xuan LIN,Jian-yu LUO,Yong FANG
Journal of ZheJiang University (Engineering Science)    2023, 57 (7): 1287-1296.   DOI: 10.3785/j.issn.1008-973X.2023.07.003
Abstract   HTML PDF (1481KB) ( 531 )  

The improved YOLOv5 object detection algorithm was used to detect the facial region of the driver and a multi-feature fusion fatigue state detection method was established aiming at the problem that existing fatigue state detection method cannot be applied to drivers under the epidemic prevention and control. The image tag data including the situation of wearing a mask and the situation without wearing a mask were established according to the characteristics of bus driving. The detection accuracy of eyes, mouth and face regions was improved by increasing the feature sampling times of YOLOv5 model. The BiFPN network structure was used to retain multi-scale feature information, which makes the prediction network more sensitive to targets of different sizes and improves the detection ability of the overall model. A parameter compensation mechanism was proposed combined with face keypoint algorithm in order to improve the accuracy of blink and yawn frame number. A variety of fatigue parameters were fused and normalized to conduct fatigue classification. The results of the public dataset NTHU and the self-made dataset show that the proposed method can recognize the blink and yawn of drivers both with and without masks, and can accurately judge the fatigue state of drivers.

Table and Figures | Reference | Related Articles | Metrics
Improved method for blockchain Kademlia network based on small world theory
Yue ZHAO,He ZHAO,Haibo TAN,Bin YU,Wangnian YU,Zhiyu MA
Journal of ZheJiang University (Engineering Science)    2024, 58 (1): 1-9.   DOI: 10.3785/j.issn.1008-973X.2024.01.001
Abstract   HTML PDF (1194KB) ( 493 )  

An improved method for the blockchain Kademlia network based on small world theory was proposed aiming at the issue of sacrificing security to improve scalability in the current research of the blockchain Kademlia network. The idea of the small world theory was followed, and a probability formula for replacing expansion nodes was proposed. The probability was inversely proportional to the distance between nodes. The number of node replacements and additional nodes could be flexibly adjusted according to actual conditions. The theoretical analysis and experimental verification demonstrate that the network transformed by this method can reach a stable state. The experimental results showed that the transmission hierarchy required for broadcasting transaction messages throughout the network was reduced by 15.0% to 30.8% and the rate of locating nodes was increased. The level of network structure was reduced and network security was enhanced compared to other optimization algorithms that modify the network structure.

Table and Figures | Reference | Related Articles | Metrics
Structured image super-resolution network based on improved Transformer
Xin-dong LV,Jiao LI,Zhen-nan DENG,Hao FENG,Xin-tong CUI,Hong-xia DENG
Journal of ZheJiang University (Engineering Science)    2023, 57 (5): 865-874.   DOI: 10.3785/j.issn.1008-973X.2023.05.002
Abstract   HTML PDF (1744KB) ( 493 )  

Most of existing structural image super-resolution reconstruction algorithms can only solve a specific single type of structural image super-resolution problem. A structural image super-resolution network based on improved Transformer (TransSRNet) was proposed. The network used the self-attention mechanism of Transformer mine a wide range of global information in spatial sequences. A spatial attention unit was built by using the hourglass block structure. The mapping relationship between the low-resolution space and the high-resolution space in the local area was concerned. The structured information in the image mapping process was extracted. The channel attention module was used to fuse the features of the self-attention module and the spatial attention module. The TransSRNet was evaluated on highly-structured CelebA, Helen, TCGA-ESCA and TCGA-COAD datasets. Results of evaluation showed that the TransSRNet model had a better overall performance compared with the super-resolution algorithms. With a upscale factor of 8, the PSNR of the face dataset and the medical image dataset could reach 28.726 and 26.392 dB respectively, and the SSIM could reach 0.844 and 0.881 respectively.

Table and Figures | Reference | Related Articles | Metrics
Multimodal image retrieval model based on semantic-enhanced feature fusion
Fan YANG,Bo NING,Huai-qing LI,Xin ZHOU,Guan-yu LI
Journal of ZheJiang University (Engineering Science)    2023, 57 (2): 252-258.   DOI: 10.3785/j.issn.1008-973X.2023.02.005
Abstract   HTML PDF (928KB) ( 476 )  

A multimodal image retrieval model based on semantic-enhanced feature fusion (SEFM) was proposed to establish the correlation between text features and image features in multimodal image retrieval tasks. Semantic enhancement was conducted on the combined features during feature fusion by two proposed modules including the text semantic enhancement module and the image semantic enhancement module. Firstly, to enhance the text semantics, a multimodal dual attention mechanism was established in the text semantic enhancement module, which associated the multimodal correlation between text and image. Secondly, to enhance the image semantics, the retain intensity and update intensity were introduced in the image semantic enhancement module, which controlled the retaining and updating degrees of the query image features in combined features. Based on the above two modules, the combined features can be optimized, and be closer to the target image features. In the experiment part, the SEFM model was evaluated on MIT-States and Fashion IQ datasets, and experimental results show that the proposed model performs better than the existing works on recall and precision metrics.

Table and Figures | Reference | Related Articles | Metrics
Ship detection algorithm in complex backgrounds via multi-head self-attention
Nan-jing YU,Xiao-biao FAN,Tian-min DENG,Guo-tao MAO
Journal of ZheJiang University (Engineering Science)    2022, 56 (12): 2392-2402.   DOI: 10.3785/j.issn.1008-973X.2022.12.008
Abstract   HTML PDF (1335KB) ( 450 )  

A ship object detection algorithm was proposed based on a multi-head self-attention (MHSA) mechanism and YOLO network (MHSA-YOLO), aiming at the characteristics of complex backgrounds, large differences in scale between classes and many small objects in inland rivers and ports. In the feature extraction process, a parallel self-attention residual module (PARM) based on MHSA was designed to weaken the interference of complex background information and strengthen the feature information of the ship objects. In the feature fusion process, a simplified two-way feature pyramid was developed so as to strengthen the feature fusion and representation ability. Experimental results on the Seaships dataset showed that the MHSA-YOLO method had a better learning ability, achieved 97.59% mean average precision in the aspect of object detection and was more effective compared with the state-of-the-art object detection methods. Experimental results based on a self-made dataset showed that MHSA-YOLO had strong generalization.

Table and Figures | Reference | Related Articles | Metrics
Optimization method of CNC milling parameters based on deep reinforcement learning
Qi-lin DENG,Juan LU,Yong-hui CHEN,Jian FENG,Xiao-ping LIAO,Jun-yan MA
Journal of ZheJiang University (Engineering Science)    2022, 56 (11): 2145-2155.   DOI: 10.3785/j.issn.1008-973X.2022.11.005
Abstract   HTML PDF (1928KB) ( 448 )  

A deep reinforcement learning-based optimization method for CNC milling machining parameters was proposed to improve the machine tool effectiveness and the machining efficiency in CNC machining, and the applicability of deep reinforcement learning to machining parameters optimization problems was explored. The combined cutting force and material removal rate were selected as the optimization objectives of effectiveness and efficiency. The optimization function of combined cutting force and milling parameters were constructed using genetic algorithm optimization back propagation neural network (GA-BPNN) and the optimization function of material removal rate was established using empirical formulas. The competing network architecture (Dueling DQN) algorithm was applied to obtain Pareto frontier for combined cutting force and material removal rate multi-objective optimization and the decision solution was selected from Pareto frontier by combining the superior-inferior solution distance method and the entropy value method. The effectiveness of the Dueling DQN algorithm for machining parameter optimization was verified based on milling tests on 45 steel. Compared with the empirically selected machining parameters, the machining solution obtained by Dueling DQN optimization resulted in 8.29% reduction of combined cutting force and 4.95% improvement of machining efficiency, which provided guidance for the multi-objective optimization method of machining parameters and the selection of machining parameters.

Table and Figures | Reference | Related Articles | Metrics
Adaptive salp swarm algorithm for solving flexible job shop scheduling problem with transportation time
Hao-yi NIU,Wei-min WU,Ting-qi ZHANG,Wei SHEN,Tao ZHANG
Journal of ZheJiang University (Engineering Science)    2023, 57 (7): 1267-1277.   DOI: 10.3785/j.issn.1008-973X.2023.07.001
Abstract   HTML PDF (1024KB) ( 443 )  

An adaptive salp swarm algorithm was proposed by minimizing the makespan in order to solve the flexible job shop scheduling problem with transportation time. A three-layer coding scheme was designed based on random key in order to make the discrete solution space continuous. The inertia weight was introduced to evaluate the influence among followers in order to enhance the global exploration and local search performance of the algorithm. An adaptive leader-follower population update strategy was proposed, and the number of leaders and followers was adjusted by the population status. The tabu search strategy was combined with the neighborhood search in order to prevent the algorithm from falling into local optimum. The benchmark instances verified the effectiveness and superiority of the proposed algorithm. The influence of the number of AGVs on the makespan conforms to the law of diminishing marginal effect.

Table and Figures | Reference | Related Articles | Metrics
Improved YOLOv3-based defect detection algorithm for printed circuit board
Bai-cheng BIAN,Tian CHEN,Ru-jun WU,Jun LIU
Journal of ZheJiang University (Engineering Science)    2023, 57 (4): 735-743.   DOI: 10.3785/j.issn.1008-973X.2023.04.011
Abstract   HTML PDF (1420KB) ( 439 )  

An AT-YOLO algorithm based on improved YOLOv3 was proposed aiming at the problem that the existing deep learning-based defect detection algorithm for printed circuit boards (PCB) could not meet the accuracy and efficiency requirements at the same time. Feature extraction capabilities were improved and the number of parameters was reduced by replacing the backbone with ResNeSt50. SPP module was added to integrate the features of different receptive fields and enrich the ability of feature representation. The PANet structure was improved to replace FPN, and the SE module was inserted to enhance the expression capability of effective feature maps. A set of high-resolution feature maps were added to the input and output in order to improve the sensitivity to small target objects, and the detection scale was increased from three to four. K-means algorithm was re-used to generate sizes of anchors in order to improve the accuracy of object detection. The experimental results showed that the AT-YOLO algorithm had an AP0.5 value of 98.42%, the number of parameters was 3.523×107, and the average detection speed was 36 frame per second on the PCB defect detection dataset, which met the requirements of accuracy and efficiency.

Table and Figures | Reference | Related Articles | Metrics
SQL generation from natural language queries with complex calculations on financial data
Jia-hao HE,Xi-ping LIU,Qing SHU,Chang-xuan WAN,De-xi LIU,Guo-qiong LIAO
Journal of ZheJiang University (Engineering Science)    2023, 57 (2): 277-286.   DOI: 10.3785/j.issn.1008-973X.2023.02.008
Abstract   HTML PDF (739KB) ( 433 )  

The problem of structured query language (SQL) generation from natural language queries (Text-to-SQL) in financial domain was investigated. First, SOFT, a Text-to-SQL dataset in the financial domain was constructed. The dataset covered common queries in the financial domain with distinctive features and presented challenges to Text-to-SQL research. Then, FinSQL, a Text-to-SQL model, which optimized the support for complex queries in the financial domain, was proposed. In particular, by analyzing the characteristics of row calculation queries, a class of queries with complex numerical calculations, a divide-and-conquer based method was proposed. A row calculation query was divided into several subqueries, the SQL statement for each subquery was generated, and the SQL statements were finally combined into together to get the SQL statement for the original query. Experimental results on SOFT dataset show that the proposed FinSQL model outperforms existing methods for the hard queries, and performs well for row calculation queries.

Table and Figures | Reference | Related Articles | Metrics
Solution approach of Burgers-Fisher equation based on physics-informed neural networks
Jian XU,Hai-long ZHU,Jiang-le ZHU,Chun-zhong LI
Journal of ZheJiang University (Engineering Science)    2023, 57 (11): 2160-2169.   DOI: 10.3785/j.issn.1008-973X.2023.11.003
Abstract   HTML PDF (1371KB) ( 420 )  

Physical information was divided into rule information and numerical information, in order to explore the role of physical information in training neural network when solving differential equations with physics-informed neural network (PINN). The logic of PINN for solving differential equations was explained, as well as the data-driven approach of physical information and neural network interpretability. Synthetic loss function of neural network was designed based on the two types of information, and the training balance degree was established from the aspects of training sampling and training intensity. The experiment of solving the Burgers-Fisher equation by PINN showed that PINN can obtain good solution accuracy and stability. In the training of neural networks for solving the equation, numerical information of the Burgers-Fisher equation can better promote neural network to approximate the equation solution than rule information. The training effect of neural network was improved with the increase of training sampling, training epoch, and the balance between the two types of information. In addition, the solving accuracy of the equation was improved with the increasing of the scale of neural network, but the training time of each epoch was also increased. In a fixed training time, it is not true that the larger scale of the neural network, the better the effect.

Table and Figures | Reference | Related Articles | Metrics
Binocular vision object 6D pose estimation based on circulatory neural network
Heng YANG,Zhuo LI,Zhong-yuan KANG,Bing TIAN,Qing DONG
Journal of ZheJiang University (Engineering Science)    2023, 57 (11): 2179-2187.   DOI: 10.3785/j.issn.1008-973X.2023.11.005
Abstract   HTML PDF (1068KB) ( 411 )  

A method for creating binocular dataset and a 6D pose estimation network called Binocular-RNN were proposed, in response to the problem of low accuracy in the current task of 6D pose estimation for objects. The existing images in the YCB-Video Dataset were used as the content captured by the left camera of the binocular system. The corresponding 3D object models in the YCB-Video Dataset were imported using Open GL, and the parameters related to each object were input to generate synthetic images captured by the virtual right camera of the binocular system. A monocular prediction network was utilized in the Binocular-RNN to extract geometric features from the left and right images in the binocular dataset, and recurrent neural network was used to fuse these geometric features and predict the 6D pose of the objects. The evaluation of Binocular-RNN and other pose estimation methods was based on the average distance of model points (ADD), average nearest point distance (ADDS), translation error and angle error. The results show that when the network was trained on a single object, the ADD or ADDS score of Binocular-RNN was 2.66 times that of PoseCNN and 1.15 times that of GDR-Net. Furthermore, the Binocular-RNN trained by the physics-based real-time rendering (Real+PBR) outperformed the DeepIM method based on deep neural network iterative 6D pose matching.

Table and Figures | Reference | Related Articles | Metrics
Steel surface defect detection based on deep learning 3D reconstruction
Huan LAN,Jian-bo YU
Journal of ZheJiang University (Engineering Science)    2023, 57 (3): 466-476.   DOI: 10.3785/j.issn.1008-973X.2023.03.004
Abstract   HTML PDF (5141KB) ( 403 )  

A new 3D reconstruction network was proposed in order to resolve the difficulty of 2D detection method to detect defects with depth information. CasMVSNet with multiscale feature enhancement (MFE-CasMVSNet) was combined with the technology of point cloud processing for steel plate surface defect detection. In order to improve the accuracy of 3D reconstruction, a position-oriented feature enhancement module (PFEM) and a multiscale feature adaptive fusion module (MFAFM) were proposed to effectively extract features and reduce information loss. A density clustering method, curvature-sparse-guided density-based spatial clustering of applications with noise (CS-DBSCAN), was proposed for accurately extracting defects in different parts, and the 3D detection box was introduced to locate and visualize defects. Experimental results show that compared with the reconstruction method based on images, MFE-CasMVSNet can realize the 3D reconstruction of steel plate surface more accurately and quickly. Compared with 2D detection, 3D visual defect detection can accurately obtain the 3D shape information of defects and realize the multi-dimensional detection of steel plate surface defects.

Table and Figures | Reference | Related Articles | Metrics
Integrated control of active front steering and direct yaw moment
Bing ZHOU,Yang-yi LIU,Xiao-jian WU,Tian CHAI,Yong-qiang ZENG,Qian-xi PAN
Journal of ZheJiang University (Engineering Science)    2022, 56 (12): 2330-2339.   DOI: 10.3785/j.issn.1008-973X.2022.12.002
Abstract   HTML PDF (2400KB) ( 390 )  

Aiming at the coordination problem of active front steering (AFS) and direct yaw moment control (DYC) in vehicle handling and stability control, an optimal phase plane method for stable region partition was proposed. In order to realize the integrated control of handling and stability on lateral and longitudinal dynamic system, a coordination criterion considering tire force characteristics was established based on the proposed method . Firstly, the side slip angle of front and rear tires and the difference between them were used as the characterization of vehicle lateral stability. Combined with the lateral force characteristics of the tires, the lateral state of the vehicle was divided into stable, critically stable and unstable regions. Thereby the coordination criterion between AFS and DYC was established. Secondly, considering the problem of obtaining the state variables when the control algorithm was oriented to practical applications, a state observer based on the super-twisting algorithm was established to estimate the vehicle front and rear wheel slip angle. Finally, the AFS and DYC higher-order sliding mode controller based on the adaptive super-twisting algorithm was designed to eliminate the chattering phenomenon and avoided frequent switching of the controllers during the process of stability control. Experimental results showed that the proposed coordination criteria and control method had positive effect on the coordination of AFS and DYC and obtained great effect on the control of handling and stability.

Table and Figures | Reference | Related Articles | Metrics
Survey of text-to-image synthesis
Yin CAO,Junping QIN,Qianli MA,Hao SUN,Kai YAN,Lei WANG,Jiaqi REN
Journal of ZheJiang University (Engineering Science)    2024, 58 (2): 219-238.   DOI: 10.3785/j.issn.1008-973X.2024.02.001
Abstract   HTML PDF (2809KB) ( 389 )  

A comprehensive evaluation and categorization of text-to-image generation tasks were conducted. Text-to-image generation tasks were classified into three major categories based on the principles of image generation: text-to-image generation based on the generative adversarial network architecture, text-to-image generation based on the autoregressive model architecture, and text-to-image generation based on the diffusion model architecture. Improvements in different aspects were categorized into six subcategories for text-to-image generation methods based on the generative adversarial network architecture: adoption of multi-level hierarchical architectures, application of attention mechanisms, utilization of siamese networks, incorporation of cycle-consistency methods, deep fusion of text features, and enhancement of unconditional models. The general evaluation indicators and datasets of existing text-to-image methods were summarized and discussed through the analysis of different methods.

Table and Figures | Reference | Related Articles | Metrics