Most of the compound fault diagnosis methods regard the compound fault as a new single fault type, ignoring the interaction of internal single faults, and the fault analysis is vague in granularity and poor in interpretation. An improved Transformer-based compound fault decoupling diagnosis method was proposed for industrial environments with very little compound fault data. The diagnosis process included pre-processing, feature extraction and fault decoupling. With introducing the decoder of the Transformer, the cross-attention mechanism enables each single fault label to adaptively in the extracted feature layer focus on the discriminative feature region corresponding to the fault feature and predicts the output probability to achieve compound fault decoupling. Compound fault tests were designed to verify the effectiveness of the method compared with the advanced algorithms. The results showed that the proposed method had high diagnostic accuracy with a small number of single fault training samples and a very small number of compound fault training samples. The compound fault diagnosis accuracy reached 88.29% when the training set contained only 5 compound fault samples. Thus the new method has a significant advantage over other methods.
A new 3D reconstruction network was proposed in order to resolve the difficulty of 2D detection method to detect defects with depth information. CasMVSNet with multiscale feature enhancement (MFE-CasMVSNet) was combined with the technology of point cloud processing for steel plate surface defect detection. In order to improve the accuracy of 3D reconstruction, a position-oriented feature enhancement module (PFEM) and a multiscale feature adaptive fusion module (MFAFM) were proposed to effectively extract features and reduce information loss. A density clustering method, curvature-sparse-guided density-based spatial clustering of applications with noise (CS-DBSCAN), was proposed for accurately extracting defects in different parts, and the 3D detection box was introduced to locate and visualize defects. Experimental results show that compared with the reconstruction method based on images, MFE-CasMVSNet can realize the 3D reconstruction of steel plate surface more accurately and quickly. Compared with 2D detection, 3D visual defect detection can accurately obtain the 3D shape information of defects and realize the multi-dimensional detection of steel plate surface defects.
A multimodal image retrieval model based on semantic-enhanced feature fusion (SEFM) was proposed to establish the correlation between text features and image features in multimodal image retrieval tasks. Semantic enhancement was conducted on the combined features during feature fusion by two proposed modules including the text semantic enhancement module and the image semantic enhancement module. Firstly, to enhance the text semantics, a multimodal dual attention mechanism was established in the text semantic enhancement module, which associated the multimodal correlation between text and image. Secondly, to enhance the image semantics, the retain intensity and update intensity were introduced in the image semantic enhancement module, which controlled the retaining and updating degrees of the query image features in combined features. Based on the above two modules, the combined features can be optimized, and be closer to the target image features. In the experiment part, the SEFM model was evaluated on MIT-States and Fashion IQ datasets, and experimental results show that the proposed model performs better than the existing works on recall and precision metrics.
For the poor real-time detection capability of the current object detection model in the production environment of electronic components, GhostNet was used to replace the backbone network of YOLOv5. And for the existence of small objects and objects with large scale changes on the surface defects of electronic components, a coordinate attention module was added to the YOLOv5 backbone network, which enhanced the sensory field while avoiding the consumption of large computational resources. The coordinate information was embedded into the channel attention to improve the object localization of the model. The feature pyramid networks (FPN) structure in the YOLOv5 feature fusion module was replaced with a weighted bi-directional feature pyramid network structure, to enhance the fusion capability of multi-scale weighted features. Experimental results on the self-made defective electronic component dataset showed that the improved GCB-YOLOv5 model achieved an average accuracy of 93% and an average detection time of 33.2 ms, which improved the average accuracy by 15.0% and the average time by 7 ms compared with the original YOLOv5 model. And the improved model can meet the requirements of both accuracy and speed of electronic component surface defect detection.
The dense small target detection algorithm LSA_YOLO based on YOLOv5s for UAVs with complex backgrounds and multiples of small targets with dense distribution was proposed for UAV images. A multi-scale feature extraction module LM-fem was constructed to enhance the feature extraction capability of the network. A new hybrid domain attention module S-ECA relying on multi-scale contextual information has been put forward and a algorithm focus on target information was established aiming to suppress the interference of complex backgrounds. The adaptive weight dynamic fusion structure AFF was designed to assign reasonable fusion weights to both shallow and deep features. The capability of algorithm in detecting dense small targets in complex backgrounds was improved given the application of S-ECA and AFF in the structure of PANet. The loss function Focal-EIOU was utilized instead of the loss function CIOU to accelerate model detection efficiency. Experimental results on the public dataset VisDrone2021 public dataset show that the average detection accuracy for all target classes improves from 51.5% for YOLOv5s to 57.6% for LSA_YOLO when the set input resolution is set to 1 504 × 1 504.
A comprehensive evaluation and categorization of text-to-image generation tasks were conducted. Text-to-image generation tasks were classified into three major categories based on the principles of image generation: text-to-image generation based on the generative adversarial network architecture, text-to-image generation based on the autoregressive model architecture, and text-to-image generation based on the diffusion model architecture. Improvements in different aspects were categorized into six subcategories for text-to-image generation methods based on the generative adversarial network architecture: adoption of multi-level hierarchical architectures, application of attention mechanisms, utilization of siamese networks, incorporation of cycle-consistency methods, deep fusion of text features, and enhancement of unconditional models. The general evaluation indicators and datasets of existing text-to-image methods were summarized and discussed through the analysis of different methods.
A ship object detection algorithm was proposed based on a multi-head self-attention (MHSA) mechanism and YOLO network (MHSA-YOLO), aiming at the characteristics of complex backgrounds, large differences in scale between classes and many small objects in inland rivers and ports. In the feature extraction process, a parallel self-attention residual module (PARM) based on MHSA was designed to weaken the interference of complex background information and strengthen the feature information of the ship objects. In the feature fusion process, a simplified two-way feature pyramid was developed so as to strengthen the feature fusion and representation ability. Experimental results on the Seaships dataset showed that the MHSA-YOLO method had a better learning ability, achieved 97.59% mean average precision in the aspect of object detection and was more effective compared with the state-of-the-art object detection methods. Experimental results based on a self-made dataset showed that MHSA-YOLO had strong generalization.
Slab track suffers material performance decline and structural damage accumulation in the long-term service process under the coupling effect of train load and complex environment, resulting in a gradual deterioration of its service performance. The forms and causes of common interlayer damages on prefabricated slab track and double-block slab track in China were comprehensively discussed. The application of ground penetrating radar method, impact echo method and other local damage identification methods used in slab tracks were summarized. And it was proposed that combining multiple local damage identification techniques was the key to achieve accurate local damage identification of track structures. In addition, the overall damage identification technologies based on modal parameters, slab bed vibration signals and vehicle vibration signals were outlined. The need to expand the detection sample of field damages to improve the generalization of the overall identification method was pointed out. The advantages and limitations of various identification methods were analyzed in detail to provide guidance for improving the identification technology system of slab track structures in China and making scientific and reasonable maintenance strategies.
A thorough analysis and cross-comparison of recent relevant works was provided, outlining a closed-loop process for EEG data analysis based on deep learning. EEG data were introduced, and the application of deep learning in three key stages: preprocessing, feature extraction, and model generalization was unfolded. The research ideas and solutions provided by deep learning algorithms in the respective stages were delineated, including the challenges and issues encountered at each stage. The main contributions and limitations of different algorithms were comprehensively summarized. The challenges faced and future directions of deep learning technology in handling EEG data at each stage were discussed.
A lightweight and efficient aerial image detection algorithm called Functional ShuffleNet YOLO (FS-YOLO) was proposed based on YOLOv8s, in order to address the issues of low detection accuracy for small targets and a large number of model parameters in current unmanned aerial vehicle (UAV) aerial image detection. A lightweight feature extraction network was introduced by reducing channel dimensions and improving the network architecture. This facilitated the efficient reuse of redundant feature information, generating more feature maps with fewer parameters, enhancing the model’s ability to extract and express feature information while significantly reducing the model size. Additionally, a content-aware feature recombination module was introduced during the feature fusion stage to enhance the attention on salient semantic information of small targets, thereby improving the detection performance of the network for aerial images. Experimental validation was conducted using the VisDrone dataset, and the results indicated that the proposed algorithm achieved a detection accuracy of 47.0% mAP0.5 with only 5.48 million parameters. This represented a 50.7% reduction in parameter count compared to the YOLOv8s benchmark algorithm, along with a 6.1% improvement in accuracy. Experimental results of DIOR dataset showed that FS-YOLO had strong generalization and was more competitive than other state-of-the-art algorithms.
The two-dimensional strength theory based cohesive zone model (ST-CZM) was extended to the three-dimensional case for a wider application. Furthermore, the finite element implementation of the ST-CZM was carried out using the Abaqus user element subroutine (UEL). The validity and accuracy of the ST-CZM were validated by several typical numerical benchmarks. On this basis, the ST-CZM finite element model was used to simulate the bond-slip behavior of the interface between fiber reinforced polymer (FRP) and concrete, which extended the application of the ST-CZM in complex working conditions. Compared to the traditional “traction laws” based cohesive zone model (CZM), the ST-CZM provides improved flexibility in mode mixity and allows independent selection of strength models in normal and tangent directions. In addition, the ST-CZM exhibits better convergence performance and more accurate strength predictions compared to “traction laws” based CZMs. All examples show that compared to the traditional “traction laws” based CZM, the ST-CZM finite element model can better predict the peak stress of the bonding interface and simulate the mixed-mode damage process, showing more realistic cracking process.
Aiming at the requirements of intelligent maintenance and digital diagnosis of commercial aircraft in China, a novel Boyer-Moore long short-term memory network (BM LSTM) algorithm was proposed for unstructured fault isolation manual. A majority voting method was used to fuse three entity recognition algorithms including conditional random fields (CRF), bi-directional long short-term memory (BiLSTM) and BiLSTM CRF. The accuracy of entity recognition was effectively improved by the proposed BM LSTM algorithm. On this basis, a maintenance scheme knowledge graph was constructed for the commercial aircraft maintenance fault diagnosis manual. A commercial aircraft maintenance scheme recommendation system was designed by combining term frequency-inverse document frequency (TF-IDF) similarity algorithm with BM LSTM. Maintenance schemes can be matched accurately with this recommendation system by retrieving the unstructured fault description texts. Experimental results show that the proposed knowledge graph and the maintenance scheme recommendation system can effectively ensure the accurate matching of maintenance information, and the efficiency of maintenance scheme formation is significantly improved.
A multi-agent reinforcement learning algorithm based on priority experience replay and decomposed reward function was proposed in multi-agent pursuit and evasion games. Firstly, multi-agent twin delayed deep deterministic policygradient algorithm (MATD3) algorithm based on multi-agent deep deterministic policy gradient algorithm (MADDPG) and twin delayed deep deterministic policy gradient algorithm (TD3) was proposed. Secondly, the priority experience replay was proposed to determine the priority of experience and sample the experience with high reward, aiming at the problem that the reward function is almost sparse in the multi-agent pursuit and evasion problem. In addition, a decomposed reward function was designed to divide multi-agent rewards into individual rewards and joint rewards to maximize the global and local rewards. Finally, a simulation experiment was designed based on DEPER-MATD3. Comparison with other algorithms showed that DEPER-MATD3 algorithm solved the over-estimation problem, and the time consumption was improved compared with MATD3 algorithm. In the decomposed reward function environment, the global mean rewards of the pursuers were improved, and the pursuers had a greater probability of chasing the evader.
In reality, the structure of most graphs could be noisy, i.e., including some noisy edges or ignoring some edges that exist between nodes in practice. To solve these challenges, a novel differentiable similarity module (DSM), which boosted node representations by digging implict association between nodes to improve the accuracy of node classification, was presented. Basic representation of each target node was learnt by DSM using an ordinary graph neural network (GNN), similar node sets were selected in terms of node representation similarity and the basic representation of the similar nodes was integrated to boost the target node’s representation. Mathematically, DSM is differentiable, so it is possible to combine DSM as plug-in with arbitrary GNNs and train them in an end-to-end fashion. DSM enables to exploit the implicit edges between nodes and make the learned representations more robust and discriminative. Experiments were conducted on several public node classification datasets. Results demonstrated that with GNNs equipped with DSM, the classification accuracy can be significantly improved, for example, GAT-DSM outperformed GAT by significant margins of 2.9% on Cora and 3.5% on Citeseer.
In order to reveal the spatiotemporal relationship between urban multi-dimensional features and bike-sharing parking demand and their associated scales, combined with multi-source data in Shanghai, a multiscale geographically and temporally weighted regression model constrained by riding distance (RD-MGTWR) was constructed to explore the spatiotemporal heterogeneity patterns of the impact of built environment and regional economic attributes on parking demand. The model comparison analysis shows that the MGTWR model exhibits better explanatory power and reliability than the geographically and temporally weighted regression model (GTWR), and the introduction of riding distance further improves the robustness of the MGTWR model. Results show that the scale of the positive impact of socioeconomic attributes on parking demand is global, while the negative impact of location conditions presents local heterogeneity, and is most significant in the inner ring central area during the commuter morning peak. In addition, bus station density, metro station density and shopping service facility density with micro-spatial or temporal scales have positive and negative effects on parking demand. The findings of the scale effect of influencing factors can help guide parking facility zoning development and bike sharing time-sharing scheduling.
A news recommendation method based on Transformer and knowledge graph was proposed to increase the auxiliary information and improve the prediction accuracy. The self-attention mechanism was used to obtain the connection between news words and news entities in order to combine news semantic information and entity information. The additive attention mechanism was employed to capture the influence of words and entities on news representation. Transformer was introduced to pick up the correlation information between clicked news of user and capture the change of user interest over time by considering the time-series characteristics of user preference for news. High-order structural information in knowledge graphs was used to fuse adjacent entities of the candidate news and enhance the integrity of the information contained in the candidate news embedding vector. The comparison experiments with five typical recommendation methods on two versions of the MIND news dataset show that the introduction of attention mechanism, Transformer and knowledge graph can improve the performance of the algorithm on news recommendation.
The improved YOLOv5 object detection algorithm was used to detect the facial region of the driver and a multi-feature fusion fatigue state detection method was established aiming at the problem that existing fatigue state detection method cannot be applied to drivers under the epidemic prevention and control. The image tag data including the situation of wearing a mask and the situation without wearing a mask were established according to the characteristics of bus driving. The detection accuracy of eyes, mouth and face regions was improved by increasing the feature sampling times of YOLOv5 model. The BiFPN network structure was used to retain multi-scale feature information, which makes the prediction network more sensitive to targets of different sizes and improves the detection ability of the overall model. A parameter compensation mechanism was proposed combined with face keypoint algorithm in order to improve the accuracy of blink and yawn frame number. A variety of fatigue parameters were fused and normalized to conduct fatigue classification. The results of the public dataset NTHU and the self-made dataset show that the proposed method can recognize the blink and yawn of drivers both with and without masks, and can accurately judge the fatigue state of drivers.
The segmentation model construction and training based on single source data may lead to insufficient segmentation accuracy due to the defects of various imaging methods in medical images. Aiming at this problem, a medical image segmentation method based on multi-source information fusion was proposed. The FFDM and DBT data sources in the breast tumour microcalcification cluster lesion were used as examples to verify the effectiveness of the proposed method. The Yolov4 region candidate network was used to screen the suspicious regions of the FFDM data. DBT image was preprocessed by using the suspicious region information. The preprocessed DBT image was used as the input of the improved U-Net model to achieve lesion segmentation. Finally, through the fusion strategy of fault segmentation results based on sequential similarity discrimination, the multi-slice results in DBT were combined to complete the final lesion segmentation. True positive rate of 98.52%, false positive rate of 10.45% and accuracy of 94.07% were obtained from the FFDM and DBT data of 20 patients by using this method. Results show that the medical image segmentation method based on multi-source information fusion can effectively utilize the advantages of multi-source data, and achieve the rapid and accurate segmentation of lesions. The method can provide a novel solution for intelligent medical image diagnosis and treatment.
A feature extraction and classification method of imagined speech electroencephalogram (EEG) signals was proposed by combining discrete wavelet transform (DWT) and empirical mode decomposition (EMD) in order to improve the accuracy of imagined speech brain-computer interface (BCI) control task. DWT and EMD were applied to the original imagined speech EEG signals respectively, and the features of the signal of each channel were extracted and fused. Then the RBF support vector machine (SVM) was used to classify the imagined speech EEG signals. The experimental results show that the classification accuracy can achieve an average by 82.46% with the proposed method, which is 20.77% higher than that with the DWT method, and 21.12% higher than that with the EMD method. The proposed method can effectively improve the classification accuracy of imagined speech EEG signals, and is of great value to the practical application of imagined speech BCI.
Few existing studies cover the state-of-the-art multi-objective particle swarm optimization (MOPSO) algorithms. To fill the gap in this area, the research background of multi-objective optimization problems (MOPs) was introduced, and the fundamental theories of MOPSO were described. The MOPSO algorithms were divided into three categories according to their features: Pareto-dominated-based MOPSO, decomposition-based MOPSO, and indicator-based MOPSO, and a detailed description of their existing classical algorithms was also developed. Next, relevant evaluation indicators were described, and seven representative algorithms were selected for performance analysis. The experimental results demonstrated the strengths and weaknesses of each of the traditional MOPSO and three categories of improved MOPSO algorithms. Among them, the indicator-based MOPSO performed better in terms of convergence and diversity. Then, the applications of MOPSO algorithms in production scheduling, image processing, and power systems were briefly introduced. Finally, the limitations and future research directions of the MOPSO algorithm for solving complex optimization problems were discussed.