A comprehensive evaluation and categorization of text-to-image generation tasks were conducted. Text-to-image generation tasks were classified into three major categories based on the principles of image generation: text-to-image generation based on the generative adversarial network architecture, text-to-image generation based on the autoregressive model architecture, and text-to-image generation based on the diffusion model architecture. Improvements in different aspects were categorized into six subcategories for text-to-image generation methods based on the generative adversarial network architecture: adoption of multi-level hierarchical architectures, application of attention mechanisms, utilization of siamese networks, incorporation of cycle-consistency methods, deep fusion of text features, and enhancement of unconditional models. The general evaluation indicators and datasets of existing text-to-image methods were summarized and discussed through the analysis of different methods.
A lightweight and efficient aerial image detection algorithm called Functional ShuffleNet YOLO (FS-YOLO) was proposed based on YOLOv8s, in order to address the issues of low detection accuracy for small targets and a large number of model parameters in current unmanned aerial vehicle (UAV) aerial image detection. A lightweight feature extraction network was introduced by reducing channel dimensions and improving the network architecture. This facilitated the efficient reuse of redundant feature information, generating more feature maps with fewer parameters, enhancing the model’s ability to extract and express feature information while significantly reducing the model size. Additionally, a content-aware feature recombination module was introduced during the feature fusion stage to enhance the attention on salient semantic information of small targets, thereby improving the detection performance of the network for aerial images. Experimental validation was conducted using the VisDrone dataset, and the results indicated that the proposed algorithm achieved a detection accuracy of 47.0% mAP0.5 with only 5.48 million parameters. This represented a 50.7% reduction in parameter count compared to the YOLOv8s benchmark algorithm, along with a 6.1% improvement in accuracy. Experimental results of DIOR dataset showed that FS-YOLO had strong generalization and was more competitive than other state-of-the-art algorithms.
A thorough analysis and cross-comparison of recent relevant works was provided, outlining a closed-loop process for EEG data analysis based on deep learning. EEG data were introduced, and the application of deep learning in three key stages: preprocessing, feature extraction, and model generalization was unfolded. The research ideas and solutions provided by deep learning algorithms in the respective stages were delineated, including the challenges and issues encountered at each stage. The main contributions and limitations of different algorithms were comprehensively summarized. The challenges faced and future directions of deep learning technology in handling EEG data at each stage were discussed.
Few existing studies cover the state-of-the-art multi-objective particle swarm optimization (MOPSO) algorithms. To fill the gap in this area, the research background of multi-objective optimization problems (MOPs) was introduced, and the fundamental theories of MOPSO were described. The MOPSO algorithms were divided into three categories according to their features: Pareto-dominated-based MOPSO, decomposition-based MOPSO, and indicator-based MOPSO, and a detailed description of their existing classical algorithms was also developed. Next, relevant evaluation indicators were described, and seven representative algorithms were selected for performance analysis. The experimental results demonstrated the strengths and weaknesses of each of the traditional MOPSO and three categories of improved MOPSO algorithms. Among them, the indicator-based MOPSO performed better in terms of convergence and diversity. Then, the applications of MOPSO algorithms in production scheduling, image processing, and power systems were briefly introduced. Finally, the limitations and future research directions of the MOPSO algorithm for solving complex optimization problems were discussed.
A path planning algorithm based on the fusion of the improved A* algorithm and the random obstacle avoidance dynamic window method (ROA-DWA) was proposed in order to address the issues of excessive traversal nodes, redundant points, non-smooth paths, lack of global guidance, susceptibility to local optima, and low safety in traditional A* algorithm and dynamic window approach (DWA) for robot path planning. The search efficiency was improved by adjusting the weights of heuristic functions, Floyd’s algorithm, redundant point deletion strategy, static and dynamic obstacle classification, and speed adaptive factor. The length of the path and the number of inflection points were reduced, and the influence of known obstacles on the path was minimized to improve the efficiency of dynamic obstacle avoidance, which enabled the robot to smoothly arrive at the target point and improved the safety of the robot, and better adapted to complex dynamic and static environments. The experimental results show that the algorithm has better global optimality and local obstacle avoidance ability, and shows better advantages in large maps.
The effects of leg structure design, foot-end design and sensor design on touchdown detection were comprehensively discussed by analyzing the existing legged robot touchdown detection methods. The touchdown method for direct detection of external sensors, the touchdown detection method based on kinematics and dynamics, and the touchdown detection method based on learning were summarized. Touchdown detection methods were summarized in three special scenarios: slippery ground, soft ground, and non-foot-end contact. The application scenarios of touchdown detection technology were analyzed, including the three application scenarios of motion control requirements, navigation applications, and terrain and geological sensing. The development trends were pointed out, which related to the four major touchdown detection methods of hardware improvement and integration, multi-mode touchdown detection, multi-sensor fusion touchdown detection, and intelligent touchdown detection. The specific relationships between various touchdown detection algorithms were summarized, which provided guidance for the development of follow-up technology for touchdown detection and specific applications of touchdown detection.
A method for recognizing machining features based on graph neural networks was proposed in order to address the difficulties in identifying intersecting features and accurately determining machining feature surfaces in existing deep learning-based approaches. Features of nodes and adjacent edges were extracted through a compression activation module, and a dual-layer attention network at the node and adjacent edge levels was constructed in order to segment the machining features corresponding to each node. The surface features and edge features of the part model were fully used combined with the topological structure of the part model. The recognition problem of non-face merged intersecting features was effectively addressed by employing attention mechanisms for deep learning on the feature information. The proposed method was experimentally compared with three other feature recognition methods on a dataset of parts with multiple machining features. The optimal results were obtained in terms of accuracy, average class accuracy and intersection-over-union metrics. The recognition accuracy exceeded 95%.
A numerical simulation framework was established, which was suitable for simulations of the continuous conversion mode of the tiltrotor based on the overset mesh method. The transition of a rotor/wing system from fixed-wing mode to helicopter mode was simulated for two important components in unmanned aerial vehicles, the rotor and the wing. Reynolds-averaged Navier-Stokes equations were used to analyze the variations of aerodynamic characteristics in different advance ratios and the effects of crosswind velocity on aerodynamic characteristics in conversion mode. Results show that the lift and drag coefficients of the wing decrease with the increase of the tilt angle, and the variation decreases with the increase of the advance ratio. The rotor thrust increases with the increase of the tilt angle, and the variation increases with the increase of the advance ratio. When there is crosswind in the incoming flow, the lift and drag coefficients of the wing decrease. The performance of the wing in low crosswind velocity is improved after the tilt angle reaches 65°. The magnitude of the thrust coefficient of the rotor is not significantly affected by crosswind, but the oscillation amplitude increases as a result.
A new steam Carnot battery based on high-temperature and low-temperature phase change materials was proposed in order to analyze the new route of multi-energy complementation of integrated energy system in industrial parks. A thermodynamic cycle calculation model considering the equipment performance and mass flow rate was established. The effects of design parameters and multi-stage compression structure on the system heat pump coefficient, round-trip efficiency, power storage loss and efficiency of the heating were analyzed. The phase change temperature of low-temperature phase change material and the phase change temperature of high-temperature phase change material are the main factors affecting the performance of steam Carnot battery. The high cycle performance region of steam Carnot battery was obtained. The parameters and structure of the steam Carnot battery were optimized. Results showed that the round-trip efficiency could reach 56.96%, the coefficient of performance of the heat pump could reach 2.55, and the efficiency of the heating could reach 68.74%.
A lightweight Yolov5 garbage detection solution was proposed aiming at the issue of poor real-time performance in garbage detection classification on edge devices. The Stem module was introduced to enhance the model’s ability to extract features from input images. The C3 module of the backbone was improved to increase feature extraction capabilities. Depthwise separable convolution was used to replace the 3×3 downsampling convolutions in the network, achieving model lightweighting. The K-means++ algorithm was employed to recompute anchor box values for objects, enabling the model to better predict target box sizes during training. Experimental research and comparisons show that the improved model achieves a 0.8% increase in mAP_0.5 and a 3% increase in mAP_0.5:0.95, while reducing model parameters by 77.9% and improving inference speed by 21.9% compared with the Yolov5s model, significantly enhancing the detection performance of the model.
A parking charge strategy based on dispatching autonomous vehicles was proposed in order to improve the efficiency of the parking system that accommodates both human-driven vehicles and autonomous vehicles. This strategy provides autonomous vehicles dispatch service to the human-driven vehicle when there is no available parking space in the parking lot but there are autonomous vehicles. The parking system will dispatch a number of autonomous vehicles among multiple parking lots to create an available parking space for the human-driven vehicle in its target parking lot after charging a certain dispatch fee of the human-driven vehicle’s user. Since each parking lot’s dispatch fee can affect the human-driven vehicle users’ parking choices, and thus affect the operation efficiency of the parking system. An agent-based parking simulation model was constructed, and differentiated dispatch fee of every parking lot was set by the genetic algorithm. The simulation results show that the differentiated parking charge strategy based on dispatching the autonomous vehicles can significantly reduce the driving time, walking time, total travel time and mileage of the human-driven vehicle users, increase the revenue of the parking system, reduce the social cost and effectively alleviate the parking problem.
A new multimodal sentiment analysis model (MTSA) was proposed on the basis of cross-modal Transformer, aiming at the difficult retention of the modal feature heterogeneity for single-modal feature extraction and feature redundancy for cross-modal feature fusion. Long short-term memory (LSTM) and multi-task learning framework were used to extract single-modal contextual semantic information, the noise was removed and the modal feature heterogeneity was preserved by adding up auxiliary modal task losses. Multi-tasking gating mechanism was used to adjust cross-modal feature fusion. Text, audio and visual modal features were fused in a stacked cross-modal Transformer structure to improve fusion depth and avoid feature redundancy. MTSA was evaluated in the MOSEI and SIMS data sets, results show that compared with other advanced models, MTSA has better overall performance, the accuracy of binary classification reached 83.51% and 84.18% respectively.
A many-core parallel optimization scheme for large-point FFT was proposed according to the structural characteristics and programming specifications of the domestic Sunway 26010 processor, which was used in the Sunway Taihu Light supercomputer. The scheme was derived from the classic Cooley-Tukey FFT algorithm, and was accelerated in parallel by iteratively decomposing the one-dimensional large-point data into two-dimensional small-scale matrices. The "column-sharing, row-continuity" strategy was specially proposed in order to solve the problem of reading, writing, transposing and calculating of the "column FFT" of the matrix. The computing resources and transmission bandwidth of the many-core processor were fully utilized by reasonable data allocation, rearrangement and exchange combined with other optimization methods such as SIMD vectorization, twiddle factor optimization, double-buffering, register communication and stride transmission. The experimental results prove that the single core-group of 64 slave cores running parallel program can achieve a maximum speed-up of 65x and an average speed-up of more than 48x compared with the main core running the FFTW library.
A super-efficient SBM model including non-desired outputs was used to measure industrial environmental efficiency in 30 Chinese provinces from 2008 to 2020 in order to solve the problem of how industrial enterprises can pick appropriate green technology innovations to accomplish industrial green transformation under the background of strict environmental regulations. The efficiency was used to characterize the level of industrial green transformation. A panel threshold model was used to explore the mechanism of the impact of different green technology innovations on industrial green transformation under different environmental regulation intensities. Results show that China's industrial environmental efficiency fluctuates and rises from 2008 to 2020 as a whole, and the efficiency gap between regions shows a slightly decreasing trend. The environmental impacts of various green technology innovations significantly differ, among which process-oriented green technology innovations emphasizing on processes and products is the key to achieving industrial green transformation. The positive environmental effect of process-oriented green technology innovation increases, while the negative environmental effect of result-oriented green technology innovation decreases as environmental regulations become more stringent.
An improved method for the blockchain Kademlia network based on small world theory was proposed aiming at the issue of sacrificing security to improve scalability in the current research of the blockchain Kademlia network. The idea of the small world theory was followed, and a probability formula for replacing expansion nodes was proposed. The probability was inversely proportional to the distance between nodes. The number of node replacements and additional nodes could be flexibly adjusted according to actual conditions. The theoretical analysis and experimental verification demonstrate that the network transformed by this method can reach a stable state. The experimental results showed that the transmission hierarchy required for broadcasting transaction messages throughout the network was reduced by 15.0% to 30.8% and the rate of locating nodes was increased. The level of network structure was reduced and network security was enhanced compared to other optimization algorithms that modify the network structure.
A light-weight, real-time approach named RTGN (real-time grasp net) was proposed to improve the accuracy and speed of robotic grasp detection for novel objects of diverse shapes, types and sizes. Firstly, a multi-scale dilated convolution module was designed to construct a light-weight feature extraction backbone. Secondly, a mixed attention module was designed to help the network focus more on meaningful features. Finally, the pyramid pool module was deployed to fuse the multi-level features extracted by the network, thereby improving the capability of grasp perception to the object. On the Cornell grasping dataset, RTGN generated grasps at a speed of 142 frame per second and attained accuracy rates of 98.26% and 97.65% on image-wise and object-wise splits, respectively. In real-world robotic grasping experiments, RTGN obtained a success rate of 96.0% in 400 grasping attempts across 20 novel objects. Experimental results demonstrate that RTGN outperforms existing methods in both detection accuracy and detection speed. Furthermore, RTGN shows strong adaptability to variations in the position and pose of grasped objects, effectively generalizing to novel objects of diverse shapes, types and sizes.
A new pre-stressed pipe pile foundation with enlarged spudcan was proposed. Different from the soil squeezing effect of pipe piles with uniform section, the expansion of the pile toe can not be regarded as an ideal sphere by considering the dimension of the spudcan is larger than that of the prefabricated pipe pile, but is much closer to an oblate spheroid (rotational ellipsoid). The construction squeezing displacement field around the new pile foundation with enlarged spudcan was analytically solved based on an oblate spheroid expansion source, and the volume deformation of the plastic region of soil around the pile was considered based on the cavity expansion theory. The calculation results of the oblate spheroid expansion strain path method were modified. The soil squeezing effect of pile foundation with enlarged spudcan was analyzed by employing the analytical solution. Results show that the surrounding soil can be roughly divided into three areas as near the ground surface, near the pile toe and near the pile shaft according to the distribution of the displacement field. The horizontal “squeezing” effect in the area near the pile shaft is slightly enhanced as the shape of the expansion source is more “flat”, while the “uplift” displacement in the area near the ground surface and the vertical displacement in the area near the pile toe are significantly reduced, which is very beneficial for the construction of the squeezing soil type piles.
A multi strategy alternating optimization (MSAO) algorithm was proposed for covert transmission of the reconfigurable intelligence surface (RIS) assisted dual-function radar and communication (DFRC) system. Under the conditions of covert constraints, radar constant modulus constraints and total power constraints, the communication beamforming vector, radar signal covariance matrix and RIS phase shift matrix were jointly designed to maximize the legitimate user Bob’ s covert communication rate and probing power at target, in order to achieve a tradeoff between covert communication and radar sensing. In both the perfect and imperfect Willie’ s channel state information scenarios, the simulation results show that deploying RIS in a generalized fully connected mode can better transmit beamforming maps, increase the upper limit of Bob’ s covert communication rate, expand the achievable range of rates, and achieve greater freedom in communication and sensing functions compared to traditional single connected RIS and systems without RIS deployment.
The three-dimensional models of proton exchange membrane fuel cell (PEMFC) with parallel, serpentine and leaf vein flow fields were established, and the oxygen distribution characteristics of catalytic layer (CL) in different models were analyzed. The corresponding porosity gradient distribution schemes in gas diffusion layer (GDL) were proposed for different flow models. The oxygen molar fraction distribution, membrane current density distribution, polarization curve and power density curve in different flow field models were analyzed. Results show that the proposed porosity gradient distribution scheme can effectively enhance the oxygen transfer from GDL to CL, improve the local oxygen supply deficiency of CL, and enhance the output performance of PEMFC. Compared with the parallel, serpentine and leaf vein flow field models of the conventional porosity distribution, the peak power density of the flow field model using the porosity gradient distribution increased by 8.59%, 18.26% and 15.46%, respectively.
A road network extraction method based on a lightweight Transformer was proposed, named RoadViT aiming at some limitations of the existing methods, such as imprecise road region extraction and limited real-time performance. The MobileViT architecture which could mix convolutional neural networks and the Transformer was used to encode features in order to efficiently extract high-level context information. Then a pyramid decoder was proposed to implement the extraction and fusion of multi-scale features, and the probability distribution of pixel categories was generated. The Mosaic method was combined with multi-scale scaling and random cropping strategies to implement data enhancement, which could construct fine and various remote sensing images. A dynamic weighting loss function was proposed to mitigate the problem according to the imbalance between the road category and background category in urban remote sensing images. The experimental results show that RoadViT, with a number of parameters of only 1.25 × 106, can achieve an inference speed of up to 10 frames in a second on the Jetson TX2, and an accuracy of up to 57.0% on the CHN6-CUG dataset. The proposed method is an effective exploration of the lightweight Transformer in urban remote sensing images, which can achieve improved road extraction accuracy while maintaining the real-time performance of inference.