Foundational models in natural language processing, computer vision and multimodal learning have achieved significant breakthroughs in recent years, showcasing the potential of general artificial intelligence. However, these models still fall short of human or animal intelligence in areas such as causal reasoning and understanding physical commonsense. This is because these models primarily rely on vast amounts of data and computational power, lacking direct interaction with and experiential learning from the real world. Many researchers are beginning to question whether merely scaling up model size is sufficient to address these fundamental issues. This has led the academic community to reevaluate the nature of intelligence, suggesting that intelligence arises not just from enhanced computational capabilities but from interactions with the environment. Embodied intelligence is gaining attention as it emphasizes that intelligent agents learn and adapt through direct interactions with the physical world, exhibiting characteristics closer to biological intelligence. A comprehensive survey of embodied artificial intelligence was provided in the context of foundational models. The underlying technical ideas, benchmarks, and applications of current embodied agents were discussed. A forward-looking analysis of future trends and challenges in embodied AI was offered.
A novel detection algorithm (efficient multi-scale attention (EMA) and small object detection based on YOLOv5, ES-YOLOv5) was proposed by targeting small ship targets in SAR scenes aiming at the issues of inconspicuous imaging features and low detection accuracy caused by arbitrary orientation of small targets in synthetic aperture radar (SAR) imaging. A small target detection layer was added to adjust the receptive field size, making it more suitable for capturing small target scale features and facilitating multi-scale fusion. An EMA mechanism was introduced to focus on key target information and enhance feature representation capability. The circular smooth label (CSL) technique was utilized to adapt to the periodicity of angles, achieving high-precision angle classification. The experimental results demonstrate that the proposed method achieves an average detection accuracy of 90.9% at an intersection over union (IoU) threshold of 0.5 on the RSDD-SAR dataset. The algorithm outperforms the baseline algorithm YOLOv5 by 6% in improving the precision of detecting small SAR ship targets, significantly enhancing the model’s detection performance.
A multi-layer degradation module was proposed aiming at the problem that most remote sensing image super-resolution models rarely consider the impact of noise, blur, JPEG compression, and other factors on image reconstruction, as well as the limitations of Transformer modules in capturing high-frequency information. A CNN-Transformer hybrid network was designed, where CNN captures high-frequency details and Transformer extracts global information. These two components were combined by an attention-based aggregation module, enhancing local high-frequency detail reconstruction while maintaining global structural coherence. The model was tested on six random scenes from the AID dataset and compared with the MM-realSR model in PSNR and SSIM. Results show an average PSNR improvement of 1.61 dB and a SSIM increase of 0.023 over MM-realSR.
The research on the influence of emotions on false memory helps explore the memory-processing mechanisms of the brain. The EEG signals of false memories under different emotion states were collected. The microstate analysis was used to obtain the template maps for each emotion group named from microstate 1 to microstate 5, the time segmentation of the four stages of memory recognition (early processing, familiar processing, episodic recall processing and post-extraction processing) for the emotion groups were divided according to the microstate fitting results, and the phase-locked brain functional networks were constructed in microstates with significant difference in time coverage. The results analyzed of EEG signals from both the temporal perspective and the spatial perspective show that the brain processing patterns of the emotion groups begin to appear different from the episode recall processing stage. The positive group remains in the active microstates 3 and 5 of the prefrontal region and has strong brain function, the negative group remains in microstate 1 and has poor brain function, and the neutral group remains in the active microstates 3 and 4 of the central region. The positive group spends more time and mental resources on plot association and reasoning, while the negative group stays depressed for a longer time, and the neutral group devotes more time and mental resources to information integration.
An improved YOLOv7-tiny detection algorithm was proposed to address the problems such as various types of surface defects in aluminum profiles, large differences in defect scales and missed detection of small target defects. The spatial pyramid pooling module was reconstructed by utilizing the residual structure, parameter-free attention mechanism (SimAM), activation function (FReLU) and clipping convolution to capture more detailed information and strengthen the multi-scale learning ability of the network. The optimized detection layer was used to obtain more small target features and location information, and improve the detection ability of network multi-scale defect. Partial convolution was introduced to replace the 3×3 convolution in the efficient layer aggregation network (ELAN), then the lightweight model was used to reduce the computing and training burden. Combined with the similarity of normalized Wasserstein distance (NWD) loss measurement, the network convergence was accelerated and the detection ability of small target defects was improved. Test was conducted on the Tianchi aluminium profile dataset, and the results showed that the improved YOLOv7-tiny algorithm achieved the accuracy, recall, mean average accuracy (mAP@0.5) and detection speed of 95.0%, 91.8%, 94.5% and 45 frames per second, respectively, when the confidence threshold was 0.25. Compared with the original algorithm, the mAP@0.5 of the improved algorithm was increased by 4.2 percentage point as a whole, the average accuracy (AP) of the dirty spot defect was increased by 13.1 percentage point; the detection results of the improved algorithm for low-resolution images and interfered images was better than of the original algorithm, which showed that the proposed method had better generalization and anti-interference ability.
Aiming at the problem of heavy load, large hysteresis and large friction disturbance of segment assembly machine, the precise control of hydraulic translation systems under friction disturbances was addressed through accurate model identification and the implementation of the iPIDD2 algorithm, to improve the accuracy and efficiency of automatic segment assembly. Initially, a signal preprocessing method combining multiple algorithms for noise reduction was proposed based on the theoretical model to preprocess the output signal. Subsequently, a deviation-compensating recursive least squares identification algorithm with a forgetting factor was adopted to obtain a more accurate hydraulic system model. To achieve precise control of the translational motion of the assembly machine under friction disturbances, the iPIDD2 control algorithm was proposed to achieve precise control of the translation cylinder. The research results were validated through AMESim-Simulink co-simulation and the construction of an electro-hydraulic servo system experimental platform with real-time control systems. Full-scale experimental verification was conducted under different load conditions. Results showed that compared with PID, this method had better precise control performance and smaller hysteresis time under parameter uncertainty and friction disturbance. The displacement tracking of this method was stable. The state error was less than 3 mm, which was 77.6% smaller than the maximum tracking error of PID control, and the hysteresis time was reduced by more than 10 s. This method held significant potential for improving the assembly precision and efficiency of automatic shield segment assembly under friction disturbances.
A cloud removal network for remote sensing imagery that integrated synthetic aperture radar (SAR) and optical data was proposed to address the issues of unstable performance and uneven color tones in existing deep learning-based cloud removal methods. The true texture information from SAR images and the spatial-spectral feature information from optical images were used to construct feature reconstruction tasks both globally and locally, and these tasks guided the network to rebuild missing information in cloud-covered areas. The dual-activation gated convolutional blocks and the channel attention blocks were utilized to build a spatial-spectral feature inference and reconstruction block which significantly enhanced the network’s ability to extract features from useful information in non-cloud areas. The SEN12MS-CR-TS dataset was divided into four subsets based on different cloud morphologies and cloud contents for training and testing. The experimental results showed that the peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) of the proposed method were 1.038 4 dB and 0.091 5, respectively, which were higher than those of the best cloud removal methods. Thus the remote sensing image thick cloud removal network, which integrates SAR and optical data, can effectively remove clouds from images and reconstruct the details beneath the clouds.
A classification method based on convolutional block attention module (CBAM) and Inception-V4 convolutional neural network was proposed to improve the classification accuracy of group EEG signals of imagined speech. CBAM was used to emphasize significant localized areas and extract distinctive features from the output feature map of convolutional neural network (CNN), so as to improve the classification performance of group EEG signals of imagined speech. The group EEG signals of imagined speech were converted into time-frequency images by short-time Fourier transform, then the images were used to train the Inception-V4 network incorporating with CBAM. Experiments on an open-accessed dataset showed that the proposed method achieved an accuracy of 52.2% in classifying six types of short words, which was 4.1 percentage points higher than that with Inception-V4 and was 5.9 percentage points higher than that with VGG-16. Furthermore, the training time can be reduced greatly with transfer learning.
A path planning method integrating B-spline technique and genetic algorithm was proposed, aiming at the path planning problem of robots in complex obstacle environments. Firstly, a strategy based on the multi-objective A* algorithm for generating path-type value points as well as inversing the control points was designed to generate a high-quality initial population, so as to increase the population diversity and improve the early convergence speed of the algorithm. Secondly, a novel fitness function was designed by integrating the continuity, safety and shortest of path, and the fitness value of each path was calculated. Then, the adaptive strategy was introduced to adjust the crossover and mutation operators to increase the diversity of individuals and avoid premature convergence to local optimal solutions. Finally, simulation experiments of the proposed algorithm were conducted based on MATLAB. The experimental results in complex static environment showed that the length of the robot traveling path generated by the proposed algorithm was reduced by an average of 8.22% and 2.15%, and the prematurity was reduced by an average of 88.31% and 77.08%, compared with the paths generated by GABE and IPSO-SP methods. And the paths had a second-order continuum derivability (i.e., C2 continuum), which improved the robot’s traveling stability. Simultaneously, the proposed algorithm was verified to be able to complete the path planning efficiently in real environments through navigation experiments by combining with the robot operation platform.
A multi-lane cellular automata model was established to study the influence of long-distance interweaving zones on traffic flow in urban expressways. Considering the lane-changing behavior and the intensity of lane-changing needs of vehicles at different positions within the long-distance weaving section, three distinct lane-changing rules were introduced and the long-distance weaving area was segmented accordingly. Cellular models under different traffic management strategies were constructed, considering factors such as dynamic safety distances and traffic flow management. Simulation revealed that mandatory lane-changing behavior within long-distance weaving sections easily led to localized congestion, forming bottlenecks at entrances and exits. Although the double dashed-line strategy provided more opportunities for lane-changing vehicles to exit, this advantage gradually diminished with an increasing occupancy rate. In comparison, the dashed-solid line strategy appeared more reasonable. The dashed-solid line strategy with a main road priority, while maintaining the right of way for vehicles exiting from the main road, inevitably sacrificed some efficiency in the movement of vehicles on the secondary road. However, considering the intermittent traffic flow characteristics of the secondary road, the solid-dashed line strategy 1 (the main road exits first, then followed by the secondary road) still held certain practical value.
Aiming at the problems of the existing methods, such as the difficulty of multi-scale feature extraction and the inaccuracy of target edge segmentation in remote sensing images, a new semantic segmentation algorithm was proposed. CNN and Efficient Transformer were utilized to construct a dual encoder to decouple context and spatial information. A feature fusion module was proposed to enhance the information interaction between the encoders, effectively fusing the global context and local detail information. A hierarchical Transformer structure was constructed to extract feature information at different scales, allowing the encoder to focus effectively on objects at different scales. An edge thinning loss function was proposed to mitigate the problem of inaccurate target edge segmentation. Experimental results showed that mean intersection over union (MIoU) of 72.45% and 82.29% was achieved by the proposed algorithm on the ISPRS Vaihingen and ISPRS Potsdam datasets, respectively. On the SOTA, SIOR, and FAST subsets of the SAMRS dataset, the MIoU of the proposed algorithm was 88.81%, 97.29%, and 86.65%, respectively, overall accuracy and mean intersection over union metrics were better than those of the comparison models. The proposed algorithm has good segmentation performance on various types of targets with different scales.
A multi-human-robot collaboration task allocation framework considering both caregiver’s fatigue and elderly satisfaction was proposed in order to balance the subjective feelings of caregivers and elderly people. A mathematical model of caregiver’s fatigue was established by considering factors such as caregiver’s rest duration before task execution, the rapport between caregivers and elderly people, and task difficulty. A multi-objective optimization model for multi-human-robot collaboration task allocation was developed combined with elderly satisfaction. A two-dimensional double-constraint encoding method and its reasonable initialization and updating methods were proposed based on the characteristics of common tasks in elderly care scenarios. A multi-objective evolutionary algorithm was employed to solve the multi-objective optimization model by using this encoding. The final task execution plan was determined from the Pareto optimal solution set according to the min-max and max-min principles in order to prevent situations where individual caregivers experience extreme fatigue or individual elderly people have extremely low satisfaction. The simulation results demonstrate that the multi-task allocation framework for ‘multiple caregivers and multiple robots’ collaboration can achieve task allocation within a multi-caregiver and multi-robot team in the proposed elderly care scenario while balancing caregiver’s fatigue and elderly satisfaction, as well as maintaining a balance between the overall and individual caregivers, and between the overall and individual elderly people.
There are two major challenges in current research on pedestrian trajectory prediction: 1) how to effectively extract the spatial-temporal correlation between the front and back frames of pedestrians; 2) how to avoid performance degradation due to the influence of sampling bias in the trajectory sampling process. In response to the above two problems, a pedestrian trajectory prediction model was proposed based on the dual-attention spatial-temporal graph convolutional network and the purposive sampling network. Temporal attention was utilized to capture the correlation between the front and back frames, and spatial attention was utilized to capture the correlation between the surrounding pedestrians. Subsequently, the spatial-temporal correlations between pedestrians were further extracted by spatial-temporal graph convolution. Meanwhile, a learnable sampling network was introduced to resolve the problem of uneven distribution caused by random sampling. Extensive experiments showed that the accuracy of this method was comparable to that of the current state-of-the-art methods on the ETH and UCY datasets, but the number of model parameters and the inference time were reduced by 1.65×104 and 0.147 s, respectively; while the accuracy on the SDD dataset slightly decreased, but the amount of model parameters was reduced by 3.46×104, which showing a good performance balance. The proposed model can provide a new effective way for pedestrian trajectory prediction.
Robotic harvesters face challenges in identifying apples under complex natural conditions such as unstable lighting, high fruit diversity, and severe leaf occlusion, which impedes the capture of key features, reducing harvesting efficiency and accuracy. An enhanced apple detection algorithm based on the YOLOv7 model for complex scenarios was proposed. A limited contrast adaptive histogram equalization technique was employed to enhance the contrast of apple images, reducing the background interference and clarifying the target contours. A multi-scale hybrid adaptive attention mechanism was introduced. The features were decomposed and reconstructed, and the spatial and channel attention directives were synergistically integrated to optimize multi-layer feature modeling over various distances, thereby boosting the model’s capability to extract apple features and resist background noise. Full-dimensional dynamic convolution was implemented to refine the feature selection process through a meticulous attention mechanism. The number of detection heads was increased to address the challenges of detecting small targets. The Meta-ACON activation function was used to optimize the attention allocation during feature extraction process. Experimental results demonstrated that the improved YOLOv7 model, achieved average accuracy and recall rates of 85.7% and 87.0%, respectively. Compared to Faster R-CNN, SSD, YOLOv5, and the original YOLOv7, the average detection precision was improved by 15.2, 7.5, 4.5, and 2.5 percentage points, and the average recall was improved by 13.7, 6.5, 3.6, and 1.3 percentage points, respectively. The model exhibits exceptional performance, providing robust technical support for apple growth monitoring and mechanical harvesting research.
The distribution law of lining defects in 19 high-speed railway tunnels was statistically analyzed, and the calculation method of surrounding rock pressure in tunnels with lining voids was modified. A numerical simulation was employed to analyze the combined defects’ impact on the internal forces and safety of super-large-span railway tunnels during their service period. Results show that combined defects of voids and thinning account for 39.3% of the statistical defects, making them the most frequent type in railway tunnels. The highest occurrence of combined defects is at the tunnel vault compared with other lining parts, with a frequency of 0.78. The surrounding rock pressure in the descending section of the void-affected region was replaced with a power function distribution, and a calculation method for the surrounding rock pressure in tunnels with voids was derived, which better reflects reality compared to the existing linear distribution. Combined defects cause significant variations in the internal forces and safety factors of super-large-span tunnel lining structures in both the defect-affected and influence zones, with the safety factor decreasing by up to 61.68%. Combined defects and lining degradation significantly impact the service safety of super-large-span tunnels, and the longer the service life, the more pronounced the effect on structural safety. Based on the minimum safety factors of 2.0, 2.4, and 2.8 for plain concrete, and 1.70, 2.04, and 2.38 for reinforced concrete, a safety evaluation and management standard for the linings of super-large-span railway tunnels with combined defects has been established.
Based on the statistics of typical cases of water and sand gushing in subway tunnel construction stages in China from 2002 to 2019, the disaster characteristics were analyzed from the aspects of disaster occurrence characteristics, disaster geological environment and hazard factors. According to the geological environment, causes and forms of disaster sources and engineering conditions, the disaster-causing structure of water and sand gushing in the subway tunnels was classified into 3 categories including 12 types. The first category is large-scale unfavorable geological bodies, including fault and weak fracture zone type, karst and underground rivers type, interlayer fracture zone type, weathering deep groove type, intrusive rocks type, and underwater sandy stratum type. The second category is water-bearing sand and soft soil stratum, including overlying/invading soft soil type, upper-soft and lower-hard composite stratum type, water-rich sandy stratum type, and ground cavity/water bag and silt stratum type. The third category is artificial underground water-rich space, including underground water transmission pipes type, abandoned mining spaces and air-raid shelters filled with water type. Three typical disaster modes of water and sand gushing in subway tunnels with the soil surrounding rock were proposed based on the mechanical characteristics of soil instability and failure, namely, sliding failure mode, breaking failure mode, and seepage failure mode.
A trusted distributed industrial data governance solution was designed based on blockchain technology in order to address the issue of the lack of a unified product data sharing service in current industrial systems, which limited users’ access to credible product traceability information. This solution enabled efficient and secure product data sharing and governance. Product data was compressed and encrypted off-chain by the data generator before it was submitted to the blockchain system. The system supported off-chain/on-chain data access through two types of blockchain transactions (producer transactions and data transactions) in order to ensure the availability of product data during the off-chain process. A hybrid access control mechanism was implemented to encrypt product data and share secret keys exclusively with authorized data users. This solution effectively protected the privacy of product data, provided fine-grained access control, and ensured end-to-end traceability of the entire product data generation process. Performance tests showed that the computation and communication costs during the key generation phase did not exceed 81.592 ms and 2.83 kB respectively on the secp256k1 elliptic curve (providing 128 bit security). The data submission phase incured cost of no more than 50.251 ms and 3.59 kB, the data update phase did not exceed 251.596 ms, and the data retrieval time remained under 311.104 ms. Performance comparisons with similar schemes confirmed the efficiency of this solution.
An unmanned aerial vehicle (UAV) small target detection algorithm based on YOLOv5, termed FDB-YOLO, was proposed to address the significant issue of misidentification and omissions in traditional target detection algorithms when applied to UAV aerial photography of small targets. Initially, a small target detection layer was added on the basis of YOLOv5, and the feature fusion network was optimized to fully leverage the fine-grained information of small targets in shallow layers, thereby enhancing the network’s perceptual capabilities. Subsequently, a novel loss function, FPIoU, was introduced, which capitalized on the geometric properties of anchor boxes and utilized a four-point positional bias constraint function to optimize the anchor box positioning and accelerate the convergence speed of the loss function. Furthermore, a dynamic target detection head (DyHead) incorporating attention mechanism was employed to enhance the algorithm’s detection capabilities through increased awareness of scale, space, and task. Finally, a bi-level routing attention mechanism (BRA) was integrated into the feature extraction phase, selectively computing relevant areas to filter out irrelevant regions, thereby improving the model’s detection accuracy. Experimental validation conducted on the VisDrone2019 dataset demonstrated that the proposed algorithm outperformed the YOLOv5s baseline in terms of Precision by an increase of 3.7 percentage points, Recall by an increase of 5.1 percentage points, mAP50 by an increase of 5.8 percentage points, and mAP50:95 by an increase of 3.4 percentage points, showcasing superior performance compared to current mainstream algorithms.
The development and research status of YOLO algorithm in traffic object detection were systematically summarized from the perspective of the three core elements of 'people-vehicle-road' in order to comprehensively analyze the important role of YOLO (You Only Look Once) algorithm in improving traffic safety and efficiency. The commonly used evaluation indexes of YOLO algorithm were outlined, and the practical significance of these indexes in traffic scenarios was elaborately expounded. An overview of the core architecture of YOLO algorithm was provided, its development process was traced, and the optimization and improvement measures in each version iteration were analyzed. The research status and application scenarios of YOLO algorithm for traffic object detection were sorted out and discussed from the perspective of the three traffic objects 'people-vehicle-road'. The limitations and challenges of YOLO algorithm in traffic object detection were analyzed, and corresponding improvement methods were proposed. Future research focuses were anticipated, providing a research reference for the intelligent development of road traffic.
State-of-the-art data-driven intelligent computations (DDICs) were comprehensively reviewed in order to effectively solve the increasingly complex and expensive optimization problems (EOPs) emerging in real-world applications, which can effectively reduce computing costs and improve solutions. The latest research achievements of DDICs were outlined from both algorithm and application perspectives. Various technical points in generalized DDICs and adaptive DDICs were summarized and categorized. The challenges and opportunities faced by DDICs in solving EOPs were analyzed. Future research potential trends were proposed, such as conducting deeper theoretical analyses, exploring novel learning paradigms, applying these methods in various practical fields, and so on. This aims to provide targeted references and directions for researchers, stimulating innovative ideas to more effectively address the complex EOPs encountered in real-world applications.