A vehicle motion planning algorithm based on deep reinforcement learning was proposed to satisfy the efficiency and comfort requirements of intelligent connected vehicles at unsignalized intersections. Temporal convolutional network (TCN) and Transformer algorithms were combined to construct the intention prediction model for surrounding vehicles. The multi-layer convolution and self-attention mechanisms were used to improve the capability of capturing vehicle motion feature. The twin delayed deep deterministic policy gradient (TD3) reinforcement learning algorithm was employed to build the vehicle motion planning model. Taking the driving intention of surrounding vehicle, driving style, interaction risk, and the comfort of ego vehicle into consideration comprehensively, the state space and reward functions were designed to enhance understanding the dynamic environment. Delaying the policy updates and smoothing the target policies were conducted to improve the stability of the proposed algorithm, and the desired acceleration was output in real-time. Experimental results demonstrated that the proposed motion planning algorithm can perceive the real-time potential interaction risk based on the driving intention of surrounding vehicles. The generated motion planning strategy met the requirements of the efficiency, safety and comfort. It showed excellent adaptability to different styles of surrounding vehicles and dense interaction scenarios, and the success rates exceeded 92.1% in various scenarios.
Foundational models in natural language processing, computer vision and multimodal learning have achieved significant breakthroughs in recent years, showcasing the potential of general artificial intelligence. However, these models still fall short of human or animal intelligence in areas such as causal reasoning and understanding physical commonsense. This is because these models primarily rely on vast amounts of data and computational power, lacking direct interaction with and experiential learning from the real world. Many researchers are beginning to question whether merely scaling up model size is sufficient to address these fundamental issues. This has led the academic community to reevaluate the nature of intelligence, suggesting that intelligence arises not just from enhanced computational capabilities but from interactions with the environment. Embodied intelligence is gaining attention as it emphasizes that intelligent agents learn and adapt through direct interactions with the physical world, exhibiting characteristics closer to biological intelligence. A comprehensive survey of embodied artificial intelligence was provided in the context of foundational models. The underlying technical ideas, benchmarks, and applications of current embodied agents were discussed. A forward-looking analysis of future trends and challenges in embodied AI was offered.
A new estimation network was proposed for improving the insufficient occlusion handling ability of existing human pose estimation methods. An occluded parts enhanced convolutional network (OCNN) and an occluded features compensation graph convolutional network (OGCN) were included in the proposed network. A high-low order feature matching attention was designed to strengthen the occlusion area features, and high-adaptation weights were extracted by OCNN, achieving enhanced detection of the occluded parts with a small amount of occlusion data. OGCN strengthened the shared and private attribute compensation node features by eliminating the obstacle features. The adjacency matrix was importance-weighted to enhance the quality of the occlusion area features and to improve the detection accuracy. The proposed network achieved detection accuracy of 78.5%, 67.1%, and 77.8% in the datasets COCO2017, COCO-Wholebody, and CrowdPose, respectively, outperforming the comparative algorithms. The proposed network saved 75% of the training data usage in the self-built occlusion dataset.
A driver-automation shared control strategy based on non-cooperative game (NCG) theory was proposed in order to reduce the conflict operations between the driver and intelligent system during the co-driving. The lane-keeping shared control problem was mathematically described by the first-order differential equation based on the linear two degree-of-freedom vehicle model. The NCG theory was employed to resolve the weight allocation problem of the shared control system, where the decision makers would act on the same dynamic system. The driving control authority was designed. Then the smooth transition of driving control authority between the driver and intelligent system was achieved by utilizing the preview offset distance (POD) to update the confidence matrix. The desired front wheel angle of lane-keeping shared control was transformed into an online quadratic programming problem formulated as a quadratic cost function with linear inequality constraints based on the model predictive control (MPC) framework. The shared control strategy was validated on the driver-in-the-loop CarSim/Simulink platform. Results demonstrate that such strategy can well-guarantee lateral tracking accuracy and the priority of the driver’s control authority.
An enhanced genetic algorithm was proposed to address the challenge of area coverage path planning for a tilt-rotor unmanned aerial vehicle (TRUAV) amidst multiple obstacles. A preliminary coverage path plan for the designated task area was devised, utilizing the minimum spanning and back-and-forth path generation algorithms. The area coverage dilemma was transformed into a traveling salesman problem to optimize the sequence of the coverage path. A fishtail-shaped obstacle avoidance strategy was proposed to circumvent obstacles within the region. The nearest neighbor algorithm was introduced to generate a superior initial population than a genetic algorithm. A three-point crossover operator and a dynamic interval mutation operator were adopted in the genetic processes to improve the proposed algorithm's global search capacity and prevent the algorithm from falling into local optima. The efficacy of the proposed algorithm was rigorously tested through simulations in polygonal areas with multiple obstacles. Results showed that, compared to the sequential path coverage algorithm and the genetic algorithm, the proposed algorithm reduced the length of the coverage path by 7.80%, significantly enhancing the coverage efficiency of TRUAV in the given task areas.
A multi-modal augmented model for click through rate (MMa4CTR) tailored for micro-videos recommendation was proposed. Multi-modal data derived from user interactions with micro-videos were effectively leveraged to construct embedded user representations and capture diverse user interests across multi-modal. The aim was to reveal the latent semantic commonalities, by combining and crossing features across modalities. The overall recommendation performance was boosted via two training strategies, automatic learning rate adjustment and validation interruption. A computationally efficient multi-layer perceptron architecture was employed, in order to address the computational demands brought on by the vast amount of multi-modal data. Performance comparison experiments and sensitivity analyses of hyperparameter on WeChat Video Channel and TikTok datasets demonstrated that MMa4CTR outperformed baseline models, delivering superior recommendation results with minimal computational resources. Additionally, ablation studies performed on both datasets further validated the significance and efficacy of the micro-video modality cross module, the user multi-modal embedding layer, and the strategies for automatic learning rate adjustment and validation interruption in enhancing recommendation performance.
A new design scheme of crab-like hexapod origami robot was proposed by combining the origami structure with the multi-legged robot design and coupling Miura origami and six-fold origami aiming at the problems that the existing origami robots have a single structure and insufficient flexibility in movement. The motion configuration of the origami robot was expanded, and the motion flexibility of the origami robot was improved. Each leg of the robot has two degrees of freedom under the symmetry hypothesis. The vertices of the robot legs were treated as joints, and the crease lines were regarded as links. A planar link equivalent model of the robot legs was established with the folding angle as the motion variable. The theoretical range of motion for the robot’s foot was determined through simulation calculations. Then tapered panel technique was utilized to thicken the folding surfaces and prevent physical interference between adjacent folding surfaces. A three-dimensional model of the origami crab-like hexapod robot was constructed. The relationship between the folding angle and foot motion was analyzed based on the equivalent model of planar links, and the foot motion trajectory and gait of the robot were designed. The experimental prototype of origami bionic hexapod robot was designed and manufactured by using 3D printing technology, and the lateral movement of the robot was realized based on STM32 microcontroller control. Results show that the origami bio-inspired robot can realize the conversion from plane configuration to a crab-like configuration. The robot can move smoothly left and right under the coordinated movement of six legs.
A cross-domain recommendation model that utilizes source domain data augmentation and multi-interest refinement transfer was proposed in order to address the issues of difficulty in modeling interest preferences in cross-domain recommendation tasks caused by the lack of user interaction data in the source domain, as well as the problem of ignored associations between multiple interests. A source-domain data augmentation strategy was introduced, generating a denoised auxiliary sequence for each user in the source domain. Then the sparsity of user interaction data in the source domain was alleviated, and enriched user interest preferences were obtained. The interest extraction and multi-interest refinement transfer were implemented by utilizing the dual sequence multi-interest extraction module and the multi-interest refinement transfer module. Three publicly cross-domain recommendation evaluation tasks were conducted. The proposed model achieved the best performance compared with the best baseline, reducing the average MAE by 22.86% and the average RMSE by 19.65%, which verified the effectiveness of the method.
A multi-agent parking simulation framework was constructed in order to formulate autonomous vehicle (AV) parking demand management strategies. Two charging strategies for empty-load driving were proposed: a static charge based on driving distance and a dynamic charge based on road congestion levels. Rate calculation method was analyzed. Cost functions for parking lots, residential parking, and continuous empty cruising were established under these charging policies. A logit model was used to describe the choice behavior under different parking modes. The simulation of urban mobility (SUMO) was used to conduct a large-scale road network simulation experiment in Nanning’s main urban area. AV parking behavior and road network operation under both strategies were analyzed. The simulation results showed that the empty-load driving mileage of AVs decreased by 20.16% and 10.85% under the static and dynamic charging strategies, respectively. Total vehicle delay decreased by 39.80% and 43.52%, respectively. The dynamic charging strategy was adjustable in real-time based on road conditions, and operational efficiency of the road network was significantly enhanced.
A path planning method integrating B-spline technique and genetic algorithm was proposed, aiming at the path planning problem of robots in complex obstacle environments. Firstly, a strategy based on the multi-objective A* algorithm for generating path-type value points as well as inversing the control points was designed to generate a high-quality initial population, so as to increase the population diversity and improve the early convergence speed of the algorithm. Secondly, a novel fitness function was designed by integrating the continuity, safety and shortest of path, and the fitness value of each path was calculated. Then, the adaptive strategy was introduced to adjust the crossover and mutation operators to increase the diversity of individuals and avoid premature convergence to local optimal solutions. Finally, simulation experiments of the proposed algorithm were conducted based on MATLAB. The experimental results in complex static environment showed that the length of the robot traveling path generated by the proposed algorithm was reduced by an average of 8.22% and 2.15%, and the prematurity was reduced by an average of 88.31% and 77.08%, compared with the paths generated by GABE and IPSO-SP methods. And the paths had a second-order continuum derivability (i.e., C2 continuum), which improved the robot’s traveling stability. Simultaneously, the proposed algorithm was verified to be able to complete the path planning efficiently in real environments through navigation experiments by combining with the robot operation platform.
A digital twin system for the mobile operation of legged robots was proposed, encompassing the design of architecture, module structure, hardware framework, and software framework. The system enabled reliable and accurate acquisition of environmental states and robot states in mobile scenarios by integrating multiple sensor inputs and data sources. The point and line feature matching theory was used to optimize the autonomous positioning accuracy and the robustness of the legged robot, and the odometer functionality and the real-time mobile mapping were effectively achieved through integration with the environmental modeling data. A general modeling method was introduced to establish a digital twin model that ensured high consistency between the simulated robot motion state and the real robot motion state through error compensation techniques. Experimental results on both datasets and real robots demonstrated that the proposed digital twin system not only operated stably and efficiently across various legged robot platforms but also ensured the real-time state feedback and the odometer positioning accuracy. Compared with ORB-SLAM3, the memory overhead was reduced by about 68.7%, and the CPU usage was reduced by about 17.8%. The hardware experiments showed that the communication delay was basically consistent with the network delay of about 30 ms, which helped to improve the efficiency of task execution.
A comprehensive evaluation and categorization of blockchain-based mobile crowdsensing (MCS) data processing was conducted, in order to address the wide participation of users, the flexible mobility of collection devices, and the complexity of communication environment in mobile crowdsensing data processing. Firstly, the developments of MCS and blockchain were reviewed, and the challenges of MCS data processing and the characteristics of blockchain were introduced. Secondly, a blockchain-based mobile crowdsensing architecture (BMCA) was designed to achieve decentralized data management, data security assurance, precise data quality evaluation, and enhanced credibility of incentives. Then, existing data processing techniques were sorted from privacy-preserving, data quality evaluation, and incentive mechanism. Finally, the current problems and challenges in resource consumption control, precise data analysis, full-cycle and differentiated privacy-preserving, and integrated mode application of blockchain-based MCS data processing research were discussed, and the potential future research direction was pointed out.
A multi-lane cellular automata model was established to study the influence of long-distance interweaving zones on traffic flow in urban expressways. Considering the lane-changing behavior and the intensity of lane-changing needs of vehicles at different positions within the long-distance weaving section, three distinct lane-changing rules were introduced and the long-distance weaving area was segmented accordingly. Cellular models under different traffic management strategies were constructed, considering factors such as dynamic safety distances and traffic flow management. Simulation revealed that mandatory lane-changing behavior within long-distance weaving sections easily led to localized congestion, forming bottlenecks at entrances and exits. Although the double dashed-line strategy provided more opportunities for lane-changing vehicles to exit, this advantage gradually diminished with an increasing occupancy rate. In comparison, the dashed-solid line strategy appeared more reasonable. The dashed-solid line strategy with a main road priority, while maintaining the right of way for vehicles exiting from the main road, inevitably sacrificed some efficiency in the movement of vehicles on the secondary road. However, considering the intermittent traffic flow characteristics of the secondary road, the solid-dashed line strategy 1 (the main road exits first, then followed by the secondary road) still held certain practical value.
A multi-human-robot collaboration task allocation framework considering both caregiver’s fatigue and elderly satisfaction was proposed in order to balance the subjective feelings of caregivers and elderly people. A mathematical model of caregiver’s fatigue was established by considering factors such as caregiver’s rest duration before task execution, the rapport between caregivers and elderly people, and task difficulty. A multi-objective optimization model for multi-human-robot collaboration task allocation was developed combined with elderly satisfaction. A two-dimensional double-constraint encoding method and its reasonable initialization and updating methods were proposed based on the characteristics of common tasks in elderly care scenarios. A multi-objective evolutionary algorithm was employed to solve the multi-objective optimization model by using this encoding. The final task execution plan was determined from the Pareto optimal solution set according to the min-max and max-min principles in order to prevent situations where individual caregivers experience extreme fatigue or individual elderly people have extremely low satisfaction. The simulation results demonstrate that the multi-task allocation framework for ‘multiple caregivers and multiple robots’ collaboration can achieve task allocation within a multi-caregiver and multi-robot team in the proposed elderly care scenario while balancing caregiver’s fatigue and elderly satisfaction, as well as maintaining a balance between the overall and individual caregivers, and between the overall and individual elderly people.
There are two major challenges in current research on pedestrian trajectory prediction: 1) how to effectively extract the spatial-temporal correlation between the front and back frames of pedestrians; 2) how to avoid performance degradation due to the influence of sampling bias in the trajectory sampling process. In response to the above two problems, a pedestrian trajectory prediction model was proposed based on the dual-attention spatial-temporal graph convolutional network and the purposive sampling network. Temporal attention was utilized to capture the correlation between the front and back frames, and spatial attention was utilized to capture the correlation between the surrounding pedestrians. Subsequently, the spatial-temporal correlations between pedestrians were further extracted by spatial-temporal graph convolution. Meanwhile, a learnable sampling network was introduced to resolve the problem of uneven distribution caused by random sampling. Extensive experiments showed that the accuracy of this method was comparable to that of the current state-of-the-art methods on the ETH and UCY datasets, but the number of model parameters and the inference time were reduced by 1.65×104 and 0.147 s, respectively; while the accuracy on the SDD dataset slightly decreased, but the amount of model parameters was reduced by 3.46×104, which showing a good performance balance. The proposed model can provide a new effective way for pedestrian trajectory prediction.
In order to improve the dynamic performance of the power conversion system (PCS), an improved active disturbance rejection control (ADRC) strategy based on reduced-order cascaded extended state observer (ESO) and complementary sliding mode control (CSMC) was designed and applied to the voltage outer loop of the bidirectional DC/AC converter in the PCS. The ESO was modified to a reduced-order cascaded ESO to improve the estimation speed of the state variables and the overall disturbance, enhancing the disturbance estimation capability. The PD control was replaced with CSMC to design a state error feedback law to enhance the robustness of the system, and an improved exponential reaching law was designed to suppress the chattering phenomenon. A simulation model was established and a related experimental platform was built to demonstrate the superiority of the improved ADRC strategy compared to PI control and traditional ADRC. The simulation and experimental results show that the improved ADRC strategy reduces the fluctuation of the DC bus voltage during the transient operation of the PCS, improves the power response speed on the AC side of the PCS, and enhances the output power quality on the AC side.
A thorough analysis and cross-comparison of recent relevant works was provided, outlining a closed-loop process for EEG data analysis based on deep learning. EEG data were introduced, and the application of deep learning in three key stages: preprocessing, feature extraction, and model generalization was unfolded. The research ideas and solutions provided by deep learning algorithms in the respective stages were delineated, including the challenges and issues encountered at each stage. The main contributions and limitations of different algorithms were comprehensively summarized. The challenges faced and future directions of deep learning technology in handling EEG data at each stage were discussed.
A spatial-temporal multi-graph convolution traffic flow prediction model by integrating static and dynamic knowledge graphs was proposed, as current traffic flow prediction methods focus on the spatial-temporal correlation of traffic information and fail to fully take into account the influence of external factors on traffic. An urban traffic knowledge graph and four road network topological graphs with distinct semantics were systematically constructed, drawing upon the road traffic information and the external factors. The urban traffic knowledge graph was inputted into the relational evolution graph convolutional neural network to realize the knowledge embedding. The traffic flow matrix and the knowledge embedding were integrated using the knowledge fusion module. The four road network topology graphs and the traffic flow matrix with fused knowledge were fed into the spatial-temporal multi-graph convolution module to extract spatiotemporal features, and the traffic flow prediction value was outputted through the fully connected layer. The model performance was evaluated on a Hangzhou traffic data set. Compared with the advanced baseline, the performance of the proposed model improved by 5.76%-10.71%. Robustness experiment results show that the proposed model has a strong ability to resist interference.
An integrated energy distributed low-carbon economic dispatch model that considered multiple flexible resources was proposed, aiming at the problem of insufficient system flexibility and low carbon of integrated energy systems (IES) with multiple parks. Firstly, the flexibility requirements of the system were analyzed, the IES flexibility margin constraints were proposed, and multiple flexibility resource models including carbon capture plants were constructed to make full use of the flexible operation mode of carbon capture plants. Second, ladder-type carbon trading was introduced to establish a two-tier scheduling model for the integrated energy system. The upper layer of the model aimed to minimize the cost of energy supply by energy suppliers, and the lower layer aimed to minimize the operating cost of energy operators consisting of energy hubs (EH). The model was solved by using the objective cascade analysis method to achieve the collaborative scheduling between the upper and lower layers of the energy supplier and energy service provider with respect to the characteristics of the multi-subject operation. Finally, the positive effect of the proposed model on enhancing the system flexibility and low carbon was verified through an arithmetic example consisting of IEEE30-node network, Belgium 20-node gas network and multiple energy hubs.
A dynamic queue length estimation model based on probability statistics and Bayesian theorem was proposed, to solve the problem of queue length estimation at intersections with mixed traffic of intelligent connected vehicles (ICVs) and human-driven vehicles (HDVs). Firstly, taking into account factors such as the position, speed, and penetration rate of ICVs in the queue, models for estimating the queue lengths of observable and unobservable queues, as well as the penetration rate, were constructed. Real-time estimation of queue lengths and penetration rate was achieved through iteration. Then, the distribution characteristics of ICVs in the queue under different penetration rate conditions were simulated using random seeds. The estimation accuracy of the model under different traffic conditions was analyzed. Comparison analysis with existing models showed that, under low penetration rate conditions of ICVs (10%) during off-peak hours, the average absolute percentage error (MAPE) of the proposed model was 29.35%, while the existing model had an MAPE of 59.68%; during peak hours, the MAPE of this model was 26.50%, compared to 34.66% for the existing model. Under high penetration rate conditions of ICVs (90%) during off-peak hours, the MAPE of this model was 6.90%, while the existing model had an MAPE of 17.85%; during peak hours, the MAPE of this model was 1.45%, compared to 1.05% for the existing model, with similar errors. The proposed queue estimation model for mixed traffic of ICVs and human-driven vehicles has better estimation accuracy under both low and high penetration rate conditions.