动态环境无人机导航的安全分层强化学习框架
商益铭,杜昌平,杨睿,方天睿,杜泽安,郑耀

Safe hierarchical reinforcement learning framework for dynamic UAV navigation
Yiming SHANG,Changping DU,Rui YANG,Tianrui FANG,Ze’an DU,Yao ZHENG
表 4 仿真实验中的主要控制参数与环境参数
Tab.4 Main control parameter and environmental parameter in simulation experiment
参数数值参数数值参数数值参数数值
$ {r}_{\text{UAV}} $/m1.0$ {n}_{\text{dyn}} $2$ {W}_{\text{AC}} $0.5$ {W}_{\text{cbf}} $0.3
$ {r}_{\mathrm{g}} $/m3.0$ {W}_{\mathrm{g}} $200$ \Delta t $/s0.1$ {W}_{0} $0.3
$ {r}_{\min } $/m2$ {W}_{\text{dir}} $1.8$ {d}_{\text{fa}\mathrm{r}} $/m70$ {\alpha }_{\tan } $0.8
$ {r}_{\max } $/m6$ {W}_{\text{col}} $20$ {d}_{\text{nar}} $/m25$ {d}_{\text{bnd}} $/m0.8
$ {r}_{\text{sen}} $/m10$ {W}_{\text{bnd}} $20$ \alpha $0.8$ {\lambda }_{\text{smo}} $0.3
$ {v}_{\max } $/(m·s−1)5.0$ {W}_{\text{sta}} $1.0$ {\beta }_{\text{DWA}} $1.2$ {C}_{\text{stp}} $2
$ {\omega }_{\max } $/(rad·s−1)$ \text{π} /2 $$ {W}_{\text{dyn}} $2.0$ \gamma $0.15$ {C}_{\text{ep}} $15
$ {v}_{\mathrm{o}1} $/(m·s−1)1.0$ {W}_{\text{dev}} $1.5$ \delta $0.02$ {\alpha }_{\text{stp}} $0.008
$ {v}_{\mathrm{o}2} $/(m·s−1)2.5$ {W}_{\text{spd}} $1.2$ \varepsilon $0.6$ {\alpha }_{\text{ep}} $0.01
$ {\theta }_{\text{fov}} $/(°)120$ {W}_{\text{eng}} $1.8$ {d}_{\text{TH}} $/m20$ {\beta }_{\text{PDO}} $0.85
$ {N}_{\max } $8$ {W}_{\text{tim}} $0.3$ {v}_{\text{TH}} $/(m·s−1)0.3$ {\lambda }_{\max } $20
$ {n}_{\text{sta}} $6$ {W}_{\text{DWA}} $1.5$ {t}_{\text{TH}} $/s6$ {D}_{\mathrm{s}} $/m3