1. Micro-satellite Research Center, Zhejiang University, Hangzhou 310027, China 2. Zhejiang Key Laboratory of Micro-nano Satellite Research, Hangzhou 310027, China
The satellite observation task planning has the characteristics of complex constraints, large solution space, and unfixed length of input task sequence. The deep reinforcement learning (DRL) method was used to solve the problems. The satellite observation task planning problem was modeled by taking into account the constraints of time windows, transfer time between tasks, and satellite power and memory constraints. A sequence decision algorithm model was established based on the operating mechanism of pointer networks (PN), Mask vector was used to consider various constraints in the satellite observation task planning problem, and the model was trained by Actor Critic reinforcement learning algorithm to obtain the maximum reward. The PN was improved by referring to the multi-head attention (MHA) mechanism, and the multi-head attention pointer networks (MHA-PN) algorithm was proposed. Experimental results show that the MHA-PN algorithm significantly improves the training speed and the generalization performance of the model, and the trained MHA-PN algorithm model can carry out end-to-end reasoning on the input sequence, avoiding the iterative solution process of traditional heuristic algorithm, has a high efficiency of solution.
关键词:
卫星观测任务规划,
组合优化问题,
深度强化学习,
指针网络(PN),
Actor Critic,
多头注意力指针网络(MHA-PN)
Fig.1Model structure of MHA-PN algorithm
元素
设定
元素
设定
ws
[0,4.0]
m
[0, 0.01]
pos
[0,0.5]
win
1
we
[ws+0.03,ws+0.08]
acc
1
d
[0.01,0.02]
mem
0.5
r
[0.1,0.9]
pow
0.5
e
[0,0.01]
pos
0
Tab.1Element settings for each task in dataset
Fig.2Loss and reward curves for model training
Fig.3Schematic diagram of reasoning results of model
算法
R /%
T /(s·epoch?1)
Vrate /%
PN
69.2
7 214.7
?
MHA-PN
69.6
5 770.9
20.0
Tab.2Comparison of reward rate and training time between PN algorithm and MHA-PN algorithm
Fig.4Distribution of model reward at different lengths
%
算法
n=50
n=100
n=125
n=150
n=175
n=200
PN
68.75
53.05
44.72
32.88
27.38
25.31
MHA-PN
69.45
53.36
48.91
44.43
41.68
38.11
Tab.3Mean comparison of model rewards at different lengths
n
算法
R /%
$T' $ /s
50
ACO
30.52
0.904
50
MHA-PN
59.45
0.194
75
ACO
23.31
1.497
75
MHA-PN
50.75
0.221
100
ACO
19.75
2.293
100
MHA-PN
45.73
0.370
Tab.4Comparison of reward rate and solution time between MHA-PN algorithm and ACO algorithm
[1]
刘晓路, 何磊, 陈英武, 等. 面向多颗敏捷卫星协同调度的自适应大邻域搜索算法[C]//第四届高分辨率对地观测学术年会论文集. 武汉: 国防科学技术大学信息系统与管理学院, 2017. LIU Xiao-lu, HE Lei, CHEN Ying-wu, et al. An adaptive large neighborhood search algorithm for multiple agile satellites [C]// Proceedings of 4th Annual Conference on High Resolution Earth Observation. Wuhan: School of Information Systems and Management, National University of Defense Technology, 2017.
[2]
郭浩, 邱涤珊, 伍国华, 等 基于改进蚁群算法的敏捷成像卫星任务调度方法[J]. 系统工程理论与实践, 2012, (11): 185- 191 GUO Hao, QIU Di-shan, WU Guo-hua, et al Agile imaging satellite task scheduling method based on improved ant colony algorithm[J]. System Engineering Theory and Practice, 2012, (11): 185- 191
[3]
王法瑞. 基于改进遗传算法的微小卫星自主任务规划方法研究[D]. 哈尔滨: 哈尔滨工业大学, 2017. WANG Fa-rui. Research on autonomous task planning method of micro satellite based on improved genetic algorithm [D]. Harbin: Harbin Institute of Technology, 2017.
[4]
贺仁杰, 高鹏, 白保存, 等 成像卫星任务规划模型、算法及其应用[J]. 系统工程理论与实践, 2011, (3): 29- 40 HE Ren-jie, GAO Peng, BAI Bao-cun, et al Imaging satellite mission planning model, algorithm and application[J]. System Engineering Theory and Practice, 2011, (3): 29- 40
[5]
陈英武, 方炎申, 李菊芳, 等 卫星任务调度问题的约束规划模型[J]. 国防科技大学学报, 2006, (5): 126- 132 CHEN Ying-wu, FANG Yan-shen, LI Ju-fang, et al Constrained programming model for satellite task scheduling problem[J]. Journal of National University of Defense Technology, 2006, (5): 126- 132
doi: 10.3969/j.issn.1001-2486.2006.05.026
CHUNG J, GULCEHRE C, CHO K, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling [J/OL]. [2020-02-29]. https://arxiv.org/abs/1412.3555.
[8]
BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate [J/OL]. [2020-02-29]. https://arxiv.org/abs/1409.0473.
[9]
VINYALS O, FORTUNATO M, JAITLY N. Pointer networks [C]// International Conference on Neural Information Processing Systems. Montreal: MIT Press, 2015.
[10]
BELLO I, PHAM H, LE Q V, et al. Neural combinatorial optimization with reinforcement learning [J/OL]. [2020-02-29]. https://arxiv.org/abs/1611.09940.
[11]
MNIH V, BADIA A P, MIRZA M, et al. Asynchronous methods for deep reinforcement learning [J/OL]. [2020-02-29]. https://arxiv.org/abs/1602.01783.
[12]
NAZARI M, OROOJLOOY A, SNYDER L, et al. Reinforcement learning for solving the vehicle routing problem [J/OL]. [2020-02-29]. https://arxiv.org/abs/1802.04240.
[13]
DAI H, KHALIL E B, ZHANG Y, et al. Learning combinatorial optimization algorithms over graphs [J/OL]. [2020-02-29]. https://arxiv.org/abs/1704.01665.
[14]
DAI H, DAI B, SONG L. Discriminative embeddings of latent variable models for structured data [J/OL]. [2020-02-29]. https://arxiv.org/abs/1603.05629.
[15]
KOOL W, HOOF H V, WELLING M. Attention, learn to solve routing problems! [J/OL]. [2020-02-29]. https://arxiv.org/abs/1803.08475.
[16]
VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [J/OL]. [2020-02-29]. https://arxiv.org/abs/1706.03762v4.
[17]
VINYALS O, BENGIO S, KUDLUR M. Order matters: sequence to sequence for sets [J/OL]. [2020-02-29]. https://arxiv.org/abs/1511.06391.
[18]
KINGMA D, BA J. Adam: a method for stochastic optimization [J/OL]. [2020-02-29]. https://arxiv.org/abs/1412.6980.
[19]
赵凡宇. 航天器多目标观测任务调度与规划方法研究[D]. 北京: 北京理工大学, 2015. ZHAO Fan-yu. Research on scheduling and planning methods of spacecraft multi-object observation mission [D]. Beijing: Beijing Institute of Technology, 2015.