Please wait a minute...
浙江大学学报(工学版)  2021, Vol. 55 Issue (2): 395-401    DOI: 10.3785/j.issn.1008-973X.2021.02.020
计算机与控制工程     
基于改进指针网络的卫星对地观测任务规划方法
马一凡1,2(),赵凡宇1,2,*(),王鑫1,2,金仲和1,2
1. 浙江大学 微小卫星研究中心,浙江 杭州 310027
2. 浙江省微纳卫星研究重点实验室,浙江 杭州 310027
Satellite earth observation task planning method based on improved pointer networks
Yi-fan MA1,2(),Fan-yu ZHAO1,2,*(),Xin WANG1,2,Zhong-he JIN1,2
1. Micro-satellite Research Center, Zhejiang University, Hangzhou 310027, China
2. Zhejiang Key Laboratory of Micro-nano Satellite Research, Hangzhou 310027, China
 全文: PDF(907 KB)   HTML
摘要:

针对卫星观测任务规划问题约束复杂、求解空间大和输入任务序列长度不固定的特点,使用深度强化学习(DRL)方法对卫星观测任务规划问题进行求解. 综合考虑时间窗口约束、任务间转移机动时间和卫星电量、存储约束,对卫星观测任务规划问题进行建模. 基于指针网络(PN)的运行机制建立序列决策算法模型,使用Mask向量来考虑卫星观测任务规划问题中的各类约束,并通过Actor Critic强化学习算法对模型进行训练,以获得最大的收益率. 借鉴多头注意力(MHA)机制的思想对PN进行改进,提出多头注意力指针网络(MHA-PN)算法. 根据实验结果可以看出,MHA-PN算法显著提高了模型的训练速度和泛化性能,训练好的MHA-PN算法模型可以直接对输入序列进行端到端的推理,避免传统启发式算法迭代求解的过程,具有较高的求解效率.

关键词: 卫星观测任务规划组合优化问题深度强化学习指针网络(PN)Actor Critic多头注意力指针网络(MHA-PN)    
Abstract:

The satellite observation task planning has the characteristics of complex constraints, large solution space, and unfixed length of input task sequence. The deep reinforcement learning (DRL) method was used to solve the problems. The satellite observation task planning problem was modeled by taking into account the constraints of time windows, transfer time between tasks, and satellite power and memory constraints. A sequence decision algorithm model was established based on the operating mechanism of pointer networks (PN), Mask vector was used to consider various constraints in the satellite observation task planning problem, and the model was trained by Actor Critic reinforcement learning algorithm to obtain the maximum reward. The PN was improved by referring to the multi-head attention (MHA) mechanism, and the multi-head attention pointer networks (MHA-PN) algorithm was proposed. Experimental results show that the MHA-PN algorithm significantly improves the training speed and the generalization performance of the model, and the trained MHA-PN algorithm model can carry out end-to-end reasoning on the input sequence, avoiding the iterative solution process of traditional heuristic algorithm, has a high efficiency of solution.

Key words: satellite observation task planning    combinatorial optimization problem    deep reinforcement learning    pointer networks (PN)    Actor Critic    multi-head attention pointer networks (MHA-PN)
收稿日期: 2020-03-11 出版日期: 2021-03-09
CLC:  V 474  
基金资助: 国家杰出青年科学基金资助项目(61525403)
通讯作者: 赵凡宇     E-mail: 21860251@zju.edu.cn;zfybit@zju.edu.cn
作者简介: 马一凡(1996—),男,硕士生,从事卫星自主任务规划研究. orcid.org/0000-0001-9762-0246. E-mail: 21860251@zju.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
马一凡
赵凡宇
王鑫
金仲和

引用本文:

马一凡,赵凡宇,王鑫,金仲和. 基于改进指针网络的卫星对地观测任务规划方法[J]. 浙江大学学报(工学版), 2021, 55(2): 395-401.

Yi-fan MA,Fan-yu ZHAO,Xin WANG,Zhong-he JIN. Satellite earth observation task planning method based on improved pointer networks. Journal of ZheJiang University (Engineering Science), 2021, 55(2): 395-401.

链接本文:

http://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2021.02.020        http://www.zjujournals.com/eng/CN/Y2021/V55/I2/395

图 1  MHA-PN算法模型结构
元素 设定 元素 设定
ws [0,4.0] m [0, 0.01]
pos [0,0.5] win 1
we [ws+0.03,ws+0.08] acc 1
d [0.01,0.02] mem 0.5
r [0.1,0.9] pow 0.5
e [0,0.01] pos 0
表 1  数据集中每个任务的元素设定
图 2  模型训练的损失和收益曲线
图 3  模型的推理结果示意图
算法 R /% T /(s·epoch?1 Vrate /%
PN 69.2 7 214.7 ?
MHA-PN 69.6 5 770.9 20.0
表 2  PN算法和MHA-PN算法收益率和训练时长的对比
图 4  不同长度下模型收益率的分布
%
算法 n=50 n=100 n=125 n=150 n=175 n=200
PN 68.75 53.05 44.72 32.88 27.38 25.31
MHA-PN 69.45 53.36 48.91 44.43 41.68 38.11
表 3  不同长度下模型收益率的均值对比
n 算法 R /% $T' $ /s
50 ACO 30.52 0.904
50 MHA-PN 59.45 0.194
75 ACO 23.31 1.497
75 MHA-PN 50.75 0.221
100 ACO 19.75 2.293
100 MHA-PN 45.73 0.370
表 4  MHA-PN算法和ACO算法收益率和求解时间的对比
1 刘晓路, 何磊, 陈英武, 等. 面向多颗敏捷卫星协同调度的自适应大邻域搜索算法[C]//第四届高分辨率对地观测学术年会论文集. 武汉: 国防科学技术大学信息系统与管理学院, 2017.
LIU Xiao-lu, HE Lei, CHEN Ying-wu, et al. An adaptive large neighborhood search algorithm for multiple agile satellites [C]// Proceedings of 4th Annual Conference on High Resolution Earth Observation. Wuhan: School of Information Systems and Management, National University of Defense Technology, 2017.
2 郭浩, 邱涤珊, 伍国华, 等 基于改进蚁群算法的敏捷成像卫星任务调度方法[J]. 系统工程理论与实践, 2012, (11): 185- 191
GUO Hao, QIU Di-shan, WU Guo-hua, et al Agile imaging satellite task scheduling method based on improved ant colony algorithm[J]. System Engineering Theory and Practice, 2012, (11): 185- 191
3 王法瑞. 基于改进遗传算法的微小卫星自主任务规划方法研究[D]. 哈尔滨: 哈尔滨工业大学, 2017.
WANG Fa-rui. Research on autonomous task planning method of micro satellite based on improved genetic algorithm [D]. Harbin: Harbin Institute of Technology, 2017.
4 贺仁杰, 高鹏, 白保存, 等 成像卫星任务规划模型、算法及其应用[J]. 系统工程理论与实践, 2011, (3): 29- 40
HE Ren-jie, GAO Peng, BAI Bao-cun, et al Imaging satellite mission planning model, algorithm and application[J]. System Engineering Theory and Practice, 2011, (3): 29- 40
5 陈英武, 方炎申, 李菊芳, 等 卫星任务调度问题的约束规划模型[J]. 国防科技大学学报, 2006, (5): 126- 132
CHEN Ying-wu, FANG Yan-shen, LI Ju-fang, et al Constrained programming model for satellite task scheduling problem[J]. Journal of National University of Defense Technology, 2006, (5): 126- 132
doi: 10.3969/j.issn.1001-2486.2006.05.026
6 HOCHREITER S, SCHMIDHUBER J Long short-term memory[J]. Neural Computation, 1997, 9 (8): 1735- 1780
doi: 10.1162/neco.1997.9.8.1735
7 CHUNG J, GULCEHRE C, CHO K, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling [J/OL]. [2020-02-29]. https://arxiv.org/abs/1412.3555.
8 BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate [J/OL]. [2020-02-29]. https://arxiv.org/abs/1409.0473.
9 VINYALS O, FORTUNATO M, JAITLY N. Pointer networks [C]// International Conference on Neural Information Processing Systems. Montreal: MIT Press, 2015.
10 BELLO I, PHAM H, LE Q V, et al. Neural combinatorial optimization with reinforcement learning [J/OL]. [2020-02-29]. https://arxiv.org/abs/1611.09940.
11 MNIH V, BADIA A P, MIRZA M, et al. Asynchronous methods for deep reinforcement learning [J/OL]. [2020-02-29]. https://arxiv.org/abs/1602.01783.
12 NAZARI M, OROOJLOOY A, SNYDER L, et al. Reinforcement learning for solving the vehicle routing problem [J/OL]. [2020-02-29]. https://arxiv.org/abs/1802.04240.
13 DAI H, KHALIL E B, ZHANG Y, et al. Learning combinatorial optimization algorithms over graphs [J/OL]. [2020-02-29]. https://arxiv.org/abs/1704.01665.
14 DAI H, DAI B, SONG L. Discriminative embeddings of latent variable models for structured data [J/OL]. [2020-02-29]. https://arxiv.org/abs/1603.05629.
15 KOOL W, HOOF H V, WELLING M. Attention, learn to solve routing problems! [J/OL]. [2020-02-29]. https://arxiv.org/abs/1803.08475.
16 VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [J/OL]. [2020-02-29]. https://arxiv.org/abs/1706.03762v4.
17 VINYALS O, BENGIO S, KUDLUR M. Order matters: sequence to sequence for sets [J/OL]. [2020-02-29]. https://arxiv.org/abs/1511.06391.
18 KINGMA D, BA J. Adam: a method for stochastic optimization [J/OL]. [2020-02-29]. https://arxiv.org/abs/1412.6980.
19 赵凡宇. 航天器多目标观测任务调度与规划方法研究[D]. 北京: 北京理工大学, 2015.
ZHAO Fan-yu. Research on scheduling and planning methods of spacecraft multi-object observation mission [D]. Beijing: Beijing Institute of Technology, 2015.
[1] 汪宏浩, 王慧泉, 金仲和. 基于增量链接的可回滚星载软件在轨更新方法[J]. 浙江大学学报(工学版), 2015, 49(4): 724-731.
[2] 丁立聪,金小军,王春晖,金仲和. ZDPS-1A卫星电源系统设计与在轨验证[J]. J4, 2012, 46(11): 2073-2080.
[3] 杨牧, 王昊, 张钰, 郑伟, 郑阳明, 金仲和. 抗辐射加固的皮卫星用实时操作系统设计[J]. J4, 2011, 45(6): 1021-1026.