Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2021, Vol. 55 Issue (2): 395-401    DOI: 10.3785/j.issn.1008-973X.2021.02.020
    
Satellite earth observation task planning method based on improved pointer networks
Yi-fan MA1,2(),Fan-yu ZHAO1,2,*(),Xin WANG1,2,Zhong-he JIN1,2
1. Micro-satellite Research Center, Zhejiang University, Hangzhou 310027, China
2. Zhejiang Key Laboratory of Micro-nano Satellite Research, Hangzhou 310027, China
Download: HTML     PDF(907KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

The satellite observation task planning has the characteristics of complex constraints, large solution space, and unfixed length of input task sequence. The deep reinforcement learning (DRL) method was used to solve the problems. The satellite observation task planning problem was modeled by taking into account the constraints of time windows, transfer time between tasks, and satellite power and memory constraints. A sequence decision algorithm model was established based on the operating mechanism of pointer networks (PN), Mask vector was used to consider various constraints in the satellite observation task planning problem, and the model was trained by Actor Critic reinforcement learning algorithm to obtain the maximum reward. The PN was improved by referring to the multi-head attention (MHA) mechanism, and the multi-head attention pointer networks (MHA-PN) algorithm was proposed. Experimental results show that the MHA-PN algorithm significantly improves the training speed and the generalization performance of the model, and the trained MHA-PN algorithm model can carry out end-to-end reasoning on the input sequence, avoiding the iterative solution process of traditional heuristic algorithm, has a high efficiency of solution.



Key wordssatellite observation task planning      combinatorial optimization problem      deep reinforcement learning      pointer networks (PN)      Actor Critic      multi-head attention pointer networks (MHA-PN)     
Received: 11 March 2020      Published: 09 March 2021
CLC:  V 474  
Fund:  国家杰出青年科学基金资助项目(61525403)
Corresponding Authors: Fan-yu ZHAO     E-mail: 21860251@zju.edu.cn;zfybit@zju.edu.cn
Cite this article:

Yi-fan MA,Fan-yu ZHAO,Xin WANG,Zhong-he JIN. Satellite earth observation task planning method based on improved pointer networks. Journal of ZheJiang University (Engineering Science), 2021, 55(2): 395-401.

URL:

http://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2021.02.020     OR     http://www.zjujournals.com/eng/Y2021/V55/I2/395


基于改进指针网络的卫星对地观测任务规划方法

针对卫星观测任务规划问题约束复杂、求解空间大和输入任务序列长度不固定的特点,使用深度强化学习(DRL)方法对卫星观测任务规划问题进行求解. 综合考虑时间窗口约束、任务间转移机动时间和卫星电量、存储约束,对卫星观测任务规划问题进行建模. 基于指针网络(PN)的运行机制建立序列决策算法模型,使用Mask向量来考虑卫星观测任务规划问题中的各类约束,并通过Actor Critic强化学习算法对模型进行训练,以获得最大的收益率. 借鉴多头注意力(MHA)机制的思想对PN进行改进,提出多头注意力指针网络(MHA-PN)算法. 根据实验结果可以看出,MHA-PN算法显著提高了模型的训练速度和泛化性能,训练好的MHA-PN算法模型可以直接对输入序列进行端到端的推理,避免传统启发式算法迭代求解的过程,具有较高的求解效率.


关键词: 卫星观测任务规划,  组合优化问题,  深度强化学习,  指针网络(PN),  Actor Critic,  多头注意力指针网络(MHA-PN) 
Fig.1 Model structure of MHA-PN algorithm
元素 设定 元素 设定
ws [0,4.0] m [0, 0.01]
pos [0,0.5] win 1
we [ws+0.03,ws+0.08] acc 1
d [0.01,0.02] mem 0.5
r [0.1,0.9] pow 0.5
e [0,0.01] pos 0
Tab.1 Element settings for each task in dataset
Fig.2 Loss and reward curves for model training
Fig.3 Schematic diagram of reasoning results of model
算法 R /% T /(s·epoch?1 Vrate /%
PN 69.2 7 214.7 ?
MHA-PN 69.6 5 770.9 20.0
Tab.2 Comparison of reward rate and training time between PN algorithm and MHA-PN algorithm
Fig.4 Distribution of model reward at different lengths
%
算法 n=50 n=100 n=125 n=150 n=175 n=200
PN 68.75 53.05 44.72 32.88 27.38 25.31
MHA-PN 69.45 53.36 48.91 44.43 41.68 38.11
Tab.3 Mean comparison of model rewards at different lengths
n 算法 R /% $T' $ /s
50 ACO 30.52 0.904
50 MHA-PN 59.45 0.194
75 ACO 23.31 1.497
75 MHA-PN 50.75 0.221
100 ACO 19.75 2.293
100 MHA-PN 45.73 0.370
Tab.4 Comparison of reward rate and solution time between MHA-PN algorithm and ACO algorithm
[1]   刘晓路, 何磊, 陈英武, 等. 面向多颗敏捷卫星协同调度的自适应大邻域搜索算法[C]//第四届高分辨率对地观测学术年会论文集. 武汉: 国防科学技术大学信息系统与管理学院, 2017.
LIU Xiao-lu, HE Lei, CHEN Ying-wu, et al. An adaptive large neighborhood search algorithm for multiple agile satellites [C]// Proceedings of 4th Annual Conference on High Resolution Earth Observation. Wuhan: School of Information Systems and Management, National University of Defense Technology, 2017.
[2]   郭浩, 邱涤珊, 伍国华, 等 基于改进蚁群算法的敏捷成像卫星任务调度方法[J]. 系统工程理论与实践, 2012, (11): 185- 191
GUO Hao, QIU Di-shan, WU Guo-hua, et al Agile imaging satellite task scheduling method based on improved ant colony algorithm[J]. System Engineering Theory and Practice, 2012, (11): 185- 191
[3]   王法瑞. 基于改进遗传算法的微小卫星自主任务规划方法研究[D]. 哈尔滨: 哈尔滨工业大学, 2017.
WANG Fa-rui. Research on autonomous task planning method of micro satellite based on improved genetic algorithm [D]. Harbin: Harbin Institute of Technology, 2017.
[4]   贺仁杰, 高鹏, 白保存, 等 成像卫星任务规划模型、算法及其应用[J]. 系统工程理论与实践, 2011, (3): 29- 40
HE Ren-jie, GAO Peng, BAI Bao-cun, et al Imaging satellite mission planning model, algorithm and application[J]. System Engineering Theory and Practice, 2011, (3): 29- 40
[5]   陈英武, 方炎申, 李菊芳, 等 卫星任务调度问题的约束规划模型[J]. 国防科技大学学报, 2006, (5): 126- 132
CHEN Ying-wu, FANG Yan-shen, LI Ju-fang, et al Constrained programming model for satellite task scheduling problem[J]. Journal of National University of Defense Technology, 2006, (5): 126- 132
doi: 10.3969/j.issn.1001-2486.2006.05.026
[6]   HOCHREITER S, SCHMIDHUBER J Long short-term memory[J]. Neural Computation, 1997, 9 (8): 1735- 1780
doi: 10.1162/neco.1997.9.8.1735
[7]   CHUNG J, GULCEHRE C, CHO K, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling [J/OL]. [2020-02-29]. https://arxiv.org/abs/1412.3555.
[8]   BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate [J/OL]. [2020-02-29]. https://arxiv.org/abs/1409.0473.
[9]   VINYALS O, FORTUNATO M, JAITLY N. Pointer networks [C]// International Conference on Neural Information Processing Systems. Montreal: MIT Press, 2015.
[10]   BELLO I, PHAM H, LE Q V, et al. Neural combinatorial optimization with reinforcement learning [J/OL]. [2020-02-29]. https://arxiv.org/abs/1611.09940.
[11]   MNIH V, BADIA A P, MIRZA M, et al. Asynchronous methods for deep reinforcement learning [J/OL]. [2020-02-29]. https://arxiv.org/abs/1602.01783.
[12]   NAZARI M, OROOJLOOY A, SNYDER L, et al. Reinforcement learning for solving the vehicle routing problem [J/OL]. [2020-02-29]. https://arxiv.org/abs/1802.04240.
[13]   DAI H, KHALIL E B, ZHANG Y, et al. Learning combinatorial optimization algorithms over graphs [J/OL]. [2020-02-29]. https://arxiv.org/abs/1704.01665.
[14]   DAI H, DAI B, SONG L. Discriminative embeddings of latent variable models for structured data [J/OL]. [2020-02-29]. https://arxiv.org/abs/1603.05629.
[15]   KOOL W, HOOF H V, WELLING M. Attention, learn to solve routing problems! [J/OL]. [2020-02-29]. https://arxiv.org/abs/1803.08475.
[16]   VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [J/OL]. [2020-02-29]. https://arxiv.org/abs/1706.03762v4.
[17]   VINYALS O, BENGIO S, KUDLUR M. Order matters: sequence to sequence for sets [J/OL]. [2020-02-29]. https://arxiv.org/abs/1511.06391.
[18]   KINGMA D, BA J. Adam: a method for stochastic optimization [J/OL]. [2020-02-29]. https://arxiv.org/abs/1412.6980.
[19]   赵凡宇. 航天器多目标观测任务调度与规划方法研究[D]. 北京: 北京理工大学, 2015.
ZHAO Fan-yu. Research on scheduling and planning methods of spacecraft multi-object observation mission [D]. Beijing: Beijing Institute of Technology, 2015.
[1] WANG Hong-hao, WANG Hui-quan, JIN Zhong-he. Rollback-able on-board software upgrade method based on incremental link[J]. Journal of ZheJiang University (Engineering Science), 2015, 49(4): 724-731.
[2] DING Li-cong, JIN Xiao-jun, WANG Chun-hui, JIN Zhong-he. Design and on-orbit verification of ZDPS-1A power system[J]. Journal of ZheJiang University (Engineering Science), 2012, 46(11): 2073-2080.
[3] YANG Mu, WANG Hao, ZHANG Yu, ZHENG Wei, ZHENG Yang-ming, JIN Zhong-he. Design of radiation-hardened real-time operating system for
pico-satellites
[J]. Journal of ZheJiang University (Engineering Science), 2011, 45(6): 1021-1026.