Satellite earth observation task planning method based on improved pointer networks

doi:10.3785/j.issn.1008-973X.2021.02.020

Journal of ZheJiang University (Engineering Science)

2021, Vol. 55

Issue (2): 395-401 DOI: 10.3785/j.issn.1008-973X.2021.02.020

Satellite earth observation task planning method based on improved pointer networks

Yi-fan MA1,2(

),Fan-yu ZHAO1,2,*(

),Xin WANG1,2,Zhong-he JIN1,2

1. Micro-satellite Research Center, Zhejiang University, Hangzhou 310027, China
2. Zhejiang Key Laboratory of Micro-nano Satellite Research, Hangzhou 310027, China

Download:

HTML

PDF(907KB) HTML
Export: BibTeX | EndNote (RIS)

Abstract

The satellite observation task planning has the characteristics of complex constraints, large solution space, and unfixed length of input task sequence. The deep reinforcement learning (DRL) method was used to solve the problems. The satellite observation task planning problem was modeled by taking into account the constraints of time windows, transfer time between tasks, and satellite power and memory constraints. A sequence decision algorithm model was established based on the operating mechanism of pointer networks (PN), Mask vector was used to consider various constraints in the satellite observation task planning problem, and the model was trained by Actor Critic reinforcement learning algorithm to obtain the maximum reward. The PN was improved by referring to the multi-head attention (MHA) mechanism, and the multi-head attention pointer networks (MHA-PN) algorithm was proposed. Experimental results show that the MHA-PN algorithm significantly improves the training speed and the generalization performance of the model, and the trained MHA-PN algorithm model can carry out end-to-end reasoning on the input sequence, avoiding the iterative solution process of traditional heuristic algorithm, has a high efficiency of solution.

Key words： satellite observation task planning combinatorial optimization problem deep reinforcement learning pointer networks (PN) Actor Critic multi-head attention pointer networks (MHA-PN)

Received: 11 March 2020 Published: 09 March 2021

CLC:

V 474

Fund: 国家杰出青年科学基金资助项目（61525403）

Corresponding Authors: Fan-yu ZHAO E-mail: 21860251@zju.edu.cn;zfybit@zju.edu.cn

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Yi-fan MA
	Fan-yu ZHAO
	Xin WANG
	Zhong-he JIN

Cite this article:

Yi-fan MA,Fan-yu ZHAO,Xin WANG,Zhong-he JIN. Satellite earth observation task planning method based on improved pointer networks. Journal of ZheJiang University (Engineering Science), 2021, 55(2): 395-401.

URL:

http://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2021.02.020 OR http://www.zjujournals.com/eng/Y2021/V55/I2/395

基于改进指针网络的卫星对地观测任务规划方法

针对卫星观测任务规划问题约束复杂、求解空间大和输入任务序列长度不固定的特点，使用深度强化学习（DRL）方法对卫星观测任务规划问题进行求解. 综合考虑时间窗口约束、任务间转移机动时间和卫星电量、存储约束，对卫星观测任务规划问题进行建模. 基于指针网络（PN）的运行机制建立序列决策算法模型，使用Mask向量来考虑卫星观测任务规划问题中的各类约束，并通过Actor Critic强化学习算法对模型进行训练，以获得最大的收益率. 借鉴多头注意力（MHA）机制的思想对PN进行改进，提出多头注意力指针网络（MHA-PN）算法. 根据实验结果可以看出，MHA-PN算法显著提高了模型的训练速度和泛化性能，训练好的MHA-PN算法模型可以直接对输入序列进行端到端的推理，避免传统启发式算法迭代求解的过程，具有较高的求解效率.

关键词： 卫星观测任务规划, 组合优化问题, 深度强化学习, 指针网络（PN）, Actor Critic, 多头注意力指针网络（MHA-PN）

Fig.1 Model structure of MHA-PN algorithm

Tab.1 Element settings for each task in dataset

Fig.2 Loss and reward curves for model training

Fig.3 Schematic diagram of reasoning results of model

Tab.2 Comparison of reward rate and training time between PN algorithm and MHA-PN algorithm

Fig.4 Distribution of model reward at different lengths

Tab.3 Mean comparison of model rewards at different lengths

Tab.4 Comparison of reward rate and solution time between MHA-PN algorithm and ACO algorithm


[1]	刘晓路, 何磊, 陈英武, 等. 面向多颗敏捷卫星协同调度的自适应大邻域搜索算法[C]//第四届高分辨率对地观测学术年会论文集. 武汉: 国防科学技术大学信息系统与管理学院, 2017. LIU Xiao-lu, HE Lei, CHEN Ying-wu, et al. An adaptive large neighborhood search algorithm for multiple agile satellites [C]// Proceedings of 4th Annual Conference on High Resolution Earth Observation. Wuhan: School of Information Systems and Management, National University of Defense Technology, 2017.

[2]	郭浩, 邱涤珊, 伍国华, 等基于改进蚁群算法的敏捷成像卫星任务调度方法[J]. 系统工程理论与实践, 2012, (11): 185- 191 GUO Hao, QIU Di-shan, WU Guo-hua, et al Agile imaging satellite task scheduling method based on improved ant colony algorithm[J]. System Engineering Theory and Practice, 2012, (11): 185- 191

[3]	王法瑞. 基于改进遗传算法的微小卫星自主任务规划方法研究[D]. 哈尔滨: 哈尔滨工业大学, 2017. WANG Fa-rui. Research on autonomous task planning method of micro satellite based on improved genetic algorithm [D]. Harbin: Harbin Institute of Technology, 2017.

[4]	贺仁杰, 高鹏, 白保存, 等成像卫星任务规划模型、算法及其应用[J]. 系统工程理论与实践, 2011, (3): 29- 40 HE Ren-jie, GAO Peng, BAI Bao-cun, et al Imaging satellite mission planning model, algorithm and application[J]. System Engineering Theory and Practice, 2011, (3): 29- 40

[5]	陈英武, 方炎申, 李菊芳, 等卫星任务调度问题的约束规划模型[J]. 国防科技大学学报, 2006, (5): 126- 132 CHEN Ying-wu, FANG Yan-shen, LI Ju-fang, et al Constrained programming model for satellite task scheduling problem[J]. Journal of National University of Defense Technology, 2006, (5): 126- 132 doi: 10.3969/j.issn.1001-2486.2006.05.026

[6]	HOCHREITER S, SCHMIDHUBER J Long short-term memory[J]. Neural Computation, 1997, 9 (8): 1735- 1780 doi: 10.1162/neco.1997.9.8.1735

[7]	CHUNG J, GULCEHRE C, CHO K, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling [J/OL]. [2020-02-29]. https://arxiv.org/abs/1412.3555.

[8]	BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate [J/OL]. [2020-02-29]. https://arxiv.org/abs/1409.0473.

[9]	VINYALS O, FORTUNATO M, JAITLY N. Pointer networks [C]// International Conference on Neural Information Processing Systems. Montreal: MIT Press, 2015.

[10]	BELLO I, PHAM H, LE Q V, et al. Neural combinatorial optimization with reinforcement learning [J/OL]. [2020-02-29]. https://arxiv.org/abs/1611.09940.

[11]	MNIH V, BADIA A P, MIRZA M, et al. Asynchronous methods for deep reinforcement learning [J/OL]. [2020-02-29]. https://arxiv.org/abs/1602.01783.

[12]	NAZARI M, OROOJLOOY A, SNYDER L, et al. Reinforcement learning for solving the vehicle routing problem [J/OL]. [2020-02-29]. https://arxiv.org/abs/1802.04240.

[13]	DAI H, KHALIL E B, ZHANG Y, et al. Learning combinatorial optimization algorithms over graphs [J/OL]. [2020-02-29]. https://arxiv.org/abs/1704.01665.

[14]	DAI H, DAI B, SONG L. Discriminative embeddings of latent variable models for structured data [J/OL]. [2020-02-29]. https://arxiv.org/abs/1603.05629.

[15]	KOOL W, HOOF H V, WELLING M. Attention, learn to solve routing problems! [J/OL]. [2020-02-29]. https://arxiv.org/abs/1803.08475.

[16]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [J/OL]. [2020-02-29]. https://arxiv.org/abs/1706.03762v4.

[17]	VINYALS O, BENGIO S, KUDLUR M. Order matters: sequence to sequence for sets [J/OL]. [2020-02-29]. https://arxiv.org/abs/1511.06391.

[18]	KINGMA D, BA J. Adam: a method for stochastic optimization [J/OL]. [2020-02-29]. https://arxiv.org/abs/1412.6980.

[19]	赵凡宇. 航天器多目标观测任务调度与规划方法研究[D]. 北京: 北京理工大学, 2015. ZHAO Fan-yu. Research on scheduling and planning methods of spacecraft multi-object observation mission [D]. Beijing: Beijing Institute of Technology, 2015.

[1]	WANG Hong-hao, WANG Hui-quan, JIN Zhong-he. Rollback-able on-board software upgrade method based on incremental link[J]. Journal of ZheJiang University (Engineering Science), 2015, 49(4): 724-731.

[2]	DING Li-cong, JIN Xiao-jun, WANG Chun-hui, JIN Zhong-he. Design and on-orbit verification of ZDPS-1A power system[J]. Journal of ZheJiang University (Engineering Science), 2012, 46(11): 2073-2080.

[3]	YANG Mu, WANG Hao, ZHANG Yu, ZHENG Wei, ZHENG Yang-ming, JIN Zhong-he. Design of radiation-hardened real-time operating system for pico-satellites[J]. Journal of ZheJiang University (Engineering Science), 2011, 45(6): 1021-1026.

Viewed

Full text

Abstract

Cited

Shared

Discussed