基于改进指针网络的卫星对地观测任务规划方法

doi:10.3785/j.issn.1008-973X.2021.02.020

浙江大学学报(工学版)

2021, Vol. 55

Issue (2): 395-401 DOI: 10.3785/j.issn.1008-973X.2021.02.020

计算机与控制工程

基于改进指针网络的卫星对地观测任务规划方法

马一凡1,2(

),赵凡宇1,2,*(

),王鑫1,2,金仲和1,2

1. 浙江大学微小卫星研究中心，浙江杭州 310027
2. 浙江省微纳卫星研究重点实验室，浙江杭州 310027

Satellite earth observation task planning method based on improved pointer networks

Yi-fan MA1,2(

),Fan-yu ZHAO1,2,*(

),Xin WANG1,2,Zhong-he JIN1,2

1. Micro-satellite Research Center, Zhejiang University, Hangzhou 310027, China
2. Zhejiang Key Laboratory of Micro-nano Satellite Research, Hangzhou 310027, China

全文: PDF(907 KB) HTML

摘要：

针对卫星观测任务规划问题约束复杂、求解空间大和输入任务序列长度不固定的特点，使用深度强化学习（DRL）方法对卫星观测任务规划问题进行求解. 综合考虑时间窗口约束、任务间转移机动时间和卫星电量、存储约束，对卫星观测任务规划问题进行建模. 基于指针网络（PN）的运行机制建立序列决策算法模型，使用Mask向量来考虑卫星观测任务规划问题中的各类约束，并通过Actor Critic强化学习算法对模型进行训练，以获得最大的收益率. 借鉴多头注意力（MHA）机制的思想对PN进行改进，提出多头注意力指针网络（MHA-PN）算法. 根据实验结果可以看出，MHA-PN算法显著提高了模型的训练速度和泛化性能，训练好的MHA-PN算法模型可以直接对输入序列进行端到端的推理，避免传统启发式算法迭代求解的过程，具有较高的求解效率.

关键词： 卫星观测任务规划; 组合优化问题; 深度强化学习; 指针网络（PN）; Actor Critic; 多头注意力指针网络（MHA-PN）

Abstract:

The satellite observation task planning has the characteristics of complex constraints, large solution space, and unfixed length of input task sequence. The deep reinforcement learning (DRL) method was used to solve the problems. The satellite observation task planning problem was modeled by taking into account the constraints of time windows, transfer time between tasks, and satellite power and memory constraints. A sequence decision algorithm model was established based on the operating mechanism of pointer networks (PN), Mask vector was used to consider various constraints in the satellite observation task planning problem, and the model was trained by Actor Critic reinforcement learning algorithm to obtain the maximum reward. The PN was improved by referring to the multi-head attention (MHA) mechanism, and the multi-head attention pointer networks (MHA-PN) algorithm was proposed. Experimental results show that the MHA-PN algorithm significantly improves the training speed and the generalization performance of the model, and the trained MHA-PN algorithm model can carry out end-to-end reasoning on the input sequence, avoiding the iterative solution process of traditional heuristic algorithm, has a high efficiency of solution.

Key words: satellite observation task planning combinatorial optimization problem deep reinforcement learning pointer networks (PN) Actor Critic multi-head attention pointer networks (MHA-PN)

收稿日期: 2020-03-11 出版日期: 2021-03-09

CLC:

V 474

基金资助: 国家杰出青年科学基金资助项目（61525403）

通讯作者: 赵凡宇 E-mail: 21860251@zju.edu.cn;zfybit@zju.edu.cn

作者简介: 马一凡(1996—)，男，硕士生，从事卫星自主任务规划研究. orcid.org/0000-0001-9762-0246. E-mail： 21860251@zju.edu.cn

	服务
	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	作者相关文章
	马一凡
	赵凡宇
	王鑫
	金仲和

引用本文:

马一凡,赵凡宇,王鑫,金仲和. 基于改进指针网络的卫星对地观测任务规划方法[J]. 浙江大学学报(工学版), 2021, 55(2): 395-401.

Yi-fan MA,Fan-yu ZHAO,Xin WANG,Zhong-he JIN. Satellite earth observation task planning method based on improved pointer networks. Journal of ZheJiang University (Engineering Science), 2021, 55(2): 395-401.

链接本文:

http://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2021.02.020 或 http://www.zjujournals.com/eng/CN/Y2021/V55/I2/395

图 1 MHA-PN算法模型结构

表 1 数据集中每个任务的元素设定

图 2 模型训练的损失和收益曲线

图 3 模型的推理结果示意图

表 2 PN算法和MHA-PN算法收益率和训练时长的对比

图 4 不同长度下模型收益率的分布

表 3 不同长度下模型收益率的均值对比

表 4 MHA-PN算法和ACO算法收益率和求解时间的对比

1	刘晓路, 何磊, 陈英武, 等. 面向多颗敏捷卫星协同调度的自适应大邻域搜索算法[C]//第四届高分辨率对地观测学术年会论文集. 武汉: 国防科学技术大学信息系统与管理学院, 2017. LIU Xiao-lu, HE Lei, CHEN Ying-wu, et al. An adaptive large neighborhood search algorithm for multiple agile satellites [C]// Proceedings of 4th Annual Conference on High Resolution Earth Observation. Wuhan: School of Information Systems and Management, National University of Defense Technology, 2017.
2	郭浩, 邱涤珊, 伍国华, 等基于改进蚁群算法的敏捷成像卫星任务调度方法[J]. 系统工程理论与实践, 2012, (11): 185- 191 GUO Hao, QIU Di-shan, WU Guo-hua, et al Agile imaging satellite task scheduling method based on improved ant colony algorithm[J]. System Engineering Theory and Practice, 2012, (11): 185- 191
3	王法瑞. 基于改进遗传算法的微小卫星自主任务规划方法研究[D]. 哈尔滨: 哈尔滨工业大学, 2017. WANG Fa-rui. Research on autonomous task planning method of micro satellite based on improved genetic algorithm [D]. Harbin: Harbin Institute of Technology, 2017.
4	贺仁杰, 高鹏, 白保存, 等成像卫星任务规划模型、算法及其应用[J]. 系统工程理论与实践, 2011, (3): 29- 40 HE Ren-jie, GAO Peng, BAI Bao-cun, et al Imaging satellite mission planning model, algorithm and application[J]. System Engineering Theory and Practice, 2011, (3): 29- 40
5	陈英武, 方炎申, 李菊芳, 等卫星任务调度问题的约束规划模型[J]. 国防科技大学学报, 2006, (5): 126- 132 CHEN Ying-wu, FANG Yan-shen, LI Ju-fang, et al Constrained programming model for satellite task scheduling problem[J]. Journal of National University of Defense Technology, 2006, (5): 126- 132 doi: 10.3969/j.issn.1001-2486.2006.05.026
6	HOCHREITER S, SCHMIDHUBER J Long short-term memory[J]. Neural Computation, 1997, 9 (8): 1735- 1780 doi: 10.1162/neco.1997.9.8.1735
7	CHUNG J, GULCEHRE C, CHO K, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling [J/OL]. [2020-02-29]. https://arxiv.org/abs/1412.3555.
8	BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate [J/OL]. [2020-02-29]. https://arxiv.org/abs/1409.0473.
9	VINYALS O, FORTUNATO M, JAITLY N. Pointer networks [C]// International Conference on Neural Information Processing Systems. Montreal: MIT Press, 2015.
10	BELLO I, PHAM H, LE Q V, et al. Neural combinatorial optimization with reinforcement learning [J/OL]. [2020-02-29]. https://arxiv.org/abs/1611.09940.
11	MNIH V, BADIA A P, MIRZA M, et al. Asynchronous methods for deep reinforcement learning [J/OL]. [2020-02-29]. https://arxiv.org/abs/1602.01783.
12	NAZARI M, OROOJLOOY A, SNYDER L, et al. Reinforcement learning for solving the vehicle routing problem [J/OL]. [2020-02-29]. https://arxiv.org/abs/1802.04240.
13	DAI H, KHALIL E B, ZHANG Y, et al. Learning combinatorial optimization algorithms over graphs [J/OL]. [2020-02-29]. https://arxiv.org/abs/1704.01665.
14	DAI H, DAI B, SONG L. Discriminative embeddings of latent variable models for structured data [J/OL]. [2020-02-29]. https://arxiv.org/abs/1603.05629.
15	KOOL W, HOOF H V, WELLING M. Attention, learn to solve routing problems! [J/OL]. [2020-02-29]. https://arxiv.org/abs/1803.08475.
16	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [J/OL]. [2020-02-29]. https://arxiv.org/abs/1706.03762v4.
17	VINYALS O, BENGIO S, KUDLUR M. Order matters: sequence to sequence for sets [J/OL]. [2020-02-29]. https://arxiv.org/abs/1511.06391.
18	KINGMA D, BA J. Adam: a method for stochastic optimization [J/OL]. [2020-02-29]. https://arxiv.org/abs/1412.6980.
19	赵凡宇. 航天器多目标观测任务调度与规划方法研究[D]. 北京: 北京理工大学, 2015. ZHAO Fan-yu. Research on scheduling and planning methods of spacecraft multi-object observation mission [D]. Beijing: Beijing Institute of Technology, 2015.

[1]	汪宏浩, 王慧泉, 金仲和. 基于增量链接的可回滚星载软件在轨更新方法[J]. 浙江大学学报(工学版), 2015, 49(4): 724-731.
[2]	丁立聪,金小军,王春晖,金仲和. ZDPS-1A卫星电源系统设计与在轨验证[J]. J4, 2012, 46(11): 2073-2080.
[3]	杨牧, 王昊, 张钰, 郑伟, 郑阳明, 金仲和. 抗辐射加固的皮卫星用实时操作系统设计[J]. J4, 2011, 45(6): 1021-1026.

Viewed

Full text

Abstract

Cited

Shared

Discussed