高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

航天器自主远距离快速抵近的近端策略优化研究

林政 胡海鹰 邸鹏 朱永生 周美江

林政, 胡海鹰, 邸鹏, 朱永生, 周美江. 航天器自主远距离快速抵近的近端策略优化研究[J]. 电子与信息学报. doi: 10.11999/JEIT250844
引用本文: 林政, 胡海鹰, 邸鹏, 朱永生, 周美江. 航天器自主远距离快速抵近的近端策略优化研究[J]. 电子与信息学报. doi: 10.11999/JEIT250844
LIN Zheng, HU Haiying, DI Peng, ZHU Yongsheng, ZHOU Meijiang. Research on Proximal Policy Optimization for Autonomous Long-Distance Rapid Rendezvous of Spacecraft[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250844
Citation: LIN Zheng, HU Haiying, DI Peng, ZHU Yongsheng, ZHOU Meijiang. Research on Proximal Policy Optimization for Autonomous Long-Distance Rapid Rendezvous of Spacecraft[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250844

航天器自主远距离快速抵近的近端策略优化研究

doi: 10.11999/JEIT250844 cstr: 32379.14.JEIT250844
基金项目: 上海市东方英才领军项目(Y4DFRCYG01)
详细信息
    作者简介:

    林政:男,博士研究生,研究方向为航天器制导控制

    胡海鹰:男,研究员,研究方向为卫星总体设计。本文通信作者

    邸鹏:男,博士研究生,研究方向为航天器制导控制

    朱永生:男,副研究员,研究方向为卫星总体设计

    周美江:女,工程师,研究方向为卫星总体设计

    通讯作者:

    胡海鹰, huhy@microsate.com

  • 中图分类号: V448

Research on Proximal Policy Optimization for Autonomous Long-Distance Rapid Rendezvous of Spacecraft

  • 摘要: 在考虑地球扁率J2摄动的影响下,文中针对限定携带燃料和限定转移时间下的异面轨道航天器远距离快速转移的最省燃料轨迹优化问题,基于近端策略优化(Proximal Policy Optimization, PPO)设计脉冲机动的时长与脉冲增量大小,实现最省燃料消耗的转移轨迹设计。首先构筑J2摄动下航天器转移变轨的动力学模型,并进行航天器在轨运行中的不确定性分析,其次,将问题转化为最优控制问题,并建立强化学习训练框架;此后,设计基于过程约束和终端约束的合适的奖励函数,提高算法的探索能力和训练过程的稳定性;最后,在该强化学习框架下进行训练得到模型,生成变轨机动策略,并通过仿真并进行对比实验验证算法性能。相较已有DRL方法,文中设计的改进型密集奖励函数结合位置势函数与速度引导机制,显著提升了算法的收敛速度、鲁棒性与燃料优化性能,仿真结果表明,该方法能够很好的生成策略并达到预期抵近要求。
  • 图  1  J2000和LVLH轨道坐标系

    图  2  智能体与环境交互过程

    图  3  PPO算法流程图

    图  4  算例1机动航天器抵近过程的最佳转移轨迹

    图  5  算例1机动航天器燃料使用情况

    图  6  机动航天器训练迭代情况

    图  7  算例2机动航天器抵近过程的最佳转移轨迹

    图  8  算例2机动航天器燃料使用情况

    图  9  不同奖励函数下迭代训练结果对比图

    图  10  不同算法下迭代训练结果对比图

    表  1  各摄动相对量级(面质比$ 0.01{\mathrm{m}}^{2}/\text{kg} $)

    各摄动相对量级低轨中轨高轨
    非球形引力J2项
    非球形引力其他项
    太阳引力
    月球引力
    大气阻力
    太阳光压
    潮汐
    10–3
    10–7
    10–8
    10–7
    10–6-10-10
    10–8
    10–8
    10–4
    10–7
    10–6
    10–6
    <10-10
    10–7
    10–9
    10–4
    10–7
    10–6
    10–5
    <10-10
    10–7
    10-10
    下载: 导出CSV

    表  2  机动航天器性能参数

    性能参数性能指标
    算例1算例2
    $ {{{m}_{\mathcal{F}}}}_{0} $ [kg]
    $ {m}_{0} $ [kg]
    $ \dot{m} $ [kg/s]
    $ {\boldsymbol{I}}_{\text{sp}} $ [s]
    $ \Delta {{{t}_{\mathcal{F}}}}_{\max } $ [s]
    150
    250
    1.6
    400
    20
    10
    20
    0.5
    400
    5
    下载: 导出CSV

    表  3  航天器初始轨道参数

    轨道参数算例1算例1
    机动航天器目标航天器机动航天器目标航天器
    半长轴 [km]
    偏心率
    轨道倾角 [°]
    升交点赤经[°]
    近地点幅角 [°]
    真近点角 [°]
    30378.1363
    0.05
    25.5
    14.4
    14.4
    7.2
    40378.1363
    0.01
    18
    352.8
    14.4
    72
    7978.1363
    0.05
    30
    0
    10
    12
    8378.1363
    0.01
    31
    5
    10
    18
    下载: 导出CSV

    表  4  算法超参数设计

    超参数算例1参数值算例2参数值
    $ {L}_{\text{ub}} $/$ {L}_{\text{lb}} $ [km]
    $ {\rho }_{1} $/$ {\rho }_{2} $
    $ \tau $ [km]
    $ {d}_{\mathrm{c}} $ [km]
    $ {N}_{\mathrm{g}} $
    $ {c}_{1} $/$ {c}_{2} $/$ {c}_{3} $/$ {c}_{4} $/$ {c}_{5} $/$ {c}_{\text{PBRS}} $/$ {c}_{\parallel } $/$ {c}_{\bot } $
    $ \gamma $
    Learning Rate
    Episode
    10000/45000
    3e-2/3e-2
    1e-4
    500
    2
    0.14/0.4/0.4/0.2/0.2/1e-3/0.18/0.18
    0.99
    1e-5
    6e5
    300/4000
    3e-2/3e-2
    1e-4
    100
    2
    0.14/0.4/0.4/2/2/1e-3/0.18/0.18
    0.9
    1e-5
    2e5
    下载: 导出CSV

    表  5  轨道不确定性高斯拟合噪声设计

    不确定性项均值标准差
    $ \delta {\boldsymbol{r}}_{s,i} $
    $ \delta {\boldsymbol{v}}_{s,i} $
    $ \delta {\boldsymbol{r}}_{m,i} $
    $ \delta {\boldsymbol{v}}_{m,i} $
    $ \delta {{{v}_{x}}}_{v,i} $/$ \delta {{{v}_{y}}}_{v,i} $/$ \delta {{{v}_{z}}}_{v,i} $
    0 km
    0 km/s
    0 km
    0 km/s
    0
    1 km
    0.02 km/s
    1 km
    0.02 km/s
    0.006
    下载: 导出CSV

    表  6  鲁棒性测试结果

    测试指标$ {\mathcal{N}}_{s,i} $$ {\mathcal{N}}_{m,i} $$ {\mathcal{N}}_{v,i} $$ {\mathcal{N}}_{s,i}+{\mathcal{N}}_{m,i}+{\mathcal{N}}_{v,i} $
    成功率89.30%87.30%71.90%63.40%
    燃料消耗 [kg]平均值 [kg]111.7205111.5226110.9310111.9222
    标准差 [kg]0.64410.50080.07820.8903
    下载: 导出CSV
  • [1] LI Weijie, CHENG Dayi, LIU Xigang, et al. On-Orbit Service (OOS) of spacecraft: A review of engineering developments[J]. Progress in Aerospace Sciences, 2019, 108: 32–120. doi: 10.1016/j.paerosci.2019.01.004.
    [2] NALLAPU R T and THANGAVELAUTHAM J. Design and sensitivity analysis of spacecraft swarms for planetary moon reconnaissance through co-orbits[J]. Acta Astronautica, 2021, 178: 854–896. doi: 10.1016/j.actaastro.2020.10.008.
    [3] NIU Shangwei, LI Dongxu, and JI Haoran. Research on mission time planning and autonomous interception guidance method for low-thrust spacecraft in long-distance interception[C]. 2020 5th International Conference on Automation, Control and Robotics Engineering (CACRE), Dalian, China, 2020: 117–123. doi: 10.1109/CACRE50138.2020.9230051.
    [4] LEDKOV A and ASLANOV V. Review of contact and contactless active space debris removal approaches[J]. Progress in Aerospace Sciences, 2022, 134: 100858. doi: 10.1016/j.paerosci.2022.100858.
    [5] 陈宏宇, 吴会英, 周美江, 等. 微小卫星轨道工程应用与STK仿真[M]. 北京: 科学出版社, 2016. (查阅网上资料, 未找到页码信息, 请确认补充).

    CHEN Hongyu, WU Huiying, ZHOU Meijiang, et al. Orbit Engineering Application and STK Simulation for Microsatellite[M]. Beijing: Science Press, 2016. (查阅网上资料, 未找到对应的英文翻译信息, 请确认).
    [6] ABDELKHALIK O and MORTARI D. N-impulse orbit transfer using genetic algorithms[J]. Journal of Spacecraft and Rockets, 2007, 44(2): 456–460. doi: 10.2514/1.24701.
    [7] PONTANI M, GHOSH P, and CONWAY B A. Particle swarm optimization of multiple-burn rendezvous trajectories[J]. Journal of Guidance, Control, and Dynamics, 2012, 35(4): 1192–1207. doi: 10.2514/1.55592.
    [8] YU Jing, CHEN Xiaoqian, CHEN Lihu, et al. Optimal scheduling of GEO debris removing based on hybrid optimal control theory[J]. Acta Astronautica, 2014, 93: 400–409. doi: 10.1016/j.actaastro.2013.07.015.
    [9] MNIH V, KAVUKKCUOGLU K, SILVER D, et al. Playing Atari with deep reinforcement learning[EB/OL]. https://arxiv.org/abs/1312.5602, 2013.
    [10] DONG Zhicai, ZHU Yiman, WANG Lu, et al. Motion planning of free-floating space robots for tracking tumbling targets by two-axis matching via reinforcement learning[J]. Aerospace Science and Technology, 2024, 155: 109540. doi: 10.1016/j.ast.2024.109540.
    [11] TIWARI M, PRAZENICA R, and HENDERSON T. Direct adaptive control of spacecraft near asteroids[J]. Acta Astronautica, 2023, 202: 197–213. doi: 10.1016/j.actaastro.2022.10.014.
    [12] SCORSOGLIO A, FURFARO R, LINARES R, et al. Relative motion guidance for near-rectilinear lunar orbits with path constraints via actor-critic reinforcement learning[J]. Advances in Space Research, 2023, 71(1): 316–335. doi: 10.1016/j.asr.2022.08.002.
    [13] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[EB/OL]. https://arxiv.org/abs/1707.06347, 2017.
    [14] 王禄丰, 李爽. J2摄动下非线性轨道不确定性传播方法[C]. 2024年中国航天大会论文集, 武汉, 2024: 59–64. doi: 10.26914/c.cnkihy.2024.081107.

    WANG Lufeng and LI Shuang. Nonlinear orbit uncertainty propagation method under J2 perturbation[C]. Proceedings of 2024 China Aerospace Congress, Wuhan, 2024: 59–64. doi: 10.26914/c.cnkihy.2024.081107. (查阅网上资料,未找到对应的英文翻译信息,请确认).
    [15] 孙盼, 李爽. 连续方程与高斯和框架下轨道不确定性传播方法综述[J]. 中国科学: 物理学 力学 天文学, 2025, 55(9): 294501. doi: 10.1360/SSPMA-2024-0300.

    SUN Pan and LI Shuang. A review of uncertainty propagation methods within continuity equation and Gaussian mixture model frameworks[J]. SCIENTIA SINICA Physica, Mechanica & Astronomica, 2025, 55(9): 294501. doi: 10.1360/SSPMA-2024-0300.
    [16] BAILLIEUL J and SAMAD T. Encyclopedia of Systems and Control[M]. 2nd ed. Cham: Springer, 2021. doi: 10.1007/978-3-030-44184-5. .
    [17] LANDERS M and DORYAB A. Deep reinforcement learning verification: A survey[J]. ACM Computing Surveys, 2023, 55(14s): 1–31. doi: 10.1145/3596444.
    [18] XU Haotian, XUAN Junyu, ZHANG Guangquan, et al. Trust region policy optimization via entropy regularization for Kullback-Leibler divergence constraint[J]. Neurocomputing, 2024, 589: 127716. doi: 10.1016/j.neucom.2024.127716.
    [19] IBRAHIM S, MOSTAFA M, JNADI A, et al. Comprehensive overview of reward engineering and shaping in advancing reinforcement learning applications[J]. IEEE Access, 2024, 12: 175473–175500. doi: 10.1109/access.2024.3504735.
    [20] PAOLO G, CONINX M, LAFLAQUIÈRE A, et al. Discovering and exploiting sparse rewards in a learned behavior space[J]. Evolutionary Computation, 2024, 32(3): 275–305. doi: 10.1162/evco_a_00343.
    [21] NG A Y, HARADA D, and RUSSELL S. Policy invariance under reward transformations: Theory and application to reward shaping[R]. 2016. (查阅网上资料, 未找到本条文献信息, 请确认).
    [22] MONTENBRUCK O and GILL E. Satellite Orbits: Models, Methods and Applications[M]. Berlin: Springer, 2000. doi: 10.1007/978-3-642-58351-3. .
    [23] ZAVOLI A and FEDERICI L. Reinforcement learning for robust trajectory design of interplanetary missions[J]. Journal of Guidance, Control, and Dynamics, 2021, 44(8): 1440–1453. doi: 10.2514/1.G005794.
  • 加载中
图(10) / 表(6)
计量
  • 文章访问数:  5
  • HTML全文浏览量:  5
  • PDF下载量:  0
  • 被引次数: 0
出版历程
  • 修回日期:  2025-12-09
  • 录用日期:  2025-12-09
  • 网络出版日期:  2025-12-13

目录

    /

    返回文章
    返回