Advanced Search
Volume 47 Issue 5
May  2025
Turn off MathJax
Article Contents
LU Yin, LIU Jinzhi, ZHANG Min. A Model-Assisted Federated Reinforcement Learning Method for Multi-UAV Path Planning[J]. Journal of Electronics & Information Technology, 2025, 47(5): 1368-1380. doi: 10.11999/JEIT241055
Citation: LU Yin, LIU Jinzhi, ZHANG Min. A Model-Assisted Federated Reinforcement Learning Method for Multi-UAV Path Planning[J]. Journal of Electronics & Information Technology, 2025, 47(5): 1368-1380. doi: 10.11999/JEIT241055

A Model-Assisted Federated Reinforcement Learning Method for Multi-UAV Path Planning

doi: 10.11999/JEIT241055 cstr: 32379.14.JEIT241055
Funds:  The National Natural Science Foundation of China (62401290)
  • Received Date: 2024-12-02
  • Rev Recd Date: 2025-04-30
  • Available Online: 2025-05-09
  • Publish Date: 2025-05-01
  •   Objective  The rapid advancement of low-altitude Internet of Things (IoT) applications has increased the demand for efficient sensor data acquisition. Unmanned Aerial Vehicles (UAVs) have emerged as a viable solution due to their high mobility and deployment flexibility. However, existing multi-UAV path planning algorithms show limited adaptability and coordination efficiency in dynamic and complex environments. To overcome these limitations, this study develops a model-assisted approach that constructs a hybrid simulated environment by integrating channel modeling with position estimation. This strategy reduces the interaction cost between UAVs and the real world. Building on this, a federated reinforcement learning-based algorithm is proposed, which incorporates a maximum entropy strategy, monotonic value function decomposition, and a federated learning framework. The method is designed to optimize two objectives: maximizing the data collection rate and minimizing the flight path length. The proposed algorithm provides a scalable and efficient solution for cooperative multi-UAV path planning under dynamic and uncertain conditions.  Methods  This study formulates the multi-UAV path planning problem as a multi-objective optimization task and models it using a Decentralized Partially Observable Markov Decision Process (Dec-POMDP) to address dynamic environments with partially unknown device positions. To improve credit assignment and exploration efficiency, enhanced reinforcement learning algorithms are developed. The exploration capacity of individual agents is increased using a maximum entropy strategy, and a dynamic entropy regularization mechanism is incorporated to avoid premature convergence. To ensure global optimality of the cooperative strategy, the method integrates monotonic value function decomposition based on the QMIX algorithm. A multi-dimensional reward function is designed to guide UAVs in balancing competing objectives, including data collection, path length, and device exploration. To reduce interaction costs in real environments, a model-assisted training framework is established. This framework combines known information with neural networks to learn channel characteristics and applies an improved particle swarm algorithm to estimate unknown device locations. To enhance generalization, federated learning is employed to aggregate local experiences from multiple UAVs into a global model through periodic updates. In addition, an attention mechanism is introduced to optimize inter-agent information aggregation, improving the accuracy of collaborative decision-making.  Results and Discussions  Simulation results demonstrate that the proposed algorithm converges more rapidly and with reduced volatility (red curves in Fig. 3 and Fig. 4), due to a 70% reduction in interactions with the real environment achieved by the model-assisted framework. The federated learning mechanism further enhances policy generalization through global model aggregation. Under test conditions with an initial energy of 50~80 J, the data collection rate increases by 2.1~7.4%, and the flight path length decreases by 6.9~14.4% relative to the baseline model (Fig. 6 and Fig. 7), confirming the effectiveness of the reward function and exploration strategy (Fig. 5). The attention mechanism allows UAVs to identify dependencies among sensing targets and cooperative agents, improving coordination. As shown in Fig. 2, the UAVs dynamically partition the environment to cover undiscovered devices, reducing path overlap and significantly improving collaborative efficiency.  Conclusions  This study proposes a model-assisted multi-UAV path planning method that integrates maximum entropy reinforcement learning, the QMIX algorithm, and federated learning to address the multi-objective data collection problem in complex environments. By incorporating modeling, dynamic entropy adjustment, and an attention mechanism within the Dec-POMDP framework, the approach effectively balances exploration and exploitation while resolving collaborative credit assignment in partially observable settings. The use of federated learning for distributed training and model sharing reduces communication overhead and enhances system scalability. Simulation results demonstrate that the proposed algorithm achieves superior performance in data collection efficiency, path optimization, and training stability compared with conventional methods. Future work will focus on coordination of heterogeneous UAV clusters and robustness under uncertain communication conditions to further support efficient data collection for low-altitude IoT applications.
  • loading
  • [1]
    WEI Zhiqing, ZHU Mingyue, ZHANG Ning, et al. UAV-assisted data collection for Internet of Things: A survey[J]. IEEE Internet of Things Journal, 2022, 9(17): 15460–15483. doi: 10.1109/JIOT.2022.3176903.
    [2]
    CHENG Zhekun, ZHAO Liangyu, and SHI Zhongjiao. Decentralized multi-UAV path planning based on two-layer coordinative framework for formation rendezvous[J]. IEEE Access, 2022, 10: 45695–45708. doi: 10.1109/ACCESS.2022.3170583.
    [3]
    ZHENG Jibin, DING Minghui, SUN Lu, et al. Distributed stochastic algorithm based on enhanced genetic algorithm for path planning of multi-UAV cooperative area search[J]. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(8): 8290–8303. doi: 10.1109/TITS.2023.3258482.
    [4]
    LIU Zhihong, WANG Xiangke, SHEN Lincheng, et al. Mission-oriented miniature fixed-wing UAV swarms: A multilayered and distributed architecture[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2022, 52(3): 1588–1602. doi: 10.1109/TSMC.2020.3033935.
    [5]
    VELICHKO N A. Distributed multi-agent reinforcement learning based on feudal networks[C]. 2024 6th International Youth Conference on Radio Electronics, Electrical and Power Engineering (REEPE), Moscow, Russian Federation, 2024: 1–5. doi: 10.1109/REEPE60449.2024.10479775.
    [6]
    LIN Mengting, LI Bin, ZHOU Bin, et al. Distributed stochastic model predictive control for heterogeneous UAV swarm[J]. IEEE Transactions on Industrial Electronics, 2024: 1–11. doi: 10.1109/TIE.2024.3508055.
    [7]
    WANG Xu, WANG Sen, LIANG Xingxing, et al. Deep reinforcement learning: A survey[J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(4): 5064–5078. doi: 10.1109/TNNLS.2022.3207346.
    [8]
    WESTHEIDER J, RÜCKIN J, and POPOVIĆ M. Multi-UAV adaptive path planning using deep reinforcement learning[C]. 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, USA, 2023: 649–656. doi: 10.1109/IROS55552.2023.10342516.
    [9]
    PUENTE-CASTRO A, RIVERO D, PEDROSA E, et al. Q-learning based system for path planning with unmanned aerial vehicles swarms in obstacle environments[J]. Expert Systems with Applications, 2024, 235: 121240. doi: 10.1016/j.eswa.2023.121240.
    [10]
    HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]. The 35th International Conference on Machine Learning, Stockholm, Sweden, 2018: 1861–1870.
    [11]
    LOWE R, WU Yi, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[C]. The 31st International Conference on Neural Information Processing Systems, Long Beach, USA, 2017: 6382–6393.
    [12]
    FOERSTER J N, FARQUHAR G, AFOURAS T, et al. Counterfactual multi-agent policy gradients[C]. The 32nd AAAI Conference on Artificial Intelligence, New Orleans, USA, 2018: 363. doi: 10.1609/aaai.v32i1.11794.
    [13]
    MAHAJAN A, RASHID T, SAMVELYAN M, et al. MAVEN: Multi-agent variational exploration[C]. The 33rd International Conference on Neural Information Processing Systems, 2019: 684.
    [14]
    RASHID T, SAMVELYAN M, DE WITT C S, et al. Monotonic value function factorisation for deep multi-agent reinforcement learning[J]. The Journal of Machine Learning Research, 2020, 21(1): 178.
    [15]
    LYU Lingjuan, YU Han, MA Xingjun, et al. Privacy and robustness in federated learning: Attacks and defenses[J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(7): 8726–8746. doi: 10.1109/TNNLS.2022.3216981.
    [16]
    WANG Tianshun, HUANG Xumin, WU Yuan, et al. UAV swarm-assisted two-tier hierarchical federated learning[J]. IEEE Transactions on Network Science and Engineering, 2024, 11(1): 943–956. doi: 10.1109/TNSE.2023.3311024.
    [17]
    HU Chen, REN Hanchi, DENG Jingjing, et al. Distributed learning for UAV swarms[J]. arXiv preprint arXiv: 2410.15882, 2024. doi: 10.48550/arXiv.2410.15882.
    [18]
    TONG Ziheng, WANG Jingjing, HOU Xiangwang, et al. Blockchain-based trustworthy and efficient hierarchical federated learning for UAV-enabled IoT networks[J]. IEEE Internet of Things Journal, 2024, 11(21): 34270–34282. doi: 10.1109/JIOT.2024.3370964.
    [19]
    FIROUZJAEI H M, MOGHADDAM J Z, and ARDEBILIPOUR M. Delay optimization of a federated learning-based UAV-aided IoT network[J]. arXiv preprint arXiv: 2502.06284, 2025. doi: 10.48550/arXiv.2502.06284.
    [20]
    WANG Pengfei, YANG Hao, HAN Guangjie, et al. Decentralized navigation with heterogeneous federated reinforcement learning for UAV-enabled mobile edge computing[J]. IEEE Transactions on Mobile Computing, 2024, 23(12): 13621–13638. doi: 10.1109/TMC.2024.3439696.
    [21]
    ESRAFILIAN O, BAYERLEIN H, and GESBERT D. Model-aided deep reinforcement learning for sample-efficient UAV trajectory design in IoT networks[C]. 2021 IEEE Global Communications Conference (GLOBECOM), Madrid, Spain, 2021: 1–6. doi: 10.1109/GLOBECOM46510.2021.9685774.
    [22]
    CHEN Jichao, ESRAFILIAN O, BAYERLEIN H, et al. Model-aided federated reinforcement learning for multi-UAV trajectory planning in IoT networks[C]. 2023 IEEE Globecom Workshops (GC Wkshps), Kuala Lumpur, Malaysia, 2023: 818–823. doi: 10.1109/GCWkshps58843.2023.10465088.
    [23]
    YAN Yan, ZHANG Baoxian, LI Cheng, et al. A novel model-assisted decentralized multi-agent reinforcement learning for joint optimization of hybrid beamforming in massive MIMO mmWave systems[J]. IEEE Transactions on Vehicular Technology, 2023, 72(11): 14743–14755. doi: 10.1109/TVT.2023.3280910.
    [24]
    ZHANG Tuo, FENG Tiantian, ALAM S, et al. GPT-FL: Generative pre-trained model-assisted federated learning[J]. arXiv Preprint arXiv: 2306.02210, 2023. doi: 10.48550/arXiv.2306.02210.
    [25]
    BAYERLEIN H, THEILE M, CACCAMO M, et al. Multi-UAV path planning for wireless data harvesting with deep reinforcement learning[J]. IEEE Open Journal of the Communications Society, 2021, 2: 1171–1187. doi: 10.1109/OJCOMS.2021.3081996.
    [26]
    OLIEHOEK F A and AMATO C. A Concise Introduction to Decentralized POMDPs[M]. Cham, Switzerland: Springer, 2016. doi: 10.1007/978-3-319-28929-8.
    [27]
    ZENG Yong, XU Jie, and ZHANG Rui. Energy minimization for wireless communication with rotary-wing UAV[J]. IEEE Transactions on Wireless Communications, 2019, 18(4): 2329–2345. doi: 10.1109/TWC.2019.2902559.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(7)  / Tables(2)

    Article Metrics

    Article views (759) PDF downloads(107) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return