Multi-Mode Anti-Jamming for UAV Communications: A Cooperative Mode-Based Decision-Making Approach via Two-Dimensional Transfer Reinforcement Learning

WANG Shiyu; WANG Ximing; KE Zhenyi; LIU Dianxiong; LIU Jize; DU Zhiyong

doi:10.11999/JEIT250566

Volume 47 Issue 11

Nov. 2025

Turn off MathJax

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2025 > 47(11): 4200-4210

WANG Shiyu, WANG Ximing, KE Zhenyi, LIU Dianxiong, LIU Jize, DU Zhiyong. Multi-Mode Anti-Jamming for UAV Communications: A Cooperative Mode-Based Decision-Making Approach via Two-Dimensional Transfer Reinforcement Learning[J]. Journal of Electronics & Information Technology, 2025, 47(11): 4200-4210. doi: 10.11999/JEIT250566

Citation:

WANG Shiyu, WANG Ximing, KE Zhenyi, LIU Dianxiong, LIU Jize, DU Zhiyong. Multi-Mode Anti-Jamming for UAV Communications: A Cooperative Mode-Based Decision-Making Approach via Two-Dimensional Transfer Reinforcement Learning[J]. Journal of Electronics & Information Technology, 2025, 47(11): 4200-4210. doi: 10.11999/JEIT250566

Citation:

WANG Shiyu, WANG Ximing, KE Zhenyi, LIU Dianxiong, LIU Jize, DU Zhiyong. Multi-Mode Anti-Jamming for UAV Communications: A Cooperative Mode-Based Decision-Making Approach via Two-Dimensional Transfer Reinforcement Learning[J]. Journal of Electronics & Information Technology, 2025, 47(11): 4200-4210. doi: 10.11999/JEIT250566

PDF( 3743 KB)

Multi-Mode Anti-Jamming for UAV Communications: A Cooperative Mode-Based Decision-Making Approach via Two-Dimensional Transfer Reinforcement Learning

doi: 10.11999/JEIT250566 cstr: 32379.14.JEIT250566

1.
National University of Defense Technology, Wuhan 430030, China
2.
The Institute of Systems Engineering, AMS, Beijing 100141, China

Funds: The National Natural Science Foundation of China (62201581, 62471473)

Received Date: 2025-06-19
Rev Recd Date: 2025-09-08

Available Online: 2025-09-12

Publish Date: 2025-11-10

Abstract

Abstract

Objective With the widespread application of Unmanned Aerial Vehicles (UAVs) in military reconnaissance, logistics, and emergency communications, ensuring the security and reliability of UAV communication systems has become a critical challenge. Wireless channels are highly vulnerable to diverse jamming attacks. Traditional anti-jamming techniques, such as Frequency-Hopping Spread Spectrum (FHSS), are limited in dynamic spectrum environments and may be compromised by advanced machine learning algorithms. Furthermore, UAVs operate under strict constraints on onboard computational power and energy, which hinders the real-time use of complex anti-jamming algorithms. To address these challenges, this study proposes a multi-mode anti-jamming framework that integrates Intelligent Frequency Hopping (IFH), Jamming-based Backscatter Communication (JBC), and Energy Harvesting (EH) to strengthen communication resilience in complex electromagnetic environments. A Multi-mode Transfer Deep Q-Learning (MT-DQN) method is further proposed, enabling two-dimensional transfer to improve learning efficiency and adaptability under resource constraints. By leveraging transfer learning, the framework reduces computational load and accelerates decision-making, thereby allowing UAVs to counter jamming threats effectively even with limited resources. Methods The proposed framework adopts a multi-mode anti-jamming architecture that integrates IFH, JBC, and EH to establish a comprehensive defense strategy of “avoiding, utilizing, and converting” interference. The system is formulated as a Markov Decision Process (MDP) to dynamically optimize the selection of anti-jamming modes and communication channels. To address the challenges of high-dimensional state-action spaces and restricted onboard computational resources, a two-dimensional transfer reinforcement learning framework is developed. This framework comprises a cross-mode strategy-sharing network for extracting common features across different anti-jamming modes (Fig. 3) and a parallel network for cross-task transfer learning to adapt to variable task requirements (Fig. 4). The cross-mode strategy-sharing network accelerates convergence by reusing experiences, whereas the cross-task transfer learning network enables knowledge transfer under different task weightings. The reward function is designed to balance communication throughput and energy consumption. It guides the UAV to select the optimal anti-jamming strategy in real time based on spectrum sensing outcomes and task priorities. Results and Discussions The simulation results validate the effectiveness of the proposed MT-DQN. The dynamic weight allocation mechanism exhibits strong cross-task transfer capability (Fig. 6), as weight adjustments enable rapid convergence toward the corresponding optimal reward values. Compared with conventional Deep Reinforcement Learning (DRL) algorithms, the proposed method achieves a 64% faster convergence rate while maintaining the probability of communication interruption below 20% in dynamic jamming environments (Fig. 7). The framework shows robust performance in terms of throughput, convergence rate, and adaptability to variations in jamming patterns. In scenarios with comb-shaped and sweep-frequency jamming, the proposed method yields higher normalized throughput and faster convergence, exceeding baseline DQN and other transfer learning-based approaches. The results also indicate that MT-DQN improves stability and accelerates policy optimization during jamming pattern switching (Fig. 7), highlighting its adaptability to abrupt changes in jamming patterns through transfer learning. Conclusions This study proposes a multi-modal anti-jamming framework that integrates IFH, JBC, and EH, thereby enhancing the communication capability of UAVs. The proposed solution shifts the paradigm from traditional jamming avoidance toward active jamming exploitation, repurposing jamming signals as covert carriers to overcome the limitations of conventional frequency-hopping systems. Simulation results confirm the advantages of the proposed method in throughput performance, convergence rate, and environmental adaptability, demonstrating stable communication quality even under complex electromagnetic conditions. Although DRL approaches are inherently constrained in handling completely random jamming without intrinsic patterns, this work improves adaptability to dynamic jamming through transfer learning and cross-modal strategy sharing. These findings provide a promising approach for countering complex jamming threats in UAV networks. Future work will focus on validating the proposed algorithm in hardware implementations and enhancing the robustness of DRL methods under highly non-stationary, though not entirely unpredictable, jamming conditions such as pseudo-random or adaptive interference.

FullText(HTML)

References(24)

References

[1]	ŠIMON O and GÖTTHANS T. A survey on the use of deep learning techniques for UAV jamming and deception[J]. Electronics, 2022, 11(19): 3025. doi: 10.3390/electronics11193025.
[2]	XUE Haonan, ZHUO Zhihai, YAN Weihao, et al. Research on UAV jamming signal generation based on intelligent jamming[J]. IEEE Access, 2025, 13: 14686–14701. doi: 10.1109/ACCESS.2025.3530987.
[3]	YU A, KOLOTYLO I, HASHIM H A, et al. Electronic warfare cyberattacks, countermeasures, and modern defensive strategies of UAV avionics: A survey[J]. IEEE Access, 2025, 13: 68660–68681. doi: 10.1109/ACCESS.2025.3561068.
[4]	LIU Dianxiong, DU Zhiyong, LIU Xiaodu, et al. Task-based network reconfiguration in distributed UAV swarms: A bilateral matching approach[J]. IEEE/ACM Transactions on Networking, 2022, 30(6): 2688–2700. doi: 10.1109/TNET.2022.3181036.
[5]	YAZICIGIL R T, NADEAU P, RICHMAN D, et al. Ultra-fast bit-level frequency-hopping transmitter for securing low-power wireless devices[C]. 2018 IEEE Radio Frequency Integrated Circuits Symposium, Philadelphia, USA, 2018: 176–179. doi: 10.1109/RFIC.2018.8428994.
[6]	SHE Honghan, CHENG Yufan, ZHANG Wenzihan, et al. A synchronization acquisition algorithm based on the frequency hopping pulses combining[J]. China Communications, 2024, 21(4): 74–87. doi: 10.23919/JCC.fa.2023-0505.202404.
[7]	WANG Beibei, WU Yongle, LIU K J R, et al. An anti-jamming stochastic game for cognitive radio networks[J]. IEEE Journal on Selected Areas in Communications, 2011, 29(4): 877–889. doi: 10.1109/JSAC.2011.110418.
[8]	GAO Yulan, XIAO Yue, WU Mingming, et al. Game theory-based anti-jamming strategies for frequency hopping wireless communications[J]. IEEE Transactions on Wireless Communications, 2018, 17(8): 5314–5326. doi: 10.1109/TWC.2018.2841921.
[9]	邓喆, 鲁信金, 雷菁. 一种非合作通信中跳频序列多站点联合预测方法[J]. 无线电通信技术, 2022, 48(5): 865–878. doi: 10.3969/j.issn.1003-3114.2022.05.013. DENG Zhe, LU Xinjin, and LEI Jing. Research on joint prediction method of frequency hopping sequence[J]. Radio Communications Technology, 2022, 48(5): 865–878. doi: 10.3969/j.issn.1003-3114.2022.05.013.
[10]	RAO Ning, XU Hua, QI Zisen, et al. Adaptive jamming decision-making against FHSS communications via inexpert demonstrations assisted meta reinforcement learning[J]. IEEE Communications Letters, 2025, 29(1): 105–109. doi: 10.1109/LCOMM.2024.3502423.
[11]	康雅洁, 林艳, 张一晋. 基于贝叶斯Q学习的无人机集群抗干扰智能快跳频算法[J]. 航天控制, 2022, 40(2): 73–78. doi: 10.3969/j.issn.1006-3242.2022.02.013. KANG Yajie, LIN Yan, and ZHANG Yijin. Intelligent fast frequency hopping algorithm for UAV swarm anti-jamming based on Bayesian Q-learning[J]. Aerospace Control, 2022, 40(2): 73–78. doi: 10.3969/j.issn.1006-3242.2022.02.013.
[12]	王瑞东, 张彦龙, 魏鹏, 等. 战术跳频系统智能抗干扰决策[J]. 信号处理, 2023, 39(1): 84–95. doi: 10.16798/j.issn.1003-0530.2023.01.009. WANG Ruidong, ZHANG Yanlong, WEI Peng, et al. Intelligent anti-jamming strategy for tactical frequency-hopping system[J]. Journal of Signal Processing, 2023, 39(1): 84–95. doi: 10.16798/j.issn.1003-0530.2023.01.009.
[13]	张惠婷, 张然, 刘敏提, 等. 基于深度强化学习的无人机通信抗干扰算法[J]. 兵器装备工程学报, 2022, 43(10): 27–34. doi: 10.11809/bqzbgcxb2022.10.004. ZHANG Huiting, ZHANG Ran, LIU Minti, et al. Anti-jamming algorithm of UAV communication based on deep reinforcement learning[J]. Journal of Ordnance Equipment Engineering, 2022, 43(10): 27–34. doi: 10.11809/bqzbgcxb2022.10.004.
[14]	KE Zhenyi, WANG Ximing, DU Zhiyong, et al. Intelligent frequency reuse for dynamic spectrum anti-jamming: A hybrid-reward-based multi-agent deep reinforcement learning approach[J]. IEEE Wireless Communications Letters, 2025, 14(3): 771–775. doi: 10.1109/LWC.2024.3523221.
[15]	DU Zhiyong, WANG Shiyu, WANG Ximing, et al. Formation-aware UAV network self-organization with game-theoretic distributed topology control[J]. IEEE Transactions on Cognitive Communications and Networking, 2025, doi: 10.1109/TCCN.2025.3530443.
[16]	LI Wen, QIN Yuan, FENG Zhibin, et al. “Advancing secretly by an unknown path”: A reinforcement learning-based hidden strategy for combating intelligent reactive jammer[J]. IEEE Wireless Communications Letters, 2022, 11(7): 1320–1324. doi: 10.1109/LWC.2022.3165633.
[17]	VAN HUYNH N, NGUYEN D N, HOANG D T, et al. “Jam me if you can:” Defeating jammer with deep dueling neural network architecture and ambient backscattering augmented communications[J]. IEEE Journal on Selected Areas in Communications, 2019, 37(11): 2603–2620. doi: 10.1109/JSAC.2019.2933889.
[18]	MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529–533. doi: 10.1038/nature14236.
[19]	DU Zhiyong, DENG Yansha, GUO Weisi, et al. Green deep reinforcement learning for radio resource management: Architecture, algorithm compression, and challenges[J]. IEEE Vehicular Technology Magazine, 2021, 16(1): 29–39. doi: 10.1109/MVT.2020.3015184.
[20]	胡杨林, 张天魁, 李博, 等. 无人机使能的通信感知一体化组网与技术研究综述[J]. 电子与信息学报, 2025, 47(4): 859–875. doi: 10.11999/JEIT241116. HU Yanglin, ZHANG Tiankui, LI Bo, et al. A survey on UAV-enabled integrated sensing and communication networking and technologies[J]. Journal of Electronics & Information Technology, 2025, 47(4): 859–875. doi: 10.11999/JEIT241116.
[21]	CHEN Yunfei, SABNIS K T, and ABD-ALHAMEED R A. New formula for conversion efficiency of RF EH and its wireless applications[J]. IEEE Transactions on Vehicular Technology, 2016, 65(11): 9410–9414. doi: 10.1109/TVT.2016.2515843.
[22]	SAMALA S, MISHRA S, and SINGH S S. Spectrum sensing techniques in cognitive radio technology: A review paper[J]. Journal of Communications, 2020, 15(7): 577–582. doi: 10.12720/jcm.15.7.577-582.
[23]	HE Kaiming and SUN Jian. Convolutional neural networks at constrained time cost[C]. 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, 2015. doi: 10.1109/CVPR.2015.7299173.
[24]	GAO Yayun, YUAN Ye, LI Huiyong, et al. Reinforcement learning-based antijamming strategy for self-defense jammer-aided radar systems[J]. IEEE Transactions on Aerospace and Electronic Systems, 2025, 61(2): 3852–3867. doi: 10.1109/TAES.2024.3492168.