UAV-assisted Mobile Edge Computing based on Hybrid Hierarchical DRL in the Internet of Vehicular

YANG Miaoyan; FANG Xuming

doi:10.11999/JEIT250743

Volume 48 Issue 4

Apr. 2026

Turn off MathJax

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2026 > 48(4): 1668-1677

YANG Miaoyan, FANG Xuming. UAV-assisted Mobile Edge Computing based on Hybrid Hierarchical DRL in the Internet of Vehicular[J]. Journal of Electronics & Information Technology, 2026, 48(4): 1668-1677. doi: 10.11999/JEIT250743

Citation:

YANG Miaoyan, FANG Xuming. UAV-assisted Mobile Edge Computing based on Hybrid Hierarchical DRL in the Internet of Vehicular[J]. Journal of Electronics & Information Technology, 2026, 48(4): 1668-1677. doi: 10.11999/JEIT250743

Citation:

PDF( 2634 KB)

UAV-assisted Mobile Edge Computing based on Hybrid Hierarchical DRL in the Internet of Vehicular

doi: 10.11999/JEIT250743 cstr: 32379.14.JEIT250743

YANG Miaoyan^,,
FANG Xuming

School of Information Science and Technology, Southwest Jiaotong University, Chengdu 610031, China

Received Date: 2025-08-12
Accepted Date: 2026-01-22
Rev Recd Date: 2026-01-22

Available Online: 2026-02-12

Publish Date: 2026-04-10

Abstract

Abstract

Objective In the Internet of Vehicles (IoV), the use of Unmanned Aerial Vehicles (UAVs) to address increasing edge computing demand has become a key direction in 6G research. However, when Deep Reinforcement Learning (DRL) is applied to optimize system latency, the action space grows exponentially with the number of vehicles and causes training difficulty and slow convergence. This study proposes a two-layer hybrid solution for UAV-assisted Mobile Edge Computing (MEC) based on DRL, termed Hybrid Hierarchical Deep Reinforcement Learning (HHDRL). Methods The HHDRL algorithm adopts a two-layer architecture to decompose complex optimization tasks. The upper layer uses an agent based on Proximal Policy Optimization (PPO) and a multi-head actor network to manage user offloading and UAV control policies. The N heads determine offloading decisions for N users, including local processing or offloading to associated CAPs or the UAV. A separate UAV flight-control head selects discrete acceleration actions to satisfy practical control constraints. The lower layer applies a computationally efficient greedy algorithm to prioritize resources based on task characteristics. This hybrid hierarchical design reduces the computational cost associated with DRL-only resource allocation. Results and Discussions The performance of the HHDRL scheme was evaluated through numerical simulations using a Rician fading channel model, a UAV flight energy consumption model, and system parameters such as mission data sizes of 9～18 Mbits and mission complexities of 2 000～3 000 cycle/bit. Figure 3 shows that HHDRL converges faster than standard DRL, although the final reward is slightly lower. Figure 4 indicates that HHDRL maintains the user delay fairness of DRL. The evaluation in Figure 5 shows that the proposed method reduces system latency by approximately 71～91% compared with a random baseline and by 1～12% compared with the original DRL algorithm. Figure 6 shows training time results for different numbers of users; HHDRL consistently achieves shorter training times, and its training time grows more slowly as the number of users increases. This results from the reduced DRL output action space. When the PPO-based upper layer is replaced with other DRL algorithms, the scheme still outperforms the random baseline and achieves performance comparable to non-hierarchical DRL, demonstrating the generality of the architecture. Figure 8 shows that computational resources have the strongest effect on latency because computation typically dominates total task processing time. Figure 9 presents UAV trajectory optimization. Figure 9(a) shows realistic velocity changes under discrete acceleration control. Figure 9(b) shows that the UAV adjusts its position to track dynamic user distribution while maintaining stable flight. Conclusions This study presents an HHDRL algorithm that integrates DRL with a greedy strategy in a hierarchical framework to address the training challenges of UAV-assisted MEC in IoV scenarios. The simulations show that (1) the proposed method accelerates convergence and reduces training time compared with standard DRL; (2) its latency performance is comparable to DRL and significantly better than heuristic and random baselines; and (3) the framework effectively manages task offloading, resource allocation, and UAV trajectory optimization under practical constraints. Future work will extend the framework to multi-UAV collaboration and more complex environments.
- UAV,
- Mobile Edge Computing (MEC),
- Hybrid algorithm,
- Resource allocation,
- Deep Reinforcement Learning (DRL)

FullText(HTML)

References(19)

References

[1]	CHENG Kaijun and FANG Xuming. A cost efficient edge computing scheme in dual-band cooperative vehicular network[C]. 2023 IEEE Wireless Communications and Networking Conference (WCNC), Glasgow, United Kingdom, 2023: 1–6. doi: 10.1109/WCNC55385.2023.10118669.
[2]	王汝言, 杨安琪, 吴大鹏, 等. 异步移动边缘计算网络中的联合任务调度与计算资源分配优化策略[J]. 电子与信息学报, 2025, 47(2): 470–479. doi: 10.11999/JEIT240685. WANG Ruyan, YANG Anqi, WU Dapeng, et al. Joint task scheduling and computing resource allocation optimization strategy in asynchronous mobile edge computing networks[J]. Journal of Electronics & Information Technology, 2025, 47(2): 470–479. doi: 10.11999/JEIT240685.
[3]	LIU Yanping, FANG Xuming, XIAO Ming, et al. Latency optimization for multi-UAV-assisted task offloading in air-ground integrated millimeter-wave networks[J]. IEEE Transactions on Wireless Communications, 2024, 23(10): 13359–13376. doi: 10.1109/TWC.2024.3400843.
[4]	WU Yu, FANG Xuming, MIN Geyong, et al. Intelligent offloading balance for vehicular edge computing and networks[J]. IEEE Transactions on Intelligent Transportation Systems, 2025, 26(5): 5792–5803. doi: 10.1109/TITS.2025.3549493.
[5]	杨守义, 成昊泽, 党亚萍. 基于集群协作的云雾混合计算资源分配和负载均衡策略[J]. 电子与信息学报, 2023, 45(7): 2423–2431. doi: 10.11999/JEIT220719. YANG Shouyi, CHENG Haoze, and DANG Yaping. Resource allocation and load balancing strategy in cloud-fog hybrid computing based on cluster-collaboration[J]. Journal of Electronics & Information Technology, 2023, 45(7): 2423–2431. doi: 10.11999/JEIT220719.
[6]	DENG Cailian, FANG Xuming, and WANG Xianbin. UAV-enabled mobile-edge computing for AI applications: Joint model decision, resource allocation, and trajectory optimization[J]. IEEE Internet of Things Journal, 2023, 10(7): 5662–5675. doi: 10.1109/JIOT.2022.3151619.
[7]	YAN Xuezhen, FANG Xuming, DENG Cailian, et al. Joint optimization of resource allocation and trajectory control for mobile group users in fixed-wing UAV-enabled wireless network[J]. IEEE Transactions on Wireless Communications, 2024, 23(2): 1608–1621. doi: 10.1109/TWC.2023.3290748.
[8]	HE Long, SUN Geng, SUN Zemin, et al. An online joint optimization approach for QoE maximization in UAV-enabled mobile edge computing[C]. The IEEE INFOCOM 2024-IEEE Conference on Computer Communications, Vancouver, Canada, 2024: 101–110. doi: 10.1109/INFOCOM52122.2024.10621306.
[9]	李斌, 蔡海晨, 赵传信, 等. 基于计算重用的无人机辅助边缘计算系统能耗优化[J]. 电子与信息学报, 2024, 46(7): 2740–2747. doi: 10.11999/JEIT231061. LI Bin, CAI Haichen, ZHAO Chuanxin, et al. Energy optimization for computing reuse in unmanned aerial vehicle-assisted edge computing systems[J]. Journal of Electronics & Information Technology, 2024, 46(7): 2740–2747. doi: 10.11999/JEIT231061.
[10]	ZHANG You and MAO Zhengchong. Computation offloading service in UAV-assisted mobile edge computing: A soft actor-critic approach[C]. 2023 International Conference on Ubiquitous Communication (Ucom), Xi’an, China, 2023: 373–378. doi: 10.1109/Ucom59132.2023.10257660.
[11]	GAO Yuan, DING Yu, WANG Ye, et al. Deep reinforcement learning-based trajectory optimization and resource allocation for secure UAV-enabled MEC networks[C]. The IEEE INFOCOM 2024-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Vancouver, Canada, 2024: 01–05. doi: 10.1109/INFOCOMWKSHPS61880.2024.10620895.
[12]	CHEN Ying, YANG Yaozong, WU Yuan, et al. Joint trajectory optimization and resource allocation in UAV-MEC systems: A Lyapunov-assisted DRL approach[J]. IEEE Transactions on Services Computing, 2025, 18(2): 854–867. doi: 10.1109/TSC.2025.3544124.
[13]	YIN Baolin, FANG Xuming, and WANG Xianbin. Joint optimization of trajectory control, resource allocation, and user association based on DRL for multi-fixed-wing UAV networks[J]. IEEE Transactions on Wireless Communications, 2024, 23(10): 13330–13343. doi: 10.1109/TWC.2024.3400821.
[14]	YANG M, JEON S W, and KIM D K. Optimal trajectory for curvature-constrained UAV mobile base stations[J]. IEEE Wireless Communications Letters, 2020, 9(7): 1056–1059. doi: 10.1109/LWC.2020.2980281.
[15]	ICAO. Unmanned Aircraft Systems (UAS) Traffic Management (UTM). Doc 10049, 2023.
[16]	YOU Changsheng and ZHANG Rui. 3D trajectory optimization in Rician fading for UAV-enabled data harvesting[J]. IEEE Transactions on Wireless Communications, 2019, 18(6): 3192–3207. doi: 10.1109/TWC.2019.2911939.
[17]	XU Yanke, GENG Qingbo, FEI Qing, et al. Research on UAV-assisted computation offloading based on PER-SAC[C]. 2024 China Automation Congress (CAC), Qingdao, China, 2024: 5672–5677. doi: 10.1109/CAC63892.2024.10865625.
[18]	ZENG Yong, XU Jie, and ZHANG Rui. Energy minimization for wireless communication with rotary-wing UAV[J]. IEEE Transactions on Wireless Communications, 2019, 18(4): 2329–2345. doi: 10.1109/TWC.2019.2902559.
[19]	CHEN Juan, XING Huanlai, XIAO Zhiwen, et al. A DRL agent for jointly optimizing computation offloading and resource allocation in MEC[J]. IEEE Internet of Things Journal, 2021, 8(24): 17508–17524. doi: 10.1109/JIOT.2021.3081694.