A Hierarchical Cross-layer Closed-loop Learning Framework and Collaborative Mechanism for Complex Multi-agent Systems

ZHANG Long; HUANG wenbo; LEI Zhen; FENG Xuanming; WANG Ying

doi:10.11999/JEIT260143

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2026 >

ZHANG Long, HUANG wenbo, LEI Zhen, FENG Xuanming, WANG Ying. A Hierarchical Cross-layer Closed-loop Learning Framework and Collaborative Mechanism for Complex Multi-agent Systems[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT260143

Citation:

ZHANG Long, HUANG wenbo, LEI Zhen, FENG Xuanming, WANG Ying. A Hierarchical Cross-layer Closed-loop Learning Framework and Collaborative Mechanism for Complex Multi-agent Systems[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT260143

Citation:

ZHANG Long, HUANG wenbo, LEI Zhen, FENG Xuanming, WANG Ying. A Hierarchical Cross-layer Closed-loop Learning Framework and Collaborative Mechanism for Complex Multi-agent Systems[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT260143

PDF( 2762 KB)

A Hierarchical Cross-layer Closed-loop Learning Framework and Collaborative Mechanism for Complex Multi-agent Systems

doi: 10.11999/JEIT260143 cstr: 32379.14.JEIT260143

ZHANG Long¹,
HUANG wenbo¹,
LEI Zhen^{1, 2},
FENG Xuanming^{1, 3},
WANG Ying^1
,

1.
System Engineering Research Institute, Academy of Military Science, Beijing 100101, China
2.
Naval Aviation University, Yantai 264000, China
3.
Science and Technology Innovation Research Center, Army Research Institute, Beijing 100012, China

Accepted Date: 2026-06-15
Rev Recd Date: 2026-06-15

Available Online: 2026-06-19

Abstract

Abstract

ObjectiveComplex multi-agent systems (MAS) in dynamic and uncertain environments face challenges in unified modeling, adaptive coordination, and interpretable effectiveness evaluation. Existing methods usually focus on individual decision-making, inter-agent cooperation, or high-level policy evolution separately, resulting in fragmented decision chains and weak cross-layer coupling. Consequently, it is difficult to explain how local learning improvements are transformed into global effectiveness gains under mission variation, environmental disturbance, and partial structural damage. To address this issue, this paper proposes a Hierarchical Cross-layer Closed-loop Learning (HCCL) framework, which couples individual autonomy, system-level collaboration, and system-of-systems learning to build a computable path from local policy optimization to overall effectiveness enhancement.MethodsHCCL adopts a unified three-layer architecture. At the individual autonomy layer, each agent is modeled by a Partially Observable Markov Decision Process (POMDP) to describe decision-making under partial observability. At the system-level collaboration layer, multi-agent cooperation is formulated as a Decentralized Partially Observable Markov Decision Process (Dec-POMDP) and represented by a dynamic directed weighted collaboration graph. A graph neural network is used to encode interaction dependencies, structural couplings, and joint value information. At the system-of-systems learning layer, a Meta-Decentralized Partially Observable Markov Decision Process (Meta-Dec-POMDP) is established to describe task-context adaptation and rule evolution.A cross-layer closed-loop mechanism is further designed. In the bottom-up behavior induction pathway, local state and capability features are aggregated into graph-level structural representations and supplied to the upper rule-learning process. In the top-down rule-shaping pathway, learned high-level rules are transformed into control parameters and fed back to lower layers to regulate local policies and collaboration relationships. Simulations are conducted under baseline, mission-variation, observation-disturbance, and structural-damage scenarios. The full HCCL model is compared with a non-closed-loop model and an upward-induction-only model, and interface ablation studies are performed to analyze the contributions of cross-layer feature reporting, structural induction, and rule shaping.Results and DiscussionsThe full HCCL model consistently outperforms the comparison models and ablated variants. In the baseline scenario, it achieves a task success rate of 88.6% and a comprehensive system effectiveness of 0.842. Under mission variation, it reduces the adaptation process to 16±2 rounds. Under structural damage, it achieves a recovery rate of 81.4% and restores collaboration-structure stability to 0.742 within 20 steps. These results indicate that HCCL improves task performance, adaptation speed, and structural recovery.Ablation results show that removing any cross-layer interface causes performance degradation, while removing the top-down rule-shaping pathway leads to the largest loss. This demonstrates that upward structural perception alone is insufficient for sustained system-level improvement. The effectiveness gain mainly comes from the closed-loop coupling between upward behavior induction and downward rule shaping, rather than from simple hierarchical stacking.ConclusionsThis paper proposes the HCCL framework for complex MAS by integrating POMDP-based individual autonomy modeling, Dec-POMDP and graph-based collaboration modeling, and Meta-Dec-POMDP-based rule evolution. Through bottom-up behavior induction and top-down rule shaping, HCCL provides a computable and interpretable path from local learning to overall effectiveness enhancement. Experimental results verify its advantages in task completion, adaptation, recovery, and collaboration stability under multiple disturbances. Future work will focus on larger-scale heterogeneous systems, communication-constrained networking, online continual adaptation, and data-driven evaluation in realistic environments.
- Hierarchical cross-layer closed-loop learning,
- Multi-agent reinforcement learning,
- Cross-layer coordination,
- Graph-based collaboration modeling,
- Effectiveness evaluation,
- XXX

FullText(HTML)

References(20)

References

[1]	ANNE T, SYRKIS N, ELHOSNI M, et al. Harnessing language for coordination: A framework and benchmark for LLM-driven multi-agent control[EB/OL]. https://arxiv.org/abs/2412.11761, 2024.
[2]	KOIFMAN Y, BAREL A, and BRUCKSTEIN A M. Distributed and decentralized task allocation for heterogeneous swarms[J]. Artificial Life and Robotics, 2026, 31(1): 302–316. doi: 10.1007/s10015-025-01104-3.
[3]	MARTIN F, KIM H J, SILKA L, et al. Artbotics: Challenges and opportunities for multi-disciplinary, community-based learning in computer science, robotics, and art[J]. 2007. (查阅网上资料, 未能确认文献类型, 请确认).
[4]	MEULEMANS A, KOBAYASHI S, VON OSWALD J, et al. Multi-agent cooperation through learning-aware policy gradients[C]. Proceedings of the 13th International Conference on Learning Representations, Singapore, Singapore, 2025.
[5]	KHUSHIYANT. Emergent collective memory in decentralized multi-agent AI systems[EB/OL]. https://arxiv.org/abs/2512.10166, 2025.
[6]	HADY M A, HU Siyi, PRATAMA M, et al. Multi-agent reinforcement learning for resources allocation optimization: A survey[J]. Artificial Intelligence Review, 2025, 58(11): 354. doi: 10.1007/s10462-025-11340-5.
[7]	ZHU Changxi, DASTANI M, and WANG Shihan. A survey of multi-agent deep reinforcement learning with communication[J]. Autonomous Agents and Multi-Agent Systems, 2024, 38(1): 4. doi: 10.1007/s10458-023-09633-6.
[8]	GUPTA N, HARE J Z, MILZMAN J, et al. Action-graph policies: Learning action co-dependencies in multi-agent reinforcement learning[EB/OL]. https://arxiv.org/abs/2602.17009, 2026.
[9]	REN Tianyu, YAO Xuan, LI Yang, et al. Bottom-up reputation promotes cooperation with multi-agent reinforcement learning[C]. Proceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems, Detroit, USA, 2025: 1745–1754. doi: 10.65109/fdxo1013.
[10]	HU Tianmeng, LUO Biao, YANG Chunhua, et al. MO-MIX: Multi-objective multi-agent cooperative decision-making with deep reinforcement learning[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(10): 12098–12112. doi: 10.1109/tpami.2023.3283537.
[11]	鲁旭涛, 智超群, 张丽娜, 等. 应急搜索UAV集群协同任务规划策略[J]. 电子与信息学报, 2022, 44(1): 187–194. doi: 10.11999/JEIT210219. LU Xutao, ZHI Chaoqun, ZHANG Lina, et al. Multi-UAV regional patrol mission planning strategy[J]. Journal of Electronics & Information Technology, 2022, 44(1): 187–194. doi: 10.11999/JEIT210219.
[12]	PAOLO G, BENECHEHAB A, CHERKAOUI H, et al. TAG: A decentralized framework for multi-agent hierarchical reinforcement learning[EB/OL]. https://arxiv.org/abs/2502.15425, 2025.
[13]	LIU Biyuan, XU Daigang, JIANG Lei, et al. Modeling the mental world for embodied AI: A comprehensive review[EB/OL]. https://arxiv.org/abs/2601.02378, 2025.
[14]	徐俊杰, 李斌, 杨敬松. 禁飞区约束下的无人机可重构智能表面辅助通信网络性能优化[J]. 电子与信息学报, 2026, 48(2): 743–751. doi: 10.11999/JEIT250681. XU Junjie, LI Bin, and YANG Jingsong. Performance optimization of UAV-RIS-assisted communication networks under no-fly zone constraints[J]. Journal of Electronics & Information Technology, 2026, 48(2): 743–751. doi: 10.11999/JEIT250681.
[15]	NATH S, PERIDIS C, BENJAMIN E, et al. Policy search, retrieval, and composition via task similarity in collaborative agentic systems[C]. Proceedings of the 40th AAAI Conference on Artificial Intelligence, Singapore, Singapore, 2026: 24504–24512. doi: 10.1609/aaai.v40i29.39633.
[16]	唐伦, 蒲昊, 汪智平, 等. 基于注意力机制ConvLSTM的UAV节能预部署策略[J]. 电子与信息学报, 2022, 44(3): 960–968. doi: 10.11999/JEIT211368. TANG Lun, PU Hao, WANG Zhiping, et al. Energy-efficient predictive deployment strategy of UAVs based on ConvLSTM with attention mechanism[J]. Journal of Electronics & Information Technology, 2022, 44(3): 960–968. doi: 10.11999/JEIT211368.
[17]	LIU Yanli, FENG Haonan, and HATZIARGYRIOU N D. Multi-stage collaborative resilient enhancement strategy for coupling faults in distribution cyber physical systems[J]. Applied Energy, 2023, 348: 121560. doi: 10.1016/j.apenergy.2023.121560.
[18]	DEVLIN J and CHANG M W. AI-assisted pipeline for dynamic generation of trustworthy health supplement content at scale[EB/OL]. https://openalex.org/works/w2896457183, 2018. (查阅网上资料,未能确认本条文献修改是否正确,请确认) (查阅网上资料,未找到本条作者信息,请确认).
[19]	ZHAI Lidong, QIU Zhijie, ZHANG Lvyang, et al. The Athenian academy: A seven-layer architecture model for multi-agent systems[EB/OL]. https://arxiv.org/abs/2504.12735, 2025.
[20]	BARONI M, DESSI R, and LAZARIDOU A. Emergent language-based coordination in deep multi-agent systems[C]. Proceedings of 2022 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts, Abu Dubai, UAE, 2022: 11–16. doi: 10.18653/v1/2022.emnlp-tutorials.3.