高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

MG-MoE:基于路由机制的多粒度专家集成模型

咸凤羽 鉴海防 谢子晖 杜军 张渊媛 宁欣 董苗苗 王洪昌

咸凤羽, 鉴海防, 谢子晖, 杜军, 张渊媛, 宁欣, 董苗苗, 王洪昌. MG-MoE:基于路由机制的多粒度专家集成模型[J]. 电子与信息学报. doi: 10.11999/JEIT260219
引用本文: 咸凤羽, 鉴海防, 谢子晖, 杜军, 张渊媛, 宁欣, 董苗苗, 王洪昌. MG-MoE:基于路由机制的多粒度专家集成模型[J]. 电子与信息学报. doi: 10.11999/JEIT260219
XIAN Fengyu, JIAN Haifang, XIE Zihui, DU Jun, ZHANG Yuanyuan, NING Xin, DONG Miaomiao, WANG Hongchang. MG-MoE: Routed Multi-Granularity Expert Ensemble[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT260219
Citation: XIAN Fengyu, JIAN Haifang, XIE Zihui, DU Jun, ZHANG Yuanyuan, NING Xin, DONG Miaomiao, WANG Hongchang. MG-MoE: Routed Multi-Granularity Expert Ensemble[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT260219

MG-MoE:基于路由机制的多粒度专家集成模型

doi: 10.11999/JEIT260219 cstr: 32379.14.JEIT260219
基金项目: XX基金(基金号);YY项目(项目号)基金项目:国家重点研发计划战略性创新合作项目(2024YFE0210600)
详细信息
    作者简介:

    咸凤羽:女,硕士生,研究方向为深度学习,计算机视觉

    鉴海防:男,研究员,博士生导师,研究方向为高性能专用集成电路设计、智能信息处理算法与系统等

    谢子晖:男,硕士生,研究方向为深度学习,计算机视觉

    杜军:女,副教授,研究方向为优化计算、医学图像处理、智能机器人导航等

    张渊媛:女,副研究员,研究方向为野生动物保护生物学

    宁欣:男,博士,研究员,博士生导师,研究方向为形象认知计算理论和2D/3D视觉算法

    董苗苗:女,硕士研究生,研究方向为深度学习,多模态智能感知

    王洪昌:男,助理研究员,研究方向为多模态智能感知算法与软硬件协同设计

  • 中图分类号: TP391

MG-MoE: Routed Multi-Granularity Expert Ensemble

Funds: This work was supported by the National Key Research and Development Program of China (2024YFE0210600)
  • 摘要: 细粒度图像识别任务中,模型需在类间差异较小的条件下,同时捕捉局部判别线索与全局结构特征,且在复杂背景、姿态变化及长尾数据分布下,仍能保持稳定的泛化性能。本文提出MG-MoE (Multi-Granularity Mixture-of-Experts),一种基于路由机制的多粒度专家集成模型,依托样本自适应条件计算机制,在可控推理开销内实现判别性能的提升。针对路由学习的稳定性与泛化性问题,本文提出两阶段优化策略。第一阶段为动态簇级训练,基于验证集统计构建簇级软教师分布,借助KL散度正则化稳定路由行为,推动专家间形成有效分工;第二阶段为残差微调,在保持特征驱动路由形式不变的前提下,按簇对Top-2专家的分类头进行解冻,并以分组学习率对门控与专家头联合微调,从而缓解专家融合偏差并增强模型对困难样本与长尾类别的判别能力。在CUB-200-2011与Bird-1445两个基准数据集上的实验结果表明,所提出的MG-MoE具有较好的有效性。其中,在CUB-200-2011上,MG-MoE取得了92.89%的准确率;在Bird-1445抽样集上,MG-MoE的准确率达到96.80%,均达到常见模型中的最佳准确率;消融分析进一步表明,受控的Top-2融合与四专家互补结构共同决定了性能上限,并在专家过少或同质扩展时呈现出可解释的退化规律。该研究为细粒度场景下的多粒度专家建模与路由训练提供了可复用的实现范式与分析框架。
  • 图  1  不同鸟类图像专家注意力区域可视化

    每行对应一个测试样本,每列展示输入图像及不同专家的注意力响应。专家名称后的“*”表示该专家对该样本分类正确,“×”表示该专家对该样本分类错误。

    图  2  MG-MoE模型框架图

    图  3  MG-MoE专家激活与类别的统计关联热力图

    图  4  t-SNE 特征空间流形图

    表  1  CUB-200-2011测试集Top-1准确率对比(%)

    模型Top-1准确率(%)相对提升(pp)
    DCL[31]86.86-
    CrossX[32]87.00+0.14
    ConvNeXt[33]87.38+0.52
    ResNet-50[34]87.74+0.88
    PMG[18]88.32+1.46
    TransFG[17]90.49+3.63
    PIM[19]91.17+4.31
    MPSA[3]91.23+4.37
    MG-MoE(本文)92.89+6.03
    下载: 导出CSV

    表  2  Bird-1445抽测试集(200类)Top-1准确率对比(%)

    方法Top-1 准确率(%)相对提升(pp)
    ResNet-5088.90
    DCL89.40+0.50
    CrossX90.60+1.70
    ConvNeXt90.70+1.80
    PMG93.10+4.20
    TransFG93.60+4.70
    PIM94.80+5.90
    MPSA95.10+6.20
    MG-MoE(本文)96.80+7.90
    下载: 导出CSV

    表  3  多专家模型CUB-200-2011效率分析对比

    模型Acc(%)GFLOPsParams(M)Latency(ms)FPS
    PMG88.3237.445.17.5133.3
    MPSA91.2362.794.218.454.3
    PIM91.1773.294.37.7130.5
    TransFG90.4999.187.622.145.2
    MG-MoE (Full)92.89275.5330.956.917.6
    MG-MoE (Top-2)92.89143.9330.93727
    下载: 导出CSV

    表  4  MG-MoE模型Top-K消融实验结果(CUB-200-2011)

    模型数量 Top-1(%) 相对最优(pp)
    Top-1 91.95 –0.94
    Top-2 92.89 0.00
    Top-3 92.62 –0.27
    Top-4 92.10 –0.79
    下载: 导出CSV

    表  5  MG-MoE模型不同专家数量Top-2融合消融实验结果(CUB-200-2011)

    专家总数 专家组合 Top-1(%) 相对最优(pp)
    2 MPSA+TransFG 91.28 –1.61
    2 PMG+PIM 91.22 –1.67
    2 MPSA+PIM 92.03 –0.86
    3 MPSA+TransFG+PIM 92.36 –0.53
    3 MPSA+PMG+PIM 92.15 –0.74
    3 MPSA+PMG+TransFG 92.01 –0.88
    4 MPSA+PMG+TransFG+PIM 92.89 -
    5 四专家+1个同质专家 92.78 –0.11
    6 四专家+2个同质专家 92.60 –0.29
    下载: 导出CSV
  • [1] SUN Hongbo, HE Xiangteng, XU Jinglin, et al. SIM-OFE: Structure information mining and object-aware feature enhancement for fine-grained visual categorization[J]. IEEE Transactions on Image Processing, 2024, 33: 5312–5326. doi: 10.1109/TIP.2024.3459788.
    [2] YANG Shengying, YANG Xinqi, WU Jianfeng, et al. Significant feature suppression and cross-feature fusion networks for fine-grained visual classification[J]. Scientific Reports, 2024, 14(1): 24051. doi: 10.1038/s41598-024-74654-4.
    [3] WANG Jiahui, XU Qin, JIANG Bo, et al. Multi-granularity part sampling attention for fine-grained visual classification[J]. IEEE Transactions on Image Processing, 2024, 33: 4529–4542. doi: 10.1109/TIP.2024.3441813.
    [4] MA Bing, LI Junyi, JIN Zhengbei, et al. Fine-grained image recognition with bio-inspired gradient-aware attention[J]. Biomimetics, 2025, 10(12): 834. doi: 10.3390/biomimetics10120834.
    [5] CHANG Dongliang, TONG Yujun, DU Ruoyi, et al. An erudite fine-grained visual classification model[C]. Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: 7268–7277. doi: 10.1109/CVPR52729.2023.00702.
    [6] SU J C, CHENG Zezhou, and MAJI S. A realistic evaluation of semi-supervised learning for fine-grained classification[C]. Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 12966–12975. doi: 10.1109/CVPR46437.2021.01277.
    [7] SHU Yangyang, YU Baosheng, XU Haiming, et al. Improving fine-grained visual recognition in low data regimes via self-boosting attention mechanism[C]. Proceedings of the 17th European Conference on Computer Vision, Tel Aviv, Israel, 2022: 449–465. doi: 10.1007/978-3-031-19806-9_26.
    [8] FEDUS W, ZOPH B, and SHAZEER N. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity[J]. Journal of Machine Learning Research, 2022, 23(120): 1–39.
    [9] JACOBS R A, JORDAN M I, NOWLAN S J, et al. Adaptive mixtures of local experts[J]. Neural Computation, 1991, 3(1): 79–87. doi: 10.1162/neco.1991.3.1.79.
    [10] SHAZEER N, MIRHOSEINI A, MAZIARZ K, et al. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer[C]. Proceedings of the 5th International Conference on Learning Representations, Toulon, France, 2017.
    [11] RIQUELME C, PUIGCERVER J, MUSTAFA B, et al. Scaling vision with sparse mixture of experts[C]. Proceedings of the 35th International Conference on Neural Information Processing Systems, 2021: 657. (查阅网上资料, 未找到对应的出版地信息, 请确认).
    [12] HAN Xumeng, WEI Longhui, DOU Zhiyang, et al. ViMoE: An empirical study of designing vision mixture-of-experts[J]. IEEE Transactions on Image Processing, 2025, 34: 7209–7221. doi: 10.1109/TIP.2025.3626887.
    [13] ZHU Jinguo, ZHU Xizhou, WANG Wenhai, et al. Uni-perceiver-MoE: Learning sparse generalist models with conditional MoEs[C]. Proceedings of the 36th International Conference on Neural Information Processing Systems, New Orleans, USA, 2022: 193.
    [14] MUSTAFA B, RIQUELME C, PUIGCERVER J, et al. Multimodal contrastive learning with LIMoE: The language-image mixture of experts[C]. Proceedings of the 36th International Conference on Neural Information Processing Systems, New Orleans, USA, 2022: 695.
    [15] SHEN Leyang, CHEN Gongwei, SHAO Rui, et al. MoME: Mixture of multimodal experts for generalist multimodal large language models[C]. Proceedings of the 38th International Conference on Neural Information Processing Systems, Vancouver, Canada, 2024: 1330.
    [16] ZHENG Haiyang, PU Nan, LI Wenjing, et al. Generalized fine-grained category discovery with multi-granularity conceptual experts[J]. arXiv preprint arXiv: 2509.26227, 2025. (查阅网上资料, 请核对文献类型及格式).
    [17] HE Ju, CHEN Jieneng, LIU Shuai, et al. TransFG: A transformer architecture for fine-grained recognition[C]. Proceedings of the 36th AAAI Conference on Artificial Intelligence, 2022: 852–860. doi: 10.1609/aaai.v36i1.19967. (查阅网上资料,未找到对应的出版地信息,请确认).
    [18] DU Ruoyi, CHANG Dongliang, BHUNIA A K, et al. Fine-grained visual classification via progressive multi-granularity training of jigsaw patches[C]. Proceedings of the 16th European Conference on Computer Vision, Glasgow, UK, 2020: 153–168. doi: 10.1007/978-3-030-58565-5_10.
    [19] CHOU P Y, LIN C H, and KAO W C. A novel plug-in module for fine-grained visual classification[J]. arXiv preprint arXiv: 2202.03822, 2022. <b>(查阅网上资料, 请核对文献类型及格式)</b>.
    [20] XU Zhikang, YUE Xiaodong, LV Ying, et al. Trusted fine-grained image classification through hierarchical evidence fusion[C]. Proceedings of the 37th AAAI Conference on Artificial Intelligence, Washington, USA, 2023: 10657–10665. doi: 10.1609/aaai.v37i9.26265.
    [21] ZHENG Haiyang, PU Nan, LI Wenjing, et al. Generalized fine-grained category discovery with multi-granularity conceptual experts[J]. arXiv preprint arXiv: 2509.26227, 2025. (查阅网上资料, 请核对文献类型及格式)(查阅网上资料, 本条文献与第16条文献重复, 请确认).
    [22] LEPIKHIN D, LEE H, XU Yuanzhong, et al. GShard: Scaling giant models with conditional computation and automatic sharding[C]. Proceedings of the 9th International Conference on Learning Representations, Austria, 2021. (查阅网上资料, 未找到对应的出版城市信息, 请确认).
    [23] GURURANGAN S, LEWIS M, HOLTZMAN A, et al. DEMix layers: Disentangling domains for modular language modeling[C]. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Seattle, United States, 2022: 5557–5576. doi: 10.18653/v1/2022.naacl-main.407.
    [24] RAJBHANDARI S, LI Conglong, YAO Zhewei, et al. DeepSpeed-MoE: Advancing mixture-of-experts inference and training to power next-generation AI scale[C]. Proceedings of the 39th International Conference on Machine Learning, Baltimore, USA, 2022: 18332–18346.
    [25] WANG Lean, GAO Huazuo, ZHAO Chenggang, et al. Auxiliary-loss-free load balancing strategy for mixture-of-experts[J]. arXiv preprint arXiv: 2408.15664, 2024. (查阅网上资料, 请核对文献类型及格式).
    [26] ROLLER S, SUKHBAATAR S, SZLAM A, et al. Hash layers for large sparse models[C]. Proceedings of the 35th International Conference on Neural Information Processing Systems, 2021: 1343. (查阅网上资料, 未找到对应的出版地信息, 请确认).
    [27] JIANG A Q, SABLAYROLLES A, ROUX A, et al. Mixtral of experts[J]. arXiv preprint arXiv: 2401.04088, 2024. (查阅网上资料, 请核对文献类型及格式).
    [28] DAI Damai, DENG Chengqi, ZHAO Chenggang, et al. DeepSeekMoE: Towards ultimate expert specialization in mixture-of-experts language models[C]. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, Bangkok, Thailand, 2024. doi: 10.18653/v1/2024.acl-long.70.
    [29] CHEN Tianlong, CHEN Xuxi, DU Xianzhi, et al. AdaMV-MoE: Adaptive multi-task vision mixture-of-experts[C]. Proceedings of 2023 IEEE/CVF International Conference on Computer Vision, Paris, France, 2023: 17346–17357. doi: 10.1109/ICCV51070.2023.01591.
    [30] 王洪昌, 咸凤羽, 谢子晖, 等. BIRD1445: 面向生态监测的大规模多模态鸟类数据集[J]. 电子与信息学报, 2026, 48(2): 873–888. doi: 10.11999/JEIT250647.

    WANG Hongchang, XIAN Fengyu, XIE Zihui, et al. BIRD1445: Large-scale multimodal bird dataset for ecological monitoring[J]. Journal of Electronics & Information Technology, 2026, 48(2): 873–888. doi: 10.11999/JEIT250647.
    [31] CHEN Yue, BAI Yalong, ZHANG Wei, et al. Destruction and construction learning for fine-grained image recognition[C]. Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 5157–5166. doi: 10.1109/CVPR.2019.00530.
    [32] LUO Wei, YANG Xitong, MO Xianjie, et al. Cross-x learning for fine-grained visual categorization[C]. Proceedings of 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), 2019: 8242–8251. doi: 10.1109/ICCV.2019.00833.
    [33] LIU Zhuang, MAO Hanzi, WU Chaoyuan, et al. A ConvNet for the 2020s[C]. Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 11976–11986. doi: 10.1109/CVPR52688.2022.01167.
    [34] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 770–778. doi: 10.1109/CVPR.2016.90.
  • 加载中
图(4) / 表(5)
计量
  • 文章访问数:  10
  • HTML全文浏览量:  5
  • PDF下载量:  1
  • 被引次数: 0
出版历程
  • 修回日期:  2026-04-23
  • 录用日期:  2026-04-23
  • 网络出版日期:  2026-05-23

目录

    /

    返回文章
    返回