高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

动态尺度感知驱动的多无人机协同3D目标检测方法

段淑靖 王智睿 成培瑞 付琨

段淑靖, 王智睿, 成培瑞, 付琨. 动态尺度感知驱动的多无人机协同3D目标检测方法[J]. 电子与信息学报. doi: 10.11999/JEIT251378
引用本文: 段淑靖, 王智睿, 成培瑞, 付琨. 动态尺度感知驱动的多无人机协同3D目标检测方法[J]. 电子与信息学报. doi: 10.11999/JEIT251378
DUAN Shujing, WANG Zhirui, CHENG Peirui, FU Kun. Dynamic Scale Perception-Driven Multi-UAV Collaborative 3D Object Detection Method[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT251378
Citation: DUAN Shujing, WANG Zhirui, CHENG Peirui, FU Kun. Dynamic Scale Perception-Driven Multi-UAV Collaborative 3D Object Detection Method[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT251378

动态尺度感知驱动的多无人机协同3D目标检测方法

doi: 10.11999/JEIT251378 cstr: 32379.14.JEIT251378
基金项目: 国家自然科学基金(62571515)
详细信息
    作者简介:

    段淑靖:女,博士生,研究方向为分布式遥感图像智能解译

    王智睿:男,副研究员,研究方向为计算机视觉与遥感图像理解

    成培瑞:男,副研究员,研究方向为遥感图像智能解译

    付琨:男,研究员,研究方向为遥感大数据智能解译

    通讯作者:

    王智睿 zhirui1990@126.com

  • 中图分类号: TN911.73; TP75

Dynamic Scale Perception-Driven Multi-UAV Collaborative 3D Object Detection Method

Funds: The National Nature Science Foundation of China (62571515)
  • 摘要: 多无人机协同三维目标检测是低空智能感知领域的核心技术,鸟瞰图(Bird’s Eye View, BEV)特征表征范式为该任务提供了全局空间一致性支撑。但在实际应用中,受遥感图像目标尺度小、分布稀疏等特性影响,现有基于Transformer的BEV感知方法因采用全图同质化特征处理策略,会造成大量计算资源浪费,或者容易丢失小目标的精细特征,难以实现计算效率与检测精度的平衡。针对上述问题,该文提出一种适用于多无人机协同场景的动态尺度感知检测网络,核心思路是通过尺度差异化特征处理机制,实现计算效率与检测精度的协同优化。为此,设计两个核心创新模块:动态尺度感知BEV生成模块(DSBG)与自适应BEV特征协同聚合模块(ACFA)。其中,DSBG模块基于各无人机的特征图目标分布情况动态感知生成多分辨率BEV特征;ACFA模块对多分辨率BEV特征进行自适应加权融合,生成全局一致的协同BEV特征,再输入检测解码器完成目标预测。实验结果表明,所提网络在AeroCollab3D和Air-Co-Pred两个多无人机协同仿真数据集上均表现优异,平均预测精度(mAP)分别达到64.0%和80.6%,相较于其他先进方法分别提升1.5%和7.2%;同时计算成本最大降低41.6%,实现了计算效率与检测精度的高效平衡。
  • 图  1  动态尺度感知驱动的多无人机协同3D目标检测网络结构图

    图  2  自适应BEV特征协同聚合模块示意图

    图  3  AeroCollab3D数据集上鸟瞰图检测可视化结果

    图  4  AeroCollab3D数据集检测可视化结果

    表  1  AeroCollab3D数据集对比实验结果

    方法 BEV网格大小 mAP↑(%) mATE↓(m) mASE↓ mAOE↓ Cost↓
    BEVDet[10] 128×128 55.4 0.512 0.196 0.498 4.712
    BEVDet4D[11] 58.7 0.499 0.102 0.317 4.712
    BEVLongTerm[10] 33.5 0.527 0.298 0.515 4.712
    BEVDepth[12] 59.9 0.489 0.106 0.495 4.712
    Where2comm[25] 52.3 0.473 0.199 0.415 4.712
    UCDNet[21] 62.5 0.487 0.188 0.399 4.712
    本文方法 64.0 0.460 0.086 0.288 3.505
    下载: 导出CSV

    表  2  AeroCollab3D数据集细粒度检测结果

    目标类别mAP↑(%)mATE↓(m)mASE↓mAOE↓
    小轿车79.70.3000.0960.043
    货车64.20.5150.0890.049
    公交车57.60.4930.0700.050
    行人54.70.5360.0931.011
    下载: 导出CSV

    表  3  Air-Co-Pred数据集对比实验结果

    方法BEV网格大小mAP↑(%)mATE↓(m)mASE↓mAOE↓Cost↓
    BEVDet[10]128×12863.90.4320.2040.1084.712
    BEVDet4D[11]61.70.4310.2240.1394.712
    BEVLongTerm[10]69.70.3300.1860.1294.712
    Where2comm[25]72.10.2950.1860.0754.712
    UCDNet[22]73.40.3230.1820.0514.712
    本文方法80.60.3270.0940.0452.869
    下载: 导出CSV

    表  4  AeroCollab3D数据集细粒度基线对比实验结果

    方法BEV网格大小car_AP(%)truck_AP(%)bus_AP(%)pedestrian_AP(%)mAP↑(%)交互传输比Cost↓
    基线模型50×5071.457.455.036.455.00.06252.000
    200×20074.663.356.245.558.51.00006.000
    +DSBG72.762.354.835.356.20.17753.317
    +DSBG+ACFA79.764.257.654.764.00.17873.505
    下载: 导出CSV
  • [1] ZONG Zhuofan, JIANG Dongzhi, SONG Guanglu, et al. Temporal enhanced training of multi-view 3D object detector via historical object prediction[C]. Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision, Paris, France, 2023: 3758–3767. doi: 10.1109/ICCV51070.2023.00350.
    [2] 何江, 喻莞芯, 黄浩, 等. 多无人机分布式感知任务分配-通信基站关联与飞行策略联合优化设计[J]. 电子与信息学报, 2025, 47(5): 1402–1417. doi: 10.11999/JEIT240738.

    HE Jiang, YU Wanxin, HUANG Hao, et al. Joint task allocation, communication base station association and flight strategy optimization design for distributed sensing unmanned aerial vehicles[J]. Journal of Electronics & Information Technology, 2025, 47(5): 1402–1417. doi: 10.11999/JEIT240738.
    [3] YANG Dingkang, YANG Kun, WANG Yuzheng, et al. How2comm: Communication-efficient and collaboration-pragmatic multi-agent perception[C]. Proceedings of the 37th International Conference on Neural Information Processing Systems, New Orleans, USA, 2023: 1093.
    [4] HU Senkang, FANG Zhengru, DENG Yiqin, et al. Collaborative perception for connected and autonomous driving: Challenges, possible solutions and opportunities[J]. IEEE Wireless Communications, 2025, 32(5): 228–234. doi: 10.1109/MWC.002.2400348.
    [5] LI Xueping, TUPAYACHI J, SHARMIN A, et al. Drone-aided delivery methods, challenge, and the future: A methodological review[J]. Drones, 2023, 7(3): 191. doi: 10.3390/drones7030191.
    [6] LI Zhenxin, LAN Shiyi, ALVAREZ J M, et al. BEVNeXt: Reviving dense BEV frameworks for 3D object detection[C]. Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2024: 20113–20123. doi: 10.1109/CVPR52733.2024.01901.
    [7] WANG Xiaoming, CHEN Hao, CHU Xiangxiang, et al. AODet: Aerial object detection using transformers for foreground regions[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 4106711. doi: 10.1109/TGRS.2024.3407815.
    [8] WANG Yuchao, WANG Zhirui, CHENG Peirui, et al. AVCPNet: An AAV-vehicle collaborative perception network for 3-D object detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2025, 63: 5615916. doi: 10.1109/TGRS.2025.3546669.
    [9] PHILION J and FIDLER S. Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3D[C]. Proceedings of the 16th European Conference on Computer Vision, Glasgow, UK, 2020: 194–210. doi: 10.1007/978-3-030-58568-6_12.
    [10] HUANG Junjie, HUANG Guan, ZHU Zheng, et al. BEVDet: High-performance multi-camera 3D object detection in bird-eye-view[EB/OL]. https://arxiv.org/abs/2112.11790, 2021.
    [11] HUANG Junjie and HUANG Guan. BEVDet4D: Exploit temporal cues in multi-camera 3D object detection[EB/OL]. https://arxiv.org/abs/2203.17054, 2022.
    [12] LI Yinhao, GE Zheng, YU Guanyi, et al. BEVDepth: Acquisition of reliable depth for multi-view 3D object detection[C]. Proceedings of the 37th AAAI Conference on Artificial Intelligence, Washington, USA, 2023: 1477–1485. doi: 10.1609/aaai.v37i2.25233.
    [13] WANG Yue, GUIZILINI V C, ZHANG Tianyuan, et al. DETR3D: 3D object detection from multi-view images via 3D-to-2D queries[C]. Proceedings of the 5th Conference on Robot Learning, London, UK, 2022: 180–191.
    [14] LI Zhiqi, WANG Wenhai, LI Hongyang, et al. BEVFormer: Learning bird's-eye-view representation from LiDAR-camera via spatiotemporal transformers[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025, 47(3): 2020–2036. doi: 10.1109/TPAMI.2024.3515454.
    [15] ZHU Pengfei, ZHENG Jiayu, DU Dawei, et al. Multi-drone-based single object tracking with agent sharing network[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31(10): 4058–4070. doi: 10.1109/TCSVT.2020.3045747.
    [16] CAO Yaru, HE Zhijian, WANG Lujia, et al. VisDrone-DET2021: The vision meets drone object detection challenge results[C]. Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision Workshops, Montreal, Canada, 2021: 2847–2854. doi: 10.1109/ICCVW54120.2021.00319.
    [17] 姚婷婷, 肇恒鑫, 冯子豪, 等. 上下文感知多感受野融合网络的定向遥感目标检测[J]. 电子与信息学报, 2025, 47(1): 233–243. doi: 10.11999/JEIT240560.

    YAO Tingting, ZHAO Hengxin, FENG Zihao, et al. A context-aware multiple receptive field fusion network for oriented object detection in remote sensing images[J]. Journal of Electronics & Information Technology, 2025, 47(1): 233–243. doi: 10.11999/JEIT240560.
    [18] ZHU Xizhou, SU Weijie, LU Lewei, et al. Deformable DETR: Deformable transformers for end-to-end object detection[C]. 9th International Conference on Learning Representations, Vienna, Austria, 2021. (查阅网上资料, 未找到本条文献页码, 请确认).
    [19] KINGMA D P and BA J. Adam: A method for stochastic optimization[C]. 3rd International Conference on Learning Representations, San Diego, USA, 2015. (查阅网上资料, 未找到本条文献页码, 请确认).
    [20] WANG Zhechao, CHENG Peirui, CHEN Mingxin, et al. Drones help drones: A collaborative framework for multi-drone object trajectory prediction and beyond[C]. Proceedings of the 38th International Conference on Neural Information Processing Systems, Vancouver, Canada, 2024: 2061.
    [21] CHEN Mingxin, WANG Zhirui, WANG Zhechao, et al. C2F-Net: Coarse-to-fine multidrone collaborative perception network for object trajectory prediction[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2025, 18: 6314–6328. doi: 10.1109/JSTARS.2025.3541249.
    [22] TIAN Pengju, WANG Zhirui, CHENG Peirui, et al. UCDNet: Multi-UAV collaborative 3-D object detection network by reliable feature mapping[J]. IEEE Transactions on Geoscience and Remote Sensing, 2025, 63: 5602016. doi: 10.1109/TGRS.2024.3517594.
    [23] CAESAR H, BANKIT V, LANG A H, et al. nuScenes: A multimodal dataset for autonomous driving[C]. Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 11618–11628. doi: 10.1109/CVPR42600.2020.01164.
    [24] 梁燕, 杨会林, 邵凯. 自适应特征选择的车路协同3D目标检测方案[J]. 电子与信息学报, 2025, 47(12): 5214–5225. doi: 10.11999/JEIT250601.

    LIANG Yan, YANG Huilin, and SHAO Kai. A vehicle-infrastructure cooperative 3D object detection scheme based on adaptive feature selection[J]. Journal of Electronics & Information Technology, 2025, 47(12): 5214–5225. doi: 10.11999/JEIT250601.
    [25] HU Yue, FANG Shaoheng, LEI Zixing X, et al. Where2comm: Communication-efficient collaborative perception via spatial confidence maps[C]. Proceedings of the 36th International Conference on Neural Information Processing Systems, New Orleans, USA, 2022: 352. doi: 10.5555/3600270.3600622. (查阅网上资料,未找到本条文献doi,请确认).
  • 加载中
图(4) / 表(4)
计量
  • 文章访问数:  7
  • HTML全文浏览量:  4
  • PDF下载量:  0
  • 被引次数: 0
出版历程
  • 收稿日期:  2025-12-30
  • 修回日期:  2026-03-03
  • 录用日期:  2026-03-03
  • 网络出版日期:  2026-03-14

目录

    /

    返回文章
    返回