高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

多尺度时空群组建模与扩散生成的多模态行人轨迹预测

孔祥燕 高玉龙 王钢

孔祥燕, 高玉龙, 王钢. 多尺度时空群组建模与扩散生成的多模态行人轨迹预测[J]. 电子与信息学报. doi: 10.11999/JEIT250900
引用本文: 孔祥燕, 高玉龙, 王钢. 多尺度时空群组建模与扩散生成的多模态行人轨迹预测[J]. 电子与信息学报. doi: 10.11999/JEIT250900
KONG Xiangyan, GAO YuLong, WANG Gang. Multimodal Pedestrian Trajectory Prediction with Multi-Scale Spatio-Temporal Group Modeling and Diffusion[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250900
Citation: KONG Xiangyan, GAO YuLong, WANG Gang. Multimodal Pedestrian Trajectory Prediction with Multi-Scale Spatio-Temporal Group Modeling and Diffusion[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250900

多尺度时空群组建模与扩散生成的多模态行人轨迹预测

doi: 10.11999/JEIT250900 cstr: 32379.14.JEIT250900
详细信息
    作者简介:

    孔祥燕:女,博士生,研究方向为轨迹预测、车路协同

    高玉龙:男,教授/博导,博士,研究方向为深度学习

    王钢:男,教授/博导,博士,研究方向为数据通信、物理层网络编码、通信网理论与技术等

  • 中图分类号: TP39

Multimodal Pedestrian Trajectory Prediction with Multi-Scale Spatio-Temporal Group Modeling and Diffusion

  • 摘要: 针对行人轨迹预测中多模态特征捕捉不足及群体动态关系缺失的问题,该文提出了一种新颖的多模态行人轨迹预测框架——MSGD (Multi-Scale Spatio-Temporal Group Modeling and Diffusion)。首先利用多尺度时空特征,准确构建多尺度时空群体;其次,设计时空交互三元组编码机制,对个体—邻居—群体的时空关系进行联合建模,兼顾局部交互细节与全局动态结构,提升对群体行为演化的表征能力。最后利用扩散模型的逆过程在生成阶段逐步减少可行区域内的不确定性,最终生成多样、合理且逼真的目标轨迹。该文在3个公开数据集(ETH, UCY和NBA数据集)上对所提出的方法进行了广泛评估,并与当前最先进的方法进行了比较。实验结果表明,MSGD框架在预测性能方面取得了显著提升,具体表现为平均偏移误差(ADE)和最终偏移误差(FDE)指标的显著改善,展现了其在建模复杂行人行为方面的有效性。
  • 图  1  多尺度时空群组和扩散模型的架构图

    图  2  NBA数据集中轨迹的可视化

    注:绿色表示历史轨迹,红色表示真实未来轨迹,蓝色表示MSGD或GroupNet+CVAE预测的20条轨迹

    表  1  NBA数据集上minADE20和minFDE20的表现(m)。数值越低模型预测的越好。粗体表示最佳,斜体表示次优。

    1 s2 s3 s4 s
    ADEFDEADEFDEADEFDEADEFDE
    Social-LSTM[3]0.450.670.881.531.332.381.793.16
    Social-GAN[22]0.460.650.851.361.241.981.622.51
    Social-STGCNN[23]0.360.500.750.991.151.791.592.37
    STGAT[24]0.380.550.731.181.071.741.412.22
    NRI[25]0.450.640.841.441.242.181.622.84
    STAR[26]0.430.650.771.281.001.551.262.04
    PECNet[27]0.510.760.961.691.412.521.833.41
    NMMP[28]0.380.540.701.111.011.611.332.05
    CVAE[17]0.370.520.671.060.961.511.251.96
    GroupNet+CVAE[17]0.340.480.620.950.871.311.131.69
    MID[29]0.280.370.510.720.710.980.961.27
    MSGD0.230.320.470.720.701.030.941.33
    下载: 导出CSV

    表  2  在ETH-UCY数据集上minADE20和minFDE20的表现(m)。数值越低模型预测的越好。粗体表示最优,斜体表示次优。

    ETHHOTELUNIVZARA1ZARA2AVG
    ADEFDEADEFDEADEFDEADEFDEADEFDEADEFDE
    Social-LSTM[3]1.092.350.791.760.671.400.471.000.561.170.721.54
    Social-GAN[22]0.871.620.671.370.761.520.350.680.420.840.611.21
    Social-Attention[30]1.392.392.512.911.252.541.012.170.881.751.412.35
    SOPHIE[31]0.701.430.761.670.541.240.300.630.380.780.541.15
    STGAT[24]0.651.120.350.660.521.100.340.690.290.600.430.83
    NMMP[28]0.611.080.330.630.521.110.320.660.430.850.410.82
    STAR[26]0.360.650.170.360.310.620.260.550.220.460.260.53
    GroupNet+CVAE[17]0.460.730.150.250.260.490.210.390.170.330.250.44
    PCCSNET[32]0.280.540.110.190.290.600.210.440.150.340.210.42
    PCCSNET+KE loss[33]0.26Null0.13Null0.28Null0.20Null0.16Null0.21Null
    MID[29]0.390.660.130.220.220.450.170.300.130.270.210.38
    PPT[34]0.360.510.110.150.220.400.170.300.120.210.200.31
    MSGD0.370.580.110.210.200.390.160.290.120.260.190.35
    下载: 导出CSV

    表  3  在ETH数据集上预测轨迹与真实轨迹的速度差与方向相似度的标准差。

    速度差标准差方向相似度标准差
    Trajectory++[35]0.04530.8054
    MSGD0.04150.7012
    下载: 导出CSV

    表  4  NBA数据集中的消融实验结果(ADE/FDE)(m)。粗体字表示最佳结果。

    序号组件1 s2 s3 s4 s
    多尺度群组时空特征扩散模型ADEFDEADEFDEADEFDEADEFDE
    10.280.380.530.770.781.101.011.37
    20.310.430.580.900.821.261.091.65
    30.250.340.500.750.751.070.981.35
    40.230.320.470.720.701.030.941.33
    下载: 导出CSV

    表  5  NBA 数据集上不同分组规模下的性能( minADE20/minFDE20) (m)。数值越低,模型预测的越好。粗体表示最优,斜体表示次优。

    1 s 2 s 3 s 4 s
    ADE FDE ADE FDE ADE FDE ADE FDE
    2 0.230 0.327 0.472 0.736 0.710 1.049 0.947 1.360
    2,3 0.229 0.326 0.471 0.734 0.709 1.045 0.946 1.355
    2,3,5 0.229 0.326 0.471 0.735 0.709 1.045 0.945 1.352
    2,3,5,11 0.229 0.324 0.467 0.724 0.698 1.025 0.929 1.324
    下载: 导出CSV

    表  6  NBA数据集上不同分组下的minADE20和minFDE20(m)。数值越低模型预测的越好。粗体表示最优,斜体表示次优。

    1 s2 s3 s4 s
    ADEFDEADEFDEADEFDEADEFDE
    20.23020.32710.47240.73640.71031.04900.94711.3598
    30.23080.32820.47290.73680.71081.04980.94681.3585
    40.23030.32740.47220.73670.71141.04890.94931.3570
    50.23880.32530.47200.73520.70981.04670.94651.3533
    110.23900.32600.46970.72990.70371.03420.93691.3375
    下载: 导出CSV

    表  7  ETH上不同扩散步骤和训练轮数下 minADE20和minFDE20(m)。数值越低模型预测的越好。粗体表示最优,斜体表示次优。

    306090
    ADEFDEADEFDEADEFDE
    100.56550.92610.59931.01910.62921.1002
    200.43310.70960.46310.73470.43600.7086
    300.38560.54450.38390.55170.41910.6262
    400.42450.66460.35080.50970.38260.5866
    500.39090.57740.36240.52630.37610.5807
    600.37660.53950.38560.57230.37930.5570
    700.39310.58200.39170.60140.37190.6061
    800.36850.52440.38440.58590.38260.5755
    900.37210.52330.39700.60470.39010.6006
    1000.46540.79590.39910.63760.42140.6927
    下载: 导出CSV
  • [1] 李暾, 朱耀堃, 吴欣虹, 等. 基于卡口上下文和深度置信网络的车辆轨迹预测模型研究[J]. 电子与信息学报, 2021, 43(5): 1323–1330. doi: 10.11999/JEIT200137.

    LI Tun, ZHU Yaokun, WU Xinhong, et al. Vehicle trajectory prediction method based on intersection context and deep belief network[J]. Journal of Electronics & Information Technology, 2021, 43(5): 1323–1330. doi: 10.11999/JEIT200137.
    [2] THERESA W G, MADHIMITHRA R, and BHAVANA G. A hybrid RL-GNN approach for precise pedestrian trajectory prediction in autonomous navigation[C]. 8th International Conference on Trends in Electronics and Informatics, Tirunelveli, India, 2025: 1485–1490. doi: 10.1109/ICOEI65986.2025.11013272.
    [3] ALAHI A, GOEL K, RAMANATHAN V, et al. Social LSTM: Human trajectory prediction in crowded spaces[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 961–971. doi: 10.1109/CVPR.2016.110.
    [4] 余浩扬, 李艳生, 肖凌励, 等. 面向动态环境的巡检机器人轻量级语义视觉SLAM框架[J]. 电子与信息学报, 2025, 47(10): 3979–3992. doi: 10.11999/JEIT250301.

    YU Haoyang, LI Yansheng, XIAO Lingli, et al. A lightweight semantic visual simultaneous localization and mapping framework for inspection robots in dynamic environments[J]. Journal of Electronics & Information Technology, 2025, 47(10): 3979–3992. doi: 10.11999/JEIT250301.
    [5] WEI Xiaoge, LV Wei, SONG Weiguo, et al. Survey study and experimental investigation on the local behavior of pedestrian groups[J]. Complexity, 2015, 20(6): 87–97. doi: 10.1002/cplx.21633.
    [6] MOUSSAÏD M, PEROZO N, GARNIER S, et al. The walking behaviour of pedestrian social groups and its impact on crowd dynamics[J]. PLoS One, 2010, 5(4): e10047. doi: 10.1371/journal.pone.0010047.
    [7] 霍如, 吕科呈, 黄韬. 车联网中路径预测驱动的任务切分与计算资源分配方法[J]. 电子与信息学报, 2025, 47(10): 3658–3669. doi: 10.11999/JEIT250135.

    HUO Ru, LÜ Kecheng, and HUANG Tao. Task segmentation and computing resource allocation method driven by path prediction in internet of vehicles[J]. Journal of Electronics & Information Technology, 2025, 47(10): 3658–3669. doi: 10.11999/JEIT250135.
    [8] 毛琳, 解云娇, 杨大伟, 等. 行人轨迹预测条件端点局部目的地池化网络[J]. 电子与信息学报, 2022, 44(10): 3465–3475. doi: 10.11999/JEIT210716.

    MAO Lin, XIE Yunjiao, YANG Dawei, et al. Local destination pooling network for pedestrian trajectory prediction of condition endpoint[J]. Journal of Electronics & Information Technology, 2022, 44(10): 3465–3475. doi: 10.11999/JEIT210716.
    [9] LIANG Junwei, JIANG Lu, MURPHY K, et al. The garden of forking paths: Towards multi-future trajectory prediction[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 10505–10515. doi: 10.1109/CVPR42600.2020.01052.
    [10] 周传鑫, 简刚, 李凌书, 等. 融合兴趣点和联合损失函数的长时航迹预测模型[J]. 电子与信息学报, 2025, 47(8): 2841–2849. doi: 10.11999/JEIT250011.

    ZHOU Chuanxin, JIAN Gang, LI Lingshu, et al. Long-term trajectory prediction model based on points of interest and joint loss function[J]. Journal of Electronics & Information Technology, 2025, 47(8): 2841–2849. doi: 10.11999/JEIT250011.
    [11] HELBING D and MOLNÁR P. Social force model for pedestrian dynamics[J]. Physical Review E, 1995, 51(5): 4282–4286. doi: 10.1103/PhysRevE.51.4282.
    [12] SCARSELLI F, GORI M, TSOI A C, et al. The graph neural network model[J]. IEEE Transactions on Neural Networks, 2009, 20(1): 61–80. doi: 10.1109/TNN.2008.2005605.
    [13] WU Zonghan, PAN Shirui, CHEN Fengwen, et al. A comprehensive survey on graph neural networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(1): 4–24. doi: 10.1109/TNNLS.2020.2978386.
    [14] WANG Chenyue and WANG Dongyu. Advancing federated learning in IoV: GNN-based trajectory prediction and privacy protection[C]. 2025 IEEE Wireless Communications and Networking Conference, Milan, Italy, 2025: 1–6. doi: 10.1109/WCNC61545.2025.10978319.
    [15] BAE I, PARK J H, and JEON H G. Learning pedestrian group representations for multi-modal trajectory prediction[C]. 17th European Conference on Computer Vision, Tel Aviv, Israel, 2022: 270–289. doi: 10.1007/978-3-031-20047-2_16.
    [16] MOUSSAÏD M, PEROZO N, GARNIER S, et al. The walking behaviour of pedestrian social groups and its impact on crowd dynamics[J]. PLoS One, 2010, 5(4): e10047. doi: 10.1371/journal.pone.0010047.(查阅网上资料,本条文献与第6条文献重复,请确认).
    [17] XU Chenxin, LI Maosen, NI Zhenyang, et al. GroupNet: Multiscale hypergraph neural networks for trajectory prediction with relational reasoning[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 6488–6497. doi: 10.1109/CVPR52688.2022.00639.
    [18] ZHANG Yuzhen, SU Junning, GUO Hang, et al. S-CVAE: Stacked CVAE for trajectory prediction with incremental greedy region[J]. IEEE Transactions on Intelligent Transportation Systems, 2024, 25(12): 20351–20363. doi: 10.1109/TITS.2024.3465836.
    [19] YANG Jiayu, LEE J J, and ANTONIOU C. Trajectory prediction for multiple agents in dynamic environments: Factoring in traffic states and driving styles[J]. IEEE Transactions on Intelligent Transportation Systems, 2025, 26(11): 19281–19295. doi: 10.1109/TITS.2025.3595743.
    [20] WEI Chuheng, WU Guoyuan, BARTH M J, et al. KI-GAN: Knowledge-informed generative adversarial networks for enhanced multi-vehicle trajectory forecasting at signalized intersections[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, USA, 2024: 7115–7124. doi: 10.1109/CVPRW63382.2024.00706.
    [21] CHEN Yanbo, YU Huilong, and XI Junqiang. STS-GAN: Spatial-temporal attention guided social GAN for vehicle trajectory prediction[C]. 16th International Symposium on Advanced Vehicle Control, Milan, Italy, 2024: 164–170. doi: 10.1007/978-3-031-70392-8_24.
    [22] GUPTA A, JOHNSON J, FEI-FEI L, et al. Social GAN: Socially acceptable trajectories with generative adversarial networks[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 2255–2264. doi: 10.1109/CVPR.2018.00240.
    [23] MOHAMED A, QIAN Kun, ELHOSEINY M, et al. Social-STGCNN: A social spatio-temporal graph convolutional neural network for human trajectory prediction[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 14412–14420. doi: 10.1109/CVPR42600.2020.01443.
    [24] HUANG Yingfan, BI Huikun, LI Zhaoxin, et al. STGAT: Modeling spatial-temporal interactions for human trajectory prediction[C]. The IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), 2019: 6271–6280. doi: 10.1109/ICCV.2019.00637.
    [25] KIPF T N, FETAYA E, WANG K C, et al. Neural relational inference for interacting systems[C]. Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 2018: 2693–2702.
    [26] YU Cunjun, MA Xiao, REN Jiawei, et al. Spatio-temporal graph transformer networks for pedestrian trajectory prediction[C]. 16th European Conference on Computer Vision, Glasgow, UK, 2020: 507–523. doi: 10.1007/978-3-030-58610-2_30.
    [27] MANGALAM K, GIRASE H, AGARWAL S, et al. It is not the journey but the destination: Endpoint conditioned trajectory prediction[C]. 16th European Conference on Computer Vision, Glasgow, UK, 2020: 759–776. doi: 10.1007/978-3-030-58536-5_45.
    [28] HU Yue, CHEN Siheng, ZHANG Ya, et al. Collaborative motion prediction via neural motion message passing[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 6318–6327. doi: 10.1109/CVPR42600.2020.00635.
    [29] GU Tianpei, CHEN Guangyi, LI Junlong, et al. Stochastic trajectory prediction via motion indeterminacy diffusion[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 17092–17101. doi: 10.1109/CVPR52688.2022.01660.
    [30] SOHL-DICKSTEIN J, WEISS E A, MAHESWARANATHAN N, et al. Deep unsupervised learning using nonequilibrium thermodynamics[C]. Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 2015: 2256–2265.
    [31] SADEGHIAN A, KOSARAJU V, SADEGHIAN A, et al. SoPhie: An attentive GAN for predicting paths compliant to social and physical constraints[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 1349–1358. doi: 10.1109/CVPR.2019.00144.
    [32] SUN Jianhua, LI Yuxuan, FANG Haoshu, et al. Three steps to multimodal trajectory prediction: Modality clustering, classification and synthesis[C]. Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021: 13230–13239. doi: 10.1109/ICCV48922.2021.01300.
    [33] LIN Xiaotong, LIANG Tianming, LAI Jianhuang, et al. Progressive pretext task learning for human trajectory prediction[C]. 18th European Conference on Computer Vision, Milan, Italy, 2025: 197–214. doi: 10.1007/978-3-031-73404-5_12.
    [34] LI Linhui, LIN Xiaotong, HUANG Yejia, et al. Beyond minimum-of-N: Rethinking the evaluation and methods of pedestrian trajectory prediction[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34(12): 12880–12893. doi: 10.1109/TCSVT.2024.3439128.
    [35] SALZMANN T, IVANOVIC B, CHAKRAVARTY P, et al. Trajectron++: Dynamically-feasible trajectory forecasting with heterogeneous data[C]. 16th European Conference on Computer Vision, Glasgow, UK, 2020: 683–700. doi: 10.1007/978-3-030-58523-5_40.
  • 加载中
图(2) / 表(7)
计量
  • 文章访问数:  232
  • HTML全文浏览量:  170
  • PDF下载量:  10
  • 被引次数: 0
出版历程
  • 收稿日期:  2025-09-09
  • 修回日期:  2026-01-04
  • 录用日期:  2026-01-04
  • 网络出版日期:  2026-01-15

目录

    /

    返回文章
    返回