DGCN-MFW: A Lightweight Human Action Recognition Network for Millimeter-Wave Radar 3D Point Clouds

DING Xuanyu; JIN Biao; ZHANG Zhenkai

doi:10.11999/JEIT251087

Volume 48 Issue 4

Apr. 2026

Turn off MathJax

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2026 > 48(4): 1740-1750

DING Xuanyu, JIN Biao, ZHANG Zhenkai. DGCN-MFW: A Lightweight Human Action Recognition Network for Millimeter-Wave Radar 3D Point Clouds[J]. Journal of Electronics & Information Technology, 2026, 48(4): 1740-1750. doi: 10.11999/JEIT251087

Citation:

DING Xuanyu, JIN Biao, ZHANG Zhenkai. DGCN-MFW: A Lightweight Human Action Recognition Network for Millimeter-Wave Radar 3D Point Clouds[J]. Journal of Electronics & Information Technology, 2026, 48(4): 1740-1750. doi: 10.11999/JEIT251087

Citation:

PDF( 4318 KB)

DGCN-MFW: A Lightweight Human Action Recognition Network for Millimeter-Wave Radar 3D Point Clouds

doi: 10.11999/JEIT251087 cstr: 32379.14.JEIT251087

Ocean College, Jiangsu University of Science and Technology, Zhenjiang 212003, China

Funds: The National Natural Science Foundation of China (62571220), The Key Research and Development Project of Henan Province (241111212500), The Science and Technology Plan (Basic Research) Project of Zhenjiang City ( JC2025026), 2025 Jiangsu Provincial Postgraduate Practice & Innovation Program (SJCX25_2502)

Received Date: 2025-10-13
Accepted Date: 2026-03-03
Rev Recd Date: 2026-02-15

Available Online: 2026-03-15

Publish Date: 2026-04-10

Abstract

Abstract

Objective Millimeter-wave radar 3D point clouds provide important spatial cues for human action recognition. However, their inherent disorder complicates feature extraction, and actions rely on temporal correlations across multiple frames, which makes single-frame analysis prone to error. In this paper, a dynamic graph convolutional network is proposed for long 3D point-cloud sequences to improve recognition performance and efficiency through multi-scale feature fusion, adaptive frame weighting, and cross-attention. Methods A dynamic graph convolutional network solution, DGCN-MFW, is proposed with three core components: dynamic graph convolution feature extraction, multi-scale feature fusion, and adaptive temporal frame weighting. In Step 1, dynamic graph convolution is used to automatically construct spatial geometry through local directed neighborhood graphs, and the neighborhoods are updated online. This design avoids manual graph construction and improves feature robustness. In Step 2, multi-scale feature fusion is applied to jointly extract and integrate point-cloud features across spatial and temporal dimensions, thereby capturing local details and global semantics. In Step 3, adaptive frame weighting is introduced to learn the importance of each frame, emphasize discriminative key frames, and suppress noisy or unimportant frames. Cross-attention is further used to enable information exchange between the center frame and its context, compensating for the limitations of single-frame analysis caused by motion blur, occlusion, or pose ambiguity. Results and Discussions The proposed network extracts features through dynamic graph convolution, performs multi-scale feature fusion and adaptive frame weighting, and ultimately completes human action recognition. It achieves strong performance on the public TI and Vayyar millimeter-wave radar point-cloud datasets. With only 2.06M parameters and 4.51 GFLOPs, it outperforms existing methods (Tables 2, 3, and 4). Ablation experiments confirm that both core modules substantially improve recognition accuracy (Table 1). The confusion matrices indicate accuracy above 99% for most actions on the two datasets, demonstrating superior recognition performance (Figs. 10 and 11). However, its scalability, parameter efficiency, and processing efficiency for large-scale data still require improvement. Future work will therefore focus on further lightweight design and architectural optimization to improve efficiency. Conclusions To address the two main challenges in mmWave radar 3D point-cloud-based human action recognition, an action recognition algorithm based on a dynamic graph convolutional network and multi-feature fusion is proposed. A multi-scale feature fusion module and cross-scale interaction are used to extract local and global features, which improves spatial representation. An adaptive frame-weighting module and a cross-attention mechanism are adopted to capture the temporal evolution of actions. The method achieves accuracies of 98.32% and 99.48% on two datasets with 2.06M parameters and 4.51 GFLOPs, outperforming mainstream models. It provides a new solution for high-precision, low-resource mmWave radar action recognition and is suitable for real-time scenarios such as industrial human-machine interaction, intelligent security, and healthcare.
- Millimeter-wave radar 3D point cloud,
- Human action recognition,
- Graph convolutional network,
- Multi-scale feature fusion,
- Adaptive frame weighting

FullText(HTML)

References(26)

References

[1]	SALTI S, SCHREER O, and DI STEFANO L. Real-time 3d arm pose estimation from monocular video for enhanced HCI[C]. The 1st ACM Workshop on Vision Networks for Behavior Analysis, Vancouver, Canada, 2008: 1–8. doi: 10.1145/1461893.1461895.
[2]	韩宗旺, 杨涵, 吴世青, 等. 时空自适应图卷积与Transformer结合的动作识别网络[J]. 电子与信息学报, 2024, 46(6): 2587–2595. doi: 10.11999/JEIT230551. HAN Zongwang, YANG Han, WU Shiqing, et al. Action recognition network combining spatio-temporal adaptive graph convolution and Transformer[J]. Journal of Electronics & Information Technology, 2024, 46(6): 2587–2595. doi: 10.11999/JEIT230551.
[3]	ZHANG Yushu, JI Junhao, WEN Wenying, et al. Understanding visual privacy protection: A generalized framework with an instance on facial privacy[J]. IEEE Transactions on Information Forensics and Security, 2024, 19: 5046–5059. doi: 10.1109/TIFS.2024.3389572.
[4]	冯翔, 刘涛, 崔文卿, 等. 基于双视角时序特征融合的毫米波雷达手势数字识别研究[J]. 电子与信息学报, 2023, 45(6): 2134–2143. doi: 10.11999/JEIT220687. FENG Xiang, LIU Tao, CUI Wenqing, et al. Handwriting number recognition based on millimeter-wave radar with dual-view feature fusion network[J]. Journal of Electronics & Information Technology, 2023, 45(6): 2134–2143. doi: 10.11999/JEIT220687.
[5]	JIN Biao, MA Xiao, ZHANG Zhenkai, et al. Interference-robust millimeter-wave radar-based dynamic hand gesture recognition using 2-D CNN-transformer networks[J]. IEEE Internet of Things Journal, 2024, 11(2): 2741–2752. doi: 10.1109/JIOT.2023.3293092.
[6]	JIN Biao, PENG Yu, KUANG Xiaofei, et al. Robust dynamic hand gesture recognition based on millimeter wave radar using atten-TsNN[J]. IEEE Sensors Journal, 2022, 22(11): 10861–10869. doi: 10.1109/JSEN.2022.3170311.
[7]	丁传威, 刘芷麟, 张力, 等. 基于MIMO雷达成像图序列的切向人体姿态识别方法[J]. 雷达学报(中英文), 2025, 14(1): 151–167. doi: 10.12000/JR24116. DING Chuanwei, LIU Zhilin, ZHANG Li, et al. Tangential human posture recognition with sequential images based on MIMO radar[J]. Journal of Radars, 2025, 14(1): 151–167. doi: 10.12000/JR24116.
[8]	杜兰, 李逸明, 薛世鲲, 等. 结合相似度预测和阈值自动求解的开集条件下毫米波雷达点云步态识别方法[J]. 电子与信息学报, 2025, 47(6): 1850–1863. doi: 10.11999/JEIT241034. DU Lan, LI Yiming, XUE Shikun, et al. Millimeter-wave radar point cloud gait recognition method under open-set conditions based on similarity prediction and automatic threshold estimation[J]. Journal of Electronics & Information Technology, 2025, 47(6): 1850–1863. doi: 10.11999/JEIT241034.
[9]	SINGH A D, SANDHA S S, GARCIA L, et al. RadHAR: Human activity recognition from point clouds generated through a millimeter-wave radar[C]. The 3rd ACM Workshop on Millimeter-wave Networks and Sensing Systems, Los Cabos, Mexico, 2019: 51–56. doi: 10.1145/3349624.3356768.
[10]	YU Chengxi, XU Zhezhuang, YAN Kun, et al. Noninvasive human activity recognition using millimeter-wave radar[J]. IEEE Systems Journal, 2022, 16(2): 3036–3047. doi: 10.1109/JSYST.2022.3140546.
[11]	CHARLES R Q, SU Hao, KAICHUN M, et al. PointNet: Deep learning on point sets for 3D classification and segmentation[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, 2017: 77–85. doi: 10.1109/CVPR.2017.16.
[12]	QI C R, YI Li, SU Hao, et al. PointNet++: Deep hierarchical feature learning on point sets in a metric space[C]. The 31st International Conference on Neural Information Processing Systems, Long Beach, USA, 2017: 5105–5114.
[13]	LI Xing, HUANG Qian, WANG Zhijian, et al. SequentialPointNet: A strong parallelized point cloud sequence classification network for 3D action recognition[J]. arXiv preprint arXiv: 2111.08492, 2021. doi: 10.48550/arXiv.2111.08492.
[14]	FAN Hehe, YU Xin, DING Yuhang, et al. PSTNet: Point spatio-temporal convolution on point cloud sequences[C]. 9th International Conference on Learning Representations, Austria, 2021.
[15]	余翔, 贺登辉, 杨路. 基于STF-GNN毫米波雷达点云人体动作识别方法[J/OL]. 现代雷达, https://doi.org/10.16592/j.cnki.1004-7859.2025152, 2025. YU Xiang, HE Denghui, and YANG Lu. Human action recognition method based on STF-GNN for millimeter-wave radar point cloud[J/OL]. Modern Radar, https://doi.org/10.16592/j.cnki.1004-7859.2025152, 2025.
[16]	FENG Runyang, GAO Yixing, MA Xueqing, et al. Mutual information-based temporal difference learning for human pose estimation in video[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: 17131–17141. doi: 10.1109/CVPR52729.2023.01643.
[17]	PACE C D, DE NUNZIO A M, DE STEFANO C, et al. Poseidon: A ViT-based architecture for multi-frame pose estimation with adaptive frame weighting and multi-scale feature fusion[J]. arXiv preprint arXiv: 2501.08446, 2025. doi: 10.48550/arXiv.2501.08446.
[18]	LIU Zhenguang, FENG Runyang, CHEN Haoming, et al. Temporal feature alignment and mutual information maximization for video-based human pose estimation[C]. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, USA, 2022: 10996–11006. doi: 10.1109/CVPR52688.2022.01073.
[19]	PENG Hanchuan, LONG Fuhui, and DING C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(8): 1226–1238. doi: 10.1109/TPAMI.2005.159.
[20]	WU Zonghan, PAN Shirui, CHEN Fengwen, et al. A comprehensive survey on graph neural networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(1): 4–24. doi: 10.1109/TNNLS.2020.2978386.
[21]	WANG Yue, SUN Yongbin, LIU Ziwei, et al. Dynamic graph CNN for learning on point clouds[J]. ACM Transactions on Graphics (TOG), 2019, 38(5): 146. doi: 10.1145/3326362.
[22]	HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[C]. 13th European Conference on Computer Vision -- ECCV 2014, Zurich, Switzerland, 2014: 346–361. doi: 10.1007/978-3-319-10578-9_23.
[23]	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[C]. 9th International Conference on Learning Representations, Austria, 2021.
[24]	靳标, 孙康圣, 吴昊, 等. 基于毫米波雷达三维点云的人体动作识别数据集与方法[J]. 雷达学报(中英文), 2025, 14(1): 73–89. doi: 10.12000/JR24195. JIN Biao, SUN Kangsheng, WU Hao, et al. 3D point cloud from millimeter-wave radar for human action recognition: Dataset and method[J]. Journal of Radars, 2025, 14(1): 73–89. doi: 10.12000/JR24195.
[25]	GUO Menghao, CAI Junxiong, LIU, Zhengning, et al. PCT: Point cloud transformer[J]. Computational Visual Media, 2021, 7(2): 187–199. doi: 10.1007/s41095-021-0229-5.
[26]	FAN Hehe, YANG Yi, and KANKANHALLI M. Point 4D transformer networks for spatio-temporal modeling in point cloud videos[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, USA, 2021: 14199–14208. doi: 10.1109/CVPR46437.2021.01398.