Kolmogorov-Arnold Nonlinear Enhancement Method for Aerial-Ground Person Re-Identification
-
摘要: 空地行人重识别旨在实现无人机视角与地面摄像视角下同一身份行人的跨平台匹配。由于空地视角差异显著且跨域分布偏移严重,现有方法中依赖线性特征变换的分类监督分支难以充分建模复杂非线性判别关系。针对上述问题,该文提出一种柯尔莫哥洛夫-阿诺德非线性增强模块,置于主干网络输出特征与线性分类层之间,用于替代传统的线性特征变换过程。该模块借鉴柯尔莫哥洛夫-阿诺德表示思想,通过可学习的样条函数对特征进行自适应非线性重构与增强,从而强化监督映射过程,促进更具判别性的空地跨视角表征学习。在CARGO和AG-ReID数据集上的实验结果表明,所提方法优于现有同类前沿方法。尤其在最具挑战性的空地跨视角检索协议下,两个数据集上的Rank-1分别达到58.75%和84.41%,较基线方法分别提升10.63%和1.99%,表明该方法具有较强的跨视角检索能力。
-
关键词:
- 低空技术 /
- 空地行人重识别 /
- 柯尔莫哥洛夫-阿诺德表示定理 /
- 非线性增强
Abstract:Objective Aerial-Ground Person Re-Identification (AGPReID) aims to match the same person across UAV and ground-camera views. Compared with conventional same-platform person re-identification, this task suffers from larger cross-view appearance variations and more severe cross-domain distribution shifts, under which identity-consistent cues are often weakened by severe viewpoint asymmetry and cross-domain appearance distortion. Existing methods mainly focus on feature extraction and cross-view representation alignment, while the classification supervision branch still relies heavily on linear feature transformation, which limits its ability to model complex nonlinear discriminative relationships in high-dimensional feature spaces. As a result, a stronger nonlinear supervision mapping is needed to better exploit high-order feature interactions and local discriminative variations. To address this issue, a Kolmogorov-Arnold Nonlinear Enhancement Module (KANEM) is proposed. KANEM replaces the conventional fully connected feature transformation between the backbone features and the linear classifier, and uses learnable nonlinear mappings to adaptively enhance features for more discriminative cross-view representation learning. Methods The backbone follows the View-decoupled Transformer (VDT), which introduces an additional view token and performs layer-wise view decoupling to separate view factors from identity features, reducing the representation bias between aerial and ground domains. Based on this framework, KANEM replaces the conventional fully connected feature transformation between the backbone features and the linear classifier, introducing adaptive nonlinear mappings for feature enhancement. Specifically, KANEM consists of a base activation branch and a spline branch, which are stacked into cascaded function-mapping layers. This design enables more flexible nonlinear modeling than conventional linear or MLP-based transformations, allowing the model to capture local nonlinear variations and complex correlations among feature dimensions. To improve discriminability and further separate identity and view information, the network is jointly optimized with identity classification loss, view classification loss, triplet loss, and orthogonality loss. In addition, KANEM is used only during training and removed during inference, introducing no extra inference cost. Results and Discussions Comprehensive evaluations are conducted on both the CARGO and AG-ReID datasets, and the results show that the proposed method consistently outperforms the baseline model and existing state-of-the-art methods. In particular, on the CARGO dataset, the proposed method achieves 70.19%/63.16%/51.34% in Rank-1/mAP/mINP under the overall “ALL” retrieval protocol and 58.75%/53.27%/41.11% under the most challenging cross-view “A↔G” retrieval protocol ( Table 1 ). On the AG-ReID dataset, it also achieves the best performance under both retrieval protocols, reaching 84.41%/76.21%/53.05% in Rank-1/mAP/mINP for “A→G” and 86.69%/77.99%/52.28% for “G→A” (Table 2 ). Furthermore, ablation studies on CARGO demonstrate the effectiveness of KANEM and show that it achieves better overall performance than conventional linear transformation and MLP-based alternatives, indicating that the proposed nonlinear enhancement strategy is more suitable for supervision mapping in AGPReID (Table 3 ,Table 4 ). In addition, integrating KANEM into other ReID tasks further demonstrates its potential generalizability across different ReID scenarios (Table 5 ). Parameter analysis shows that setting λ to 0.001 (Fig. 2a ) enables the model to better balance the category gap between view classification and identity classification. When G and P are set to 5 and 3, respectively, the model can effectively fit nonlinear variations in the feature space while preserving the smoothness and continuity of spline functions, thereby achieving effective nonlinear feature enhancement (Fig. 2b –d). In addition, the 2D t-SNE visualization shows that the enhanced features exhibit higher intra-class compactness and better inter-class separability (Fig. 3 ). The Top-5 retrieval comparisons further provide qualitative evidence that the proposed method improves ranking quality and retrieval robustness under all four retrieval protocols on the CARGO dataset by promoting correct matches to higher positions and returning more relevant samples within the top-ranked results (Fig. 4 ).Conclusions This paper presents KANEM for AGPReID. The proposed module is motivated by the large discrepancy between UAV and ground-camera views and the limited capacity of linear feature transformation in the classification branch to capture complex nonlinear discriminative relationships. By replacing the conventional fully connected feature transformation between the backbone output features and the linear classification layer, KANEM introduces a more flexible nonlinear supervision mechanism for cross-view representation learning. Through adaptive nonlinear enhancement, it is intended to better model complex feature interactions in high-dimensional spaces and strengthen the representation of cross-view consistency as well as fine-grained discriminative cues. Experimental results on the CARGO and AG-ReID datasets demonstrate the effectiveness of the proposed method, particularly in challenging scenarios with large view discrepancies. Future work will further optimize the nonlinear mapping mechanism of KANEM and explore its potential in more complex cross-view settings, with the aim of improving the discriminative capability and generalization performance of the model. -
表 1 在CARGO数据集下与其他方法的对比实验结果(%)
方法 来源 协议1:(ALL) 协议2:(G$ \leftrightarrow $G) 协议3:(A$ \leftrightarrow $A) 协议4:(A$ \leftrightarrow $G) Rank-1 mAP mINP Rank-1 mAP mINP Rank-1 mAP mINP Rank-1 mAP mINP PCB[11] TPAMI-21 44.23 38.15 26.14 72.32 61.92 45.72 57.50 42.34 22.50 21.25 21.02 14.22 SBS[12] ACM MM-23 50.32 43.09 29.76 73.21 62.99 48.24 67.50 49.73 29.32 31.25 29.00 18.71 BoT[13] CVPR-19 54.81 46.49 32.40 77.68 66.47 51.34 65.00 49.79 29.82 36.25 32.56 21.46 MGN[14] ACMM-18 54.49 46.58 33.55 82.14 69.31 53.60 65.00 48.86 27.42 32.50 30.44 21.53 APNet[15] TIP-21 58.97 50.24 35.76 77.68 66.83 51.85 67.50 54.57 37.35 44.37 39.35 26.76 VV[16] IJCNN-19 45.83 38.84 39.57 72.31 62.99 48.24 67.50 49.73 29.32 31.25 29.00 18.71 AGW[10] TPAMI-21 60.26 53.44 40.22 81.25 71.66 58.09 67.50 56.48 40.40 43.57 40.90 29.39 ViT[17] ICLR-21 61.54 53.54 39.62 82.14 71.34 57.55 80.00 64.47 47.97 43.13 40.11 28.20 IDA[18] 自动化学报-25 64.42 58.17 46.17 83.04 77.04 67.50 82.50 69.65 54.58 48.75 45.13 33.92 DTST[19] ICME-25 64.42 55.73 41.92 78.57 72.40 62.10 80.00 63.31 44.67 50.63 43.39 29.46 VIF[20] ICCV-25 65.71 57.46 44.12 83.93 74.19 62.30 82.50 66.98 51.44 51.25 44.55 31.20 VDT[6] CVPR-24 64.10 55.20 41.13 82.14 71.59 58.39 82.50 66.83 50.22 48.12 42.76 29.95 Ours −− 70.19 63.16 51.34 83.93 76.17 66.07 80.00 72.09 59.86 58.75 53.27 41.11 表 2 在AG-ReID数据集下与其他方法的对比实验结果(%)
方法 来源 协议1:(A→G) 协议2:(G→A) Rank-1 mAP mINP Rank-1 mAP mINP SBS[12] ACM MM-23 73.54 59.77 − 73.70 62.37 − BoT[13] CVPR-19 70.01 55.47 − 71.20 58.83 − VV[16] IJCNN-19 77.22 67.23 41.43 79.73 69.83 42.37 ViT[17] ICLR-21 81.28 72.38 − 82.64 73.35 − Explain[4] ICME-23 81.47 72.61 − 81.85 73.35 − DTST[19] ICME-25 83.48 74.51 49.86 84.72 76.05 50.04 SeCap[7] CVPR-25 83.91 75.14 50.31 85.78 76.96 50.52 VDT[6] CVPR-24 82.42 74.23 49.28 84.24 76.48 49.50 Ours −− 84.41 76.21 53.05 86.69 77.99 52.28 表 3 KANEM的不同的层数与维度在CARGO数据集下的实验结果(%)
维度与层数配置 协议1:(ALL) 协议2:(G$ \leftrightarrow $G) 协议3:(A$ \leftrightarrow $A) 协议4:(A$ \leftrightarrow $G) Rank-1 mAP mINP Rank-1 mAP mINP Rank-1 mAP mINP Rank-1 mAP mINP $ 768\rightarrow 768 $ 68.45 62.53 51.70 84.82 77.08 67.02 80.00 71.02 59.26 57.38 50.67 38.72 $ 768\rightarrow {N}_{\text{id}} $ 67.95 62.53 51.26 83.04 77.31 67.01 77.50 71.12 59.57 56.25 51.99 39.72 $ 768\rightarrow 768\rightarrow 768 $ 69.23 63.01 51.59 82.14 75.98 67.67 82.50 71.24 59.16 57.67 51.33 40.24 $ 768\rightarrow 1536\rightarrow 768 $ 68.91 60.93 48.26 82.14 76.07 67.12 82.50 72.93 59.79 56.68 49.85 36.57 $ 768\rightarrow 768\rightarrow {N}_{\text{id}} $ 67.95 61.80 49.88 81.25 75.43 65.96 77.50 70.54 59.06 56.88 52.63 40.55 $ 768\rightarrow 1536\rightarrow {N}_{\text{id}} $ 70.19 63.16 51.34 83.93 76.17 66.07 80.00 72.09 59.86 58.75 53.27 41.11 $ 768\rightarrow 1536\rightarrow 1536\rightarrow {N}_{\text{id}} $ 66.99 61.51 49.79 81.25 75.64 65.85 80.00 71.36 59.46 55.09 50.63 37.68 表 4 KANEM的有效性分析实验结果(%)
方法 协议1:(ALL) 协议2:(G$ \leftrightarrow $G) 协议3:(A$ \leftrightarrow $A) 协议4:(A$ \leftrightarrow $G) Rank-1 mAP mINP Rank-1 mAP mINP Rank-1 mAP mINP Rank-1 mAP mINP Baseline 64.10 55.20 41.13 82.14 71.59 58.39 82.50 66.83 50.22 48.12 42.76 29.95 Baseline+MLP(ReLU) 68.56 62.34 50.35 81.25 74.32 64.37 80.00 72.12 59.67 56.88 52.40 40.44 Baseline+MLP(GELU) 68.59 61.01 49.76 83.04 75.81 65.05 77.50 70.94 59.44 56.25 51.68 38.42 Baseline+KANEM 70.19 63.16 51.34 83.93 76.17 66.07 80.00 72.09 59.86 58.75 53.27 41.11 表 5 KANEM在不同框架和不同类型行人重识别任务上的通用性分析(%)
任务类型 模型名称 数据集 评估设置 Baseline +KANEM 变化量 Rank-1 mAP Rank-1 mAP Rank-1 mAP 遮挡行人重识别 PVPM[21] Occluded-REID 普通 66.8 59.5 67.9 60.2 +1.1 +0.7 空地行人重识别 SeCap[7] CARGO 总体 68.59 60.19 74.35 68.24 +5.76 +8.05 空中到地面 69.43 58.94 75.61 70.11 +6.18 +11.17 换衣行人重识别 CAL[22] CCVID 不换衣 82.6 81.3 87.2 84.3 +4.6 +3.0 换衣 81.7 79.6 86.1 83.0 +4.4 +3.45 可见光-红外行人重识别 DEEN[23] LLCM 红外到可见光 54.9 62.9 56.4 63.4 +1.5 +0.5 可见光到红外 62.5 65.8 68.3 65.2 +5.8 -0.6 -
[1] 张红颖, 樊世钰, 罗谦, 等. 结合视觉文本匹配和图嵌入的可见光-红外行人重识别[J]. 电子与信息学报, 2024, 46(9): 3662–3671. doi: 10.11999/JEIT240318.ZHANG Hongying, FAN Shiyu, LUO Qian, et al. Visible-infrared person re-identification combining visual-textual matching and graph embedding[J]. Journal of Electronics & Information Technology, 2024, 46(9): 3662–3671. doi: 10.11999/JEIT240318. [2] 庄建军, 王楠. T3FRNet: 一种融合三重感知细粒度重构的换衣行人重识别方法[J]. 电子与信息学报, 2026, 48(1): 370–381. doi: 10.11999/JEIT250476.ZHUANG Jianjun and WANG Nan. T3FRNet: A cloth-changing person re-identification via texture-aware transformer tuning fine-grained reconstruction method[J]. Journal of Electronics & Information Technology, 2026, 48(1): 370–381. doi: 10.11999/JEIT250476. [3] 周玉, 赵小锋, 汪一, 等. 关键细粒度信息指导的多尺度遮挡行人重识别[J]. 电子与信息学报, 2024, 46(6): 2578–2586. doi: 10.11999/JEIT230686.ZHOU Yu, ZHAO Xiaofeng, WANG Yi, et al. Multi-scale occluded person re-identification guided by key fine-grained information[J]. Journal of Electronics & Information Technology, 2024, 46(6): 2578–2586. doi: 10.11999/JEIT230686. [4] NGUYEN K, FOOKES C, SRIDHARAN S, et al. AG-ReID 2023: Aerial-ground person re-identification challenge results[C]. 2023 IEEE International Joint Conference on Biometrics (IJCB), Ljubljana, Slovenia, 2023: 1–10. doi: 10.1109/IJCB57857.2023.10448780. [5] NGUYEN H, NGUYEN K, SRIDHARAN S, et al. AG-ReID. v2: Bridging aerial and ground views for person re-identification[J]. IEEE Transactions on Information Forensics and Security, 2024, 19: 2896–2908. doi: 10.1109/TIFS.2024.3353078. [6] ZHANG Quan, WANG Lei, PATEL V M, et al. View-decoupled transformer for person re-identification under aerial-ground camera network[C]. Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2024: 22000–22009. doi: 10.1109/CVPR52733.2024.02077. [7] WANG Shining, WANG Yunlong, WU Ruiqi, et al. SeCap: Self-calibrating and adaptive prompts for cross-view person re-identification in aerial-ground networks[C]. Proceedings of the 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, USA, 2025: 22119–22128. doi: 10.1109/CVPR52734.2025.02060. [8] KOLMOGOROV A N. On the representation of continuous functions of several variables by superpositions of continuous functions of fewer variables[J]. American Mathematical Society Translations, 1961, 17(2): 369–373. [9] LIU Ziming, WANG Yixuan, VAIDYA S, et al. KAN: Kolmogorov-Arnold networks[C]. The Thirteenth International Conference on Learning Representations, Singapore, Singapore, 2025. [10] YE Mang, SHEN Jianbing, LIN Gaojie, et al. Deep learning for person re-identification: A survey and outlook[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(6): 2872–2893. doi: 10.1109/TPAMI.2021.3054775. [11] SUN Yifan, ZHENG Liang, LI Yali, et al. Learning part-based convolutional features for person re-identification[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(3): 902–917. doi: 10.1109/TPAMI.2019.2938523. [12] HE Lingxiao, LIAO Xingyu, LIU Wu, et al. FastReID: A Pytorch toolbox for general instance re-identification[C]. Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, Canada, 2023: 9664–9667. doi: 10.1145/3581783.3613460. [13] LUO Hao, GU Youzhi, LIAO Xingyu, et al. Bag of tricks and a strong baseline for deep person re-identification[C]. Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, USA, 2019: 1487–1495. doi: 10.1109/CVPRW.2019.00190. [14] WANG Guanshou, YUAN Yufeng, CHEN Xiong, et al. Learning discriminative features with multiple granularities for person re-identification[C]. Proceedings of the 26th ACM International Conference on Multimedia, Seoul, Republic of Korea, 2018: 274–282. doi: 10.1145/3240508.3240552. [15] CHEN Guangyi, GU Tianpei, LU Jiwen, et al. Person re-identification via attention pyramid[J]. IEEE Transactions on Image Processing, 2021, 30: 7663–7676. doi: 10.1109/TIP.2021.3107211. [16] KUMA R, WEILL E, AGHDASI F, et al. Vehicle re-identification: An efficient baseline using triplet embedding[C]. 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 2019: 1–9. doi: 10.1109/IJCNN.2019.8852059. [17] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[C]. 9th International Conference on Learning Representations, Vienna, Austria, 2021: 611–631. (查阅网上资料, 未找到本条文献出版城市、页码信息, 请确认). [18] 贝俊仁, 张权, 赖剑煌. 基于隐式解码对齐的空地行人重识别方法[J]. 自动化学报, 2025, 51(9): 1988–2000. doi: 10.16383/j.aas.c240705.BEI Junren, ZHANG Quan, and LAI Jianhuang. Implicit decoder alignment for aerial-ground person re-identification[J]. Acta Automatica Sinica, 2025, 51(9): 1988–2000. doi: 10.16383/j.aas.c240705. [19] WANG Yuhai and PISHGAR M. Dynamic token selective transformer for aerial-ground person re-identification[C]. 2025 IEEE International Conference on Multimedia and Expo, Nantes, France, 2025: 1–6. doi: 10.1109/ICME59968.2025.11210054. [20] KHALID W, LIU Bin, LI Xulin, et al. Bridging the sky and ground: Towards view-invariant feature learning for aerial-ground person re-identification[C]. Proceedings of the 2025 IEEE/CVF International Conference on Computer Vision, Honolulu, USA, 2025: 9749–9758. doi: 10.1109/ICCV51701.2025.00909. [21] GAO Shang, WANG Jingya, LU Huchuan, et al. Pose-guided visible part matching for occluded person ReID[C]. Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 11741–11749. doi: 10.1109/CVPR42600.2020.01176. [22] GU Xinqian, CHANG Hong, MA Bingpeng, et al. Clothes-changing person re-identification with RGB modality only[C]. Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 1050–1059. doi: 10.1109/CVPR52688.2022.00113. [23] ZHANG Yukang and WANG Hanzi. Diverse embedding expansion network and low-light cross-modality benchmark for visible-infrared person re-identification[C]. Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: 2153–2162. doi: 10.1109/CVPR52729.2023.00214. -
下载:
下载: