A Multi-View Feature Extraction and Dual-Edge Contrastive Learning Approach for Image Forgery Detection
-
摘要: 图像篡改检测技术在新闻审查和司法鉴定等领域具有重要应用价值。针对现有方法在逐像素分类问题定义下存在的标签冲突问题,以及篡改线索挖掘多集中于空间域而忽略其它视角特征的问题,本文提出了一种基于图像内不一致性的图像篡改检测改进模型,及其基于多视角特征提取和双边缘对比学习的图像篡改检测算法。算法基于模型的思路实现。提出的模型针对局限性,能够有效避免标签冲突问题,增强对于篡改线索的挖掘力度,提升了泛化能力,克服了现有方法存在的问题。实验结果表明,本文方法在pF1与pIoU指标上相比现有主流方法平均提升了26.0%和10.1%。Abstract:
Objective With the rapid development and widespread use of advanced image editing tools such as Adobe Photoshop and Meitu, the creation and dissemination of highly realistic forged images have become increasingly prevalent, posing significant challenges to the authenticity verification of visual content across various fields including journalism, forensic analysis, and social security. Conventional image forgery detection methods predominantly formulate the task as a pixel-wise binary classification problem, which often leads to label ambiguity and conflicts, especially around the edges of tampered regions. Additionally, most existing approaches primarily focus on spatial domain features, neglecting the rich complementary information available from other perspectives such as noise and frequency domains, which can be crucial for forgery detection. Methods To overcome these limitations, this paper proposes a novel image forgery detection algorithm based on multi-view feature extraction combined with a dual-edge contrastive learning framework. The core idea involves redefining the detection task as an intra-image inconsistency detection problem, thereby effectively avoiding the label conflict issues inherent in traditional pixel classification schemes. To address the semantic ambiguity and blurred boundaries at tampered edges, a dual-edge contrastive learning strategy is designed, which separately extracts and contrasts features from inner and outer edge regions as well as from non-edge tampered and non-tampered areas. This approach encourages the model to pay attention to challenging edge samples, thereby improving edge detection accuracy. Furthermore, the proposed method develops a dual-branch multi-view feature encoder to comprehensively capture diverse clues. The spatial domain branch employs a High-Resolution Network (HRNet) backbone to extract multi-scale spatial features, enhanced by a mixture-of-experts gating mechanism that dynamically weights features across scales and fuses residuals between adjacent scales, thus emphasizing subtle forgery traces. The noise domain branch extracts multiple noise-related features, including camera noise fingerprints, SRM filter responses, constrained Bayar convolution outputs, max pooling features, residuals from average pooling, and learnable Fourier domain features with adaptive masking. A mixture-of-experts strategy is also utilized to assign relevance weights to these heterogeneous features dynamically, according to each input image’s specific characteristics. During training, the fused multi-view features are subjected to the dual-edge contrastive learning framework, which employs a contrastive loss to enhance the discrimination between tampered and authentic regions, especially at their edges. At the inference stage, clustering algorithms such as K-means are applied to the learned feature representations to delineate tampered regions without relying on explicit pixel labels, thus providing a more flexible detection process. Results and Discussions Extensive experiments are conducted on multiple widely used benchmark datasets, including NIST16, Columbia, COVERAGE, DSO, and CASIA-v1, covering various forgery types such as splicing, copy-move, object removal, and post-processing. The proposed method consistently outperforms state-of-the-art approaches, achieving average permuted F1 and IoU score improvements of 26.0% and 10.1%, respectively, over the best existing methods ( Table 3 ). Visualization results demonstrate superior tampered region localization, especially along tampered edge areas, with reduced false positives and clearer edge delineation (Fig. 5 ). Ablation studies further confirm the effectiveness of each key component, including multi-view feature extraction, the mixture of noise experts fusion mechanism, and the dual-edge contrastive learning strategy (Table 4 ,5 ,6 ).Conclusions This paper presents a novel image forgery detection framework that addresses the limitations of conventional classification-based methods by modeling the task as intra-image inconsistency detection. The introduction of dual-edge contrastive learning effectively mitigates semantic ambiguity at tampered edges, while the multi-view feature extraction encoder comprehensively captures spatial and noise domain clues. Experimental results across diverse datasets demonstrate significant improvements in detection accuracy and edge precision. Future work will explore extending the inconsistency detection paradigm to incorporate additional modalities such as text, enabling multimodal forgery detection. -
表 1 训练数据集信息
表 2 测试数据集信息
表 3 算法的整体性能表现(%)
方法 NIST Columbia COVERAGE DSO CASIA-v1 平均指标 pF1 pIoU pF1 pIoU pF1 pIoU pF1 pIoU pF1 pIoU pF1 pIoU MVSS-Net[19](ICCV 2021) 35.62 26.62 77.71 68.54 50.70 39.14 40.41 27.80 58.67 48.68 52.62 42.16 PSCC-Net[6](TCSVT 2022) 40.34 31.42 88.20 82.11 45.94 34.50 41.58 28.64 57.72 47.69 54.76 44.87 CAT-Net[17](CVPR 2022) 43.12 35.54 95.48 93.18 51.94 44.15 30.46 20.59 81.52 75.24 60.50 53.74 TruFor[21](CVPR 2023) 44.55 38.06 97.91 93.06 54.57 47.22 41.75 32.44 83.40 78.26 64.44 57.81 CoDE[14](TIFS 2024) 42.03 33.90 88.12 84.41 46.44 36.21 40.74 30.00 72.33 63.74 57.93 49.65 SparseViT[9](AAAI 2025) 43.11 35.52 97.47 95.81 58.32 51.26 39.75 29.92 83.08 77.54 64.35 58.01 FMAE[22](AAAI 2025) 47.05 39.21 93.54 90.26 65.42 57.15 52.43 40.39 75.37 68.13 66.76 59.03 Mesorch[8](AAAI 2025) 47.65 40.44 97.09 95.18 63.42 56.33 42.35 32.53 84.72 79.24 67.05 60.74 SFIRE[12](AAAI 2025) 48.88 40.74 97.92 94.54 64.96 55.27 56.36 47.44 33.14 26.11 60.25 52.82 MPC[13](TIFS 2025) 47.13 39.15 96.23 94.61 63.59 54.86 50.81 39.29 75.17 69.62 66.59 59.51 本文方法 74.56 47.04 97.74 95.50 86.28 68.56 77.92 54.56 85.85 68.75 84.47 66.88 注:表中粗体表示最优值,下划线表示次优值。 表 4 主要模块的消融实验结果(%)
空间分支 噪声分支 双边缘对比学习策略 NIST COVERAGE DSO 平均指标 - - - 73.01 78.56 76.05 75.87 √ - - 73.14 82.68 77.05 77.62 √ √ - 74.03 84.03 78.12 78.73 √ √ √ 74.56 86.28 77.92 79.59 注:表中粗体表示最优值。 表 5 噪声提取分支的消融实验结果(%)
移除特定噪声分支 拼接 拷贝移动 移除 平均指标 Noiseprint++ 70.06 73.59 69.55 71.07 SRM卷积 74.33 72.56 71.02 72.64 Bayar卷积 72.56 70.28 72.92 71.92 最大池化 75.15 72.88 70.34 72.79 平均池化残差 74.50 73.22 71.25 72.99 傅里叶变换 72.56 71.33 70.56 71.48 不移除 75.58 73.97 73.34 74.56 表 6 边缘宽度的消融实验结果(%)
边缘宽度 NIST COVERAGE DSO 平均指标 1 74.33 85.56 78.02 79.30 3 74.56 86.28 77.92 79.59 5 73.15 84.88 76.34 78.12 7 72.56 82.34 75.56 76.82 注:表中粗体表示最优值。 -
[1] FARID H and LYU Siwei. Higher-order wavelet statistics and their application to digital forensics[C]. Proceedings of 2003 Conference on Computer Vision and Pattern Recognition Workshop, Madison, USA, 2003: 94. doi: 10.1109/CVPRW.2003.10093. [2] NIU Yakun, TONDI B, ZHAO Yao, et al. Image splicing detection, localization and attribution via JPEG primary quantization matrix estimation and clustering[J]. IEEE Transactions on Information Forensics and Security, 2021, 16: 5397–5412. doi: 10.1109/TIFS.2021.3129654. [3] PYATYKH S, HESSER J, and ZHENG Lei. Image noise level estimation by principal component analysis[J]. IEEE Transactions on Image Processing, 2013, 22(2): 687–699. doi: 10.1109/TIP.2012.2221728. [4] ZORAN D and WEISS Y. Scale invariance and noise in natural images[C]. Proceedings of the 2009 IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 2009: 2209–2216. doi: 10.1109/ICCV.2009.5459476. [5] 毕秀丽, 魏杨, 肖斌, 等. 基于级联卷积神经网络的图像篡改检测算法[J]. 电子与信息学报, 2019, 41(12): 2987–2994. doi: 10.11999/JEIT190043.BI Xiuli, WEI Yang, XIAO Bin, et al. Image forgery detection algorithm based on cascaded convolutional neural network[J]. Journal of Electronics & Information Technology, 2019, 41(12): 2987–2994. doi: 10.11999/JEIT190043. [6] LIU Xiaohong, LIU Yaojie, CHEN Jun, et al. PSCC-Net: Progressive spatio-channel correlation network for image manipulation detection and localization[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(11): 7505–7517. doi: 10.1109/TCSVT.2022.3189545. [7] QU Chenfan, ZHONG Yiwu, LIU Chongyu, et al. Towards modern image manipulation localization: A large-scale dataset and novel methods[C]. Proceedings of 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2024: 10781–10790. doi: 10.1109/CVPR52733.2024.01025. [8] ZHU Xuekang, MA Xiaochen, SU Lei, et al. Mesoscopic insights: Orchestrating multi-scale & hybrid architecture for image manipulation localization[C]. Proceedings of the 39th AAAI Conference on Artificial Intelligence, Philadelphia, USA, 2025: 11022–11030. doi: 10.1609/aaai.v39i10.33198. [9] SU Lei, MA Xiaochen, ZHU Xuekang, et al. Can we get rid of handcrafted feature extractors? SparseViT: Nonsemantics-centered, parameter-efficient image manipulation localization through spare-coding transformer[C]. Proceedings of the 39th AAAI Conference on Artificial Intelligence, Philadelphia, USA, 2025: 7024–7032. doi: 10.1609/aaai.v39i7.32754. [10] 李树原, 严彩萍, 李红. 用于图像篡改检测的混合Transformer网络[J]. 计算机辅助设计与图形学学报, 2024, 36(12): 2010–2019. doi: 10.3724/SP.J.1089.2024.20099.LI Shuyuan, YAN Caiping, and LI Hong. A hybrid Transformer network for image splicing forgery detection[J]. Journal of Computer-Aided Design & Computer Graphics, 2024, 36(12): 2010–2019. doi: 10.3724/SP.J.1089.2024.20099. [11] KONG Chenqi, LUO Anwei, WANG Shiqi, et al. Pixel-inconsistency modeling for image manipulation localization[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025, 47(6): 4455–4472. doi: 10.1109/TPAMI.2025.3541028. [12] KWON M J, LEE W, NAM S H, et al. SAFIRE: Segment any forged image region[C]. Proceedings of the 39th AAAI Conference on Artificial Intelligence, Philadelphia, USA, 2025: 4437–4445. doi: 10.1609/aaai.v39i4.32467. [13] LOU Zijie, CAO Gang, GUO Kun, et al. Exploring multi-view pixel contrast for general and robust image forgery localization[J]. IEEE Transactions on Information Forensics and Security, 2025, 20: 2329–2341. doi: 10.1109/TIFS.2025.3541957. [14] PENG Rongxuan, TAN Shunquan, MO Xianbo, et al. Employing reinforcement learning to construct a decision-making environment for image forgery localization[J]. IEEE Transactions on Information Forensics and Security, 2024, 19: 4820–4834. doi: 10.1109/TIFS.2024.3381470. [15] 马杰, 钟斌斌, 焦亚男. 基于极坐标正弦变换的Copy-move篡改检测[J]. 电子与信息学报, 2020, 42(5): 1172–1178. doi: 10.11999/JEIT190481.MA Jie, ZHONG Binbin, and JIAO Yanan. Copy-move forgeries detection based on polar sine transform[J]. Journal of Electronics & Information Technology, 2020, 42(5): 1172–1178. doi: 10.11999/JEIT190481. [16] 王青, 张荣. 基于DCT系数双量化映射关系的图像盲取证算法[J]. 电子与信息学报, 2014, 36(9): 2068–2074. doi: 10.3724/SP.J.1146.2013.01488.WANG Qing and ZHANG Rong. Exposing digital image forgeries based on double quantization mapping relation of DCT coefficient[J]. Journal of Electronics & Information Technology, 2014, 36(9): 2068–2074. doi: 10.3724/SP.J.1146.2013.01488. [17] KWON M J, YU I J, NAM S H, et al. CAT-Net: Compression artifact tracing network for detection and localization of image splicing[C]. Proceedings of 2021 IEEE Winter Conference on Applications of Computer Vision, Waikoloa, USA, 2021: 375–384. doi: 10.1109/WACV48630.2021.00042. [18] WU Yue, ABDALMAGEED W, and NATARAJAN P. ManTra-Net: Manipulation tracing network for detection and localization of image forgeries with anomalous features[C]. Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 9535–9544. doi: 10.1109/CVPR.2019.00977. [19] DONG Chengbo, CHEN Xinru, HU Ruohan, et al. MVSS-Net: Multi-view multi-scale supervised networks for image manipulation detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(3): 3539–3553. doi: 10.1109/TPAMI.2022.3180556. [20] COZZOLINO D and VERDOLIVA L. Noiseprint: A CNN-based camera model fingerprint[J]. IEEE Transactions on Information Forensics and Security, 2020, 15: 144–159. doi: 10.1109/TIFS.2019.2916364. [21] GUILLARO F, COZZOLINO D, SUD A, et al. TruFor: Leveraging all-round clues for trustworthy image forgery detection and localization[C]. Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: 20606–20615. doi: 10.1109/CVPR52729.2023.01974. [22] ZHU Jiaying, LI Dong, FU Xueyang, et al. A lottery ticket hypothesis approach with sparse fine-tuning and MAE for image forgery detection and localization[C]. Proceedings of the 39th AAAI Conference on Artificial Intelligence, Philadelphia, USA, 2025: 10968–10976. doi: 10.1609/aaai.v39i10.33192. [23] GUAN Haiying, KOZAK M, ROBERTSON E, et al. MFC datasets: Large-scale benchmark datasets for media forensic challenge evaluation[C]. Proceedings of 2019 IEEE Winter Applications of Computer Vision Workshops, Waikoloa, USA, 2019: 63–72. doi: 10.1109/WACVW.2019.00018. [24] HSU Y F and CHANG S F. Detecting image splicing using geometry invariants and camera characteristics consistency[C]. Proceedings of 2006 IEEE International Conference on Multimedia and Expo, Toronto, Canada, 2006: 549–552. doi: 10.1109/ICME.2006.262447. [25] WEN Bihan, ZHU Ye, SUBRAMANIAN R, et al. COVERAGE — a novel database for copy-move forgery detection[C]. Proceedings of 2016 IEEE International Conference on Image Processing, Phoenix, USA, 2016: 161–165. doi: 10.1109/ICIP.2016.7532339. [26] DE CARVALHO T J, RIESS C, ANGELOPOULOU E, et al. Exposing digital image forgeries by illumination color classification[J]. IEEE Transactions on Information Forensics and Security, 2013, 8(7): 1182–1194. doi: 10.1109/TIFS.2013.2265677. [27] DONG Jing, WANG Wei, and TAN Tieniu. CASIA image tampering detection evaluation database[C]. Proceedings of 2013 IEEE China Summit and International Conference on Signal and Information Processing, Beijing, China, 2013: 422–426. doi: 10.1109/ChinaSIP.2013.6625374. [28] WANG Jingdong, SUN Ke, CHENG Tianheng, et al. Deep high-resolution representation learning for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(10): 3349–3364. doi: 10.1109/TPAMI.2020.2983686. [29] KWON M J, NAM S H, YU I J, et al. Learning JPEG compression artifacts for image manipulation detection and localization[J]. International Journal of Computer Vision, 2022, 130(8): 1875–1895. doi: 10.1007/s11263-022-01617-5. [30] NOVOZÁMSKÝ A, MAHDIAN B, and SAIC S. IMD2020: A large-scale annotated dataset tailored for detecting manipulated images[C]. Proceedings of 2020 IEEE Winter Applications of Computer Vision Workshops, Snowmass, USA, 2020: 71–80. doi: 10.1109/WACVW50321.2020.9096940. -
下载:
下载: