基于多视角特征提取和双边缘对比学习的图像篡改检测算法

徐壮; 叶子奕; 潘恩康; 刘春晓

doi:10.11999/JEIT251271

基于多视角特征提取和双边缘对比学习的图像篡改检测算法

doi: 10.11999/JEIT251271 cstr: 32379.14.JEIT251271

徐壮¹,
叶子奕¹,
潘恩康¹,
刘春晓^{1, 2, ,}

1.
浙江工商大学计算机科学与技术学院杭州 310018
2.
浙江省大数据与未来电子商务技术重点实验室杭州 310018

基金项目: 国家自然科学基金(61976188)，浙江省自然科学基金(LY24F020004)，国家级大学生创新训练计划(202510353027)，浙江省大学生创新训练计划(S202510353076)

详细信息

作者简介:
徐壮：男，硕士生，研究方向为视觉安全与深度学习

叶子奕：女，研究方向为视觉安全与深度学习

潘恩康：男，研究方向为视觉安全与深度学习

刘春晓：男，副教授，研究方向为计算机视觉与计算机图形学、机器学习与智能系统、视觉安全与隐私保护

通讯作者:
刘春晓　cxliu@mail.zjgsu.edu.cn

中图分类号: TN915.08; TP393.08
计量
- 文章访问数: 291
- HTML全文浏览量: 206
- PDF下载量: 26
- 被引次数: 0
出版历程
- 收稿日期: 2025-12-01
- 修回日期: 2026-04-29
- 录用日期: 2026-05-12
- 网络出版日期: 2026-05-27
- 刊出日期: 2026-06-15

A Multi-view Feature Extraction and Dual-edge Contrastive Learning Approach for Image Forgery Detection

XU Zhuang¹,
YE Ziyi¹,
PAN Enkang¹,
LIU Chunxiao^{1, 2
, ,}

1.
School of Computer Science and Technology, Zhejiang Gongshang University, Hangzhou 310018, China
2.
Zhejiang Key Laboratory of Big Data and Future E-Commerce Technology, Hangzhou 310018, China

Funds: The National Natural Science Foundation of China (61976188), Zhejiang Provincial Natural Science Foundation of China (LY24F020004), The National College Students Innovation and Entrepreneurship Training Program (202510353027), Zhejiang Provincial College Students Innovation and Entrepreneurship Training Program (S202510353076)

摘要

摘要: 图像篡改检测技术在新闻审查和司法鉴定等领域具有重要应用价值。针对现有方法在逐像素分类问题定义下存在的标签冲突问题，以及篡改线索挖掘多集中于空间域而忽略其他视角特征的问题，该文提出一种基于图像内不一致性的图像篡改检测改进模型，及其基于多视角特征提取和双边缘对比学习的图像篡改检测算法。算法基于模型的思路实现。所提模型针对局限性，能够有效避免标签冲突问题，增强对于篡改线索的挖掘力度，提升了泛化能力，克服了现有方法存在的问题。实验结果表明，该文方法在置换F1分数(pF1)与置换交并比(pIoU)指标上相比现有主流方法平均提升了26.0%和10.1%。
- 图像篡改检测 /
- 对比学习 /
- 多视角特征提取 /
- 边缘细化
Abstract: Objective With the rapid development and wide use of image editing tools, such as Adobe Photoshop and Meitu, realistic forged images can now be created and disseminated with increasing ease. This trend poses challenges to visual content authentication in journalism, forensic analysis, and social security. Existing image forgery detection methods usually define the task as pixel-wise binary classification. This formulation may cause label conflicts, especially when the same object has different labels in different images. In addition, most methods mainly focus on spatial-domain features and make limited use of complementary information from other views, such as noise-domain clues. Methods To address these limitations, this paper proposes an image forgery detection algorithm based on multi-view feature extraction and dual-edge contrastive learning. The detection task is reformulated as intra-image inconsistency detection, which avoids label conflicts caused by conventional pixel-wise classification. To reduce semantic ambiguity near tampered boundaries, a dual-edge contrastive learning strategy is designed. Inner-edge and outer-edge features are extracted and contrasted separately, and non-edge tampered and non-tampered features are also contrasted. This strategy guides the model to focus on difficult edge samples and improves boundary detection accuracy. A dual-branch multi-view feature encoder is further developed to extract complementary forgery clues. The spatial-domain branch uses a High-Resolution Network (HRNet) backbone to extract multi-scale spatial features. A mixture-of-experts gating mechanism dynamically weights features across scales and fuses residuals between adjacent scales, which helps capture subtle forgery traces. The noise-domain branch extracts multiple noise-related features, including noise fingerprint features, Spatial Rich Model (SRM) filter responses, Bayar convolution features, max-pooling features, average-pooling residuals, and learnable Fourier-domain features with adaptive masking. A mixture-of-experts strategy is also used to dynamically assign weights to these heterogeneous features according to the characteristics of each input image. During training, the fused multi-view features are optimized using the dual-edge contrastive learning framework, which strengthens discrimination between tampered and non-tampered regions, particularly near their boundaries. During inference, K-means clustering is applied to the learned feature representations to locate tampered regions without explicit pixel labels. Results and Discussions Extensive experiments are conducted on widely used benchmark datasets, including NIST, Columbia, COVERAGE, DSO, and CASIA-v1. These datasets cover different forgery types, including splicing, copy-move, object removal, and post-processing. The proposed method consistently outperforms state-of-the-art methods. Compared with the best existing methods, it improves the average permuted F1 (pF1) and permuted Intersection over Union (pIoU) by 26.0% and 10.1%, respectively (Table 3). Visualization results show more accurate localization of tampered regions, especially along tampered boundaries, with fewer false positives and clearer edge delineation (Fig. 5). Ablation studies further verify the effectiveness of each key component, including multi-view feature extraction, the mixture-of-experts fusion mechanism for noise features, and the dual-edge contrastive learning strategy (Tables 4～6). Conclusions This paper presents an image forgery detection framework that addresses the limitations of conventional classification-based methods by modeling the task as intra-image inconsistency detection. Dual-edge contrastive learning reduces semantic ambiguity at tampered boundaries, and the multi-view feature encoder extracts complementary spatial-domain and noise-domain clues. Experimental results on different datasets show improved detection accuracy and boundary precision. Future work will explore the extension of the inconsistency detection paradigm to additional modalities, such as text, for multimodal forgery detection.
- Image forgery detection /
- Contrastive learning /
- Multi-view feature extraction /
- Edge refinement

HTML全文

图 1 逐像素分类问题定义方式下标签冲突示例

下载: 全尺寸图片幻灯片

图 2 两种问题定义方式示例

下载: 全尺寸图片幻灯片

图 3 算法整体框架

下载: 全尺寸图片幻灯片

图 4 双边缘掩码求取示例

下载: 全尺寸图片幻灯片

图 5 可视化展示结果

下载: 全尺寸图片幻灯片

表 1 训练数据集信息

数据集名称	图片数量	拼接	拷贝移动	后处理	物体移除
CASIA-v2^[27]	5105	√	√	√	√
SP-COCO^[29]	200k	√	-	√	-
CM-COCO^[29]	200k	-	√	√	-
CM-RAISE^[29]	200k	-	√	√	-
CM-C-RAISE^[29]	200k	-	√	√	-
IMD2020^[30]	2010	√	√	√	√

下载: 导出CSV

表 2 测试数据集信息

数据集名称	图片数量	拼接	拷贝移动	后处理	物体移除
NIST^[23]	564	√	√	√	√
Columbia^[24]	180	√	-	-	-
COVERAGE^[25]	100	-	√	√	-
DSO^[26]	95	√	-	√	-
CASIA-v1^[27]	920	√	√	√	√

下载: 导出CSV

表 3 算法的整体性能表现(%)

方法	NIST		Columbia		COVERAGE		DSO		CASIA-v1		平均指标
方法	pF1	pIoU	pF1	pIoU	pF1	pIoU	pF1	pIoU	pF1	pIoU	pF1	pIoU
MVSS-Net^[19](TPAMI 2023)	35.62	26.62	77.71	68.54	50.70	39.14	40.41	27.80	58.67	48.68	52.62	42.16
PSCC-Net^[6](TCSVT 2022)	40.34	31.42	88.20	82.11	45.94	34.50	41.58	28.64	57.72	47.69	54.76	44.87
CAT-Net^[17](WACV 2021)	43.12	35.54	95.48	93.18	51.94	44.15	30.46	20.59	81.52	75.24	60.50	53.74
TruFor^[21](CVPR 2023)	44.55	38.06	97.91	93.06	54.57	47.22	41.75	32.44	83.40	78.26	64.44	57.81
CoDE^[14](TIFS 2024)	42.03	33.90	88.12	84.41	46.44	36.21	40.74	30.00	72.33	63.74	57.93	49.65
SparseViT^[9](AAAI 2025)	43.11	35.52	97.47	95.81	58.32	51.26	39.75	29.92	83.08	77.54	64.35	58.01
FMAE^[22](AAAI 2025)	47.05	39.21	93.54	90.26	65.42	57.15	52.43	40.39	75.37	68.13	66.76	59.03
Mesorch^[8](AAAI 2025)	47.65	40.44	97.09	95.18	63.42	56.33	42.35	32.53	84.72	79.24	67.05	60.74
SFIRE^[12](AAAI 2025)	48.88	40.74	97.92	94.54	64.96	55.27	56.36	47.44	33.14	26.11	60.25	52.82
MPC^[13](TIFS 2025)	47.13	39.15	96.23	94.61	63.59	54.86	50.81	39.29	75.17	69.62	66.59	59.51
本文方法	74.56	47.04	97.74	95.50	86.28	68.56	77.92	54.56	85.85	68.75	84.47	66.88
注：表中粗体表示最优值。

下载: 导出CSV

表 4 主要模块的消融实验结果(%)

空间分支	噪声分支	双边缘对比学习策略	NIST	COVERAGE	DSO	平均指标
-	-	-	73.01	78.56	76.05	75.87
√	-	-	73.14	82.68	77.05	77.62
√	√	-	74.03	84.03	78.12	78.73
√	√	√	74.56	86.28	77.92	79.59
注：表中粗体表示最优值。

下载: 导出CSV

表 5 噪声提取分支的消融实验结果(%)

移除特定噪声分支	拼接	拷贝移动	移除	平均指标
Noiseprint++	70.06	73.59	69.55	71.07
SRM卷积	74.33	72.56	71.02	72.64
Bayar卷积	72.56	70.28	72.92	71.92
最大池化	75.15	72.88	70.34	72.79
平均池化残差	74.50	73.22	71.25	72.99
傅里叶变换	72.56	71.33	70.56	71.48
不移除	75.58	73.97	73.34	74.56

下载: 导出CSV

表 6 边缘宽度的消融实验结果(%)

边缘宽度	NIST	COVERAGE	DSO	平均指标
1	74.33	85.56	78.02	79.30
3	74.56	86.28	77.92	79.59
5	73.15	84.88	76.34	78.12
7	72.56	82.34	75.56	76.82
注：表中粗体表示最优值。

下载: 导出CSV

参考文献(30)

[1]	FARID H and LYU Siwei. Higher-order wavelet statistics and their application to digital forensics[C]. 2003 Conference on Computer Vision and Pattern Recognition Workshop, Madison, USA, 2003: 94. doi: 10.1109/CVPRW.2003.10093.
[2]	NIU Yakun, TONDI B, ZHAO Yao, et al. Image splicing detection, localization and attribution via JPEG primary quantization matrix estimation and clustering[J]. IEEE Transactions on Information Forensics and Security, 2021, 16: 5397–5412. doi: 10.1109/TIFS.2021.3129654.
[3]	PYATYKH S, HESSER J, and ZHENG Lei. Image noise level estimation by principal component analysis[J]. IEEE Transactions on Image Processing, 2013, 22(2): 687–699. doi: 10.1109/TIP.2012.2221728.
[4]	ZORAN D and WEISS Y. Scale invariance and noise in natural images[C]. The 2009 IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 2009: 2209–2216. doi: 10.1109/ICCV.2009.5459476.
[5]	侯志强, 董佳乐, 马素刚, 等. 基于多尺度特征增强与全局-局部特征聚合的视频目标分割算法[J]. 电子与信息学报, 2024, 46(11): 4198-4207. doi: 10.11999/JEIT231394. HOU Zhiqiang, DONG Jiale, MA Sugang, et al. Video Object Segmentation Algorithm Based on Multi-scale Feature Enhancement and Global-Local Feature Aggregation[J]. Journal of Electronics & Information Technology, 2024, 46(11): 4198-4207. doi: 10.11999/JEIT231394.
[6]	LIU Xiaohong, LIU Yaojie, CHEN Jun, et al. PSCC-Net: Progressive spatio-channel correlation network for image manipulation detection and localization[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(11): 7505–7517. doi: 10.1109/TCSVT.2022.3189545.
[7]	QU Chenfan, ZHONG Yiwu, LIU Chongyu, et al. Towards modern image manipulation localization: A large-scale dataset and novel methods[C]. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2024: 10781–10790. doi: 10.1109/CVPR52733.2024.01025.
[8]	ZHU Xuekang, MA Xiaochen, SU Lei, et al. Mesoscopic insights: Orchestrating multi-scale & hybrid architecture for image manipulation localization[C]. The 39th AAAI Conference on Artificial Intelligence, Philadelphia, USA, 2025: 11022–11030. doi: 10.1609/aaai.v39i10.33198.
[9]	SU Lei, MA Xiaochen, ZHU Xuekang, et al. Can we get rid of handcrafted feature extractors? SparseViT: Nonsemantics-centered, parameter-efficient image manipulation localization through spare-coding transformer[C]. The 39th AAAI Conference on Artificial Intelligence, Philadelphia, USA, 2025: 7024–7032. doi: 10.1609/aaai.v39i7.32754.
[10]	陈雷, 杨吉斌, 曹铁勇, 等. 一种基于Transformer特征金字塔的自蒸馏目标分割方法[J]. 电子与信息学报, 2025, 47(2): 551–560. doi: 10.11999/JEIT240735. CHEN Lei, YANG Jibin, CAO Tieyong, et al. A Self-distillation Object Segmentation Method Based on Transformer Feature Pyramid[J]. Journal of Electronics & Information Technology, 2025, 47(2): 551–560. doi: 10.11999/JEIT240735.
[11]	KONG Chenqi, LUO Anwei, WANG Shiqi, et al. Pixel-inconsistency modeling for image manipulation localization[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025, 47(6): 4455–4472. doi: 10.1109/TPAMI.2025.3541028.
[12]	KWON M J, LEE W, NAM S H, et al. SAFIRE: Segment any forged image region[C]. The 39th AAAI Conference on Artificial Intelligence, Philadelphia, USA, 2025: 4437–4445. doi: 10.1609/aaai.v39i4.32467.
[13]	LOU Zijie, CAO Gang, GUO Kun, et al. Exploring multi-view pixel contrast for general and robust image forgery localization[J]. IEEE Transactions on Information Forensics and Security, 2025, 20: 2329–2341. doi: 10.1109/TIFS.2025.3541957.
[14]	PENG Rongxuan, TAN Shunquan, MO Xianbo, et al. Employing reinforcement learning to construct a decision-making environment for image forgery localization[J]. IEEE Transactions on Information Forensics and Security, 2024, 19: 4820–4834. doi: 10.1109/TIFS.2024.3381470.
[15]	张博雅, 王勇. 频域感知与空间信息约束的SAR图像舰船目标实例分割方法[J]. 电子与信息学报, 2025, 47(12): 4813–4823. doi: 10.11999/JEIT250938. ZHANG Boya, WANG Yong. A Frequency-Aware and Spatially Constrained Network for Ship Instance Segmentation in SAR Images[J]. Journal of Electronics & Information Technology, 2025, 47(12): 4813–4823. doi: 10.11999/JEIT250938.
[16]	王开正, 曾瑶, 张占喜, 等. FCSNet: 频域感知的跨特征融合烟雾分割网络[J]. 电子与信息学报, 2025, 47(7): 2320–2333. doi: 10.11999/JEIT241021. WANG Kaizheng, ZENG Yao, ZHANG Zhanxi, et al. FCSNet: A Frequency-Domain Aware Cross-Feature Fusion Network for Smoke Segmentation[J]. Journal of Electronics & Information Technology, 2025, 47(7): 2320–2333. doi: 10.11999/JEIT241021.
[17]	KWON M J, YU I J, NAM S H, et al. CAT-Net: Compression artifact tracing network for detection and localization of image splicing[C]. 2021 IEEE Winter Conference on Applications of Computer Vision, Waikoloa, USA, 2021: 375–384. doi: 10.1109/WACV48630.2021.00042.
[18]	WU Yue, ABDALMAGEED W, and NATARAJAN P. ManTra-Net: Manipulation tracing network for detection and localization of image forgeries with anomalous features[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 9535–9544. doi: 10.1109/CVPR.2019.00977.
[19]	DONG Chengbo, CHEN Xinru, HU Ruohan, et al. MVSS-Net: Multi-view multi-scale supervised networks for image manipulation detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(3): 3539–3553. doi: 10.1109/TPAMI.2022.3180556.
[20]	COZZOLINO D and VERDOLIVA L. Noiseprint: A CNN-based camera model fingerprint[J]. IEEE Transactions on Information Forensics and Security, 2020, 15: 144–159. doi: 10.1109/TIFS.2019.2916364.
[21]	GUILLARO F, COZZOLINO D, SUD A, et al. TruFor: Leveraging all-round clues for trustworthy image forgery detection and localization[C]. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: 20606–20615. doi: 10.1109/CVPR52729.2023.01974.
[22]	ZHU Jiaying, LI Dong, FU Xueyang, et al. A lottery ticket hypothesis approach with sparse fine-tuning and MAE for image forgery detection and localization[C]. The 39th AAAI Conference on Artificial Intelligence, Philadelphia, USA, 2025: 10968–10976. doi: 10.1609/aaai.v39i10.33192.
[23]	GUAN Haiying, KOZAK M, ROBERTSON E, et al. MFC datasets: Large-scale benchmark datasets for media forensic challenge evaluation[C]. 2019 IEEE Winter Applications of Computer Vision Workshops, Waikoloa, USA, 2019: 63–72. doi: 10.1109/WACVW.2019.00018.
[24]	HSU Y F and CHANG S F. Detecting image splicing using geometry invariants and camera characteristics consistency[C]. 2006 IEEE International Conference on Multimedia and Expo, Toronto, Canada, 2006: 549–552. doi: 10.1109/ICME.2006.262447.
[25]	WEN Bihan, ZHU Ye, SUBRAMANIAN R, et al. COVERAGE — a novel database for copy-move forgery detection[C]. 2016 IEEE International Conference on Image Processing, Phoenix, USA, 2016: 161–165. doi: 10.1109/ICIP.2016.7532339.
[26]	DE CARVALHO T J, RIESS C, ANGELOPOULOU E, et al. Exposing digital image forgeries by illumination color classification[J]. IEEE Transactions on Information Forensics and Security, 2013, 8(7): 1182–1194. doi: 10.1109/TIFS.2013.2265677.
[27]	DONG Jing, WANG Wei, and TAN Tieniu. CASIA image tampering detection evaluation database[C]. 2013 IEEE China Summit and International Conference on Signal and Information Processing, Beijing, China, 2013: 422–426. doi: 10.1109/ChinaSIP.2013.6625374.
[28]	WANG Jingdong, SUN Ke, CHENG Tianheng, et al. Deep high-resolution representation learning for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(10): 3349–3364. doi: 10.1109/TPAMI.2020.2983686.
[29]	KWON M J, NAM S H, YU I J, et al. Learning JPEG compression artifacts for image manipulation detection and localization[J]. International Journal of Computer Vision, 2022, 130(8): 1875–1895. doi: 10.1007/s11263-022-01617-5.
[30]	NOVOZÁMSKÝ A, MAHDIAN B, and SAIC S. IMD2020: A large-scale annotated dataset tailored for detecting manipulated images[C]. 2020 IEEE Winter Applications of Computer Vision Workshops, Snowmass, USA, 2020: 71–80. doi: 10.1109/WACVW50321.2020.9096940.