Advances in Deep Neural Network Based Image Compression: A Survey
-
摘要: 深度神经网络图像压缩方法凭借其强大的建模能力与端到端优化机制,在信号保真、人眼感知和机器分析等多个维度展现出超越传统编码方法的优势。该文系统地梳理了该领域的最新研究进展,从3个核心方向展开综述:在面向信号保真的压缩方面,介绍了经典的码率-失真优化模型,并深入探讨了有损压缩的关键组件,包括非线性变换、量化策略、熵编码机制,以及支持多码率输出的可变码率压缩技术。在面向人眼感知的压缩方面,重点分析了码率-失真-感知联合优化框架,并对比了基于生成对抗网络和扩散模型的感知驱动方法。在面向机器分析的压缩方面,阐述了码率-失真-失准协同建模范式,并结合语义保真优化目标与架构设计进行了系统归纳。最后,文章对现有研究成果进行了总结,并展望了未来仍需解决的技术挑战与发展方向。Abstract:
Significance With the continuous advancement of information technology, digital images are evolving toward ultra-high-definition formats characterized by increased resolution, dynamic range, color depth, sampling rates, and multi-viewpoint support. In parallel, the rapid development of artificial intelligence is reshaping both the generation and application paradigms of digital imagery. As visual big data converges with AI technologies, the volume and diversity of image data expand exponentially, creating unprecedented challenges for storage and transmission. As a core technology in digital image processing, image compression reduces storage costs and bandwidth requirements by eliminating internal information redundancy, thereby serving as a fundamental enabler for visual big data applications. However, traditional image compression standards increasingly struggle to meet rising industrial demands due to limited modeling capacity, inadequate perceptual adaptability, and poor compatibility with machine vision tasks. Deep Neural Network (DNN)-based image compression methods, leveraging powerful modeling capabilities, end-to-end optimization mechanisms, and compatibility with both human perception and machine understanding, are progressively exceeding conventional coding approaches. These methods demonstrate clear advantages and broad potential across diverse application domains, drawing growing attention from both academia and industry. Progress This paper systematically reviews recent advances in DNN-based image compression from three core perspectives: signal fidelity, human visual perception, and machine analysis. First, in signal fidelity–oriented compression, the rate–distortion optimization framework is introduced, with detailed discussion of key components in lossy image compression, including nonlinear transforms, quantization strategies, entropy coding mechanisms, and variable-rate techniques for multi-rate adaptation. The synergistic design of these modules underpins the architecture of modern DNN-based image compression systems. Second, in perceptual quality–driven compression, the principles of joint rate–distortion–perception optimization models are examined, together with a comparative analysis of two major perceptual paradigms: Generative Adversarial Network (GAN)-based models and diffusion model–based approaches. Both strategies employ perceptual loss functions or generative modeling techniques to markedly improve the visual quality of reconstructed images, aligning them more closely with the characteristics of the human visual system. Finally, in machine analysis–oriented compression, a co-optimization framework for rate–distortion–accuracy trade-offs is presented, with semantic fidelity as the primary objective. From the perspective of integrating image compression with downstream machine analysis architectures, this section analyzes how current methods preserve essential semantic information that supports tasks such as object detection and semantic segmentation during the compression process. Conclusions DNN-based image compression shows strong potential across signal fidelity, human visual perception, and machine analysis. Through end-to-end jointly optimized neural network architectures, these methods provide comprehensive modeling of the encoding process and outperform traditional approaches in compression efficiency. By leveraging the probabilistic modeling and image generation capabilities of DNNs, they can accurately estimate distributional differences between reconstructed and original images, quantify perceptual losses, and generate high-quality reconstructions that align with human visual perception. Furthermore, their compatibility with mainstream image analysis frameworks enables the extraction of semantic features and the design of collaborative optimization strategies, allowing efficient compression tailored to machine vision tasks. Prospects Despite significant progress in compression performance, perceptual quality, and task adaptability, DNN-based image compression still faces critical technical challenges and practical limitations. First, computational complexity remains high. Most high-performance models rely on deep and sophisticated architectures (e.g., attention mechanisms and Transformer models), which enhance modeling capability but also introduce substantial computational overhead and long inference latency. These limitations are particularly problematic for deployment on mobile and embedded devices. Second, robustness and generalization continue to be major concerns. DNN-based compression models are sensitive to input perturbations and vulnerable to adversarial attacks, which can lead to severe reconstruction distortions or even complete failure. Moreover, while they perform well on training data and similar distributions, their performance often degrades markedly under cross-domain scenarios. Third, the evaluation framework for perceptual- and machine vision–oriented compression remains immature. Although new evaluation dimensions have been introduced, no unified and objective benchmark exists. This gap is especially evident in machine analysis–oriented compression, where downstream tasks vary widely and rely on different visual models. Therefore, comparability across methods is limited and consistent evaluation metrics are lacking, constraining both research and practical adoption. Overall, DNN-based image compression is in transition from laboratory research to real-world deployment. Although it demonstrates clear advantages over traditional approaches, further advances are needed in efficiency, robustness, generalization, and standardized evaluation protocols. Future research should strengthen the synergy between theoretical exploration and engineering implementation to accelerate widespread adoption and continued progress in areas such as multimedia communication, edge computing, and intelligent image sensing systems. -
Key words:
- Image compression /
- Deep learning /
- Signal fidelity /
- Human visual perception /
- Machine analysis
-
表 1 面向人眼感知的图像压缩技术总结
压缩技术类别 优点 缺点 GAN 对抗训练机制学习真实数据分布,重构图像感知逼真,
解码速度快不显式建模概率分布,训练过程不稳定,易出现模式崩溃等问题 扩散模型 去噪过程显式建模真实数据分布,相比GAN训练更加稳定,重构图像感知质量高 需要多次迭代去噪步骤,解码
速度较慢,计算成本较高表 2 面向机器分析的图像压缩架构总结
压缩架构 优点 缺点 架构1 兼容现有图像压缩、机器分析方法 机器分析需要先重构图像,计算
开销相对较高架构2 压缩域特征同时支持图像重构和机器分析,
避免先图像重构再分析图像重构与机器分析目标存在
差异,压缩域特征难以兼顾架构3 机器分析有独立的特征提取模块,分析性能较高,不依赖重构图像,
解码器可以去除 $ \hat{\boldsymbol{y}} $和$ \hat{\boldsymbol{z}} $的信息冗余图像重构需要进行图像编码和特征提取,
并在解码端融合 $ \hat{\boldsymbol{y}} $和$ \hat{\boldsymbol{z}} $,
计算复杂度较高架构4 压缩域特征支持图像重构和机器分析,机器分析仅需传输部分特征,
不需要独立的特征提取模块需要设计复杂的压缩域特征拆分和筛选机制,训练难度较大 -
[1] BLAU Y and MICHAELI T. The perception-distortion tradeoff[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 6228–6237. doi: 10.1109/CVPR.2018.00652. [2] BLAU Y and MICHAELI T. Rethinking lossy compression: The rate-distortion-perception tradeoff[C]. The 36th International Conference on Machine Learning, Long Beach, USA, 2019: 675–685. [3] YANG Wenhan, HUANG Haofeng, HU Yueyu, et al. Video coding for machines: Compact visual representation compression for intelligent collaborative analytics[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(7): 5174–5191. doi: 10.1109/TPAMI.2024.3367293. [4] 高文, 田永鸿, 王坚. 数字视网膜: 智慧城市系统演进的关键环节[J]. 中国科学: 信息科学, 2018, 48(8): 1076–1082. doi: 10.1360/N112018-00025.GAO Wen, TIAN Yonghong, and WANG Jian. Digital retina: Revolutionizing camera systems for the smart city[J]. SCIENTIA SINICA Informationis, 2018, 48(8): 1076–1082. doi: 10.1360/N112018-00025. [5] GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]. The 28th International Conference on Neural Information Processing Systems, Montreal, Canada, 2014: 2672–2680. [6] HO J, JAIN A, and ABBEEL P. Denoising diffusion probabilistic models[C]. The 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, 2020: 574. [7] BALLÉ J, LAPARRA V, and SIMONCELLI E P. End-to-end optimized image compression[C]. The 5th International Conference on Learning Representations, Toulon, France, 2017. [8] THEIS L, SHI Wenzhe, CUNNINGHAM A, et al. Lossy image compression with compressive autoencoders[C]. The 5th International Conference on Learning Representations, Toulon, France, 2017. [9] WITTEN I H, NEAL R M, and CLEARY J G. Arithmetic coding for data compression[J]. Communications of the ACM, 1987, 30(6): 520–540. doi: 10.1145/214762.214771. [10] DUDA J. Asymmetric numeral systems[J]. arXiv: 0902.0271, 2009. doi: 10.48550/arXiv.0902.0271. (查阅网上资料,不确定文献类型及格式是否正确,请确认). [11] SHANNON C E. A mathematical theory of communication[J]. Bell System Technical Journal, 1948, 27(3): 379–423. doi: 10.1002/j.1538-7305.1948.tb01338.x. [12] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]. IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 770–778. doi: 10.1109/CVPR.2016.90. [13] CHENG Zhengxue, SUN Heming, TAKEUCHI M, et al. Learned image compression with discretized gaussian mixture likelihoods and attention modules[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 7936–7945. doi: 10.1109/CVPR42600.2020.00796. [14] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[C]. The 9th International Conference on Learning Representations, 2021. (查阅网上资料, 未找到本条文献出版地信息, 请确认). [15] ZHU Yinhao, YANG Yang, and COHEN T. Transformer-based transform coding[C]. The 10th International Conference on Learning Representations, 2022. (查阅网上资料, 未找到本条文献出版地信息, 请确认). [16] ZOU Renjie, SONG Chunfeng, and ZHANG Zhaoxiang. The devil is in the details: Window-based attention for image compression[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 17471–17480. doi: 10.1109/CVPR52688.2022.01697. [17] LIU Ze, LIN Yutong, CAO Yue, et al. Swin transformer: Hierarchical vision transformer using shifted windows[C]. IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021: 9992–10002. doi: 10.1109/ICCV48922.2021.00986. [18] LIU Jinming, SUN Heming, and KATTO J. Learned image compression with mixed transformer-CNN architectures[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: 14388–14397. doi: 10.1109/CVPR52729.2023.01383. [19] LI Han, LI Shaohui, DAI Wenrui, et al. Frequency-aware transformer for learned image compression[C]. The 12th International Conference on Learning Representations, Vienna, Austria, 2024. [20] ZENG Fanhu, TANG Hao, SHAO Yihua, et al. MambaIC: State space models for high-performance learned image compression[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2025: 18041–18050. doi: 10.1109/CVPR52734.2025.01681. [21] LI Mu, ZUO Wangmeng, GU Shuhang, et al. Learning content-weighted deep image compression[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(10): 3446–3461. doi: 10.1109/TPAMI.2020.2983926. [22] CUI Ze, WANG Jing, GAO Shangyin, et al. Asymmetric gained deep image compression with continuous rate adaptation[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 10527–10536. doi: 10.1109/CVPR46437.2021.01039. [23] GE Ziqing, MA Siwei, GAO Wen, et al. NLIC: Non-uniform quantization-based learned image compression[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34(10): 9647–9663. doi: 10.1109/TCSVT.2024.3401872. [24] AGUSTSSON E and THEIS L. Universally quantized neural compression[C]. The 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, 2020: 1037. [25] GUO Zongyu, ZHANG Zhizheng, FENG Runsen, et al. Soft then hard: Rethinking the quantization in neural image compression[C]. The 38th International Conference on Machine Learning, 2021: 3920–3929. (查阅网上资料, 未找到本条文献出版地信息, 请确认). [26] YANG Yibo, BAMLER R, and MANDT S. Improving inference for neural image compression[C]. The 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, 2020: 49. [27] MENTZER F, AGUSTSSON E, TSCHANNEN M, et al. Conditional probability models for deep image compression[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 4394–4402. doi: 10.1109/CVPR.2018.00462. [28] BALLÉ J, MINNEN D, SINGH S, et al. Variational image compression with a scale hyperprior[C]. The 6th International Conference on Learning Representations, Vancouver, Canada, 2018. [29] MINNEN D, BALLÉ J, and TODERICI G D. Joint Autoregressive and hierarchical priors for learned image compression[C]. The 32nd International Conference on Neural Information Processing Systems, Montréal, Canada, 2018: 10794–10803. [30] BROSS B, WANG Yekui, YE Yan, et al. Overview of the Versatile Video Coding (VVC) standard and its applications[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31(10): 3736–3764. doi: 10.1109/TCSVT.2021.3101953. [31] SALIMANS T, KARPATHY A, CHEN Xi, et al. PixelCNN++: Improving the PixelCNN with discretized logistic mixture likelihood and other modifications[C]. The 5th International Conference on Learning Representations, Toulon, France, 2017. [32] HE Dailan, ZHENG Yaoyan, SUN Baocheng, et al. Checkerboard context model for efficient learned image compression[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 14766–14775. doi: 10.1109/CVPR46437.2021.01453. [33] MINNEN D and SINGH S. Channel-wise autoregressive entropy models for learned image compression[C]. IEEE International Conference on Image Processing, Abu Dhabi, United Arab Emirates, 2020: 3339–3343. doi: 10.1109/ICIP40778.2020.9190935. [34] HE Dailan, YANG Ziming, PENG Weikun, et al. ELIC: Efficient learned image compression with unevenly grouped space-channel contextual adaptive coding[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 5708–5717. doi: 10.1109/CVPR52688.2022.00563. [35] MENTZER F, AGUSTSON E, and TSCHANNEN M. M2T: Masking transformers twice for faster decoding[C]. IEEE/CVF International Conference on Computer Vision, Paris, France, 2023: 5317–5326. doi: 10.1109/ICCV51070.2023.00492. [36] CHANG Huiwen, ZHANG Han, JIANG Lu, et al. MaskGIT: Masked generative image transformer[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 11305–11315. doi: 10.1109/CVPR52688.2022.01103. [37] QIAN Yichen, SUN Xiuyu, LIN Ming, et al. Entroformer: A transformer-based entropy model for learned image compression[C]. The 10th International Conference on Learning Representations, 2022. (查阅网上资料, 未找到本条文献出版地信息, 请确认). [38] KOYUNCU A B, GAO Han, BOEV A, et al. Contextformer: A transformer with spatio-channel attention for context modeling in learned image compression[C]. The 17th European Conference on Computer Vision, Tel Aviv, Israel, 2022: 447–463. doi: 10.1007/978-3-031-19800-7_26. [39] JIANG Wei, YANG Jiayu, ZHAI Yongqi, et al. MLIC++: Linear complexity multi-reference entropy modeling for learned image compression[J]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2025, 21(5): 142. doi: 10.1145/3719011. [40] LI Daxin, BAI Yuanchao, WANG Kai, et al. GroupedMixer: An entropy model with group-wise token-mixers for learned image compression[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34(10): 9606–9619. doi: 10.1109/TCSVT.2024.3395481. [41] HU Yueyu, YANG Wenhan, MA Zhan, et al. Learning end-to-end lossy image compression: A benchmark[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(8): 4194–4211. doi: 10.1109/TPAMI.2021.3065339. [42] DUAN Zhihao, LU Ming, MA J, et al. QARV: Quantization-aware ResNet VAE for lossy image compression[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(1): 436–450. doi: 10.1109/TPAMI.2023.3322904. [43] LU Jingbo, ZHANG Leheng, ZHOU Xingyu, et al. Learned image compression with dictionary-based entropy model[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2025: 12850–12859. doi: 10.1109/CVPR52734.2025.01199. [44] CHOI Y, EL-KHAMY M, and LEE J. Variable rate deep image compression with a conditional autoencoder[C]. IEEE/CVF International Conference on Computer Vision, Seoul, South Korea, 2019: 3146–3154. doi: 10.1109/ICCV.2019.00324. [45] YANG Fei, HERRANZ L, CHENG Yongmei, et al. Slimmable compressive autoencoders for practical neural image compression[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 4996–5005. doi: 10.1109/CVPR46437.2021.00496. [46] SONG M, CHOI J, and HAN B. Variable-rate deep image compression through spatially-adaptive feature transform[C]. IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021: 2360–2369. doi: 10.1109/ICCV48922.2021.00238. [47] GAO Chenjian, XU Tongda, HE Dailan, et al. Flexible neural image compression via code editing[C]. The 36th International Conference on Neural Information Processing Systems, New Orleans, USA, 2022: 885. [48] DIAZ Y P, GANSEKOELE A, and BHULAI S. Robustly overfitting latents for flexible neural image compression[C]. The 38th International Conference on Neural Information Processing Systems, Vancouver, Canada, 2024: 3388. [49] IEEE. IEEE 1857.11-2024 IEEE standard for neural network-based image coding[S]. IEEE, 2024. doi: 10.1109/IEEESTD.2024.10810316. (查阅网上资料,未找到本条文献出版地信息,请确认). [50] ALSHINA E, ASCENSO J, and EBRAHIMI T. JPEG AI: The first international standard for image coding based on an end-to-end learning-based approach[J]. IEEE MultiMedia, 2024, 31(4): 60–69. doi: 10.1109/MMUL.2024.3485255. [51] JIA Chuanmin, HANG Xinyu, WANG Shanshe, et al. FPX-NIC: An FPGA-accelerated 4K ultra-high-definition neural video coding system[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(9): 6385–6399. doi: 10.1109/TCSVT.2022.3164059. [52] SUN Heming, YI Qingyang, and FUJITA M. FPGA codec system of learned image compression with algorithm-architecture co-optimization[J]. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2024, 14(2): 334–347. doi: 10.1109/JETCAS.2024.3386328. [53] AGUSTSSON E, TSCHANNEN M, MENTZER F, et al. Generative adversarial networks for extreme learned image compression[C]. IEEE/CVF International Conference on Computer Vision, Seoul, South Korea, 2019: 221–231. doi: 10.1109/ICCV.2019.00031. [54] MENTZER F, TODERICI G, TSCHANNEN M, et al. High-fidelity generative image compression[C]. The 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, 2020: 999. [55] MUCKLEY M, EL-NOUBY A, ULLRICH K, et al. Improving statistical fidelity for neural image compression with implicit local likelihood models[C]. The 40th International Conference on Machine Learning, Honolulu, USA, 2023: 25426–25443. [56] KÖRBER N, KROMER E, SIEBERT A, et al. EGIC: Enhanced low-bit-rate generative image compression guided by semantic segmentation[C]. The 18th European Conference on Computer Vision, Milan, Italy, 2024: 202–220. doi: 10.1007/978-3-031-72761-0_12. [57] ZHANG G, QIAN Jingjing, CHEN Jun, et al. Universal rate-distortion-perception representations for lossy compression[C]. The 35th International Conference on Neural Information Processing Systems, 2021: 880. (查阅网上资料, 未找到本条文献出版地信息, 请确认). [58] YAN Zeyu, WEN Fei, and LIU Peilin. Optimally controllable perceptual lossy compression[C]. The 39th International Conference on Machine Learning, Baltimore, USA, 2022: 24911–24928. [59] AGUSTSSON E, MINNEN D, TODERICI G, et al. Multi-realism image compression with a conditional generator[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: 22324–22333. doi: 10.1109/CVPR52729.2023.02138. [60] HOOGEBOOM E, AGUSTSSON E, MENTZER F, et al. High-fidelity image compression with score-based generative models[J]. arXiv: 2305.18231, 2024. doi: 10.48550/arXiv.2305.18231. (查阅网上资料,不确定文献类型及格式是否正确,请确认). [61] GHOUSE N F, PETERSEN J, WIGGERS A, et al. A residual diffusion model for high perceptual quality codec augmentation[J]. arXiv: 2301.05489, 2023. doi: 10.48550/arXiv.2301.05489. (查阅网上资料,不确定文献类型及格式是否正确,请确认). [62] YANG Ruihan and MANDT S. Lossy image compression with conditional diffusion models[C]. The 37th International Conference on Neural Information Processing Systems, New Orleans, USA, 2023: 2835. [63] KHOSHKHAHTINAT A, ZAFARI A, MEHTA P M, et al. Laplacian-guided entropy model in neural codec with blur-dissipated synthesis[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2024: 3045–3054. doi: 10.1109/CVPR52733.2024.00294. [64] HOOGEBOOM E and SALIMANS T. Blurring diffusion models[C]. The 11th International Conference on Learning Representations, Kigali, Rwanda, 2023. [65] ROMBACH R, BLATTMANN A, LORENZ D, et al. High-resolution image synthesis with latent diffusion models[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 10674–10685. doi: 10.1109/CVPR52688.2022.01042. [66] RELIC L, AZEVEDO R, GROSS M, et al. Lossy image compression with foundation diffusion models[C]. The 18th European Conference on Computer Vision, Milan, Italy, 2024: 303–319. doi: 10.1007/978-3-031-73030-6_17. [67] CAREIL M, MUCKLEY M J, VERBEEK J, et al. Towards image compression with perfect realism at ultra-low bitrates[C]. The 12th International Conference on Learning Representations, Vienna, Austria, 2024. [68] LI Zhiyuan, ZHOU Yanhui, WEI Hao, et al. Toward extreme image compression with latent feature guidance and diffusion prior[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2025, 35(1): 888–899. doi: 10.1109/TCSVT.2024.3455576. [69] SHI Guangming, XIAO Yong, LI Yingyu, et al. From semantic communication to semantic-aware networking: Model, architecture, and open problems[J]. IEEE Communications Magazine, 2021, 59(8): 44–50. doi: 10.1109/MCOM.001.2001239. [70] 牛凯, 张平. 语义通信的数学理论[J]. 通信学报, 2024, 45(6): 7–59. doi: 10.11959/j.issn.1000-436x.2024111.NIU Kai and ZHANG Ping. A mathematical theory of semantic communication[J]. Journal of Communications, 2024, 45(6): 7–59. doi: 10.11959/j.issn.1000-436x.2024111. [71] VAN DEN OORD A, LI Yazhe, and VINYALS O. Representation learning with contrastive predictive coding[J]. arXiv: 1807.03748, 2018. doi: 10.48550/arXiv.1807.03748. (查阅网上资料,不确定文献类型及格式是否正确,请确认). [72] HE Kaiming, FAN Haoqi, WU Yuxin, et al. Momentum contrast for unsupervised visual representation learning[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 9726–9735. doi: 10.1109/CVPR42600.2020.00975. [73] HE Kaiming, CHEN Xinlei, XIE Saining, et al. Masked autoencoders are scalable vision learners[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 15979–15988. doi: 10.1109/CVPR52688.2022.01553. [74] WANG Shurun, WANG Zhao, WANG Shiqi, et al. Deep image compression toward machine vision: A unified optimization framework[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(6): 2979–2989. doi: 10.1109/TCSVT.2022.3230843. [75] CHEN Y, WENG Y, KAO C, et al. TransTIC: Transferring transformer-based image compression from human perception to machine perception[C]. IEEE/CVF International Conference on Computer Vision, Paris, France, 2023: 23240–23250. doi: 10.1109/ICCV51070.2023.02129. [76] JIA Menglin, TANG Luming, CHEN B, et al. Visual prompt tuning[C]. The 17th European Conference on Computer Vision, Tel Aviv, Israel, 2022: 709–727. doi: 10.1007/978-3-031-19827-4_41. [77] LI Han, LI Shaohui, DING Shuangrui, et al. Image compression for machine and human vision with spatial-frequency adaptation[C]. The 18th European Conference on Computer Vision, Milan, Italy, 2024: 382–399. doi: 10.1007/978-3-031-72983-6_22. [78] YANG Zhaohui, WANG Yunhe, XU Chang, et al. Discernible image compression[C]. The 28th ACM International Conference on Multimedia, Seattle, USA, 2020: 1561–1569. doi: 10.1145/3394171.3413968. [79] ZHANG Qi, WANG Shanshe, ZHANG Xinfeng, et al. Just recognizable distortion for machine vision oriented image and video coding[J]. International Journal of Computer Vision, 2021, 129(10): 2889–2906. doi: 10.1007/s11263-021-01505-4. [80] ZHANG Qi, WANG Shanshe, ZHANG Xinfeng, et al. Perceptual video coding for machines via satisfied machine ratio modeling[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(12): 7651–7668. doi: 10.1109/TPAMI.2024.3393633. [81] DUBOIS Y, BLOEM-REDDY B, ULLRICH K, et al. Lossy compression for lossless prediction[C]. The 35th International Conference on Neural Information Processing Systems, 2021: 1074. (查阅网上资料, 未找到本条文献出版地信息, 请确认). [82] FENG Ruoyu, JIN Xin, GUO Zongyu, et al. Image coding for machines with omnipotent feature learning[C]. The 17th European Conference on Computer Vision, Tel Aviv, Israel, 2022: 510–528. doi: 10.1007/978-3-031-19836-6_29. [83] TIAN Yuan, LU Guo, ZHAI Guangtao, et al. Non-semantics suppressed mask learning for unsupervised video semantic compression[C]. IEEE/CVF International Conference on Computer Vision, Paris, France, 2023: 13564–13576. doi: 10.1109/ICCV51070.2023.01252. [84] TIAN Yuan, LU Guo, and ZHAI Guangtao. Free-VSC: Free semantics from visual foundation models for unsupervised video semantic compression[C]. The 18th European Conference on Computer Vision, Milan, Italy, 2024: 163–183. doi: 10.1007/978-3-031-72967-6_10. [85] BAI Yuanchao, YANG Xu, LIU Xianming, et al. Towards end-to-end image compression and analysis with transformers[C]. The 36th AAAI Conference on Artificial Intelligence, 2022: 104–112. doi: 10.1609/aaai.v36i1.19884. (查阅网上资料,未找到本条文献出版地信息,请确认). [86] WANG Shurun, WANG Shiqi, YANG Wenhan, et al. Towards analysis-friendly face representation with scalable feature and texture compression[J]. IEEE Transactions on Multimedia, 2022, 24: 3169–3181. doi: 10.1109/TMM.2021.3094300. [87] LIU Jinming, FENG Ruoyu, QI Yunpeng, et al. Rate-distortion-cognition controllable versatile neural image compression[C]. The 18th European Conference on Computer Vision, Milan, Italy, 2024: 329–348. doi: 10.1007/978-3-031-72992-8_19. [88] CHOI H and BAJIĆ I V. Scalable image coding for humans and machines[J]. IEEE Transactions on Image Processing, 2022, 31: 2739–2754. doi: 10.1109/TIP.2022.3160602. [89] LIU Lei, HU Zhihao, CHEN Zhenghao, et al. ICMH-Net: Neural image compression towards both machine vision and human vision[C]. The 31st ACM International Conference on Multimedia, Ottawa, Canada, 2023: 8047–8056. doi: 10.1145/3581783.3612041. [90] YU Yi, WANG Yufei, YANG Wenhan, et al. Backdoor attacks against deep image compression via adaptive frequency trigger[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: 12250–12259. doi: 10.1109/CVPR52729.2023.01179. [91] CHEN Tong and MA Zhan. Toward robust neural image compression: Adversarial attack and model finetuning[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(12): 7842–7856. doi: 10.1109/TCSVT.2023.3276442. [92] DUAN Zhihao, LU Ming, YANG J, et al. Towards backward-compatible continual learning of image compression[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2024: 25564–25573. doi: 10.1109/CVPR52733.2024.02415. -