Global–local Co-embedding and Semantic Mask-driven Aging Approach
-
摘要: 人像年龄化要求在保留输入人像个体特征与身份信息的同时生成指定年龄人像。针对现有方法在嵌入阶段存在的特征解耦能力不足,头发、眼镜等非年龄化因素对皮肤纹理建模干扰产生伪影的问题,该文提出一种全局–局部协同嵌入与语义掩码驱动的年龄化方法(GLS-Age)。通过全局–局部协同嵌入策略对不同潜在空间分配差异化的学习任务,在保持人像全局一致性的同时,增强了对睫毛、皮肤纹理等局部细节的还原能力,显著改善了嵌入人像的感知质量;针对非年龄化因素对皮肤纹理建模的干扰,设计一种语义掩码驱动的非年龄化区域编辑模块,通过图像填充技术对输入人像进行重构去除非年龄化因素,从而避免在年龄化过程中引入伪影。为精确迁移输入人像中头发、眼镜等非年龄化要素,进一步构建可微生成器DsGAN实现迁移潜码与原始嵌入潜码的高效对齐,确保生成人像在语义与结构上的一致性。在CACD、CelebA等公开基准数据集上的实验结果表明,GLS-Age在确保年龄转化效果的同时显著提升了身份一致性。同时在Face++平台评估中,GLS-Age所生成人像在身份置信度和年龄预测分布等指标上均获得了优异的评分。Abstract:
Objective Facial age progression has become increasingly important in applications such as criminal investigation and digital identity authentication, making it a key research area in computer vision. However, existing mainstream facial age progression networks face two primary limitations. First, they tend to overemphasize the embedding of age-related features, often at the expense of preserving identity-consistent multi-scale attributes. Second, they fail to effectively eliminate interference from non-age-related elements such as hair and glasses, leading to suboptimal performance in complex scenarios. To address these challenges, this study proposes a global–local co-embedding and semantic mask-driven aging method. The global–local co-embedding strategy improves the accuracy of input portrait reconstruction while reducing computational cost during the embedding phase. In parallel, a semantic mask editing mechanism is introduced to remove non-age-related features—such as hair and eyewear—thereby enabling more accurate embedding of age-related characteristics. This dual strategy markedly enhances the model’s capacity to learn and represent age-specific attributes in facial imagery. Methods A Global–Local Collaborative Embedding (GLCE) strategy is proposed to achieve high-quality latent space mapping of facial images. Distinct learning objectives are assigned to separate latent subspaces, which enhances the representation of fine-grained facial features while preserving identity-specific information. Therefore, identity consistency is improved, and both training time and computational cost are reduced, increasing the efficiency of feature extraction. To address interference from non-age-related elements, a semantic mask-driven editing mechanism is employed. Semantic segmentation and image inpainting techniques are integrated to accurately remove regions such as hair and glasses that hinder precise age modeling. A differentiable generator, DsGAN, is introduced to align the transferred latent codes with the embedded identity-preserving codes. Through this alignment, the expression of age-related features is enhanced, and identity information is better retained during the age progression process. Results and Discussions Experimental results on benchmark datasets, including CCAD and CelebA, demonstrate that GLS-Age outperforms existing methods such as IPCGAN, CUSP, SAM, and LATS in identity confidence assessment. The age distributions of the generated portraits are also more closely aligned with those of the target age groups. Qualitative analysis further shows that, in cases with hair occlusion, GLS-Age produces more realistic wrinkle textures and enables more accurate embedding of age-related features compared with other methods. Simultaneously, it significantly improves the identity consistency of the synthesized facial images. Conclusions This study addresses core challenges in facial age progression, including identity preservation, inadequate detail modeling, and interference from non-age-related factors. A novel Global–Local collaborative embedding and Semantic mask-driven Aging method (GLS-Age) is proposed to resolve these limitations. By employing a differentiated latent space learning strategy, the model achieves hierarchical decoupling of structural and textural features. When integrated with semantic-guided portrait editing and a differentiable generator for latent space alignment, GLS-Age markedly enhances both the fidelity of age feature expression and the consistency of identity retention. The method demonstrates superior generalization and synthesis quality across multiple benchmark datasets, effectively reproducing natural wrinkle patterns and age-related facial changes. These results confirm the feasibility and advancement of GLS-Age in facial age synthesis tasks. Furthermore, this study establishes a compact, high-quality dataset focused on Asian facial portraits, supporting further research in image editing and face generation within this demographic. The proposed method not only contributes technical support to practical applications such as cold case resolution and missing person identification in public security but also offers a robust data and modeling framework for advancing human age-based simulation technologies. Future work will focus on enhancing controllable editing within latent spaces, improving anatomical plausibility in skull structure transformations, and strengthening model performance across extreme age groups, including infants and the elderly. These efforts aim to expand the application of facial age progression in areas such as forensic analysis, humanitarian family search, and social security systems. -
表 1 测试集上生成人像的预测年龄分布
模型 年龄估计值(岁) 0 10 20 30 50 Eur CUSP 7.42±2.44 14.57±2.97 26.61±6.09 36.32±8.01 57.12±14.01 LATS 8.11±3.25 13.76±2.09 26.54±5.33 37.87±7.43 59.03±12.54 HRFAE 19.41±15.34 24.56±6.39 24.90±7.13 25.53±8.19 34.12±12.98 IPCGAN - - 27.31±6.67 35.76±8.11 56.78±17.21 SAM 9.84±5.66 21.68±3.05 23.67±2.46 34.58±4.61 56.96±7.78 FADING 12.09±2.27 12.31±4.27 24.72±6.13 34.33±3.63 59.13±5.43 本文 7.01±2.01 15.22±3.04 23.33±2.74 33.29±3.17 56.25±7.10 As CUSP 7.11±2.91 11.35±4.13 24.19±4.87 36.96±7.71 58.76±12.22 LATS 7.11±3.56 12.23±4.54 24.31±4.56 35.67±6.93 57.83±13.10 HRFAE 17.93±10.43 22.21±5.33 27.42±7.54 24.90±6.33 30.10±8.12 IPCGAN - - 28.83±6.11 34.98±7.86 58.21±13.72 SAM 10.55±4.33 19.72±5.44 24.15±3.42 33.21±4.26 57.69±8.37 FADING 11.44±2.91 12.21±5.79 25.12±5.19 34.27±3.74 57.12±6.16 本文 6.53±2.88 13.59±3.09 23.97±3.42 34.15±3.01 55.65±7.44 表 2 测试集上生成人像的身份置信度
模型 身份置信度(%) 0 10 20 30 50 Eur CUSP 61.01 68.46 85.44 98.12 89.12 LATS 65.11 73.23 97.33 98.06 94.21 HRFAE 94.51 96.38 97.47 98.81 97.12 IPCGAN - - 96.42 95.35 96.79 SAM 72.16 85.08 98.43 98.54 96.14 FADING 54.47 79.41 89.19 97.23 94.77 本文 75.54 87.79 98.59 98.81 97.19 As CUSP 60.13 65.13 84.31 97.59 86.67 LATS 60.56 77.12 88.56 85.62 81.92 HRFAE 96.67 98.11 95.37 96.29 97.37 IPCGAN - - 94.95 97.54 97.33 SAM 69.45 72.68 87.55 81.61 81.44 FADING 56.21 79.55 90.05 98.33 95.48 OURS 70.93 87.24 98.45 98.17 97.21 表 3 消融各模块定量比对
模型/测试年龄组 年龄估计值(岁) 身份置信度(%) Real face 42.52±4.27 / GLS-Age 41.17±6.32 98.22 w/o GLCE 37.51±8.13 65.01 w/o SM-NAE 47.09±5.53 95.67 with FS / 99.92 with Restyle / 97.76 -
[1] ZHANG Zikang, YIN Songfeng, and CAO Liangcai. Age-invariant face recognition based on identity-age shared features[J]. The Visual Computer, 2024, 40(8): 5465–5474. doi: 10.1007/s00371-023-03116-1. [2] 刘耀晖, 孙鹏, 郎宇博, 等. 复合因素影响下嫌疑人发型变化的深度模拟[J]. 计算机应用研究, 2025, 42(3): 955–960. doi: 10.19734/j.issn.1001-3695.2024.04.0215.LIU Yaohui, SUN Peng, LANG Yubo, et al. Deep simulation of suspect hairstyles under influence of multiple factors[J]. Application Research of Computers, 2025, 42(3): 955–960. doi: 10.19734/j.issn.1001-3695.2024.04.0215. [3] SUO Jinli, CHEN Xilin, SHAN Shiguang, et al. A concatenational graph evolution aging model[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(11): 2083–2096. doi: 10.1109/TPAMI.2012.22. [4] ROWLAND D A and PERRETT D I. Manipulating facial appearance through shape and color[J]. IEEE Computer Graphics and Applications, 1995, 15(5): 70–76. doi: 10.1109/38.403830. [5] KEMELMACHER-SHLIZERMAN I, SUWAJANAKORN S, and SEITZ S M. Illumination-aware age progression[C]. 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, USA, 2014: 3334–3341. doi: 10.1109/CVPR.2014.426. [6] ALALUF Y, PATASHNIK O, and COHEN-OR D. Only a matter of style: Age transformation using a style-based regression model[J]. ACM Transactions on Graphics (TOG), 2021, 40(4): 1–12. doi: 10.1145/3450626.3459805. [7] OR-EL R, SENGUPTA S, FRIED O, et al. Lifespan age transformation synthesis[C]. The 16th European Conference on Computer Vision – ECCV 2020, Glasgow, UK, 2020: 739–755. doi: 10.1007/978-3-030-58539-6_44. [8] HE Zhenliang, ZUO Wangmeng, KAN M, et al. AttGAN: Facial attribute editing by only changing what you want[J]. IEEE Transactions on Image Processing, 2019, 28(11): 5464–5478. doi: 10.1109/TIP.2019.2916751. [9] LIU Kanglin, CAO Gaofeng, ZHOU Fei, et al. Towards disentangling latent space for unsupervised semantic face editing[J]. IEEE Transactions on Image Processing, 2022, 31: 1475–1489. doi: 10.1109/TIP.2022.3142527. [10] ABDAL R, ZHU Peihao, MITRA N J, et al. StyleFlow: Attribute-conditioned exploration of styleGAN-generated images using conditional continuous normalizing flows[J]. ACM Transactions on Graphics (ToG), 2021, 40(3): 1–21. doi: 10.1145/3447648. [11] WANG Haoyi, SANCHEZ V, and LI C T. Cross-age contrastive learning for age-invariant face recognition[C]. ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, South Korea, 2024: 4600–4604. doi: 10.1109/ICASSP48485.2024.10445859. [12] WANG Haoyi, SANCHEZ V, LI C T, et al. From age estimation to age-invariant face recognition: Generalized age feature extraction using order-enhanced contrastive learning[J]. IEEE Transactions on Information Forensics and Security, 2025, 20: 8525–8540. doi: 10.1109/TIFS.2025.3597187. [13] 毛亮, 薛月菊, 魏颖慧, 等. 一种用于细粒度人脸识别的眼镜去除方法[J]. 电子与信息学报, 2021, 43(5): 1448–1456. doi: 10.11999/JEIT200176.MAO Liang, XUE Yueju, WEI Yinghui, et al. An eyeglasses removal method for fine-grained face recognition[J]. Journal of Electronics & Information Technology, 2021, 43(5): 1448–1456. doi: 10.11999/JEIT200176. [14] 夏垚铮, 郝蕾, 郑宛露, 等. 基于语义分离和特征融合的人脸编辑方法[J]. 计算机辅助设计与图形学学报, 2025, 37(3): 414–426. doi: 10.3724/SP.J.1089.2024-00305.XIA Yaozheng, HAO Lei, ZHENG Wanlu, et al. An independent semantic and fused latent model for local face editing[J]. Journal of Computer-Aided Design & Computer Graphics, 2025, 37(3): 414–426. doi: 10.3724/SP.J.1089.2024-00305. [15] JIN Shiwei, WANG Zhen, WANG Lei, et al. ReDirTrans: Latent-to-latent translation for gaze and head redirection[C]. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, 2023: 5547–5556. doi: 10.1109/CVPR52729.2023.00537. (查阅网上资料,未找到本条文献出版地信息,请确认). [16] LI Qi, LIU Yunfan, SUN Zhenan. Age progression and regression with spatial attention modules[C]. Proceedings of the 34th AAAI Conference on Artificial Intelligence, New York, USA, 2020: 11378–11385. doi: 10.1609/aaai.v34i07.6800. [17] PARIHAR R, SACHIDANAND V S, MANI S, et al. PreciseControl: Enhancing text-to-image diffusion models with fine-grained attribute control[C]. The 18th European Conference on Computer Vision – ECCV 2024, Milan, Italy, 2025: 469–487. doi: 10.1007/978-3-031-73007-8_27. [18] CHANDALIYA P K and NAIN N. AW-GAN: Face aging and rejuvenation using attention with wavelet GAN[J]. Neural Computing and Applications, 2023, 35(3): 2811–2825. doi: 10.1007/s00521-022-07721-4. [19] HOU Chen, WEI Guoqiang, and CHEN Zhibo. High-fidelity diffusion-based image editing[C]. The Thirty-Eighth AAAI Conference on Artificial Intelligence, Vancouver, Canada, 2024: 2184–2192. doi: 10.1609/aaai.v38i3.27991. [20] 桂列林, 黄山, 印月. 结合Pixel2style2Pixel的年龄转化方法[J]. 计算机工程与应用, 2024, 60(14): 162–174. doi: 10.3778/j.issn.1002-8331.2304-0007.GUI Lielin, HUANG Shan, and YIN Yue. Age transformation method combined with Pixel2style2Pixel[J]. Computer Engineering and Applications, 2024, 60(14): 162–174. doi: 10.3778/j.issn.1002-8331.2304-0007. [21] SHU Xiangbo, TANG Jinhui, LAI Hanjiang, et al. Personalized age progression with aging dictionary[C]. 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 2015: 3970–3978. doi: 10.1109/ICCV.2015.452. [22] 张珂, 于婷婷, 石超君, 等. 融合通道位置注意力机制和并行空洞卷积的人脸年龄合成[J]. 中国图象图形学报, 2023, 28(12): 3870–3883. doi: 10.11834/jig.230007.ZHANG Ke, YU Tingting, SHI Chaojun, et al. Face age synthesis fusing channel-coordinate attention mechanism and parallel dilated convolution[J]. Journal of Image and Graphics, 2023, 28(12): 3870–3883. doi: 10.11834/jig.230007. [23] YANG Shuai, JIANG Liming, LIU Ziwei, et al. GP-UNIT: Generative prior for versatile unsupervised image-to-image translation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(10): 11869–11883. doi: 10.1109/TPAMI.2023.3284003. [24] 任坤, 李峥瑱, 桂源泽, 等. 低分辨率随机遮挡人脸图像的超分辨率修复[J]. 电子与信息学报, 2024, 46(8): 3343–3352. doi: 10.11999/JEIT231262.REN Kun, LI Zhengzhen, GUI Yuanze, et al. Super-resolution restoration of low-resolution randomly occluded face images[J]. Journal of Electronics & Information Technology, 2024, 46(8): 3343–3352. doi: 10.11999/JEIT231262. [25] 赵宏, 李文改. 基于扩散生成对抗网络的文本生成图像模型研究[J]. 电子与信息学报, 2023, 45(12): 4371–4381. doi: 10.11999/JEIT221400.ZHAO Hong and LI Wengai. Text-to-image generation model based on diffusion wasserstein generative adversarial networks[J]. Journal of Electronics & Information Technology, 2023, 45(12): 4371–4381. doi: 10.11999/JEIT221400. [26] 郭璠, 刘文韬, 杨佳男, 等. 基于半解析模型的夜间雾天图像生成算法[J]. 通信学报, 2025, 46(4): 129–143. doi: 10.11959/j.issn.1000-436x.2025061.GUO Fan, LIU Wentao, YANG Jianan, et al. Nighttime foggy image generation algorithm based on semi-analytic model[J]. Journal on Communications, 2025, 46(4): 129–143. doi: 10.11959/j.issn.1000-436x.2025061. [27] KARRAS T, AITTALA M, HELLSTEN J, et al. Training generative adversarial networks with limited data[C]. The 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, 2020: 12104–12114. [28] KARRAS T, LAINE S, AITTALA M, et al. Analyzing and improving the image quality of styleGAN[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, 2020: 8107–8116. doi: 10.1109/CVPR42600.2020.00813. [29] RICHARDSON E, ALALUF Y, PATASHNIK O, et al. Encoding in style: A styleGAN encoder for image-to-image translation[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, USA, 2021: 2287–2296. doi: 10.1109/CVPR46437.2021.00232. [30] WEI Tianyi, CHEN Dongdong, ZHOU Wenbo, et al. E2Style: Improve the efficiency and effectiveness of StyleGAN inversion[J]. IEEE Transactions on Image Processing, 2022, 31: 3267–3280. doi: 10.1109/TIP.2022.3167305. [31] ABDAL R, QIN Yipeng, and WONKA P. Image2StyleGAN: How to embed images into the styleGAN latent space?[C]. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 2019: 4431–4440. doi: 10.1109/ICCV.2019.00453. [32] TEWARI A, ELGHARIB M, R M B, et al. PIE: Portrait image embedding for semantic control[J]. ACM Transactions on Graphics (TOG), 2020, 39(6): 223 doi: 10.1145/3414685.3417803. [33] ZHU Peihao, ABDAL R, QIN Yipeng, et al. Improved StyleGAN embedding: Where are the good latents?[DB/OL]. arXiv preprint arXiv: 2012.09036, 2020. https://arxiv.org/abs/2012.09036?context=cs. (查阅网上资料, 未找到引用日期信息, 请补充) [34] DENTON R, HUTCHINSON B, MITCHELL M, et al. Image counterfactual sensitivity analysis for detecting unintended bias[DB/OL]. arXiv preprint arXiv: 1906.06439, 2019. https://arxiv.org/abs/1906.06439v3. (查阅网上资料, 未找到引用日期信息, 请补充) [35] GOETSCHALCKX L, ANDONIAN A, OLIVA A, et al. GANalyze: Toward visual definitions of cognitive image properties[C]. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, South Korea, 2019: 5743–5752. doi: 10.1109/ICCV.2019.00584. [36] SHEN Yujun, GU Jinjin, TANG Xiaoou, et al. Interpreting the latent space of GANs for semantic face editing[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, 2020: 9240–9249. doi: 10.1109/CVPR42600.2020.00926. [37] HÄRKÖNEN E, HERTZMANN A, LEHTINEN J, et al. GANSpace: Discovering interpretable GAN controls[C]. The 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, 2020, 33: 9841–9850. [38] VOYNOV A and BABENKO A. Unsupervised discovery of interpretable directions in the GAN latent space[C]. The 37th International Conference on Machine Learning, Vienna, Austria, 2020: 9786–9796. [39] WANG Binxu and PONCE C R. The geometry of deep generative image models and its applications[J]. arXiv preprint arXiv: 2101.06006, 2021. doi: 10.48550/arXiv.2101.06006. [40] TEWARI A, ELGHARIB M, BHARAJ G, et al. StyleRig: Rigging styleGAN for 3D control over portrait images[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, 2020: 6141–6150. doi: 10.1109/CVPR42600.2020.00618. [41] YANG Hongyu, HUANG Di, WANG Yunhong, et al. Learning face age progression: A pyramid architecture of GANs[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 31–39. doi: 10.1109/CVPR.2018.00011. [42] LIU Yunfan, LI Qi, and SUN Zhenan. Attribute-aware face aging with wavelet-based generative adversarial networks[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, USA, 2019: 11869–11878. doi: 10.1109/CVPR.2019.01215. [43] HUANG Zhizhong, CHEN Shouzhen, ZHANG Junping, et al. PFA-GAN: Progressive face aging with generative adversarial network[J]. IEEE Transactions on Information Forensics and Security, 2021, 16: 2031–2045. doi: 10.1109/TIFS.2020.3047753. [44] SHEN Yujun, ZHOU Bolei, LUO Ping, et al. FaceFeat-GAN: A two-stage approach for identity-preserving face synthesis[J]. arXiv preprint arXiv: 1812.01288, 2018. doi: 10.48550/arXiv.1812.01288. [45] YU Changqian, WANG Jingbo, PENG Chao, et al. Bisenet: Bilateral segmentation network for real-time semantic segmentation[C]. The 15th European Conference on Computer Vision – ECCV 2018, Munich, Germany, 2018: 334–349. doi: 10.1007/978-3-030-01261-8_20. [46] TELEA A. An image inpainting technique based on the fast marching method[J]. Journal of Graphics Tools, 2004, 9(1): 23–34. doi: 10.1080/10867651.2004.10487596. [47] GOMEZ-TRENADO G, LATHUILIÈRE S, MESEJO P, et al. Custom structure preservation in face aging[C]. The 17th European Conference on Computer Vision – ECCV 2022, Tel Aviv, Israel, 2022: 565–580. doi: 10.1007/978-3-031-19787-1_32. [48] YAO Xu, PUY G, NEWSON A, et al. High resolution face age editing[C]. 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 2021: 8624–8631. doi: 10.1109/ICPR48806.2021.9412383. [49] CHEN Xiangyi and LATHUILIÈRE S. Face aging via diffusion-based editing[J]. arXiv preprint arXiv: 2309.11321, 2023. doi: 10.48550/arXiv.2309.11321. -