高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

两行更新策略的成对约束投影-聚类联合优化方法

朱建勇 陈坤 杨辉 聂飞平

朱建勇, 陈坤, 杨辉, 聂飞平. 两行更新策略的成对约束投影-聚类联合优化方法[J]. 电子与信息学报. doi: 10.11999/JEIT260111
引用本文: 朱建勇, 陈坤, 杨辉, 聂飞平. 两行更新策略的成对约束投影-聚类联合优化方法[J]. 电子与信息学报. doi: 10.11999/JEIT260111
ZHU Jianyong, CHEN Kun, YANG Hui, NIE Feiping. Joint Optimization Method for Pairwise Constrained Projection Clustering Integrating Two-row Update Strategy[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT260111
Citation: ZHU Jianyong, CHEN Kun, YANG Hui, NIE Feiping. Joint Optimization Method for Pairwise Constrained Projection Clustering Integrating Two-row Update Strategy[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT260111

两行更新策略的成对约束投影-聚类联合优化方法

doi: 10.11999/JEIT260111 cstr: 32379.14.JEIT260111
基金项目: 国家自然科学基金(No. 62363010),江西省自然科学基金重点项目(20252BAC250019),江西省“双千计划”(SSQ2023018) ,国家科技重大专项项目(2026ZD16010302)
详细信息
    作者简介:

    朱建勇:男,教授,研究方向为机器学习,工业人工智能等

    陈坤:男,硕士生,研究方向为机器学习,数据挖掘等

    杨辉:男,教授,研究方向为复杂系统建模,控制与优化等

    聂飞平:男,教授,研究方向为机器学习及应用,模式识别,数据挖掘等

    通讯作者:

    聂飞平 feipingnie@gmail.com

  • 中图分类号: TP181

Joint Optimization Method for Pairwise Constrained Projection Clustering Integrating Two-row Update Strategy

Funds: National Natural Science Foundation of China (No. 62363010), Key Project of Jiangxi Provincial Natural Science Foundation (20252BAC250019), Jiangxi Double Thousand Plan (SSQ2023018), National Science and Technology Major Project (2026ZD16010302).
  • 摘要: 现有的约束投影方法通常采用投影-聚类两步独立策略,导致投影误差直接传递至聚类过程;此外,成对约束仅局限于投影阶段,偏离由先验知识引导聚类的核心思想。为此,本文提出两行更新策略的成对约束投影-聚类联合优化方法。首先,将约束投影目标函数作为正则化项统一到聚类框架中,二者在交替优化的过程中共同逼近全局最优解;其次,为克服传统K-means易陷入局部次优解的缺陷,并旨在提升计算效率,采用了一种融合增量计算、提前存储、变量更新三步加速策略的改进坐标下降法求解指示矩阵;最后,将成对约束嵌入到聚类指示矩阵中,使有限的监督信息贯穿于整个学习流程,并设计了一种新颖的两行同时优化策略,在每次循环中精准且直接地分配勿连约束。仿真实验验证了本文所提方法的优越性。
  • 图  1  不同方法在4个基准数据集上的成对约束违反率对比

    图  2  不同$ \gamma $值对PCITUS算法性能的影响

    图  3  PCITUS噪声敏感度及收敛曲线

    表  1  本文主要符号说明

    符号描述符号描述符号描述符号描述
    $ \boldsymbol{X}\in {\mathbb{R}}^{d\times n} $数据集矩阵$ \tilde{\boldsymbol{Y}}\in {\mathbb{R}}^{\tilde{n}\times c} $合并后的指示矩阵M(C)必连(勿连)约束集合d样本原始特征维度
    $ \boldsymbol{Y}\in {\mathbb{R}}^{n\times c} $指示矩阵$ {\tilde{y}}_{i}\in {\mathbb{R}}^{\tilde{n}\times 1} $$ \tilde{\boldsymbol{Y}} $的第in原始样本数m投影维度
    $ \boldsymbol{W}\in {\mathbb{R}}^{d\times m} $投影矩阵$ {\tilde{y}}^{i}\in {\mathbb{R}}^{1\times c} $$ \tilde{\boldsymbol{Y}} $的第i$ \tilde{n} $合并后的样本数c类别数
    下载: 导出CSV

    1  式(6)下提出的PCITUS 算法

     1:输入:数据$ \tilde{\boldsymbol{X}}\in {\mathbb{R}}^{d\times \tilde{n}} $,成对约束集合MC, 类别数c,调
     节参数$ \gamma $;
     2: 初始化: 通过式(5)利用必连传递闭包形成超链接点,得到合并
     后的数据集$ \tilde{\boldsymbol{X}}\in {\mathbb{R}}^{d\times \tilde{n}} $,包含超链接点的CL约束集合$ \tilde{\boldsymbol{C}} $,随机
     初始化指示矩阵$ \tilde{\boldsymbol{Y}}\in {\mathbb{R}}^{\tilde{n}\times c} $;
     3: repeat
     4:  计算$ \boldsymbol{S}=\boldsymbol{P}+\gamma \boldsymbol{K} $;
     5:  通过式 (7) 对S执行特征分解更新 W
     6:  通过算法2中坐标下降法更新 $ \tilde{\boldsymbol{Y}} $;
     7: until convergence
     8: 输出:总样本的预测标签Label.
    下载: 导出CSV

    2  坐标下降法求解问题(8)

     1: 输入:合并后数据矩阵$ \tilde{\boldsymbol{X}}\in {\mathbb{R}}^{d\times \tilde{n}} $,投影矩阵$ \boldsymbol{W}\in {\mathbb{R}}^{d\times m} $,包含超链接点的CL约束集合$ \tilde{C} $;
     2: 初始化:随机初始化指示矩阵$ \tilde{\boldsymbol{Y}}\in {\mathbb{R}}^{\tilde{n}\times c} $,计算$ \boldsymbol{A}={\tilde{\boldsymbol{X}}}^{\text{T}}\boldsymbol{W}{\boldsymbol{W}}^{\text{T}}\tilde{\boldsymbol{X}} $;
     3: 计算并提前存储$ \boldsymbol{A}{\tilde{\boldsymbol{y}}}_{k} $,$ \tilde{\boldsymbol{y}}_{k}^{\mathrm{T}}\boldsymbol{A}{\tilde{\boldsymbol{y}}}_{k} $和$ \tilde{\boldsymbol{y}}_{k}^{\mathrm{T}}{\tilde{\boldsymbol{y}}}_{k}(k=1,2,\cdots ,c) $;
     4: for i = 1 to$ \tilde{n} $$do
     5: if $ {\tilde{\boldsymbol{x}}}_{i} $不包含在$ \tilde{C} $中 then
     6:  通过式(14)计算$ \varphi (k)(k=1,2,\cdots c) $;
     7:  通过式(15)更新$ \tilde{\boldsymbol{Y}} $的第i行$ {\tilde{\mathrm{h}}}^{\mathrm{i}} $$ {\tilde{\boldsymbol{y}}}^{i} $;
     8:  if $ q\neq p $ then
     9:   通过式(16)更新$ \boldsymbol{A}{\tilde{\boldsymbol{y}}}_{k} $,$ \tilde{\boldsymbol{y}}_{k}^{\mathrm{T}}\boldsymbol{A}{\tilde{\boldsymbol{y}}}_{k} $和$ \tilde{\boldsymbol{y}}_{k}^{\mathrm{T}}{\tilde{\boldsymbol{y}}}_{k}(k=p,q) $;
     10: end if
     11: else if $ {\tilde{\boldsymbol{x}}}_{i} $和$ {\tilde{\boldsymbol{x}}}_{j} $包含在$ \tilde{C} $中then
     12: 通过式(14)同时计算$ {\varphi }^{i}(k) $和$ {\varphi }^{j}(k) $;
     13: 通过式(17)同时更新$ {\tilde{\boldsymbol{y}}}^{i} $$ {\tilde{\mathrm{h}}}^{\mathrm{i}} $和$ {\tilde{\boldsymbol{y}}}^{j} $;$ {\tilde{\mathrm{h}}}^{\mathrm{j}} $
     14: if $ q\neq p $ then
     15: 通过式(16)分别更新$ {\tilde{\boldsymbol{y}}}^{i} $和$ {\tilde{\boldsymbol{y}}}^{j} $$ {\tilde{\mathrm{h}}}^{\mathrm{j}} $所对应的$ \boldsymbol{A}{\tilde{\boldsymbol{y}}}_{k} $,$ \tilde{\boldsymbol{y}}_{k}^{\mathrm{T}}\boldsymbol{A}{\tilde{\boldsymbol{y}}}_{k} $和$ \tilde{\boldsymbol{y}}_{k}^{\mathrm{T}}{\tilde{\boldsymbol{y}}}_{k}(k=p,q) $;
     16: end if
     17: end if
     18: end for
     19: 输出: 指示矩阵$ \tilde{\boldsymbol{Y}}\in {\mathbb{R}}^{\tilde{n}\times c} $.
    下载: 导出CSV

    表  2  数据集信息

    数据集样本数量维度类别数据集样本数量维度类别
    Wine178133Binalpha140432036
    Breast69992Steel1941277
    Australian690142Satimage6435366
    Landset2000366Mushroom81241122
    下载: 导出CSV

    表  3  不同算法参数设置

    算法名称 参数描述 算法名称 参数描述
    Cop-Kmeans 无参数调节 Onestep-PCP 权衡系数α=1
    PC-Kmeans 权重系数ω=1 PCOG 正则化系数$y \in\left\{10^{-3}, 10^{-2}, 10^{-1}, 1,10,10^{-1}, 10^{-1}\right\} $
    SCC 高斯核带宽$\sigma \in[0,2] $ PCASG 正则化系数$\lambda=1 ; \beta, \theta \in\left\{10^{-3}, 10^{-2}, 10^{-1}, 1,10,10^{2}, 10^{2}\right\} $
    CNP $ \rho \in [0.1,0.2] $ ESGCPC 锚点个$m=\left\{2^{\prime}, 2^{\prime}, 2^{\prime}, 2^{\prime \prime}, 2^{n}\right\} $,近邻数$k=\{6,12,18,24\} $
    SSDC 临时聚类数$ D=\sqrt{n} $ or $\sqrt{m} / 2 $ PCITUS 权衡系数$y \in\left\{10^{-3}, 10^{-2}, 10^{-1}, 1,10,10^{-1}, 10^{-1}\right\} $
    下载: 导出CSV

    表  5  不同方法在八个基准数据集上的NMI(%)比较(平均值±标准差)

    MethodWineBreastAustralianLandsetBinalphaSteelSatimageMushroom
    Cop-Kmeans86.51±1.9069.86±2.1236.57±1.2159.79±2.8456.51±1.0229.62±2.1859.77±3.2742.76±5.49
    PC-Kmeans87.10±1.8670.52±1.1537.85±0.9858.83±2.3957.83±1.0130.58±2.0358.13±4.7340.39±4.36
    SCC87.28±1.3972.68±2.8035.79±1.6351.41±2.8161.89±0.4931.41±2.3660.89±0.1550.90±2.25
    CNP82.75±4.5170.42±1.2838.81±2.0257.53±4.3158.53±1.1330.44±1.9560.95±0.2145.84±3.77
    SSDC75.83±5.3430.96±3.8211.86±1.1444.64±2.6646.51±3.4726.31±1.2443.14±1.9124.71±2.54
    Onestep-PCP84.44±0.0037.12±0.008.71±0.0062.52±0.0061.33±0.3726.92±1.6661.65±0.0213.61±0.00
    PCOG83.24±3.3266.37±1.18N/A52.55±4.8560.54±5.0126.91±0.2946.90±3.65N/A
    PCASG62.59±3.9862.41±0.6325.20±3.3455.91±3.0660.51±1.5225.21±1.7955.95±3.45N/A
    ESGCPC88.07±2.0873.69±2.3836.98±1.6261.77±2.2258.94±0.8332.27±0.9161.37±0.5145.87±5.42
    PCITUS89.24±1.9675.53±1.4441.50±1.9562.01±0.3861.46±1.2132.69±1.8361.81±0.2258.19±0.61
    下载: 导出CSV

    表  4  不同方法在八个基准数据集上的ACC(%)比较(平均值±标准差)

    MethodWineBreastAustralianLandsetBinalphaSteelSatimageMushroom
    Cop-Kmeans96.15±0.8895.09±0.4684.03±0.5167.32±1.8640.25±1.7844.75±2.7767.31±2.5881.52±5.76
    PC-Kmeans96.48±0.7895.27±0.2484.58±0.6065.10±3.2441.95±1.7746.39±3.4365.70±3.7979.27±6.24
    SCC96.43±0.4295.04±0.6583.52±2.8261.44±2.0346.11±1.9441.27±3.8563.99±0.3585.44±4.28
    CNP95.06±1.6095.14±0.3184.04±0.7765.32±3.1242.88±2.1944.81±3.5167.86±0.3588.94±2.69
    SSDC90.34±4.1866.82±4.6131.48±3.5952.95±4.6129.20±2.1832.65±3.3253.27±3.6449.48±6.15
    Onestep-PCP95.51±0.0080.11±0.0066.81±0.0064.30±0.0046.65±1.4241.20±3.0663.90±0.0270.21±0.00
    PCOG94.94±2.2894.07±0.42N/A61.75±5.3542.53±2.6642.28±1.4961.49±4.24N/A
    PCASG82.70±2.8893.62±0.1478.84±4.2167.97±1.5844.07±2.7941.04±0.8354.80±5.16N/A
    ESGCPC96.91±0.6995.91±0.4884.19±0.6569.28±2.7944.04±2.0746.29±2.6068.39±0.3086.19±4.07
    PCITUS97.11±0.5696.28±0.2986.07±0.7668.62±0.6146.89±2.0847.59±2.5170.91±1.1190.13±0.15
    下载: 导出CSV

    表  6  不同约束比例对本文算法的性能(NMI)(%)影响(平均值±标准差)

    MetricsRate of pairwise linksWineBreastAustralianLandsetBinalphaSteelSatimageMushroom
    NMI0.189.24±1.9675.53±1.4441.50±1.9562.01±0.3861.20±1.2131.07±1.8361.81±0.2258.19±0.61
    0.289.91±2.8178.51±2.7545.72±2.8462.89±0.6163.04±0.9931.97±1.3662.63±0.2361.47±0.88
    0.391.29±3.1781.88±2.2250.13±2.5963.72±0.5164.83±1.0833.27±1.3963.40±0.3865.48±1.08
    0.491.81±2.5584.42±1.9454.29±3.4164.70±0.6366.55±1.3034.35±2.0164.13±0.4469.67±1.10
    0.592.75±2.4187.05±2.3256.95±2.7265.59±0.8768.38±1.2535.34±1.5364.38±2.1572.99±1.07
    下载: 导出CSV

    表  7  不同方法在4个较大规模数据集上的运行时间对比(秒)

    MethodCop-KmeansPC-KmeansSCCCNPSSDCOnestep-PCPESGCPCPCITUS
    Landset0.834.223.830.0329.8416.911.480.34
    Steel1.835.2710.490.0232.6430.151.650.44
    Satimage5.6482.35514.760.53415.92669.775.114.82
    Mushroom2.688.62320.491.56723.551477.622.792.93
    下载: 导出CSV
  • [1] 张朋飞, 程俊, 张治坤, 等. 满足本地差分隐私的混合噪音感知的模糊C均值聚类算法[J]. 电子与信息学报, 2025, 47(3): 739–757. doi: 10.11999/JEIT241067.

    ZHANG Pengfei, CHENG Jun, ZHANG Zhikun, et al. Fuzzy C-means clustering algorithm based on mixed noise-aware under local differential privacy[J]. Journal of Electronics & Information Technology, 2025, 47(3): 739–757. doi: 10.11999/JEIT241067.
    [2] 金极栋, 卢宛萱, 孙显, 等. 分布采样对齐的遥感半监督要素提取框架及轻量化方法[J]. 电子与信息学报, 2024, 46(5): 2187–2197. doi: 10.11999/JEIT240220.

    JIN Jidong, LU Wanxuan, SUN Xian, et al. Remote sensing semi-supervised feature extraction framework and lightweight method integrated with distribution-aligned sampling[J]. Journal of Electronics & Information Technology, 2024, 46(5): 2187–2197. doi: 10.11999/JEIT240220.
    [3] 霍纬纲, 朱旭, 张盼. 基于约束传递的深度主动时序聚类方法[J]. 电子与信息学报, 2025, 47(4): 1172–1181. doi: 10.11999/JEIT240855.

    HUO Weigang, ZHU Xu, and ZHANG Pan. Deep active time-series clustering based on constraint transitivity[J]. Journal of Electronics & Information Technology, 2025, 47(4): 1172–1181. doi: 10.11999/JEIT240855.
    [4] CHEN Jingwei, XIE Shiyu, YANG Hui, et al. Effective semi-supervised graph clustering with pairwise constraints[J]. Information Sciences, 2024, 681: 121249. doi: 10.1016/j.ins.2024.121249.
    [5] WAGSTAFF K, CARDIE C, ROGERS S, et al. Constrained K-means clustering with background knowledge[C]. The 18th International Conference on Machine Learning, San Francisco, USA, 2001: 577–584.
    [6] KAMVAR S D, KAMVAR D, and MANNING C DKAMVAR S D, KAMVAR D, and MANNING C DSpectral learning[C]. The 18th International Joint Conference on Artificial Intelligence, San Francisco, USA, 2003: 561–566.
    [7] JIA Yuheng, WU Wenhui, WANG Ran, et al. Joint optimization for pairwise constraint propagation[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(7): 3168–3180. doi: 10.1109/TNNLS.2020.3009953.
    [8] NIE Feiping, ZHANG Han, WANG Rong, et al. Semi-supervised clustering via pairwise constrained optimal graph[C]. The 29th International Joint Conferences on Artificial Intelligence, Yokohama, Japan, 2021: 3160–3166. doi: 10.24963/IJCAI.2020/437.
    [9] ZHANG Jing, FAN Ruidong, TAO Hong, et al. Constrained clustering with weak label prior[J]. Frontiers of Computer Science, 2024, 18(3): 183338. doi: 10.1007/s11704-023-3355-7.
    [10] ZHOU Jie, HUANG Chucheng, GAO Can, et al. Weighted subspace fuzzy clustering with adaptive projection[J]. International Journal of Intelligent Systems, 2024, 2024: 6696775. doi: 10.1155/2024/6696775.
    [11] TANG Wei, XIONG Hui, ZHONG Shi, et al. Enhancing semi-supervised clustering: A feature projection perspective[C]. The 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, USA, 2007: 707–716. doi: 10.1145/1281192.1281268.
    [12] WANG Hongjun, LI Tao, LI Tianrui, et al. Constraint neighborhood projections for semi-supervised clustering[J]. IEEE Transactions on Cybernetics, 2014, 44(5): 636–643. doi: 10.1109/TCYB.2013.2263383.
    [13] YU Zhiwen, LUO Peinan, LIU Jiming, et al. Semi-supervised ensemble clustering based on selected constraint projection[J]. IEEE Transactions on Knowledge and Data Engineering, 2018, 30(12): 2394–2407. doi: 10.1109/TKDE.2018.2818729.
    [14] NIE Feiping, XUE Jingjing, WU Danyang, et al. Coordinate descent method for k-means[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(5): 2371–2385. doi: 10.1109/TPAMI.2021.3085739.
    [15] ZHANG Chao, XU Deng, CHEN Chunlin, et al. Semi-supervised multi-view clustering with active constraints[C]. The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1, Toronto, Canada, 2025: 1903–1912. doi: 10.1145/3690624.3709204.
    [16] BASU S, BANERJEE A, and MOONEY R J. Active semi-supervision for pairwise constrained clustering[C]. The 2004 SIAM International Conference on Data Mining, Lake Buena Vista, USA, 2004: 333–344.
    [17] REN Yazhou, HU Xiaohui, SHI Ke, et al. Semi-supervised DenPeak clustering with pairwise constraints[C]. The 15th Pacific RIM International Conference on Artificial Intelligence, Nanjing, China, 2018: 837–850. doi: 10.1007/978-3-319-97304-3_64.
    [18] CHEN Long and ZHONG Zhi. Adaptive and structured graph learning for semi-supervised clustering[J]. Information Processing & Management, 2022, 59(4): 102949. doi: 10.1016/j.ipm.2022.102949.
  • 加载中
图(3) / 表(9)
计量
  • 文章访问数:  14
  • HTML全文浏览量:  9
  • PDF下载量:  0
  • 被引次数: 0
出版历程
  • 修回日期:  2026-05-12
  • 录用日期:  2026-05-12
  • 网络出版日期:  2026-05-29

目录

    /

    返回文章
    返回