Deep Active Time-series Clustering Based on Constraint Transitivity

HUO Weigang; ZHU Xu; ZHANG Pan

doi:10.11999/JEIT240855

Volume 47 Issue 4

Apr. 2025

Turn off MathJax

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2025 > 47(4): 1172-1181

HUO Weigang, ZHU Xu, ZHANG Pan. Deep Active Time-series Clustering Based on Constraint Transitivity[J]. Journal of Electronics & Information Technology, 2025, 47(4): 1172-1181. doi: 10.11999/JEIT240855

Citation:

HUO Weigang, ZHU Xu, ZHANG Pan. Deep Active Time-series Clustering Based on Constraint Transitivity[J]. Journal of Electronics & Information Technology, 2025, 47(4): 1172-1181. doi: 10.11999/JEIT240855

Citation:

PDF( 1515 KB)

Deep Active Time-series Clustering Based on Constraint Transitivity

doi: 10.11999/JEIT240855 cstr: 32379.14.JEIT240855

School of Computer Science and Technology, Civil Aviation University of China, Tianjin 300300, China

Funds: The Civil Aviation Joint Fund of The National Natural Science Foundation of China (U2033205), The National Natural Science Foundation of China (62173331), Tianjin Natural Science Foundation General Project (24JCYBJC00990)

Received Date: 2024-10-14
Rev Recd Date: 2025-03-10

Available Online: 2025-03-20

Publish Date: 2025-04-01

Abstract

Abstract

Objective The rapid advancement of the Internet of Things and sensor technology has led to the accumulation of vast amounts of unlabeled time-series data, making deep time-series clustering a key analytical approach. However, existing deep clustering methods lack supervised constraint information and label guidance, making them susceptible to noise and outliers. Deep semi-supervised clustering methods rely on predefined Must-Link (ML) and Cannot-Link (CL) constraints, limiting improvements in clustering performance. Existing active clustering approaches sample only within clusters in the representation space, overlooking pairwise annotations from different clusters. This results in lower-quality ML and CL constraints and prevents further extrapolation from manually annotated pairs, increasing annotation costs. To address these limitations, this paper proposes Deep Active Time-series Clustering based on Constraint Transitivity (DATC-CT), which improves clustering performance while reducing annotation costs. Methods DATC-CT defines an Annotation Cluster Set (ACS) and an Auxiliary Annotation Set (AAS) and obtains the representation vector of time-series samples using a pre-trained autoencoder. In each clustering epoch, samples closest to cluster centers in the representation space are selected, labeled, and stored in ACS. This ensures that all samples within an ACS belong to the same category, while those in different ACSs belong to different categories. Next, a time-series sample is randomly chosen from the ACS with the fewest samples. Another sample, which does not belong to the same cluster but is nearest to the selected sample’s center, is then sampled, labeled, and stored in either AAS or ACS. Samples in ACS and AAS belong to different categories. ML and CL constraints are inferred from these samples. The encoder’s network parameters and cluster centers are updated using KL divergence between the cluster distribution (modeled by a t-distribution) and an auxiliary distribution generated from it. Additionally, a constraint loss function is applied to increase the distance between ML-constrained samples while reducing the distance between CL-constrained samples in the representation space. Results and Discussions Experimental results on 18 public datasets show that the proposed method improves the average Rand Index (RI) by more than 5% compared to existing deep time-series clustering methods (Table 2). With the same labeling budget, it achieves an RI improvement of over 7% compared to existing active clustering methods (Table 3). These findings confirm the effectiveness of the active sampling strategy and constraint reasoning mechanism. Additionally, the method infers a large number of ML and CL constraints from a small set of manually annotated constraints (Fig. 4), significantly reducing annotation costs. Conclusions This paper proposes a deep active time-series clustering model based on constraint transitivity, incorporating a two-phase active sampling strategy: exploration and consolidation. In the exploration phase, the model selects the sample closest to each cluster center in the representation space and stores it in ACS. During consolidation, a sample is randomly chosen from the ACS with the fewest samples. Another sample, which does not belong to the same cluster but is nearest to the selected sample’s center, is then sampled, labeled, and stored in either AAS or ACS. The number of ACS and AAS matches the number of clusters. ML and CL constraints are inferred from ACS and AAS samples. Experiments on public datasets demonstrate that inferring new clustering constraints reduces annotation costs and improves deep time-series clustering performance.
- Deep time series clustering,
- Active learning,
- Constraint transitivity

FullText(HTML)

References(25)

References

[1]	AIGNER W, MIKSCH S, SCHUMANN H, et al. Visualization of Time-Oriented Data[M]. London: Springer, 2011: 153–196.
[2]	LEE S, CHOI C, and SON Y. Deep time-series clustering via latent representation alignment[J]. Knowledge-Based Systems, 2024, 303: 112434. doi: 10.1016/j.knosys.2024.112434.
[3]	CAI Jinyu, FAN Jicong, GUO Wenzhong, et al. Efficient deep embedded subspace clustering[C]. The 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, USA, 2022: 21–30. doi: 10.1109/cvpr52688.2022.00012.
[4]	ZHONG Ying, HUANG Dong, and WANG Changdong. Deep temporal contrastive clustering[J]. Neural Processing Letters, 2023, 55(6): 7869–7885. doi: 10.1007/s11063-023-11287-0.
[5]	ROS F, RIAD R, and GUILLAUME S. Deep clustering framework review using multicriteria evaluation[J]. Knowledge-Based Systems, 2024, 285: 111315. doi: 10.1016/j.knosys.2023.111315.
[6]	XIE Junyuan, GIRSHICK R, and FARHADI A. Unsupervised deep embedding for clustering analysis[C]. The 33rd International Conference on International Conference on Machine Learning, New York, USA, 2016: 478–487.
[7]	MADIRAJU N S. Deep temporal clustering: Fully unsupervised learning of time-domain features[D]. [Ph. D. dissertation], Arizona State University, 2018.
[8]	MA Qianli, ZHENG Jiawei, LI Sen, et al. Learning representations for time series clustering[C]. The 33rd International Conference on Neural Information Processing Systems, Red Hook, USA, 2019: 3781–3791.
[9]	PENG Furong, LUO Jiachen, LU Xuan, et al. Cross-domain contrastive learning for time series clustering[C]. The AAAI Conference on Artificial Intelligence, Vancouver, Canada, 2024: 8921–8929. doi: 10.1609/aaai.v38i8.28740.
[10]	LI Xiaosheng, XI Wenjie, and LIN J. Randomnet: Clustering time series using untrained deep neural networks[J]. Data Mining and Knowledge Discovery, 2024, 38(6): 3473–3502. doi: 10.1007/s10618-024-01048-5.
[11]	SUN Bicheng, ZHOU Peng, DU Liang, et al. Active deep image clustering[J]. Knowledge-Based Systems, 2022, 252: 109346. doi: 10.1016/j.knosys.2022.109346.
[12]	REN Yazhou, HU Kangrong, DAI Xinyi, et al. Semi-supervised deep embedded clustering[J]. Neurocomputing, 2019, 325: 121–130. doi: 10.1016/j.neucom.2018.10.016.
[13]	REN Pengzhen, XIAO Yun, CHANG Xiaojun, et al. A survey of deep active learning[J]. ACM Computing Surveys (CSUR), 2022, 54(9): 180. doi: 10.1145/3472291.
[14]	DENG Xun, LIU Junlong, ZHONG Han, et al. A3S: A general active clustering method with pairwise constraints[C]. The 41st International Conference on Machine Learning, Vienna, Austria, 2024: 10488–10505.
[15]	李海林, 张丽萍. 时间序列数据挖掘中的聚类研究综述[J]. 电子科技大学学报, 2022, 51(3): 416–424. doi: 10.12178/1001-0548.2022055. LI Hailin and ZHANG Liping. Summary of clustering research in time series data mining[J]. Journal of University of Electronic Science and Technology of China, 2022, 51(3): 416–424. doi: 10.12178/1001-0548.2022055.
[16]	PETITJEAN F, KETTERLIN A, and GANÇARSKI P. A global averaging method for dynamic time warping, with applications to clustering[J]. Pattern Recognition, 2011, 44(3): 678–693. doi: 10.1016/j.patcog.2010.09.013.
[17]	PAPARRIZOS J and GRAVANO L. Fast and accurate time-series clustering[J]. ACM Transactions on Database Systems (TODS), 2017, 42(2): 8. doi: 10.1145/3044711.
[18]	ZAKARIA J, MUEEN A, and KEOGH E. Clustering time series using unsupervised-shapelets[C]. 2012 IEEE 12th International Conference on Data Mining, Brussels, Belgium, 2012: 785–794. doi: 10.1109/icdm.2012.26.
[19]	余思琴, 闫秋艳, 闫欣鸣. 基于最佳u-shapelets的时间序列聚类算法[J]. 计算机应用, 2017, 37(8): 2349–2356. doi: 10.11772/j.issn.1001-9081.2017.08.2349. YU Siqin, YAN Qiuyan, and YAN Xinming. Clustering algorithm of time series with optimal u-shapelets[J]. Journal of Computer Applications, 2017, 37(8): 2349–2356. doi: 10.11772/j.issn.1001-9081.2017.08.2349.
[20]	MACQUEEN J B. Some methods for classification and analysis of multivariate observations[C]. The 5th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, USA, 1967: 281–297.
[21]	JOHNSON S C. Hierarchical clustering schemes[J]. Psychometrika, 1967, 32(3): 241–254. doi: 10.1007/bf02289588.
[22]	NG A Y, JORDAN M I, and WEISS Y. On spectral clustering: Analysis and an algorithm[C]. The 15th International Conference on Neural Information Processing Systems: Natural and Synthetic, Cambridge, USA, 2001: 849–856.
[23]	CHANG Shiyu, ZHANG Yang, HAN Wei, et al. Dilated recurrent neural networks[C]. The 31st International Conference on Neural Information Processing Systems, Red Hook, USA, 2017: 76–86.
[24]	BASU S, BANERJEE A, and MOONEY R J. Active semi-supervision for pairwise constrained clustering[C]. The 2004 SIAM International Conference on Data Mining, Lake Buena Vista, USA, 2004: 333–344. doi: 10.1137/1.9781611972740.31.
[25]	DAU H A, BAGNALL A, KAMGAR K, et al. The UCR time series archive[J]. IEEE/CAA Journal of Automatica Sinica, 2019, 6(6): 1293–1305. doi: 10.1109/jas.2019.1911747.