Adaptive Oversampling Method Based on Maximum Safe Nearest Neighbor and Local Density

ZHAO Xiaoqiang; HE Jiaqi

doi:10.11999/JEIT240441

Volume 47 Issue 4

Apr. 2025

Turn off MathJax

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2025 > 47(4): 1140-1149

ZHAO Xiaoqiang, HE Jiaqi. Adaptive Oversampling Method Based on Maximum Safe Nearest Neighbor and Local Density[J]. Journal of Electronics & Information Technology, 2025, 47(4): 1140-1149. doi: 10.11999/JEIT240441

Citation:

ZHAO Xiaoqiang, HE Jiaqi. Adaptive Oversampling Method Based on Maximum Safe Nearest Neighbor and Local Density[J]. Journal of Electronics & Information Technology, 2025, 47(4): 1140-1149. doi: 10.11999/JEIT240441

Citation:

PDF( 1094 KB)

Adaptive Oversampling Method Based on Maximum Safe Nearest Neighbor and Local Density

doi: 10.11999/JEIT240441 cstr: 32379.14.JEIT240441

ZHAO Xiaoqiang^,,
HE Jiaqi

College of Electrical Engineering and Information Engineering, Lanzhou University of Technology, Lanzhou 730000, China

Funds: The National Natural Science Foundation of China (62263021), The College Industrial Support Project of Gansu Province (2023CYZC-24)

Received Date: 2024-06-03
Rev Recd Date: 2025-03-30

Available Online: 2025-04-11

Publish Date: 2025-04-01

Abstract

Abstract

Objective Traditional classifiers tend to optimize overall accuracy when dealing with imbalanced data sets, often resulting in poor classification performance for minority class samples. Among the available strategies, oversampling methods are widely used due to their strong generalization ability. However, conventional oversampling techniques frequently generate new samples with high overlap rates and limited validity, particularly near decision boundaries. To address this issue, this study proposes an adaptive oversampling approach that selects sub-boundary samples—those located near the boundary samples—for sample generation. In addition, the nearest-neighbor parameter space is constrained to refine the synthetic sample region. This method improves the classifier’s performance when learning from imbalanced data sets. Methods This study first identifies the maximum safe like-neighbors of positive class samples and classifies these samples as either hazardous or safe. The local density of each sample is then calculated, and hazardous samples—those more difficult to classify—are further categorized as either boundary samples or outliers. To provide the classifier with more informative positive class samples, “sub-boundary points” are preferentially selected as root samples using a weighted composite factor. The K-value in the K-nearest neighbor algorithm is adaptively adjusted based on the maximum safe nearest neighbor of each sample to improve neighbor selection. Outliers are oversampled randomly within a hypersphere to generate new samples while minimizing increases in spatial complexity. Results and Discussions To evaluate the feasibility and generalization of the proposed method, Logistic Regression (LR) and Support Vector Machine (SVM) classifiers are employed as base classifiers. The range of the distance adjustment coefficient is first determined by comparing results across selected datasets (Table 3). Once the range is established, the effect of different weight adjustment coefficients on performance is assessed (Table 4). The proposed method is then compared with six existing oversampling techniques across 13 datasets. For most datasets, the proposed method achieves higher values in more than half of the five evaluation metrics considered (Tables 5 and 6). These results demonstrate that the proposed approach effectively improves classifier performance on imbalanced data sets. Conclusions This study introduces the maximum safe nearest neighbor number and local density to classify minority class samples into safe samples, boundary samples, and outliers. A weighted sampling probability, based on both local density and the maximum safe nearest neighbor number, is used to guide adaptive K-nearest neighbor oversampling of safe and boundary samples. Random oversampling within a hypersphere is applied to outliers to preserve informative but rare samples. Comparative experiments confirm that the proposed method performs well across datasets with varying imbalance ratios and remains competitive under highly imbalanced conditions.
- Unbalanced data,
- Over-sampling technique,
- Maximum safe nearest neighbors,
- Sub-boundary points

FullText(HTML)

References(23)

References

[1]	李艳霞, 柴毅, 胡友强, 等. 不平衡数据分类方法综述[J]. 控制与决策, 2019, 34(4): 673–688. doi: 10.13195/j.kzyjc.2018.0865. LI Yanxia, CHAI Yi, HU Youqiang, et al. Review of imbalanced data classification methods[J]. Control and Decision, 2019, 34(4): 673–688. doi: 10.13195/j.kzyjc.2018.0865.
[2]	GUO Haixiang, LI Yijing, SHANG J, et al. Learning from class-imbalanced data: Review of methods and applications[J]. Expert Systems with Applications, 2017, 73: 220–239. doi: 10.1016/j.eswa.2016.12.035.
[3]	SHIN K, HAN J, and KANG S. MI-MOTE: Multiple imputation-based minority oversampling technique for imbalanced and incomplete data classification[J]. Information Sciences, 2021, 575: 80–89. doi: 10.1016/j.ins.2021.06.043.
[4]	苏逸, 李晓军, 姚俊萍, 等. 不平衡数据分类数据层面方法: 现状及研究进展[J]. 计算机应用研究, 2023, 40(1): 11–19. doi: 10.19734/j.issn.1001-3695.2022.05.0250. SU Yi, LI Xiaojun, YAO Junping, et al. Data-level methods of imbalanced data classification: Status and research development[J]. Application Research of Computers, 2023, 40(1): 11–19. doi: 10.19734/j.issn.1001-3695.2022.05.0250.
[5]	THABTAH F, HAMMOUD S, KAMALOV F, et al. Data imbalance in classification: Experimental evaluation[J]. Information Sciences, 2020, 513: 429–441. doi: 10.1016/j.ins.2019.11.004.
[6]	CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE: Synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 16: 321–357. doi: 10.1613/jair.953.
[7]	ABDI L and HASHEMI S. To combat multi-class imbalanced problems by means of over-sampling techniques[J]. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(1): 238–251. doi: 10.1109/TKDE.2015.2458858.
[8]	CHEN Baiyun, XIA Shuyin, CHEN Zizhong, et al. RSMOTE: A self-adaptive robust SMOTE for imbalanced problems with label noise[J]. Information Sciences, 2021, 553: 397–428. doi: 10.1016/j.ins.2020.10.013.
[9]	HAN Hui, WANG Wenyuan, and MAO Binghuan. Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning[C]. The International Conference on Intelligent Computing Advances in Intelligent Computing, Hefei, China, 2005: 878–887. doi: 10.1007/11538059_91.
[10]	HE Haibo, BAI Yang, GARCIA E A, et al. ADASYN: Adaptive synthetic sampling approach for imbalanced learning[C]. 2008 IEEE International Joint Conference on Neural Networks, Hong Kong, China, 2008: 1322–1328. doi: 10.1109/IJCNN.2008.4633969.
[11]	SOLTANZADEH P and HASHEMZADEH M. RCSMOTE: Range-Controlled synthetic minority over-sampling technique for handling the class imbalance problem[J]. Information Sciences, 2021, 542: 92–111. doi: 10.1016/j.ins.2020.07.014.
[12]	XU Zhaozhao, SHEN Derong, NIE Tiezheng, et al. A cluster-based oversampling algorithm combining SMOTE and k-means for imbalanced medical data[J]. Information Sciences, 2021, 572: 574–589. doi: 10.1016/j.ins.2021.02.056.
[13]	高雷阜, 张梦瑶, 赵世杰. 融合簇边界移动与自适应合成的混合采样算法[J]. 电子学报, 2022, 50(10): 2517–2529. doi: 10.12263/DZXB.20210265. GAO Leifu, ZHANG Mengyao, and ZHAO Shijie. Mixed-sampling algorithm combining cluster boundary movement and adaptive synthesis[J]. Acta Electronica Sinica, 2022, 50(10): 2517–2529. doi: 10.12263/DZXB.20210265.
[14]	黄海松, 魏建安, 康佩栋. 基于不平衡数据样本特性的新型过采样SVM分类算法[J]. 控制与决策, 2018, 33(9): 1549–1558. doi: 10.13195/j.kzyjc.2017.0649. HUANG Haisong, WEI Jian’an, and KANG Peidong. New over-sampling SVM classification algorithm based on unbalanced data sample characteristics[J]. Control and Decision, 2018, 33(9): 1549–1558. doi: 10.13195/j.kzyjc.2017.0649.
[15]	SHI Shengnan, LI Jie, ZHU Dan, et al. A hybrid imbalanced classification model based on data density[J]. Information Sciences, 2023, 624: 50–67. doi: 10.1016/j.ins.2022.12.046.
[16]	TAO Xinmin, GUO Xinyue, ZHENG Yujia, et al. Self-adaptive oversampling method based on the complexity of minority data in imbalanced datasets classification[J]. Knowledge-Based Systems, 2023, 277: 110795. doi: 10.1016/j.knosys.2023.110795.
[17]	周玉, 岳学震, 刘星, 等. 不平衡数据集的自然邻域超球面过采样方法[J]. 哈尔滨工业大学学报, 2024, 56(12): 81–95. doi: 10.11918/202311030. ZHOU Yu, YUE Xuezhen, LIU Xing, et al. A natural neighborhood hypersphere oversampling method for imbalanced data sets[J]. Journal of Harbin Institute of Technology, 2024, 56(12): 81–95. doi: 10.11918/202311030.
[18]	LENG Qiangkui, GUO Jiamei, JIAO Erjie, et al. NanBDOS: Adaptive and parameter-free borderline oversampling via natural neighbor search for class-imbalance learning[J]. Knowledge-Based Systems, 2023, 274: 110665. doi: 10.1016/j.knosys.2023.110665.
[19]	THEJAS G S, HARIPRASAD Y, IYENGAR S S, et al. An extension of Synthetic Minority Oversampling Technique based on Kalman filter for imbalanced datasets[J]. Machine Learning with Applications, 2022, 8: 100267. doi: 10.1016/j.mlwa.2022.100267.
[20]	胡峰, 王蕾, 周耀. 基于三支决策的不平衡数据过采样方法[J]. 电子学报, 2018, 46(1): 135–144. doi: 10.3969/j.issn.0372-2112.2018.01.019. HU Feng, WANG Lei, and ZHOU Yao. An oversampling method for imbalance data based on three-way decision model[J]. Acta Electronica Sinica, 2018, 46(1): 135–144. doi: 10.3969/j.issn.0372-2112.2018.01.019.
[21]	ALCALÁ-FDEZ J, SÁNCHEZ L, GARCÍA S, et al. KEEL: A software tool to assess evolutionary algorithms for data mining problems[J]. Soft Computing, 2009, 13(3): 307–318. doi: 10.1007/s00500-008-0323-y.
[22]	LI Junnan, ZHU Qingsheng, WU Quanwang, et al. A novel oversampling technique for class-imbalanced learning based on SMOTE and natural neighbors[J]. Information Sciences, 2021, 565: 438–455. doi: 10.1016/j.ins.2021.03.041.
[23]	LI Junnan, ZHU Qingsheng, WU Quanwang, et al. SMOTE-NaN-DE: Addressing the noisy and borderline examples problem in imbalanced classification by natural neighbors and differential evolution[J]. Knowledge-Based Systems, 2021, 223: 107056. doi: 10.1016/j.knosys.2021.107056.