Hierarchical Fusion Multi-Instance Learning for Weakly Supervised Pathological Image Classification

CHEN Xiaohe; ZHANG Jiaang; LI Lingzhi; LI Guixiu; OU Zirong; BAO Yuehua; LIU Xinxin; YU Qiuchen; MA Yuhan; ZHAO Keyu; BAI Hua

doi:10.11999/JEIT250726

Volume 48 Issue 3

Mar. 2026

Turn off MathJax

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2026 > 48(3): 1116-1127

CHEN Xiaohe, ZHANG Jiaang, LI Lingzhi, LI Guixiu, OU Zirong, BAO Yuehua, LIU Xinxin, YU Qiuchen, MA Yuhan, ZHAO Keyu, BAI Hua. Hierarchical Fusion Multi-Instance Learning for Weakly Supervised Pathological Image Classification[J]. Journal of Electronics & Information Technology, 2026, 48(3): 1116-1127. doi: 10.11999/JEIT250726

Citation:

CHEN Xiaohe, ZHANG Jiaang, LI Lingzhi, LI Guixiu, OU Zirong, BAO Yuehua, LIU Xinxin, YU Qiuchen, MA Yuhan, ZHAO Keyu, BAI Hua. Hierarchical Fusion Multi-Instance Learning for Weakly Supervised Pathological Image Classification[J]. Journal of Electronics & Information Technology, 2026, 48(3): 1116-1127. doi: 10.11999/JEIT250726

Citation:

CHEN Xiaohe, ZHANG Jiaang, LI Lingzhi, LI Guixiu, OU Zirong, BAO Yuehua, LIU Xinxin, YU Qiuchen, MA Yuhan, ZHAO Keyu, BAI Hua. Hierarchical Fusion Multi-Instance Learning for Weakly Supervised Pathological Image Classification[J]. Journal of Electronics & Information Technology, 2026, 48(3): 1116-1127. doi: 10.11999/JEIT250726

PDF( 4548 KB)

Hierarchical Fusion Multi-Instance Learning for Weakly Supervised Pathological Image Classification

doi: 10.11999/JEIT250726 cstr: 32379.14.JEIT250726

1.
The First Affiliated Hospital of Ningbo University, Ningbo 315010, China
2.
College of Artificial Intelligence, China University of Petroleum (Beijing), Beijing 102249, China
3.
Beijing Research Institute of Automation for Machinery Industry Co., Ltd., Beijing 100120, China

Funds: Ningbo Major Research and Development Plan Project (2024Z228), The Health Major Science and Technology Planning Project of Zhejiang Province, China (WKJ-ZJ-2411), The Public Welfare Projects of Ningbo, China (2022S065), The Project of Ningbo Leading Medical & Health Discipline (2022-F23)

Received Date: 2025-08-07
Accepted Date: 2026-01-22
Rev Recd Date: 2026-01-22

Available Online: 2026-02-12

Publish Date: 2026-03-10

Abstract

Abstract

Objective Cancer mortality in China continues to rise, and pathological image classification has become central to diagnosis. Pathological images have a multilevel structure, yet many existing methods focus only on the highest resolution or use simple feature concatenation for multi-scale fusion. These strategies do not make effective use of hierarchical information. In addition, most approaches rely on random pseudo-bag division to handle high-resolution images. Because cancerous regions in positive slides are sparse, random sampling often produces incorrect pseudo-labels and low signal-to-noise ratios, which reduce classification accuracy. This study proposes a Hierarchical Fusion Multi-Instance Learning (HFMIL) method that integrates multilevel feature fusion with a pseudo-bag division strategy based on an attention evaluation function to improve accuracy and interpretability in pathological image classification. Methods A weakly supervised multilevel classification method is proposed to use the hierarchical characteristics of pathological images and improve cancer image classification performance. The method has three main steps. First, multilevel features are extracted. Blank regions are removed, low-resolution images are divided into patches, and these patches are indexed to their corresponding high-resolution regions. Semantic features capture low-resolution tissue structure and high-resolution cellular detail. Second, pseudo-bags are constructed using an attention-based evaluation function. Class activation mapping is used to compute patch-level scores. Patches are ranked, and high-scoring ones are selected as potential positive samples. Low-scoring patches are discarded to maintain pseudo-label relevance. High-resolution pseudo-bags are then generated using index mapping, which reduces incorrect pseudo-labels and improves the signal-to-noise ratio. Third, a two-stage classification model is developed. Low-resolution pseudo-bags are aggregated with a gated attention mechanism for preliminary classification. A cross-attention mechanism then fuses the most informative low-resolution features with their corresponding high-resolution features. The fused representation is concatenated with aggregated high-resolution pseudo-bags to form an image-level feature vector for final prediction. Training uses a two-stage loss that combines low-resolution and overall cross-entropy losses. Experiments on three pathological image datasets confirm the effectiveness of the method in weakly supervised settings. Results and Discussions The proposed method is compared with several recent weakly supervised classification approaches, including ABMIL, CLAM, TransMIL, and DTFD, using three pathological image datasets: the publicly available Camelyon16 and TCGA-LUNG datasets and a private skin cancer dataset, NBU-Skin. The results show clear performance gains. On Camelyon16, the method achieves 88.3% accuracy and an AUC of 0.979 (Table 2). On TCGA-LUNG, accuracy reaches 86.0% and AUC 0.931 (Table 2), exceeding the comparative methods. On the NBU-Skin dataset, accuracy reaches 90.5% and AUC 0.976 for multiclass tasks (Table 2). Ablation studies further examine the necessity of the multilevel feature fusion and pseudo-bag division modules. The combination of these modules improves classification performance. On the skin cancer dataset, removing the pseudo-bag division module reduces accuracy from 93.8% to 90.7%, and removing the multilevel feature fusion module reduces accuracy further to 80.0% (Table 3). These results confirm that each component contributes to the effectiveness of the method. Conclusions A weakly supervised pathological image classification method that integrates multilevel feature fusion and an attention-based pseudo-bag division strategy is proposed. The method uses hierarchical information effectively and reduces errors caused by incorrect pseudo-labels and low signal-to-noise ratios. Experiments show consistent improvements in accuracy and AUC across three datasets. The main contributions are: (1) a multilevel feature extraction and fusion strategy that uses a cross-attention mechanism to combine features across scales; (2) an attention-based pseudo-bag division method that identifies potential positive regions and improves pseudo-label correctness through a top-k strategy while reducing background noise; and (3) superior performance compared with recent weakly supervised classifiers. Future work may include optimizing cross-level attention mechanisms, extending the framework to prognosis prediction or lesion segmentation, and developing more efficient feature extraction and fusion modules for broader clinical use.
- Pathological image,
- Multiple instance learning,
- Deep learning,
- Artificial intelligence

FullText(HTML)

References(24)

References

[1]	HAN Bingfeng, ZHENG Rongshou, ZENG Hongmei, et al. Cancer incidence and mortality in China, 2022[J]. Journal of the National Cancer Center, 2024, 4(1): 47–53. doi: 10.1016/j.jncc.2024.01.006.
[2]	姜梦琦, 韩昱晨, 傅小龙. 基于人工智能的H-E染色全切片病理学图像分析在肺癌研究中的进展[J]. 中国癌症杂志, 2024, 34(3): 306–315. doi: 10.19401/j.cnki.1007-3639.2024.03.009. JIANG Mengqi, HAN Yuchen, and FU Xiaolong. Research progress on H-E stained whole slide image analysis by artificial intelligence in lung cancer[J]. China Oncology, 2024, 34(3): 306–315. doi: 10.19401/j.cnki.1007-3639.2024.03.009.
[3]	王钰萌, 刘振丙, 刘再毅. 隐私保护的联邦弱监督组织病理学亚型分类方法[J/OL]. https://jeit.ac.cn/cn/article/doi/10.11999/JEIT250842, 2025. WANG Yumeng, LIU Zhenbing, and LIU Zaiyi. Privacy-preserving federated weakly-supervised learning for cancer subtyping on histopathology images[J/OL]. https://jeit.ac.cn/cn/article/doi/10.11999/JEIT250842, 2025.
[4]	金怀平, 薛飞跃, 李振辉, 等. 基于病理图像集成深度学习的胃癌预后预测方法[J]. 电子与信息学报, 2023, 45(7): 2623–2633. doi: 10.11999/JEIT220655. JIN Huaiping, XUE Feiyue, LI Zhenhui, et al. Prognostic prediction of gastric cancer based on ensemble deep learning of pathological images[J]. Journal of Electronics & Information Technology, 2023, 45(7): 2623–2633. doi: 10.11999/JEIT220655.
[5]	FEI Manman, ZHANG Xin, CHEN Dongdong, et al. Whole slide cervical cancer classification via graph attention networks and contrastive learning[J]. Neurocomputing, 2025, 613: 128787. doi: 10.1016/j.neucom.2024.128787.
[6]	ZHANG Jiawei, SUN Zhanquan, WANG Kang, et al. Prognosis prediction based on liver histopathological image via graph deep learning and transformer[J]. Applied Soft Computing, 2024, 161: 111653. doi: 10.1016/j.asoc.2024.111653.
[7]	LI Mingze, ZHANG Bingbing, SUN Jian, et al. Weakly supervised breast cancer classification on WSI using transformer and graph attention network[J]. International Journal of Imaging Systems and Technology, 2024, 34(4): e23125. doi: 10.1002/ima.23125.
[8]	WANG Fuying, XIN Jiayi, ZHAO Weiqin, et al. TAD-graph: Enhancing whole slide image analysis via task-aware subgraph disentanglement[J]. IEEE Transactions on Medical Imaging, 2025, 44(6): 2683–2695. doi: 10.1109/TMI.2025.3545680.
[9]	WU Kun, JIANG Zhiguo, TANG Kunming, et al. Pan-cancer histopathology WSI pre-training with position-aware masked autoencoder[J]. IEEE Transactions on Medical Imaging, 2025, 44(4): 1610–1623. doi: 10.1109/TMI.2024.3513358.
[10]	张印辉, 张金凯, 何自芬, 等. 全局感知与稀疏特征关联图像级弱监督病理图像分割[J]. 电子与信息学报, 2024, 46(9): 3672–3682. doi: 10.11999/JEIT240364. ZHANG Yinhui, ZHANG Jinkai, HE Zifen, et al. Global perception and sparse feature associate image-level weakly supervised pathological image segmentation[J]. Journal of Electronics & Information Technology, 2024, 46(9): 3672–3682. doi: 10.11999/JEIT240364.
[11]	YAN Rui, LV Zhilong, YANG Zhidong, et al. Sparse and hierarchical transformer for survival analysis on whole slide images[J]. IEEE Journal of Biomedical and Health Informatics, 2024, 28(1): 7–18. doi: 10.1109/JBHI.2023.3307584.
[12]	MA Yingfan, LUO Xiaoyuan, FU Kexue, et al. Transformer-based video-structure multi-instance learning for whole slide image classification[C]. The 38th AAAI Conference on Artificial Intelligence, Vancouver, Canada, 2024: 14263–14271. doi: 10.1609/aaai.v38i13.29338.
[13]	ILSE M, TOMCZAK J, and WELLING M. Attention-based deep multiple instance learning[C]. The 35th International Conference on Machine Learning, Stockholm, Sweden, 2018: 2127–2136.
[14]	LU M Y, WILLIAMSON D F K, CHEN T Y, et al. Data-efficient and weakly supervised computational pathology on whole-slide images[J]. Nature Biomedical Engineering, 2021, 5(6): 555–570. doi: 10.1038/s41551-020-00682-w.
[15]	SHAO Zhuchen, BIAN Hao, CHEN Yang, et al. TransMIL: Transformer based correlated multiple instance learning for whole slide image classification[C].The 35th International Conference on Neural Information Processing Systems, 2021: 164.
[16]	ZHANG Hongrun, MENG Yanda, ZHAO Yitian, et al. DTFD-MIL: Double-tier feature distillation multiple instance learning for histopathology whole slide image classification[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 18780–18790. doi: 10.1109/CVPR52688.2022.01824.
[17]	LI Bin, LI Yin, and ELICEIRI K W. Dual-stream multiple instance learning network for whole slide image classification with self-supervised contrastive learning[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 14313–14323. doi: 10.1109/CVPR46437.2021.01409.
[18]	CHEN Y C and LU C S. RankMix: Data augmentation for weakly supervised learning of classifying whole slide images with diverse sizes and imbalanced categories[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: 23936–23945. doi: 10.1109/CVPR52729.2023.02292.
[19]	LIU Pei, JI Luping, ZHANG Xinyu, et al. Pseudo-bag mixup augmentation for multiple instance learning-based whole slide image classification[J]. IEEE Transactions on Medical Imaging, 2024, 43(5): 1841–1852. doi: 10.1109/TMI.2024.3351213.
[20]	YANG Jiawei, CHEN Hanbo, ZHAO Yu, et al. ReMix: A general and efficient framework for multiple instance learning based whole slide image classification[C]. The 25th International Conference on Medical Image Computing and Computer-Assisted Intervention, Singapore, Singapore, 2022: 35–45. doi: 10.1007/978-3-031-16434-7_4.
[21]	BEJNORDI B E, VETA M, VAN DIEST P J, et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer[J]. JAMA, 2017, 318(22): 2199–2210. doi: 10.1001/jama.2017.14585.
[22]	TOMCZAK K, CZERWIŃSKA P, and WIZNEROWICZ M. The Cancer Genome Atlas (TCGA): An immeasurable source of knowledge[J]. Contemporary Oncology, 2015, 19(1A): A68–A77. doi: 10.5114/wo.2014.47136.
[23]	ZHOU S K, RUECKERT D, and FICHTINGER G. Handbook of Medical Image Computing and Computer Assisted Intervention[M]. London: Academic Press, 2020: 521–546.
[24]	LOU Wei, LI Guanbin, WAN Xiang, et al. Multi-modal denoising diffusion pre-training for whole-slide image classification[C]. The 32nd ACM International Conference on Multimedia, Melbourne, Australia, 2024: 10804–10813. doi: 10.1145/3664647.3680882.