Differentiable Sparse Mask Guided Infrared Small Target Fast Detection Network
-
摘要: 红外小目标检测在遥感探测、红外制导、环境监测等领域具有不可替代的应用价值,其核心挑战在于目标像素占比极小(目标尺寸通常小于9×9)、空间特征稀疏且易被复杂背景杂波淹没。现有红外小目标方法或依赖手工设计的背景抑制算子,难以适应复杂场景;或采用密集卷积神经网络,未充分考虑目标背景占比极不均衡导致的计算冗余。基于目标稀疏先验,本文提出一种可微稀疏掩膜引导的红外小目标快速检测网络。首先,设计可微稀疏掩膜生成模块作为预处理,输出目标候选区域的二值掩码,实现对目标的粗检测,并过滤大量背景冗余信息;其次,基于Minkowski Engine稀疏卷积构建稀疏特征提取模块,仅对二值掩码中的非零目标区域进行稀疏卷积运算,实现对目标候选区域的精细化处理;最后,通过金字塔池化模块进行多尺度特征融合,并将融合后的特征送入目标-背景二分类器输出最终检测结果。为验证方法有效性,在NUDT-SIRST与NUAA-SIRST两大主流红外小目标数据集上进行实验,实验结果表明,所提方法实现了在检测性能相当的情况下,实现了检测效率的极大改善,验证了所提方法的有效性。Abstract:
Objective Infrared small target detection holds significant and irreplaceable application value across various critical domains, including infrared guidance, environmental monitoring, and security surveillance. Its importance is underscored by tasks such as early warning systems, precision targeting, and pollution tracking, where timely and accurate detection is paramount. The core challenges in this domain stem from the inherent characteristics of infrared small targets: their extremely small size (typically less than 9×9 pixels), limited spatial features due to long imaging distance and the high probability of being overwhelmed by complex and cluttered backgrounds, such as cloud cover, sea glint, or urban thermal noise. These factors make it difficult to distinguish genuine targets from background clutter using conventional methods. Existing approaches to infrared small target detection can be broadly categorized into traditional model-based methods and modern deep learning techniques. Traditional methods often rely on manually designed background suppression operators, such as morphological filters (e.g., Top-Hat) or low-rank matrix recovery (e.g., IPI). While these methods are interpretable in simple scenarios, they struggle to adapt to dynamic and complex real-world environments, leading to high false alarm rates and limited robustness. On the other hand, deep learning-based methods, particularly those employing dense convolutional neural networks (CNNs), have shown improved detection performance by leveraging data-driven feature learning. However, these networks often fail to fully account for the extreme imbalance between target and background pixels—where targets typically constitute less than 1% of the entire image. This imbalance results in significant computational redundancy, as the network processes vast background regions that contribute little to the detection task, thereby hampering efficiency and real-time performance. To address these challenges, exploiting the sparsity of infrared small targets offers a promising direction. By designing a sparse mask generation module that capitalizes on target sparsity, it becomes feasible to coarsely extract potential target regions while filtering out the majority of redundant background areas. This coarse target region can then be refined through subsequent processing stages to achieve satisfactory detection performance. This paper presents an intelligent solution that effectively balances high detection accuracy with computational efficiency, making it suitable for real-time applications. Methods This paper proposes an end-to-end infrared small target detection network guided by a differentiable sparse mask. First, an input infrared image is preprocessed with convolution to generate raw features. A differentiable sparse mask generation module then uses two convolution branches to produce a probability map and a threshold map, and outputs a binary mask via a differentiable binarization function to extract target candidate regions and filter background redundancy. Next, a target region sampling module converts dense raw features into sparse features based on the binary mask. A sparse feature extraction module with a U-shaped structure (composed of encoders, decoders, and skip connections) using Minkowski Engine sparse convolution performs refined processing only on non-zero target regions to reduce computation. Finally, a pyramid pooling module fuses multi-scale sparse features, and the fused features are fed into a target-background binary classifier to output detection results. Results and Discussions To fully validate the effectiveness of the proposed method, comprehensive experiments were conducted on two mainstream infrared small target datasets: NUAA-SIRST, which contains 427 real-world infrared images extracted from actual videos, and NUDT-SIRST, a large-scale synthetic dataset with 1327 diverse images. The method was compared against 3 representative traditional algorithms (e.g., Top-Hat, IPI) and 6 state-of-the-art deep learning methods (e.g., DNA-Net, ACM). Results demonstrate the method achieves competitive detection performance: on NUAA-SIRST, it attains 74.38% IoU, 100% Pd, and 7.98×10-6 Fa; on NUDT-SIRST, it reaches 83.03% IoU, 97.67% Pd, and 9.81×10-6 Fa, matching the performance of leading deep learning methods. Notably, it excels in efficiency: with only 0.35M parameters, 11.10G Flops, and 215.06 FPS, its FPS is 4.8 times that of DNA-Net, significantly cutting computational redundancy. Ablation experiments (Fig.6 ) confirm the differentiable sparse mask module effectively filters most backgrounds while preserving target regions. Visual results (Fig.5 ) show fewer false alarms than traditional methods like PSTNN, as its "coarse-to-fine" mode reduces background interference, verifying balanced performance and efficiency.Conclusions This paper addresses the massive computational redundancy of existing dense computing methods in infrared small target detection—caused by extremely unbalanced target-background proportion (target proportion is usually smaller than 1% of the whole image)—by proposing a fast infrared small target detection network guided by a differentiable sparse mask. The network adaptively extracts candidate target regions and filters background redundancy via a differentiable sparse mask generation module, and constructs a feature extraction module based on Minkowski Engine sparse convolution to reduce computation, forming an end-to-end "coarse-to-fine" detection framework. Experiments on NUDT-SIRST and NUAA-SIRST datasets demonstrate that the proposed method achieves comparable detection performance to existing deep learning methods while significantly optimizing computational efficiency, balancing detection accuracy and speed. It provides a new idea for reducing redundancy based on sparsity in infrared small target detection, is applicable to scenarios like remote sensing detection, infrared guidance and environmental monitoring that require both real-time performance and accuracy, and offers useful references for the lightweight development of the field. -
表 1 在NUAA-SIRST和NUDT-SIRST数据下不同方法性能对比
方法 NUAA-SIRST NUDT-SIRST #Params
(M)Flops
(G)FPS IoU (%) Pd (%) Fa (×10–6) IoU (%) Pd (%) Fa (×10–6) Top-Hat[22] 7.14 79.84 1012 20.72 78.41 166.70 - - 336.36 IPI[4] 25.67 85.55 11.47 17.76 74.49 41.23 - - 0.12 PSTNN[6] 22.40 77.95 29.11 14.85 66.13 44.17 - - 5.4 MDvsFA[8] 60.30 89.35 56.35 74.14 90.47 25.34 3.92 264.96 4.72 ACM[9] 70.33 93.91 3.73 67.08 95.97 10.18 0.52 0.43 180.32 ISTDU[13] 58.83 89.91 40.63 78.80 97.04 21.51 2.76 7.44 134.28 DNA-Net[7] 76.24 97.71 12.80 87.09 98.73 4.22 4.70 14.02 45.20 RDIAN[14] 68.98 96.33 29.63 73.36 94.82 47.94 0.22 3.69 278.80 HoLoCoNet[15] 73.89 100.00 19.87 80.90 97.67 13.54 0.70 6.60 125.49 本文方法 74.38 100.00 7.98 83.03 97.67 9.81 0.35 11.10 215.06 表 2 在NUAA-SIRST数据集上对金字塔池化模块有效性验证的结果
方法 IoU (%) Pd (%) Fa (×10–6) #Params
(M)Flops
(G)FPS 消融模型 67.7 98.17 30.10 0.34 8.68 252.05 本文方法 74.38 100.00 7.98 0.35 11.10 215.06 -
[1] HAN Zonghao, ZHANG Ziye, ZHANG Shun, et al. Aerial visible-to-infrared image translation: Dataset, evaluation, and baseline[J]. Journal of Remote Sensing, 2023, 3: 0096. doi: 10.34133/remotesensing.0096. [2] WANG Qunming and HUANG Ruijie. RES-STF: Spatio temporal fusion of visible infrared imaging radiometer suite and landsat land surface temperature based on restormer[J]. Journal of Remote Sensing, 2024, 4: 0208. doi: 10.34133/remotesensing.0208. [3] 张晶晶, 曹思华, 崔文楠, 等. 基于改进顶帽变换的红外弱小目标检测[J]. 电子与信息学报, 2024, 46(1): 267–276. doi: 10.11999/JEIT221562.ZHANG Jingjing, CAO Sihua, CUI Wennan, et al. Improved top-hat transform-based algorithm for infrared dim and small target detection[J]. Journal of Electronics & Information Technology, 2024, 46(1): 267–276. doi: 10.11999/JEIT221562. [4] HAN Jinhui, MORADI S, FARAMARZI I, et al. A local contrast method for infrared small-target detection utilizing a tri-layer window[J]. IEEE Geoscience and Remote Sensing Letters, 2020, 17(10): 1822–1826. doi: 10.1109/LGRS.2019.2954578. [5] CHEN C L P, LI Hong, WEI Yantao, et al. A local contrast method for small infrared target detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2014, 52(1): 574–581. doi: 10.1109/TGRS.2013.2242477. [6] GAO Chenqiang, MENG Deyu, YANG Yi, et al. Infrared patch-image model for small target detection in a single image[J]. IEEE Transactions on Image Processing, 2013, 22(12): 4996–5009. doi: 10.1109/TIP.2013.2281420. [7] LIU Ting, YANG Jungang, LI Boyang, et al. Nonconvex tensor low-rank approximation for infrared small target detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5614718. doi: 10.1109/TGRS.2021.3130310. [8] ZHANG Landan and PENG Zhenming. Infrared small target detection based on partial sum of the tensor nuclear norm[J]. Remote Sensing, 2019, 11(4): 382. doi: 10.3390/rs11040382. [9] LI Boyang, XIAO Chao, WANG Longguang, et al. Dense nested attention network for infrared small target detection[J]. IEEE Transactions on Image Processing, 2023, 32: 1745–1758. doi: 10.1109/TIP.2022.3199107. [10] WANG Huan, ZHOU Luping, and WANG Lei. Miss detection vs. false alarm: Adversarial learning for small object segmentation in infrared images[C]. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, South Korea, 2019: 8508–8517. doi: 10.1109/ICCV.2019.00860. [11] DAI Yimian, WU Yiquan, ZHOU Fei, et al. Asymmetric contextual modulation for infrared small target detection[C]. 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, USA, 2021: 949–958. doi: 10.1109/WACV48630.2021.00099. [12] DAI Yimian, WU Yiquan, ZHOU Fei, et al. Attentional local contrast networks for infrared small target detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 59(11): 9813–9824. doi: 10.1109/TGRS.2020.3044958. [13] ZHANG Mingjin, ZHANG Rui, YANG Yuxiang, et al. ISNet: Shape matters for infrared small target detection[C]. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, USA, 2022: 867–876. doi: 10.1109/CVPR52688.2022.00095. [14] WU Xin, HONG Danfeng, CHANUSSOT J. UIU-net: U-net in U-net for infrared small object detection[J]. IEEE Transactions on Image Processing, 2023, 32: 364–376. doi: 10.1109/TIP.2022.3228497. [15] HOU Qingyu, ZHANG Liuwei, TAN Fanjiao, et al. ISTDU-Net: Infrared small-target detection U-Net[J]. IEEE Geoscience and Remote Sensing Letters, 2022, 19: 7506205. doi: 10.1109/LGRS.2022.3141584. [16] SUN Heng, BAI Junxiang, YANG Fan, et al. Receptive-field and direction induced attention network for infrared dim small target detection with a large-scale dataset IRDST[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 5000513. doi: 10.1109/TGRS.2023.3235150. [17] CHEN Gao, WANG Zhuang, WANG Weihua, et al. Holistic modularization of local contrast in the end-to-end network for infrared small target detection[J]. IEEE Geoscience and Remote Sensing Letters, 2023, 20: 7001305. doi: 10.1109/LGRS.2023.3320191. [18] ZHANG Mingjin, YUE Ke, LI Boyang, et al. Single-frame infrared small target detection via Gaussian curvature inspired network[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 5005013. doi: 10.1109/TGRS.2024.3423492. [19] REN Xiangyang, JIAO Boyang, PENG Zhenming, et al. MSFFNet: A multilevel sparse feature fusion network for infrared dim small target detection[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2025, 18: 147–159. doi: 10.1109/JSTARS.2024.3488698. [20] ZHANG Luping, LUO Junhai, HUANG Yian, et al. MDIGCNet: Multidirectional information-guided contextual network for infrared small target detection[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2025, 18: 2063–2076. doi: 10.1109/JSTARS.2024.3508255. [21] HU Chen, HUANG Yian, LI Kexuan, et al. DATransNet: Dynamic attention transformer network for infrared small target detection[J]. IEEE Geoscience and Remote Sensing Letters, 2025, 22: 7001005. doi: 10.1109/LGRS.2025.3557021. [22] LIAO Minghui, ZOU Zhisheng, WAN Zhaoyi, et al. Real-time scene text detection with differentiable binarization and adaptive scale fusion[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(1): 919–931. doi: 10.1109/TPAMI.2022.3155612. [23] SMITH L N and TOPIN N. Super-convergence: Very fast training of neural networks using large learning rates[C]. Proceedings Volume 11006, Artificial Intelligence and Machine Learning for Multi-domain Operations Applications, Baltimore, United States, 2019: 369–386. doi: 10.1117/12.2520589. [24] RIVEST J F and FORTIN R. Detection of dim targets in digital infrared imagery by morphological image processing[J]. Optical Engineering, 1996, 35(7): 1886–1893. doi: 10.1117/1.600620. [25] WU Shuanglin, XIAO Chao, WANG Yingqian, et al. Sparsity-aware global channel pruning for infrared small-target detection networks[J]. IEEE Transactions on Geoscience and Remote Sensing, 2025, 63: 5615011. doi: 10.1109/TGRS.2025.3544645. [26] CHUNG W Y, LEE I H, PARK C G. Lightweight infrared small target detection network using full-scale skip connection U-Net[J]. IEEE Geoscience and Remote Sensing Letters, 2023, 20: 7000705. doi: 10.1109/LGRS.2023.3276326. [27] KOU Renke, WANG Chunping, YU Ying, et al. LW-IRSTNet: Lightweight infrared small target segmentation network and application deployment[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 5621313. doi: 10.1109/TGRS.2023.3314586. [28] MA Tianlei, YANG Zhen, SONG Yifan, et al. DMEF-Net: Lightweight infrared dim small target detection network for limited samples[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 5626015. doi: 10.1109/TGRS.2023.3333378. [29] ZHANG Mingjin, YANG Handi, GUO Jie, et al. IRPruneDet: Efficient infrared small target detection via wavelet structure-regularized soft channel pruning[C]. Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, Vancouver, Canada, 2024: 7224–7232. doi: 10.1609/aaai.v38i7.28551. [30] LI Boyang, WANG Longguang, WANG Yingqian, et al. Mixed-precision network quantization for infrared small target segmentation[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 5000812. doi: 10.1109/TGRS.2023.3346904. [31] XIAO Chao, AN Wei, ZHANG Yifan, et al. Highly efficient and unsupervised framework for moving object detection in satellite videos[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(12): 11532–11539. doi: 10.1109/TPAMI.2024.3409824. -
下载:
下载: