S4-UNET: A Long-Sequence Modeling Blind Source Separation Method for Single-Channel Co-Frequency Overlapped Communication Signals
-
摘要: 针对单通道场景下通信信号盲源分离长序列建模能力不足,计算效率亟待提升;具有频偏的同频混叠通信信号分离有待进一步研究的问题,提出一种面向单通道同频混叠通信信号的盲源分离方法S4-UNET。该方法构建了融合U-NET与结构化状态空间序列模型(Structured State Space for Sequence Model, S4)的S4-UNET架构,以时序状态增强模块(Temporal State Enhancement Module, TSEM)作为编码器和解码器的主干模块初步提取混合信号特征,并在编码器奇数阶段引入S4实现高效序列建模,达成长序列的近似线性复杂度处理。通过编码器-解码器结构结合跳跃连接进行特征融合,利用上采样恢复特征分辨率。在含微小频偏的同频混叠场景中,对相同调制方式、不同调制方式及不同带宽的信号混合情况实现了分离。在仿真与实测数据集上的实验表明,与深度学习模型(ConvTasNet、CTDCRN)和经典算法(TDE-ICA)相比,所提方法的分离准确率显著提升,不仅对长序列实现了高效建模,对短序列同样有效,且在不同数据域中展现出良好的适应能力与鲁棒性。Abstract:
Objective Blind source separation of single-channel co-frequency overlapped communication signals remains a formidable challenge in non-cooperative reception scenarios. Conventional multi-channel methods are inapplicable due to antenna limitations, while existing deep learning approaches suffer from inadequate long-sequence modeling capability, prohibitive computational complexity, and unsatisfactory performance when signals exhibit small carrier frequency offsets. These limitations severely hinder the practical deployment of blind separation techniques in dense electromagnetic environments. There is therefore a critical need for an efficient and robust framework that can effectively capture long-range temporal dependencies while maintaining computational tractability. Methods The proposed S4-UNET deeply integrates the U-NET encoder-decoder framework with the Structured State Space Sequence model (S4). A Temporal State Enhancement Module (TSEM) is designed as the backbone building block for both the encoder and decoder to extract local time-frequency features through residual learning. To address the long-range dependency modeling problem, the S4 is strategically embedded in the odd-numbered stages of the encoder, leveraging its inherent capacity to capture global temporal correlations with near-linear computational complexity. The S4 transforms sequence modeling into a state-space evolution process and employs Fast Fourier Transform (FFT) for efficient convolution, complemented by skip connections and Gated Linear Units (GLU) to preserve fine-grained local details. Multi-scale feature fusion is achieved through skip connections between corresponding encoder and decoder stages, and signal resolution is progressively restored via interpolation-based upsampling. The model adaptively tokenizes feature maps either temporally or channel-wise depending on the feature scale, ensuring optimal sequence representation. Results and Discussions Experimental evaluations were conducted on extensive simulation datasets covering identical modulation mixtures, different modulation mixtures, and different bandwidth mixtures with micro frequency offsets, as well as on publicly available benchmarks and hardware-collected measured datasets. Quantitative metrics and visualizations ( Fig. 3 ,Fig. 5 ,Table 5 ) demonstrate that S4-UNET consistently outperforms representative deep learning baselines such as ConvTasNet and CTDCRN, as well as the classical TDE-ICA algorithm, across various signal lengths and modulation schemes. The model exhibits robust separation fidelity even under randomly distributed frequency offsets and phase mismatches (Table 3 ), confirming its strong generalization capacity. Ablation studies and sensitivity analyses (Table 6 ,Table 7 ,Table 8 ) reveal that the selective placement of S4 in odd encoder stages, appropriate convolutional stride configurations, and the adoption of GLU activation collectively contribute to an optimal trade-off between separation accuracy and computational efficiency. Importantly, the model maintains competitive inference latency while effectively handling both long and short sequences, underscoring its practical viability.Conclusions The proposed S4-UNET successfully addresses the core challenges of single-channel co-frequency blind source separation by synergistically combining multi-scale convolutional feature extraction with efficient state-space long-sequence modeling. It demonstrates superior separation performance, robustness against frequency offsets, and favorable generalization across diverse data domains. While the current work focuses on dual-source mixtures, the modular architecture provides a solid foundation for future extensions toward handling an unknown number of sources through integration with source enumeration and iterative cancellation strategies. -
表 1 数据集参数
序号 调制方式 L f (MHz) $ \Delta f $(Hz) $ \Delta \tau $ $ \Delta \phi $ Rs (MBd) SNR (dB) A 8PSK+8PSK 4100 20 500 0.3T π/5 5 −10:4:30 B QPSK+16APSK 4100 20 500 0.3T π/5 5 −10:4:30 C 8PSK+8PSK 4100 10 375 0.3T π/5 5,2.5 −10:4:30 D 8PSK+8PSK 4100 20 U(0,700) 0.3T U(0, π) 5 −10:4:30 E 8PSK+8PSK 8200 20 500 0.3T π/5 5 −10:4:30 F 8PSK+8PSK 4100 915 500 0.3T π/5 1 −10:4:30 表 2 模型超参数配置
阶段 特征
通道数TSEM卷积
核大小编码器TSEM
卷积块数量解码器TSEM
卷积块数量卷积步长(σi)
L=8200 卷积步长(σi)
L=4100 卷积步长(σi)
L=1024 卷积步长(σi)
L=1281 32 3 2 2 1 1 1 1 2 64 3 2 2 2 2 1 1 3 128 3 2 1 4 2 2 2 4 256 3 1 1 5 5 2 2 5 512 3 1 / 5 5 2 2 表 3 数据集A、D、F实验结果对比
模型-数据集 ρ SI-SDR SI-SIR S4-UNET A 0.852 9.49 29.46 ConvTasNet A 0.844 7.12 24.34 CTDCRN A 0.824 4.68 16.84 S4-UNET D 0.852 9.38 27.63 ConvTasNet D 0.845 7.23 24.66 CTDCRN D 0.834 5.71 19.09 S4-UNET F 0.879 8.01 31.54 ConvTasNet F 0.816 3.75 29.21 CTDCRN F 0.766 1.76 18.79 表 4 分离算法参数
ConvTasNet CTDCRN TDE-ICA 滤波器数量 64 CHE卷积核 3,1 时延嵌入维度 3 滤波器长度 32 CHE-1输出通道数 128 时延嵌入步数 1 瓶颈层通道数 128 CHE-2输出通道数 64 最大迭代次数 1000 TCN隐层通道数 256 CDCM模块堆叠数量 4 非线性函数 logcosh TCN卷积核 3 CDCM通道数 64 收敛容忍度 1e-6 每重复块卷积层 8 CDCM扩张卷积核 3 TCN重复次数 4 LSTM层数 1 输出源数量 2 LSTM隐层 64 表 5 序列建模能力对比
数据集 模型/算法 参数量 计算量(FLOPs) $ \rho $ 训练时间(Epoch/s) 推理时间(ms/sample) RML2016.10a
L=128ConvTasNet 2.21 M 1.53 G 0.765 16.4 0.294 CTDCRN 201.48 K 9.34 G 0.822 7.8 0.180 TDE-ICA 8 11.12 K 0.641 / 3.30 S4-UNET 3.55 M 21.71 G 0.828 8.9 0.321 RML2018.01a
L=1024 ConvTasNet 2.21 M 13.81 G 0.893 38.6 0.309 CTDCRN 201.48 K 74.71 G 0.888 28.2 0.217 TDE-ICA 8 161.97 K 0.621 / 3.10 S4-UNET 3.67 M 178.31 G 0.907 28.5 0.302 A
L=4100 ConvTasNet 2.21 M 55.88 G 0.844 50.6 0.288 CTDCRN 201.48 K 299.14 G 0.824 104.1 0.873 TDE-ICA 8 499.86 K 0.662 / 6.70 S4-UNET 3.61 M 240.85 G 0.852 40.4 0.402 E
L=8200 ConvTasNet 2.21 M 111.99 G 0.849 70.1 0.342 CTDCRN 201.48 K 598.28 G 0.806 216.7 1.971 TDE-ICA 8 852.93 K 0.672 / 4.33 S4-UNET 3.61 M 316.58 G 0.854 52.7 0.522 表 6 不同卷积步长实验结果
卷积步长 ρ SI-SDR 1, 2, 2, 2, 2 0.901 16.89 1, 1, 2, 2, 2 0.907 17.98 1, 1, 4, 2, 2 0.904 17.26 1, 1, 2, 4, 2 0.906 17.75 表 7 不同阶段数/卷积核大小实验结果
阶段数/
卷积核ρ SI-SDR 参数量 训练时间
(Epoch/s)推理时间
(ms/sample)3/3 0.903 15.59 423.05 K 18.5 0.162 4/3 0.904 17.33 1.6 M 24.8 0.215 5/3 0.907 17.98 3.67 M 28.5 0.302 6/3 0.902 16.20 13.91 M 40.2 0.518 5/5 0.906 18.73 5.65 M 28.9 0.347 5/7 0.905 18.39 7.64 M 30.1 0.387 表 8 不同阶段数/卷积核大小实验结果表8 S4与U-NET融合策略与激活函数消融实验结果
启用S4阶段 激活函数 ρ SI-SDR 参数量(M) 训练时间(Epoch/s) 推理时间(ms/sample) k mod 2 = 1 GLU 0.907 17.98 3.67 28.5 0.302 k mod 2 = 0 GLU 0.906 18.34 3.67 30.8 0.328 k GLU 0.904 17.26 3.89 37.7 0.364 None GLU 0.902 17.14 3.52 21.5 0.263 k mod 2 = 1 ReLU 0.909 19.22 3.63 26.3 0.347 k mod 2 = 1 None 0.908 17.39 3.63 26.6 0.319 -
[1] ZHANG Weipeng, TAIT A, HUANG Chaoran, et al. Broadband physical layer cognitive radio with an integrated photonic processor for blind source separation[J]. Nature Communications, 2023, 14(1): 1107. doi: 10.1038/s41467-023-36814-4. [2] ANSARI S, ALATRANY A S, ALNAJJAR K A, et al. A survey of artificial intelligence approaches in blind source separation[J]. Neurocomputing, 2023, 561: 126895. doi: 10.1016/j.neucom.2023.126895. [3] 邓文, 黄知涛, 王翔. 单通道通信信号盲分离方法的研究进展综述[J]. 通信学报, 2023, 44(8): 179–194. doi: 10.11959/j.issn.1000-436x.2023138.DENG Wen, HUANG Zhitao, and WANG Xiang. Overview of research progress on blind separation methods for single channel communication signal[J]. Journal on Communications, 2023, 44(8): 179–194. doi: 10.11959/j.issn.1000-436x.2023138. [4] SCHWIEGELSHOHN F, OSSOVSKI E, and HÜBNER M. A resampling method for parallel particle filter architectures[J]. Microprocessors and Microsystems, 2016, 47: 314–320. doi: 10.1016/j.micpro.2016.07.017. [5] LIU Xiaobei and GUAN Yongliang. Single-channel blind separation of unsynchronized multiuser PSK signals with non-identical sampling frequency offsets[J]. IEEE Communications Letters, 2022, 26(11): 2774–2778. doi: 10.1109/LCOMM.2022.3202538. [6] LUO Yi and MESGARANI N. Conv-TasNet: Surpassing ideal time–frequency magnitude masking for speech separation[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2019, 27(8): 1256–1266. doi: 10.1109/TASLP.2019.2915167. [7] 兰朝凤, 杨国涛, 陈英淇, 等. 时频域多尺度信息交互策略的单声道语音分离方法研究[J]. 电子与信息学报, 2025. (查阅网上资料,未找到本条文献卷期、页码信息,请确认). doi: 10.11999/JEIT251340.LAN Chaofeng, YANG Guotao, CHEN Yingqi, et al. Research on monophonic speech separation method using time-frequency domain multi-scale information interaction strategy[J]. Journal of Electronics & Information Technology, 2025. doi: 10.11999/JEIT251340. [8] HOU Xiaoqi and GAO Yong. Single-channel blind separation of co-frequency signals based on convolutional network[J]. Digital Signal Processing, 2022, 129: 103654. doi: 10.1016/j.dsp.2022.103654. [9] MA Hao, ZHENG Xiang, YU Lu, et al. A novel end‐to‐end deep separation network based on attention mechanism for single channel blind separation in wireless communication[J]. IET Signal Processing, 2023, 17(2): e12173. doi: 10.1049/sil2.12173. [10] YANG Boyi, CHEN Tao, and LEI Yu. Single-channel radar signal separation based on instance segmentation with mask optimization[J]. IEEE Transactions on Circuits and Systems II: Express Briefs, 2024, 71(5): 2879–2883. doi: 10.1109/TCSII.2024.3350662. [11] GUO Pengcheng, YU Miao, SHEN Lei, et al. Single-channel blind source separation in wireless communications: A complex-domain deep learning approach[J]. IEEE Wireless Communications Letters, 2024, 13(6): 1645–1649. doi: 10.1109/LWC.2024.3384813. [12] DENG Wen, WANG Xiang, and HUANG Zhitao. Co-channel multiuser modulation classification using data-driven blind signal separation[J]. IEEE Internet of Things Journal, 2024, 11(8): 14829–14843. doi: 10.1109/JIOT.2023.3345023. [13] LUO Jian, QIU Zhaoyang, XIAO Jian, et al. Single-channel blind source separation of co-channel communication signals: A hybrid knowledge-data driven approach[J]. IEEE Transactions on Cognitive Communications and Networking, 2026, 12: 5704–5717. doi: 10.1109/TCCN.2026.3658769. [14] LU Weitsung. WANG Juchiang, KONG Qiuqiang, et al. Music source separation with band-split rope transformer[C]. ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Republic of Korea, 2024: 481–485. doi: 10.1109/ICASSP48485.2024.10446843. [15] 付卫红, 张鑫钰, 刘乃安. 基于多尺度融合神经网络的同频同调制单通道盲源分离算法[J]. 系统工程与电子技术, 2025, 47(2): 641–649. doi: 10.12305/j.issn.1001-506X.2025.02.30.FU Weihong, ZHANG Xinyu, and LIU Naian. Single-channel blind source separation algorithm for co-frequency and co-modulation based on multi-scale fusion neural network[J]. Systems Engineering and Electronics, 2025, 47(2): 641–649. doi: 10.12305/j.issn.1001-506X.2025.02.30. [16] GU A, GOEL K, and RE C. GU A, GOEL K, and RE C. Efficiently modeling long sequences with structured state spaces[C]. Proceedings of the 10th International Conference on Learning Representations (ICLR), 2022. (查阅网上资料, 未找到本条文献出版地信息, 请确认) [17] KALMAN R. On the general theory of control systems[J]. IRE Transactions on Automatic Control, 1959, 4(3): 110. doi: 10.1109/TAC.1959.1104873. [18] ROUX J L, WISDOM S, ERDOGAN H, et al. SDR – half-baked or well done?[C]. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 2019: 626–630. doi: 10.1109/ICASSP.2019.8683855. -
下载:
下载: