Swin Transformer-based Wideband Wireless Image Transmission Semantic Joint Encoding and Decoding Method

SHEN Bin; LI Xuan; LAI Xuebing; YANG Shuhan

doi:10.11999/JEIT250039

Volume 47 Issue 8

Aug. 2025

Turn off MathJax

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2025 > 47(8): 2665-2674

SHEN Bin, LI Xuan, LAI Xuebing, YANG Shuhan. Swin Transformer-based Wideband Wireless Image Transmission Semantic Joint Encoding and Decoding Method[J]. Journal of Electronics & Information Technology, 2025, 47(8): 2665-2674. doi: 10.11999/JEIT250039

Citation:

SHEN Bin, LI Xuan, LAI Xuebing, YANG Shuhan. Swin Transformer-based Wideband Wireless Image Transmission Semantic Joint Encoding and Decoding Method[J]. Journal of Electronics & Information Technology, 2025, 47(8): 2665-2674. doi: 10.11999/JEIT250039

Citation:

PDF( 3029 KB)

Swin Transformer-based Wideband Wireless Image Transmission Semantic Joint Encoding and Decoding Method

doi: 10.11999/JEIT250039 cstr: 32379.14.JEIT250039

1.
School of Communications and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
2.
Chongqing Key Laboratory of Mobile Communications Technology, Chongqing 400065, China

Funds: The National Natural Science Foundation of China (62371082)

Received Date: 2025-01-16
Rev Recd Date: 2025-04-27

Available Online: 2025-05-20

Publish Date: 2025-08-27

Abstract

Abstract

Objective Conventional studies on image semantic communication primarily address simplified channel models, such as Gaussian and Rayleigh fading channels. However, real-world wireless communication environments are characterized by complex multipath fading, which necessitates advanced signal processing at both the transmitter and receiver. To address this challenge, this paper proposes a Wideband Wireless Image Transmission Semantic Communication (WWIT-SC) system based on the Swin Transformer. The proposed method enhances image transmission performance in multipath fading channels through end-to-end semantic joint encoding and decoding. Methods The WWIT-SC system adopts the Swin Transformer as the core architecture for semantic encoding and decoding. This network not only processes semantic image representations but also improves adaptability to complex channel conditions through a joint mechanism based on Channel State Information (CSI) and Coordinate Attention (CA). CSI, a key signal in wireless systems, enables accurate estimation of channel conditions. However, due to temporal variations in wireless channels, CSI is often subject to attenuation and distortion, reducing its effectiveness when used in isolation. To address this limitation, the system incorporates a CSI-guided CA mechanism that enables fine-grained mapping and adjustment of semantic features across subcarriers. This mechanism integrates spatial and channel-domain features to localize critical information adaptively, thereby accommodating the channel’s time-varying behavior. A Channel Estimation Subnetwork (CES) is further implemented at the receiver to correct CSI estimation errors introduced by noise and dynamic channel variations. The CES enhances CSI accuracy during decoding, resulting in improved semantic image reconstruction quality. Results and Discussions The WWIT-SC and CA-JSCC models are trained under fixed Signal-to-Noise Ratio (SNR) conditions and evaluated at the same SNR values. Across all SNR levels, the WWIT-SC model consistently outperforms CA-JSCC. Specifically, Peak Signal-to-Noise Ratio (PSNR) improves by 6.4%, 8.5%, and 9.3% at different bandwidth ratios (R=1/12, 1/6, 1/3)(Fig.4). Both models are also trained using SNR values randomly selected from the range [0, 15] dB and tested at various SNR levels. Although random SNR training leads to reduced overall performance compared to fixed SNR training, WWIT-SC maintains superior performance over CA-JSCC across all conditions. Under these settings, PSNR gains of up to 6.8%, 8.3%, and 9.8% are achieved at different bandwidth ratios (R=1/12, 1/6, 1/3)(Fig. 4). Further evaluation is conducted by training both models on randomly cropped ImageNet images and testing them on the Kodak dataset. The WWIT-SC model trained on the larger dataset achieves up to a 4% PSNR improvement over CA-JSCC on Kodak (Fig. 6). A series of ablation experiments are conducted to assess the contributions of each module in WWIT-SC. First, the Swin Transformer is replaced with the Feature Learning (FL) module from CA-JSCC. Across all three bandwidth ratios, PSNR values for WWIT-SC exceed those of the modified WWIT-SC-FL variant at all SNR levels (Fig. 5(a)), confirming the importance of multi-scale feature extraction. Next, the CSI-CA module is replaced with the Channel Learning (CL) module from CA-JSCC. Again, WWIT-SC outperforms the modified WWIT-SC-CL model across all bandwidth ratios and SNR values (Fig. 5(b)), highlighting the role of the long-range dependency mechanism in enhancing feature localization and adaptation. Finally, the CES is removed to assess its contribution. The original WWIT-SC model consistently achieves higher PSNR values than the variant without CES at all bandwidth ratios and SNR levels (Fig. 5(c)), demonstrating that the inclusion of CES substantially improves channel decoding accuracy. Conclusions This paper proposes a Swin Transformer-based WWIT-SC system, integrating Orthogonal Frequency Division Multiplexing (OFDM) technology to enhance semantic image transmission under multipath fading channels. The scheme employs the Swin Transformer as the backbone for the semantic encoder-decoder and incorporates a CSI-assisted CA mechanism to accurately map critical semantic features to subcarriers, adapting to time-varying channel conditions. In addition, a CES at the receiver compensates for channel estimation errors, improving CSI accuracy. Experimental results show that, compared to CA-JSCC, the WWIT-SC system achieves up to a 9.8% PSNR improvement. This work presents a novel solution for semantic image transmission in complex broadband wireless communication environments.
- Semantic communication,
- Swin Transformer,
- Orthogonal Frequency Division Multiplexing(OFDM),
- Image transmission

FullText(HTML)

References(19)

References

[1]	SAAD W, BENNIS M, and CHEN Mingzhe. A vision of 6G wireless systems: Applications, trends, technologies, and open research problems[J]. IEEE Network, 2020, 34(3): 134–142. doi: 10.1109/MNET.001.1900287.
[2]	XIE Huiqiang, QIN Zhijin, LI G Y, et al. Deep learning enabled semantic communication systems[J]. IEEE Transactions on Signal Processing, 2021, 69: 2663–2675. doi: 10.1109/TSP.2021.3071210.
[3]	SHANNON C E. A mathematical theory of communication[J]. ACM SIGMOBILE Mobile Computing and Communications Review, 2001, 5(1): 3–55. doi: 10.1145/584091.584093.
[4]	ZHANG Zhenguo, YANG Qianqian, HE Shibo, et al. Wireless transmission of images with the assistance of multi-level semantic information[C]. International Symposium on Wireless Communication Systems (ISWCS), Hangzhou, China, 2022: 1–6. doi: 10.1109/ISWCS56560.2022.9940401.
[5]	SHAO Yulin and GUNDUZ D. Semantic communications with discrete-time analog transmission: A PAPR perspective[J]. IEEE Wireless Communications Letters, 2023, 12(3): 510–514. doi: 10.1109/LWC.2022.3232946.
[6]	JANKOWSKI M, GÜNDÜZ D, and MIKOLAJCZYK K. Deep joint source-channel coding for wireless image retrieval[C]. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020: 5070–5074. doi: 10.1109/ICASSP40776.2020.9054078.
[7]	江沸菠, 彭于波, 董莉. 面向6G的深度图像语义通信模型[J]. 通信学报, 2023, 44(3): 198–208. doi: 10.11959/j.issn.1000-436x.2023050. JIANG Feibo, PENG Yubo, and DONG Li. Deep image semantic communication model for 6G[J]. Journal on Communications, 2023, 44(3): 198–208. doi: 10.11959/j.issn.1000-436x.2023050.
[8]	DENG Zhaokai, LI Shufeng, CAI Yujun, et al. Federated learning for image semantic communication system based on CNN and Transformer[C]. International Conference on Ubiquitous Communication (UCOM), Xi’an, China, 2023: 408–414. doi: 10.1109/Ucom59132.2023.10257622.
[9]	YANG Ke, WANG Sixian, DAI Jincheng, et al. WITT: A wireless image transmission Transformer for semantic communications[C]. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 2023: 1–5. doi: 10.1109/ICASSP49357.2023.10094735.
[10]	LIU Ze, LIN Yutong, CAO Yue, et al. Swin transformer: Hierarchical vision transformer using shifted windows[C]. IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, Canada, 2021: 9992–10002. doi: 10.1109/ICCV48922.2021.00986.
[11]	YANG Mingyu and KIM H S. Deep joint source-channel coding for wireless image transmission with adaptive rate control[C]. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, Singapore, 2022: 5193–5197. doi: 10.1109/ICASSP43922.2022.9746335.
[12]	DING Mingze, LI Jiahui, MA Mengyao, et al. SNR-adaptive deep joint source-channel coding for wireless image transmission[C]. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, Canada, 2021: 1555–1559. doi: 10.1109/ICASSP39728.2021.9414037.
[13]	CHEN Weixuan, CHEN Yuhao, YANG Qianqian, et al. Deep joint source-channel coding for wireless image transmission with entropy-aware adaptive rate control[C]. IEEE Global Communications Conference, Kuala Lumpur, Malaysia, 2023: 2239–2244. doi: 10.1109/GLOBECOM54140.2023.10437482.
[14]	XU Jialong, AI Bo, CHEN Wei, et al. Wireless image transmission using deep source channel coding with attention modules[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(4): 2315–2328. doi: 10.1109/TCSVT.2021.3082521.
[15]	YANG Mingyu, BIAN Chenghong, and KIM H S. OFDM-guided deep joint source channel coding for wireless multipath fading channels[J]. IEEE Transactions on Cognitive Communications and Networking, 2022, 8(2): 584–599. doi: 10.1109/TCCN.2022.3151935.
[16]	WU Haotian, SHAO Yulin, MIKOLAJCZYK K, et al. Channel-adaptive wireless image transmission with OFDM[J]. IEEE Wireless Communications Letters, 2022, 11(11): 2400–2404. doi: 10.1109/LWC.2022.3204837.
[17]	HOU Qibin, ZHOU Daquan, and FENG Jiashi. Coordinate attention for efficient mobile network design[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, USA, 2021: 13708–13717. doi: 10.1109/CVPR46437.2021.01350.
[18]	PASZKE A, GROSS S, MASSA F, et al. PyTorch: An imperative style, high-performance deep learning library[C]. The 33rd International Conference on Neural Information Processing Systems, Vancouver, Canada, 2019: 721.
[19]	KINGMA D P and BA J. Adam: A method for stochastic optimization[C]. 3rd International Conference on Learning Representations, San Diego, USA, 2015.