Advanced Search

2025 Vol. 47, No. 7

Cover
Cover
2025, 47(7)
Abstract:
2025, 47(7): 1-4.
Abstract:
Wireless Communication and Internet of Things
Reconfigurable Intelligent Surface-empowered Covert Communication Strategies for D2D Systems
LÜ Lu, ZHENG Pengwei, YANG Long, CHEN Jian
2025, 47(7): 2023-2035. doi: 10.11999/JEIT250045
Abstract:
  Objective   The rising demand for secure communication in sensitive data transmission scenarios has increased interest in covert communication research. Existing Device-to-Device (D2D) covert communication solutions typically employ additional uncertainty mechanisms, such as artificial noise, leading to elevated energy consumption and implementation complexity. This study addresses these issues by investigating a novel covert communication strategy enabled by Reconfigurable Intelligent Surfaces (RIS). The strategy exploits RIS to enhance wireless propagation for legitimate users and simultaneously introduces controlled phase-shift uncertainty to impair eavesdropping effectiveness. The primary objective is to maximize the covert communication rate among D2D users while maintaining a low probability of detection and guaranteeing the Quality of Service (QoS) requirements for cellular users.  Methods   The proposed framework consists of an RIS-assisted D2D communication network comprising one cellular user, one pair of D2D users, and an eavesdropper aiming to detect ongoing communications. A comprehensive optimization problem is established to jointly optimize the transmit powers of both the cellular and D2D transmitters, as well as the phase shifts of the RIS, to maximize the covert communication rate for D2D users. Given the non-convex nature and highly interdependent variables within the optimization problem, an alternating optimization algorithm utilizing Gaussian randomization is developed. This algorithm iteratively determines the optimal transmission powers and RIS phase-shift configurations, adhering strictly to constraints on power consumption, RIS characteristics, and covert communication detection probabilities. Additionally, Successive Interference Cancellation (SIC) is integrated at the D2D receiver to effectively mitigate interference from cellular communications, facilitating accurate decoding of covert signals.  Results and Discussions   Simulation results confirm the efficacy of the proposed RIS-enabled covert communication strategy, showing significant performance enhancements over traditional methods. The inclusion of RIS notably improves the covert communication rate for D2D transmissions. For instance, increasing the number of RIS reflective elements enhances system performance further by introducing greater uncertainty in the received signals at the eavesdropper, thus complicating detection efforts (Fig. 8). Furthermore, it is observed that the cellular user’s transmit power inherently acts as an effective shield, increasing confusion for eavesdropping attempts and thus reducing detection accuracy.Convergence of the proposed optimization algorithm is validated through iterative simulation experiments, demonstrating stable and reliable performance across varied conditions and constraints (Fig. 4). Additionally, Monte Carlo simulations verify the accuracy of the analytical expressions derived for the minimum average detection error probability achievable by the eavesdropper, highlighting the critical role of RIS in generating sufficient energy uncertainty to ensure covert communication effectiveness (Fig. 5, Fig. 6). Comparative analyses further illustrate the superior performance of the proposed RIS-based approach relative to conventional artificial noise techniques, particularly in scenarios demanding high covert communication rates. Moreover, the integration of RIS and SIC methods demonstrates notable benefits; SIC efficiently reduces interference from cellular signals, maintaining the cellular user’s QoS without compromising the integrity of covert signals decoded at the D2D receiver.  Conclusions  This study proposes an advanced RIS-empowered covert communication strategy tailored specifically for D2D networks. The approach successfully leverages RIS-induced phase-shift uncertainty and capitalizes on cellular transmissions as natural interference sources, significantly enhancing covert communication capabilities. Through joint optimization of transmission power allocation and RIS configurations, the proposed method effectively maximizes the covert communication rate while satisfying QoS constraints for cellular users. These promising results establish a solid foundation for future exploration into active RIS-assisted communication schemes and the development of sophisticated optimization strategies aimed at further improving covert communication effectiveness.
Self-Interference Measurements and Analysis of Co-time Co-frequency Full Duplex Arrays in U6G Frequency Band
SHI Chengzhe, LI Weishi, LI Tong, PAN Wensheng, SHEN Ying, SHAO Shihai
2025, 47(7): 2036-2049. doi: 10.11999/JEIT241086
Abstract:
  Objective  The U6G frequency band spans a continuous 700 MHz bandwidth, aligning closely with the Sub-6GHz range. It offers a balance between low-frequency coverage capabilities and high-frequency capacity advantages, making it suitable for the deployment of future 5G-A and 6G systems. With the growing demand for wireless communication services and the limited availability of spectrum resources in future networks, the need for full-duplex technology has emerged. A U6G full-duplex transceiver, with sufficient transmit-receive isolation, can transmit and receive simultaneously within the same frequency band, effectively doubling spectral efficiency compared to Time-Division Duplex (TDD) or Frequency-Division Duplex (FDD) systems. However, full-duplex systems with large-scale array antennas face the challenge of complex multi-dimensional cross-coupling strong self-interference. Near-field coupling self-interference can degrade reception sensitivity, potentially leading to the saturation of low-noise amplifiers. Understanding the near-field coupling characteristics of self-interference between arrays is crucial for evaluating the proposed full-duplex industrial standards and protocols. Currently, self-interference measurements for array systems primarily focus on the millimeter-wave band, with research on array-to-array self-interference in the U6G band being relatively scarce, mostly limited to single-antenna configurations. This work utilizes an analog beamforming phased array platform capable of precise beam steering to conduct large-scale full-duplex array self-interference coupling channel measurements in the U6G band, completing nearly 3.6 million measurements. Through these self-interference coupling channel measurements between beams as well as between array elements, an in-depth analysis is provided on the angular and physical spatial distribution characteristics of transmit-receive isolation, and the inherent connection between element-to-element coupling and beam-to-beam coupling is revealed.  Methods  In this work, a 128T-128R phased array platform with analog beamforming capability is deployed in an outdoor environment. Frequency-domain measurement techniques are employed to acquire the frequency response of the self-interference coupling channels between different transmit and receive beams, as well as between array elements. The spatial and numerical distribution characteristics of the coupling self-interference are analyzed using transmit-receive isolation as the evaluation criterion. The measurement process utilizes a dual-port vector network analyzer, with stepwise frequency scanning conducted across the 66756875 MHz band (with a total measurement bandwidth of 200 MHz) to measure the frequency response of the self-interference channels. For beam-to-beam coupling channels, the azimuth sweep range is set from –60° to +60°, and the elevation sweep range is from –30° to +30°, with a step interval of 2°, resulting in a total of 3,575,881 sets of beam-to-beam coupling channel data. For element-to-element coupling channels, only one pair of transmit-receive elements is excited at a time, while all other transmit and receive elements are turned off. This measurement process covers all possible transmit-receive element pairs, yielding a total of 16,384 sets of element-to-element coupling channel data.  Results and Discussions  The analysis of the transmit-receive isolation between beams reveals that the maximum and minimum isolation between the transmit and receive beams are 52.17 dB and –6.25 dB, respectively. Approximately 95% of the isolation values fall between 10 dB and 40 dB, with a median isolation of 26.66 dB (Fig. 6). The isolation distribution between beams exhibits strong spatial symmetry and directionality (Fig. 7, Fig. 8, Fig. 9, Fig. 10). Specifically, steering the transmit and receive beams along the direction of the array causes significant variations in the self-interference coupling, with no transmit and receive beams consistently providing high or low isolation (Fig. 7). Moreover, in the U6G frequency band, the sensitivity of self-interference coupling to beam steering is much weaker than in the millimeter-wave frequency band (Fig. 9). Therefore, relying on beam steering to reduce self-interference in the U6G frequency band may be inefficient and result in suboptimal performance.The analysis of the transmit-receive isolation between array elements indicates that the maximum and minimum isolation between transmit and receive elements are 88.08 dB and 54.48 dB, respectively. Approximately 95% of the isolation values fall between 63 dB and 76 dB, with a median isolation of 69.43 dB (Fig. 11). Even with the same element spacing, the isolation between elements is not necessarily the same; multiple isolation mappings exist for the same distance (Fig. 12). This many-to-one mapping relationship is likely due to differences in multipath propagation between the positions of the transmit and receive elements, as well as amplitude-phase inconsistencies in the transmit and receive chains. Furthermore, by assigning beamforming weights to the non-directional element-to-element coupling channels, the transmit-receive isolation between beams can be reliably predicted. This approach accurately reproduces the self-interference coupling between the transmit and receive beams and, compared to the spherical wave model, better captures the realistic characteristics of self-interference in both spatial and numerical distributions (Fig. 13, Fig. 14).  Conclusions  This study examines the near-field self-interference coupling characteristics of U6G full-duplex array systems through nearly 3.6 million measurements. The measurement and analysis results demonstrate that the isolation distribution between transmit and receive beams exhibits strong spatial symmetry and directionality. In contrast to the millimeter-wave frequency band, self-interference coupling in the U6G band shows weaker sensitivity to beam steering. Therefore, relying solely on beam steering to reduce self-interference is insufficient to achieve the required receiver sensitivity, necessitating the adoption of additional active or passive spatial self-interference suppression techniques. In some cases, a combination of RF and digital-domain self-interference suppression techniques may also be necessary.Furthermore, measurements of the self-interference coupling channels between transmit and receive array elements reveal that multiple isolation mappings exist for the same element spacing, which cannot be accurately described by traditional spherical wave models. In particular, by assigning beamforming weights to the non-directional element-to-element coupling channels, the self-interference coupling characteristics between beams can be replicated, and the array isolation can be accurately predicted. These measurement and analysis results provide essential insights for the design of U6G full-duplex communication systems and lay the foundation for future work on self-interference channel modeling, beamforming optimization, and self-interference suppression.
Low-Complexity Transform Domain Orthogonal Time Frequency Space Channel Equalization Algorithm
LIAO Yong, LIU Shuang, LI Xue
2025, 47(7): 2050-2061. doi: 10.11999/JEIT250013
Abstract:
  Objective  Orthogonal Time Frequency Space (OTFS) modulation is a key technique for high-mobility communication systems, offering robustness against severe Doppler shifts and multipath fading. It provides notable advantages in dynamic environments such as vehicular networks, high-speed rail communications, and unmanned aerial vehicle systems, where conventional orthogonal frequency-division multiplexing fails due to rapid channel variations and dense scattering. However, standard equalization algorithms, including Zero Forcing (ZF) and Minimum Mean Square Error (MMSE), are often ineffective in mitigating Inter-Symbol Interference (ISI) and Inter-Doppler Interference (IDI) under rich-scatterer conditions. These methods also require large-scale matrix inversion, resulting in prohibitively high computational complexity, particularly on OTFS grids with high dimensionality (e.g., M = 32 subcarriers, N = 16 symbols per frame). Most existing studies adopt single-scatterer models that do not reflect the interference structure in practical multipath channels. This study proposes a low-complexity transform domain OTFS equalization algorithm that incorporates block matrix decomposition, transform domain diagonalization, and decision feedback strategies. The algorithm aims to (1) reduce complexity by exploiting block sparsity and structural features of the Delay-Doppler (DD) domain channel matrix, (2) improve interference suppression in time-varying Doppler and dense scattering environments, and (3) validate performance using the 3GPP Extended Vehicular A (EVA) channel model, which simulates realistic high-speed scenarios with user velocities ranging from 121.5 km/h to 607.5 km/h and multiple scattering paths.  Methods  The proposed algorithm operates in three key stages: (1) Block-Wise ISI Elimination: Leveraging the block-sparse structure of the DD-domain channel matrix, the algorithm partitions the channel into submatrices, each corresponding to a specific DD component. Guard intervals are introduced to suppress ISI arising from signal dispersion across the OTFS grid. Each submatrix K m,l is modeled as a Toeplitz circulant matrix, enabling iterative cancellation of interference by subtracting previously estimated symbols. (2) Transform Domain Diagonalization: Each Toeplitz circulant submatrix is diagonalized using Fourier-based operations. Specifically, the normalized FFT matrix F N is applied to K m,l , converting it into a diagonal form and transforming complex matrix inversion into element-wise division. This step reduces the computational complexity of MMSE equalization from \begin{document}$\mathcal{O}\left( M^3{{N^3}} \right)$\end{document} to \begin{document}$\mathcal{O}\left( {{N^3}} \right)$\end{document}, where N denotes the Doppler dimension of the OTFS resource grid. (3) Decision Feedback Refinement: A closed-loop decision feedback mechanism is introduced to iteratively improve symbol estimates. The demodulated symbols are re-modulated and fed back to update the channel matrix, thereby enhancing estimation accuracy and lowering pilot overhead. The algorithm is evaluated using the 3GPP EVA channel model, which reflects practical high-speed communication scenarios with user velocities between 121.5 km/h and 607.5 km/h, time-varying Doppler shifts, and multiple scatterers. Key system parameters include 32 subcarriers (M = 32), 16 symbols per frame (N = 16), and modulation formats ranging from QPSK to 64QAM.  Results and Discussions  The performance of the proposed algorithm is evaluated against ZF, MMSE, Message Passing (MP), Maximal Ratio Combining (MRC), and Hybrid MP (HMP) detectors under scenarios: Complexity reduction. The algorithm achieves a computational complexity of \begin{document}$\mathcal{O}\left( {{N^3}} \right)$\end{document}, markedly lower than that of ZF/MMSE and MP. Transform domain diagonalization simplifies matrix inversion into element-wise division, thereby eliminating \begin{document}$\mathcal{O}\left( {{N^3}} \right)$\end{document} operations.Interference Suppression: the algorithm yields a 2.5 dB Bit Error Ratio (BER) improvement over ZF and MMSE at 15 dB SNR under 16QAM modulation. The decision feedback mechanism further reduces the Normalized Mean Square Error (NMSE) by 12.5 dB while lowering pilot overhead by 50%. In high-speed scenarios, the algorithm maintains superior performance, outperforming MRC and HMP by 1.7 dB and 1.0 dB, respectively, under 64QAM modulation. Modulation Robustness: The algorithm consistently demonstrates performance gains across QPSK, 16QAM, and 64QAM. At high SNR with 64QAM, BER gains of 1.7 dB, 1.5 dB, and 1.0 dB are achieved over MRC, MP, and HMP, respectively. Transform domain processing efficiently diagonalizes the channel matrix and eliminates IDI, which is critical in scatterer-rich environments where non-diagonal components dominate interference.Practical Validation: Simulations using the 3GPP EVA model confirm the algorithm’s applicability in real-world high-mobility settings.  Conclusions  This study presents a low-complexity approach to OTFS channel equalization, addressing both computational and interference challenges in high-mobility scenarios. By leveraging the block-sparse structure of the DD-domain channel matrix and applying Fourier-based diagonalization, the algorithm achieves near-linear complexity while maintaining competitive BER performance. The decision feedback mechanism further enhances robustness, enabling adaptive channel estimation with reduced pilot overhead.Key contributions include: Block-sparse matrix decomposition that facilitates sequential ISI elimination through the use of guard intervals and Toeplitz circulant structures.Fourier-based diagonalization that replaces matrix inversion with element-wise division, reducing computational complexity by orders of magnitude.A closed-loop decision feedback scheme that improves NMSE by 12.5 dB while halving the required pilot overhead.Simulation results under the 3GPP EVA model confirm the algorithm’ suitability for high-speed applications, such as vehicular networks and high-speed rail communications. Future work will explore extensions to large-scale Multiple-Input Multiple-Output (MIMO) systems, adaptive channel tracking, and multi-user interference suppression, with the aim of integrating this framework into 6G URLLC systems.
Communication Jamming Effectiveness Evaluation: q-Rung Orthopair Fuzzy Set and CoCoSo-BM Fusion Framework
NING Xiaoyan, WANG Xiangchen, YANG Jian, CHEN Zengmao
2025, 47(7): 2062-2072. doi: 10.11999/JEIT241140
Abstract:
  Objective  With the advancement of modern warfare and the development of communication countermeasure technologies, the evaluation of jamming efficiency is essential for supporting decision-making in anti-jamming strategy selection. It also plays a critical role in enabling intelligent countermeasures in unmanned systems. However, existing evaluation methods often lack comprehensiveness and exhibit low sensitivity. To address these limitations, this study proposes an evaluation method based on the q-rung orthopair fuzzy set and the Combined Compromise Solution (CoCoSo)-Bonferroni Mean (BM) approach, inspired by the recently developed CoCoSo algorithm in mathematical statistics. By integrating q-rung fuzzy ordinals and constructing a structured jamming efficiency evaluation framework, the proposed method improves the completeness and rationality of evaluation results. This approach provides a more effective means of assessing communication jamming efficiency.  Methods  The proposed evaluation method establishes a comprehensive index system for assessing communication jamming efficiency by integrating the characteristics of both communication systems and jamming techniques. The criterion layer is constructed based on anti-jamming technologies, while the index layer incorporates key indicators such as amplitude variation, frequency-domain dispersion, and time-domain coincidence degree. A combined weighting scheme is adopted using the Analytic Hierarchy Process (AHP) and the maximum deviation method to ensure both subjective and objective consistency. To overcome the limitations of single-dimensional evaluation, the q-rung orthopair fuzzy set is introduced, enabling more flexible representation of uncertainty. The CoCoSo method is applied to aggregate multiple scoring strategies, thereby enhancing the reliability of the evaluation results. To address the CoCoSo method’s limitation in parameter weighting, the BM operator is integrated into the model, ensuring a more balanced and comprehensive assessment.  Results and Discussions  Based on the established evaluation system, the proposed method utilizes simulation data (Figs. 36) and communication system parameters to construct the evaluation matrix. Standard normalization and q-rung fuzzy number transformation are applied to the collected data. The index weights are calculated using the combined AHP and maximum deviation methods. The CoCoSo-BM approach is then employed to comprehensively evaluate the index values under the specified communication environment. Comparative analysis with alternative evaluation methods (Fig. 7) indicates that the proposed method achieves higher discrimination capability across different communication scenarios. Sensitivity analysis (Fig. 8) further confirms its strong differentiation capacity. Evaluation results under varying q values, compromise coefficients, and communication environments (Figs. 911) demonstrate the method’s stability and robustness.  Conclusions  This study addresses the limitations of incompleteness and low discrimination in communication jamming efficiency evaluation by proposing a novel method based on the q-rung orthopair fuzzy set and CoCoSo-BM. By applying the q-rung orthopair fuzzy set—previously unused in communication efficiency assessments—and incorporating the BM operator into the CoCoSo framework, the proposed method enhances evaluation completeness and consistency. Simulation results under varied conditions demonstrate that: (1) the method achieves higher sensitivity than existing approaches; (2) the evaluation remains stable across different q values and compromise coefficients, enabling adaptive parameter selection; and (3) performance under varying signal-to-jamming ratios, system configurations, and jamming conditions confirms its robustness and adaptability in practical scenarios.
Research on Low-Power Transmission Method for Group-Connected Beyond-Diagonal Reconfigurable Intelligent Surface-assisted Communication Systems
WANG Hong, LI Peiqi, LI Heyi, WANG Peiyu
2025, 47(7): 2073-2079. doi: 10.11999/JEIT241029
Abstract:
  Objective  Reconfigurable Intelligent Surface (RIS) technology enables dynamic reconfiguration of the wireless communication environment. Among recent advancements, Beyond-Diagonal RIS (BD-RIS) has emerged as a novel architecture, featuring a phase-shift matrix unconstrained by diagonal form. This allows simultaneous adjustment of phase and amplitude, offering greater design flexibility and improved system performance. However, while prior studies have primarily focused on BD-RIS-assisted downlink systems, the uplink counterpart remains unexplored. Unlike downlink transmission, where only the total base station power is constrained, uplink transmission imposes individual power limitations on each user, necessitating different optimization models. Therefore, existing downlink-oriented design approaches cannot be directly applied to uplink scenarios. This study proposes a low-power transmission method tailored for BD-RIS-assisted uplink systems, addressing the unique constraints and challenges of uplink-communication.  Methods  This study investigates a group-connected BD-RIS-assisted uplink communication system to minimize total transmit power by jointly optimizing the equalizer, user transmit power, and BD-RIS phase-shift matrix. The Minimum Mean-Square Error (MMSE) equalizer is employed to maximize the Signal-to-Interference-plus-Noise Ratio (SINR) of each received signal. Subsequently, an analytical expression linking user transmit power and the phase-shift matrix is derived. The phase-shift optimization problem is then reformulated as an unconstrained univariate optimization problem. Finally, an alternating optimization approach is applied to iteratively refine the equalizer, user transmit power, and BD-RIS phase-shift matrix, achieving minimal system transmit power.  Results and Discussions  The proposed scheme is compared with benchmark methods, and simulation results demonstrate its superior convergence (Fig. 2) and performance (Figs. 3 and 4). The group-connected BD-RIS achieves lower total transmit power than the traditional single-connected RIS (Figs. 3 and 4) due to its ability to manipulate both the phase and amplitude of signals, leading to enhanced system performance. Furthermore, larger group sizes in the group-connected BD-RIS result in improved performance (Figs. 3 and 4), as increased group size provides greater design flexibility, further optimizing system efficiency.  Conclusions  To address the limitations of existing BD-RIS research, this study investigates a group-connected BD-RIS-assisted uplink communication system and proposes a method to minimize total transmit power. An optimization problem is formulated to minimize user transmit power, and an alternating optimization approach is employed to iteratively refine the equalizer, user transmit power, and BD-RIS phase-shift matrix. Specifically, the MMSE equalizer maximizes each user’s SINR, a closed-form expression for user power is derived, and the phase-shift optimization problem is transformed into an unconstrained single-variable optimization problem, achieving minimal system power consumption. Simulation results indicate that, compared with the traditional single-connected RIS, the group-connected BD-RIS achieves lower system transmit power, with performance improving as group size increases. This study assumes perfect channel state information; however, in practical RIS-assisted communication systems, accurately obtaining ideal channel state information is challenging. Future research should consider the effects of non-ideal channel state information.
Cooperative Spectrum Sensing Method Against Spectrum Sensing Data Falsification Attacks Based on Multiscale Entropy
WANG Anyi, GONG Jianchao, ZHU Tao
2025, 47(7): 2080-2088. doi: 10.11999/JEIT241091
Abstract:
  Objective  With the rapid development of 5G and Internet of Things (IoT) technologies and the increasing number of devices accessing wireless networks, Cognitive Radio (CR) technology offers an effective solution to alleviate spectrum resource scarcity. CR allows Secondary Users (SU) to perform spectrum sensing and share the Primary User (PU) frequency band. However, Cooperative Spectrum Sensing (CSS) is vulnerable to Spectrum Sensing Data Falsification (SSDF) attacks by Malicious Users (MU), which degraded sensing performance. Existing anti-SSDF algorithms, while reducing the effects of SSDF attacks, face challenges in detecting MU under complex attack strategies. This study proposes the use of multiscale entropy to enhance anti-SSDF attack schemes. By updating the reputation value of SU sensing results through multiscale analysis, the detection performance and MU detection rate of the CSS algorithm under various attack strategies are significantly improved. This work provides a solution to the problem of SSDF attacks under complex strategies and offers a theoretical foundation for CSS technology in areas with scarce spectrum resources.  Methods  The multiscale entropy algorithm calculates the reputation value of the SU using a sliding window model. It extracts effective features from the local sensing results of the SU and converts them into weights, which are then used to update the SU’s reputation value. The final reputation value is compared with a threshold to identify MU. The sliding window model collects SU sensing results from different time slots and computes the reputation value by comparing them with the Fusion Center (FC) judgment results. A normalization function processes the updated reputation value to derive the final global decision. A higher reputation value indicates that the SU’s local sensing result is more reliable, while a lower reputation value suggests the user may be an MU.  Results and Discussions  The multiscale entropy algorithm performs multiscale analysis of the SU perception results based on a sliding window model. This approach mitigates the impact of MU on CSS system performance by extracting effective features to counter Independent Attacks (IA) and Collaborative Attacks (CA). Simulation results show that CA affects CSS performance more significantly than IA (Fig. 3, Fig. 4). The proposed algorithm effectively identifies MUs under both attack strategies (Fig. 7, Fig. 9), demonstrating its effectiveness. Additionally, the algorithm exhibits low complexity (Table 1). Under both IA and CA, when the attack probability exceeds 0.4, the MU detection rate improves by an average of 3.56%, 0.77%, and 6.45%, 36.92%, respectively, compared to the baseline algorithms. These results highlight the strong anti-attack capability of the proposed algorithm.  Conclusions  This paper addresses the SSDF attacks problem in CSS. The detection capability of MU is constrained by the reliability of the global decision. To mitigate this issue, a CSS method based on multiscale entropy is proposed. The method, built on the sliding window model, utilizes multiscale entropy to extract feature information that enhances judgment accuracy, thereby updating the reputation value and improving the global decision. Simulation results demonstrate that the proposed algorithm exhibits strong resistance to SSDF attacks and performs well in MU detection under both IA and CA, with lower complexity. This approach is particularly suitable for regions with scarce spectrum resources, ensuring the reliable operation of CR systems and enabling efficient spectrum utilization by effectively identifying MU. Future work will explore the application of deep learning techniques in CSS against SSDF attacks, aiming to further enhance network performance.
A Beamforming Combined Iterative Dual-Maximum Ratio Combining Detection Algorithm for Orthogonal Time Frequency Space Systems
PEI Errong, JI Xianghui, SUN Yuanxin, LI Wei
2025, 47(7): 2089-2097. doi: 10.11999/JEIT241035
Abstract:
  Objective  The rapid advancement of wireless communication has introduced new waveform and modulation requirements for high-mobility scenarios such as vehicular networks, high-speed railways, and Low-Earth Orbit (LEO) satellites. Traditional Orthogonal Frequency Division Multiplexing (OFDM) performs poorly in such environments due to severe Inter-Carrier Interference (ICI). To address this, Orthogonal Time Frequency Space (OTFS), a two-dimensional modulation scheme that maps data in the Delay-Doppler (DD) domain, has been proposed. OTFS transforms complex Time-Frequency (TF) domain channels into sparse DD domain representations and has demonstrated improved performance over OFDM under high mobility. Signal detection plays a critical role in realizing OTFS benefits, and extensive studies have focused on DD-domain sparsity-based detection algorithms. However, in complex scenarios—such as urban vehicular networks, drone formations, and multi-user MIMO systems—DD-domain sparsity is often absent. This condition significantly increases detection complexity and degrades accuracy at the receiver.  Methods  A beamforming combined iterative Dual-Maximum Ratio Combining (Dual MRC) detection algorithm is proposed for OTFS systems (Algorithm 1). The approach utilizes a multi-antenna array and a beamforming network at the receiver to initially separate signals arriving from different angles within the multipath channel. This separation enhances channel matrix sparsity and provides diversity gain. By leveraging the computational simplicity of OTFS signals in the Delay-Time (DT) domain, the algorithm coherently combines multipath components within each beamforming branch and iteratively across branches. This process gradually refines the signal estimate and converges toward the optimal transmitted signal.  Results and Discussions  Simulation results show that the proposed algorithm significantly improves Bit Error Rate (BER) performance compared with several conventional detection methods. In particular, relative to the beamforming Message Passing-MRC (MP-MRC) algorithm, it achieves better BER performance (Fig. 2) and reduces both the number of iterations and the iteration time required to reach convergence (Fig. 3). The algorithm also maintains robust BER performance as terminal mobility increases (Fig. 4), as the DD grid size \begin{document}$ N \times M $\end{document} expands (Fig. 5), and as the number of channel paths \begin{document}$ L $\end{document} grows (Fig. 6). Furthermore, compared with MP-MRC, the proposed method reduces computational complexity by two orders of magnitude while further improving detection performance (Table 2, Fig. 2).  Conclusions  This study addresses the limitations of existing OTFS detection algorithms using multi-antenna and beamforming receivers, which often suffer from high computational complexity or limited accuracy. A beamforming combined iterative Dual MRC detection algorithm operating in the DT domain is proposed to enhance receiver performance. Simulation results show that the proposed method substantially improves BER performance compared with conventional algorithms. In particular, relative to the beamforming-based MP-MRC algorithm, it achieves a marked reduction in computational complexity while improving detection accuracy. These results indicate that the proposed algorithm offers an effective and computationally efficient solution for OTFS signal detection in complex communication environments.
Rank-Two Beamforming Algorithm Based on Alternating Optimization Assisted by Intelligent Reflecting Surface
ZHOU Kai, YU Lan, GUO Qiang
2025, 47(7): 2098-2107. doi: 10.11999/JEIT241107
Abstract:
  Objective  To address the limitations of current optimization methods for Intelligent Reflecting Surface (IRS)-aided communication systems—such as high computational complexity, lack of closed-form solutions, and real-time transmission constraints—this study proposes an efficient joint active-passive beamforming algorithm to improve spectral efficiency and real-time performance. As the number of users increases, conventional rank-1 beamforming lacks sufficient design flexibility, highlighting the need for advanced approaches to avoid performance bottlenecks. This challenge is central to the practical deployment of large-scale Multiple-Input Single-Output (MISO) systems.  Methods  A hierarchical optimization framework is proposed to resolve the non-convex design problem in IRS-assisted MISO systems. A joint beamforming model is developed for downlink multi-user scenarios, incorporating Alamouti Space–Time Block Coding (STBC) and rank-2 beamforming to maximize the Weighted Sum Rate (WSR) under total power and IRS unit modulus constraints. The framework jointly optimizes the transmit and reflection matrices to improve spectral efficiency. To address the non-convexity of the formulation, an alternating optimization strategy is adopted. At the base station, a Weighted Minimum Mean-Square Error (WMMSE) algorithm is applied to refine the rank-2 beamforming design, and ensure efficient power allocation. For IRS phase shift optimization, an improved Riemannian Gradient Algorithm (RGA) is proposed. This algorithm integrates restart mechanisms and dynamic scaling vector transmission to accelerate convergence by avoiding local optima. Step size sensitivity is reduced using relaxed Wolfe conditions, which improves computational efficiency without loss of global optimality.  Results and Discussions  The improved Riemannian gradient optimization algorithm achieves faster convergence and markedly higher WSR performance, attributed to the incorporation of restart strategies and dynamic scaling vector transmission mechanisms, outperforming conventional algorithms (Fig. 3). The proposed rank-2 beamforming scheme yields substantially better system performance than traditional rank-1 techniques (Fig. 3). Simulations further evaluate the effect of varying the number of IRS reflection elements. Across different configurations, the proposed algorithm consistently enhances WSR and outperforms benchmark algorithms (Fig. 4). In addition, it maintains robust performance under varying base station transmit power levels and antenna counts, with rank-2 beamforming preserving clear advantages over rank-1 designs (Fig. 5, Fig. 6). Finally, simulation results identify optimal IRS deployment positions. System performance peaks when the IRS is placed near the base station or users, whereas intermediate placement leads to performance degradation, highlighting the critical role of deployment strategy in practical applications (Fig. 7).  Conclusions  This study addresses the problem of spectral efficiency maximization in IRS-aided communication systems by proposing a joint rank-2 beamforming and alternating optimization framework. For transmit-side optimization, the WMMSE algorithm is applied to enable efficient power allocation in the rank-2 beamforming design. In parallel, an improved RGA is developed for optimizing the IRS phase shift matrix. This algorithm incorporates adaptive initial step selection based on relaxed Wolfe conditions and integrates restart strategies to avoid local optima. Simulation results confirm that the proposed framework achieves faster convergence and higher user sum rate performance compared to conventional algorithms. Moreover, rank-2 beamforming consistently provides superior system efficiency relative to traditional rank-1 methods across a range of scenarios.
Research on Fast Iterative TDOA Localization Method Based on Spatial Grid Gradients
WANG Jie, WU Linghao, BU Xiangxi, LI Hang, LIANG Xingdong
2025, 47(7): 2108-2116. doi: 10.11999/JEIT241105
Abstract:
  Objective  In various application scenarios such as Unmanned Aerial Vehicle (UAV) formation, emergency rescue, and low-altitude intelligent networks, passive localization technologies that offer low latency and high precision are of significant practical value. The Time Difference of Arrival (TDOA) localization method is widely adopted for wireless signal source localization due to its ability to operate without requiring the target to actively transmit a signal and its strong adaptability to different environments. Among the various methods used to enhance the accuracy of TDOA localization, the Taylor Iterative Method has gained significant popularity. However, this method requires the calculation of a Taylor expansion for each iteration, resulting in a high computational load. This computational burden often leads to issues such as poor real-time performance and degraded accuracy, which hinder the application of TDOA localization technology in low-latency engineering contexts. To overcome these challenges, this paper proposes a novel TDOA rapid iterative localization method based on spatial grid gradients. The proposed method can significantly reduce computational time while maintaining high levels of localization accuracy.  Methods  The proposed approach is based on the concept of spatial gridization, incorporating insights derived from the inherent gradient relationships between neighboring grids. These relationships are leveraged to integrate the grid framework into an iterative compensation model. This integration addresses the performance limitations associated with grid width in traditional gridization algorithms, thereby enhancing the efficiency of the iterative localization process. The overall computational process is divided into two distinct stages: preprocessing and iterative localization. The preprocessing stage occurs during the system’s initialization phase and includes constructing the spatial grid, calculating the TDOA gradients between grid points, and establishing the grid-based iterative matrix. Once this preprocessing is complete, the results are stored and readily accessible for future localization processes. During the localization stage, the precomputed iteration matrix is directly invoked and the initial value for the target’s position. The method then calculates and compensates for the deviation between the initial value and the actual target position. By employing a grid-based approach, the significant computational workload typically encountered during iterative localization is shifted to the preprocessing phase. This leads to a marked reduction in localization time, significantly improving computational efficiency.  Results and Discussions  To validate the effectiveness and performance of the proposed algorithm, simulations and field experiments are conducted. The results are compared with those of the classic spatial gridization algorithm and the Taylor Iterative Method. It is observed that the classic spatial gridization algorithm experiences a significant loss in localization accuracy as the grid width increases, accompanied by a dramatic increase in computation time. In contrast, the proposed algorithm remains unaffected by grid width and outperforms the traditional spatial gridization method in both localization accuracy and computation time (Fig. 3). A deeper comparison of the proposed algorithm with the Taylor Iterative Method is made by analyzing the effects of TDOA estimation errors, initial value errors, and iteration thresholds on the performance of both algorithms. Specifically, under varying TDOA estimation errors, the proposed algorithm reduces the average computation time by 76% compared to the Taylor Iterative Method, while maintaining similar localization accuracy (Fig. 4). Under varying initial value errors, the proposed algorithm reduces average computation time by 78%, with comparable localization accuracy (Fig. 5). As the iteration threshold increases, both algorithms experience a slight reduction in localization accuracy; however, their overall performance remains similar. In this scenario, the proposed algorithm still reduces computation time by approximately 76% when compared to the Taylor Iterative Method (Fig. 6). To further verify the applicability of the proposed algorithm in real-world scenarios, field experiments are also conducted. The field test results confirm the validity of the proposed method, demonstrating a 78% reduction in computation time compared to the Taylor Iterative Method, while maintaining comparable localization accuracy (Table 2).  Conclusions  The proposed TDOA fast iterative localization method, based on spatial grid gradients, effectively reduces computational complexity while maintaining localization accuracy. This method is well-suited for high real-time passive localization applications. It significantly enhances both the efficiency and practicality of TDOA localization systems. Future work will focus on expanding the applicability of this algorithm by integrating it with other localization techniques, such as Time of Arrival (TOA), Angle of Arrival (AOA), and Frequency Difference of Arrival (FDOA). This integration is expected to facilitate the development of low-altitude economic activities and contribute to advancing the capabilities of localization technologies.
Joint Local Linear Embedding and Deep Reinforcement Learning for RIS-MISO Downlink Sum-Rate Optimization
SUN Jun, YANG Junlong, YANG Qingqing, HU Mingzhi, WU Ziyi
2025, 47(7): 2117-2126. doi: 10.11999/JEIT241083
Abstract:
  Objective  Reconfigurable Intelligent Surfaces (RISs) enhance signal transmission efficiency for large-scale user networks by adaptively controlling signal propagation paths. In RIS-assisted Multiple Input Single Output (MISO) systems, Deep Reinforcement Learning (DRL) is widely employed to jointly optimize Base Station (BS) beamforming and RIS phase shifts. However, the channel state space expands quadratically with the number of users, leading to increased training overhead and reduced algorithm efficiency. To address this challenge, the LLE-SAC algorithm is proposed, in which Local Linear Embedding (LLE) is integrated for dimensionality reduction with the Soft Actor-Critic (SAC) algorithm for policy optimization. This joint framework aims to improve system throughput and training efficiency by reducing the complexity of the channel state representation, thereby enabling the construction of a scalable and intelligent communication system for RIS-assisted MISO in multi-user scenarios.  Methods  The LLE-SAC algorithm models the wireless environment as a cascaded channel comprising the links between the BS, RIS, and user equipment. To reduce the dimensionality of the high-dimensional channel state, the algorithm searches for the optimal number of neighboring nodes and low-dimensional features based on the principle of minimizing reconstruction error. These parameters are selected through a randomized search strategy to ensure minimal information loss during dimensionality reduction. The LLE algorithm is then applied using the identified optimal parameters to map the original high-dimensional state into a low-dimensional representation. Parameter selection in LLE is constrained to preserve the local geometric structure of the nonlinear channel data and achieve efficient dimensionality reduction. The resulting low-dimensional state, combined with the BS transmission power and user equipment receive power, forms the input state space for the SAC algorithm. Within the SAC framework, the state space comprises the reduced-dimension representation of the cascaded channel and the BS beamforming and RIS phase shift matrix from the previous time step. The action space consists of the current BS beamforming vectors and RIS phase shifts. The reward function is defined as the sum rate of the RIS-assisted MISO system, guiding the agent to iteratively optimize its beamforming strategy. By leveraging both channel state abstraction and historical control parameters, the agent dynamically selects actions that maximize the system sum rate under complex multi-user conditions.  Results and Discussions  The LLE-SAC algorithm reduces the dimensionality of the high-dimensional cascaded channel state. It then computes the BS beamforming vectors and RIS phase shifts based on the resulting low-dimensional representation to maximize the sum rate of the RIS-assisted MISO system. Simulation results demonstrate that LLE-SAC effectively identifies the optimal number of neighboring nodes and low-dimensional features to minimize reconstruction error (Fig. 6, Fig. 7). For a system with 30 users, the minimum reconstruction error reaches 0.061 when the number of neighboring nodes is set to 2 and the dimensionality is reduced to 15, compressing the state space from 7092 to 960. In terms of training overhead (Fig. 8), the LLE-SAC algorithm reduces training time by 18.3% and computational resource usage by 64.8% relative to the conventional SAC algorithm when the user count reaches 40. This efficiency gain increases with user scale, further reducing training overhead in large-scale scenarios. Under high transmission power (Fig. 9), the LLE-SAC algorithm achieves a higher sum rate than both the alternating optimization and semi-definite relaxation algorithms, while maintaining comparable performance to SAC. The algorithm also scales effectively with the number of transmit antennas, achieving increased sum rates and reduced inter-user interference, further confirming its effectiveness. Moreover, in ten independent runs using different random seeds (Fig. 10), the LLE-SAC algorithm consistently yields optimal sum rate performance, demonstrating both robustness and stability.  Conclusions  The proposed method addresses the challenge of high-dimensional channel states, which significantly increase the training overhead in RIS-assisted MISO systems, by integrating the LLE algorithm with the SAC framework. This integration enables effective dimensionality reduction of the cascaded channel state, thereby lowering training costs while maintaining sum rate performance. The simulation results demonstrate three key findings. First, when the number of users reaches 40, the LLE-SAC algorithm reduces training time by 18.3% and computational resource consumption by 64.8% compared to the SAC algorithm. Second, under increasing transmission power, the proposed method achieves superior sum rate performance relative to conventional optimization methods and performs comparably to SAC. Third, across different antenna configurations, the LLE-SAC algorithm yields improved sum rates with increasing transmission power, demonstrating its robustness and scalability. Future work will explore the application of the LLE-SAC algorithm in edge computing environments with large-scale user access.
Link State Awareness Enhanced Intelligent Routing Algorithm for Tactical Communication Networks
SHI Huaifeng, ZHOU Long, PAN Chengsheng, CAO Kangning, LIU Chaofan, LV Miao
2025, 47(7): 2127-2139. doi: 10.11999/JEIT241132
Abstract:
  Objective   Operational concept iteration, combat style innovation, and the emergence of new combat forces are accelerating the transition of warfare toward intelligent systems. In this context, tactical communication networks must establish end-to-end transmission paths through heterogeneous links, including ultra-shortwave and satellite communications, to meet differentiated routing requirements for multi-modal services sensitive to latency, bandwidth, and reliability. Existing Deep Reinforcement Learning (DRL)-based intelligent routing algorithms primarily use single neural network architectures, which inadequately capture complex dependencies among link states. This limitation reduces the accuracy and robustness of routing decisions under time-varying network conditions. To address this, a link state perception-enhanced intelligent routing algorithm (DRL-SGA) is proposed. By capturing spatiotemporal dependencies in link state sequences, the algorithm improves the adaptability of routing decision models to dynamic network conditions and enables more effective path selection for multi-modal service transmission.  Methods   The proposed DRL-SGA algorithm incorporates a link state perception enhancement module that integrates a Graph Neural Network (GNN) and an attention mechanism into a Proximal Policy Optimization (PPO) agent framework for collecting network state sequences. This module extracts high-order features from the sequences across temporal and spatial dimensions, thereby addressing the limited global link state awareness of the PPO agent’s Fully Connected Neural Network (FCNN). This enhancement improves the adaptability of the routing decision model to time-varying network conditions. The Actor-Critic framework enables periodic interaction between the agent and the network environment, while an experience replay pool continuously refines policy parameters. This process facilitates the discovery of routing paths that meet heterogeneous transmission requirements across latency-, bandwidth-, and reliability-sensitive services.  Results and Discussions   The routing decision capability of the DRL-SGA algorithm is evaluated in a simulated network comprising 47 routing nodes and 61 communication links. Its performance is compared with that of five other routing algorithms under varying traffic intensities. The results show that DRL-SGA provides superior adaptability to heterogeneous network environments. At a traffic intensity of 100 kbit/s, DRL-SGA reduces latency by 14.42~33.57% compared with the other algorithms (Figure 4). Network throughput increases by 2.51~23.41% (Figure 5). In scenarios characterized by resource constraints or topological changes, DRL-SGA consistently maintains higher service quality and greater adaptability to fluctuations in network state (Figures 712). Ablation experiments confirm the effectiveness of the individual components within the link state perception enhancement module in improving the algorithm’s perception capability (Table 3).  Conclusions   A link state perception-enhanced intelligent routing algorithm (DRL-SGA) is proposed for tactical communication networks. By extracting high-order features from link state sequences across temporal and spatial dimensions, the algorithm addresses the limited global link state awareness of the PPO agent’s FCNN. Through the Actor-Critic framework and periodic interactions between the agent and the network environment, DRL-SGA enables iterative optimization of routing strategies, improving decision accuracy and robustness under dynamic topology and link conditions. Experimental results show that DRL-SGA meets the differentiated transmission requirements of heterogeneous services—latency-sensitive, bandwidth-sensitive, and reliability-sensitive, while offering improved adaptability to variations in network state. However, the algorithm may exhibit delayed convergence when training samples are insufficient in rapidly changing environments. Future work will examine the integration of diffusion models to enrich training data and accelerate convergence.
Radar, Navigation and Array Signal Processing
High Precision Large Aperture Array Calibration Method for Residual Separation of Near-field Effects in Darkroom
XU Libing, LIU Kaixin
2025, 47(7): 2140-2148. doi: 10.11999/JEIT241084
Abstract:
  Objective  Direction estimation is a critical aspect of array signal processing technology. Array errors are inevitably introduced during the manufacturing and installation processes. To mitigate the negative effects of these errors on the accuracy and resolution of direction estimation, arrays must be calibrated before deployment. In practical engineering applications, active array calibration is the primary method, and performing calibration in a darkroom, which shields against electromagnetic wave interference, enhances calibration performance. However, the size of the darkroom is limited. As the array aperture increases, the distance between the calibration source and the array may fail to meet the far-field condition, leading to the introduction of nonlinear near-field phase components in the received signal. Moreover, the absence of a precise positioning system in the darkroom may result in the actual installation position of the calibration source differing from its nominal position, further compromising calibration performance. To address these issues, this paper proposes a high-precision large aperture array calibration method that separates residual near-field effects in the darkroom. This method eliminates the phase residuals caused by deviations in the calibration source position and the near-field distance of the source, effectively calibrates the amplitude and phase errors of large aperture arrays in the darkroom, and thereby improves the accuracy of direction estimation.  Methods  The proposed method utilizes the nominal coordinates of the calibration source to compensate for the phase difference caused by the near-field distance and derives the formula for the near-field effect residual resulting from source position deviations. Next, an array amplitude and phase error estimation technique at nominal coordinates is proposed to solve for the array amplitude and phase error matrix containing the phase residual. This technique constructs a cost function based on the orthogonality between the subspace spanned by the array manifold vector with amplitude-phase errors and the noise subspace of the array’s received signals. The least squares method is then applied to solve for the low-precision array error estimation results. To enhance precision, this method employs a near-field effect residual separation technique to separate the phase residuals from the solved array error matrix, thereby achieving high-precision array error estimation results. Through differential operations, the technique verifies that the near-field effect residual of each element in a uniform linear array is approximately proportional to the element’s serial number. High-precision array calibration improves the accuracy of direction estimation.  Results and Discussions  This paper proposes a high-precision near-field calibration method for large-aperture uniform linear arrays. The method addresses the calibration problem under near-field conditions and mitigates the negative effects of calibration source position deviation on the active calibration algorithm. It requires only a single calibration source, demonstrating both innovation and practicality.In Section 5, the performance of the proposed method is analyzed in detail through simulation. First, Fig. 2 verifies an important conclusion of this algorithm: the near-field effect residual of each element in a uniform linear array is approximately proportional to the element’s serial number. Figs. 4 to 7 examine the influences of various factors on array calibration performance, including calibration source position deviation, array aperture, distance between the calibration source and the array, and signal frequency. All four factors significantly impact the magnitude of the near-field residual. Specifically, increasing source position deviation, array aperture, and signal frequency, as well as decreasing the distance of the calibration source, will all increase the phase residual, which negatively affects array calibration. In Fig. 4, the proposed method demonstrates greater tolerance to severe source position deviations, maintaining high accuracy in array error estimation even under such conditions. Fig. 5 shows that increasing the number of array elements, which is equivalent to enlarging the array aperture, increases the near-field effect residual. However, this method effectively removes the residual, aiding in achieving high-precision array error estimation and restoring high-precision direction estimation for the large-aperture array. Fig. 6 investigates the influence of the distance between the calibration source and the array. Without removal of the near-field effect residual, the array error estimation accuracy rapidly decreases as the distance decreases. However, the proposed method ensures that the accuracy decreases only slowly. When the near-field distance ranges from 20m to 50m, the accuracy remains nearly unchanged. This simulation clearly demonstrates the effectiveness of the method in removing near-field effect residuals. High-frequency signals theoretically provide excellent direction estimation accuracy. However, higher signal frequencies lead to more severe near-field phase residuals, and inefficient array calibration can further degrade direction estimation accuracy. In Fig. 7, without removing the near-field phase residual, the direction estimation accuracy does not improve even with an increase in signal frequency. Fortunately, after removing the near-field effect residual with the proposed method, high-precision direction estimation performance is restored for high-frequency signals.  Conclusions  This paper addresses the challenge that large-aperture arrays often fail to satisfy the far-field distance condition in a darkroom. Additionally, due to instrument measurement errors, obtaining precise calibration source position coordinates in the darkroom is difficult, which complicates array calibration. To address these issues, a high-precision large-aperture array calibration method for residual separation of near-field effects in a darkroom is proposed. This method not only achieves near-field array calibration but also mitigates the phase errors caused by source position deviation, resulting in high-precision estimation of array amplitude and phase errors.The method compensates for the phase difference of the near-field path and constructs a cost function to estimate array amplitude and phase errors using the orthogonality of the signal subspace. The calibration source position deviation introduces near-field effect phase residuals. This paper analyzes the relationship between the phase residual and the array element serial number, solves for and removes the phase residual, thus obtaining high-precision array error estimation results.Simulation results demonstrate that the proposed method has a high tolerance for source position deviation. Even under conditions of large array aperture, high signal frequency, and close proximity between the calibration source and the array, the method remains effective in calibrating the array and significantly improves the accuracy of direction estimation.
Estimation Method of Target Propeller Parameters under Low Signal-to-noise Ratio
HAN Chuang, LENG Bing, LAN Chaofeng, XING Bowen
2025, 47(7): 2149-2162. doi: 10.11999/JEIT240790
Abstract:
  Objective  Accurate estimation of underwater propeller parameters—such as blade number, blade length, and rotational speed—is critical for target identification in marine environments. However, low Signal-to-Noise Ratio (SNR) conditions, caused by complex underwater clutter and ambient noise, substantially degrade the performance of conventional micro-Doppler feature extraction methods. Existing approaches, including Fourier Transform (FT), wavelet analysis, and Hilbert-Huang Transform (HHT), are limited in handling non-stationary signals and are highly susceptible to noise, leading to unreliable parameter estimation. To address these limitations, this study proposes a method that integrates Complex Variational Mode Decomposition (CVMD) for signal denoising with Orthogonal Matching Pursuit (OMP) for sparse parameter estimation. The combined approach improves robustness against noise while maintaining computational efficiency. This method contributes to advancing underwater acoustic target recognition in interference-rich environments and offers a theoretical basis for improving the reliability of marine detection systems.  Methods  The proposed method integrates CVMD and OMP to improve the estimation of propeller parameters in low-SNR environments. The approach consists of three sequential phases: signal decomposition and denoising, time-frequency feature extraction, and sparse parameter estimation. This structure enhances robustness to noise while maintaining computational efficiency. CVMD extends conventional Variational Mode Decomposition (VMD) to the complex domain, enabling adaptive decomposition of propeller echo signals into Intrinsic Mode Functions (IMFs) with preserved spectral symmetry. Unlike standard VMD, which cannot process complex-valued signals directly, CVMD treats the real and imaginary parts of the noisy signal separately. The decomposition is formulated as a constrained optimization problem, where IMFs are iteratively extracted by minimizing the total bandwidth of all modes. A correlation-based thresholding scheme is then used to identify and discard noise-dominated IMFs. The remaining signal-related IMFs are reconstructed to obtain a denoised signal, effectively isolating micro-Doppler features from background clutter. Time-frequency analysis is subsequently applied to the denoised signal to extract key scintillation parameters, including blade parity, scintillation intervals, and the maximum instantaneous micro-Doppler frequency. These parameters are used as prior information to constrain the parameter search space and reduce computational burden. Blade parity, inferred from the symmetry of the time-frequency distribution, narrows the candidate blade number range by half. Scintillation intervals and frequency bounds are also used to define physical constraints for sparse estimation. A sparse dictionary is constructed using Sinusoidal Frequency-Modulated (SFM) atoms, each corresponding to a candidate blade number. The OMP algorithm iteratively selects the atom most correlated with the residual signal, updates the sparse coefficient vector, and refines the residual until convergence. Incorporating prior information into dictionary design significantly reduces its dimensionality, transforming a multi-parameter estimation problem into an efficient single-parameter search. This step allows precise estimation of the blade number with minimal computational cost. Once the blade number is determined, the blade length and rotational speed are derived analytically using the relationships between the micro-Doppler frequency, scintillation period, and geometric parameters of the propeller.  Results and Discussions  The proposed CVMD-OMP framework demonstrates robust performance in propeller parameter estimation under low-SNR conditions, as verified through comprehensive simulations. The denoising efficacy of CVMD is illustrated by the reconstruction of distinct time-frequency features from heavily noise-corrupted propeller echoes (Fig. 10). By decomposing the complex-valued signal into IMFs and retaining only signal-dominant components, CVMD achieves a 12.4 dB improvement in SNR and reduces the Mean Square Error (MSE) to 0.009 at SNR = –10 dB, outperforming conventional methods such as EMD-WT and CEEMDAN-WT (Table 3). Time-frequency analysis of the denoised signal reveals clear periodic scintillation patterns (Fig. 11), which enable accurate extraction of blade parity and scintillation intervals. Guided by these prior features, the OMP algorithm achieves 91.9% accuracy in blade number estimation at SNR = –10 dB (Table 4). Accuracy improves progressively with increasing SNR, reaching 98% at SNR = 10 dB, highlighting the method’s adaptability to varying noise levels. The sparse dictionary, refined through prior-informed dimensionality reduction, maintains high precision while minimizing computational complexity. Comparative evaluations confirm that OMP outperforms CoSaMP and Subspace Pursuit (SP) in both estimation accuracy and computational efficiency. The execution time is reduced to 1.73 ms for single-parameter estimation (Fig. 15, Table 5). Parameter estimation consistency is further validated through the calculation of blade length and rotational speed. At SNR = –10 dB, the Mean Absolute Error (MAE) for blade length is 0.021 m, and 0.31 rad/s for rotational speed (Table 6). Both errors decrease significantly with improved SNR, demonstrating the method’s robustness across diverse noise conditions. The framework remains stable in multi-blade configurations, with extracted time-frequency characteristics closely matching theoretical expectations (Figs. 2 and 3). The integration of CVMD and OMP effectively balances accuracy and computational efficiency under low-SNR conditions. By leveraging prior-informed dimensionality reduction, the framework achieves a 90% reduction in computational load relative to conventional techniques. Future research will extend this framework to multi-target environments and validate its performance using real-world underwater acoustic datasets.  Conclusions  This study addresses the challenge of estimating underwater propeller parameters under low SNR conditions by proposing a novel framework that integrates CVMD and OMP. CVMD demonstrates strong capability in suppressing noise while preserving key micro-Doppler features, allowing reliable extraction of target signatures from severely corrupted signals. By incorporating time-frequency characteristics as prior knowledge, OMP enables accurate and efficient blade number estimation, substantially reducing computational complexity. The proposed framework shows high adaptability to varying noise levels and propeller configurations, ensuring robust performance in complex underwater environments. Its balance between estimation accuracy and computational efficiency supports real-time application in acoustic target recognition. The consistency of results with theoretical models further supports the method’s physical interpretability and practical relevance. Future work will extend this approach to multi-target scenarios and validate its effectiveness using experimental acoustic datasets, advancing the deployment of model-driven methods in real-world marine detection systems.
Joint Design of Integrated Sensing And Communication Waveforms and Receiving Filters
LIU Tao, LI Xiangxuan, LI Yubo
2025, 47(7): 2163-2171. doi: 10.11999/JEIT241082
Abstract:
  Objective  With the continuous increase in wireless communication traffic and the growing scarcity of spectrum resources, the overlapping frequency bands of communication and radar systems have led to mutual interference. Therefore, Integrated Sensing And Communication (ISAC) has emerged as a critical area of research. A key technology in this field is the design of ISAC waveforms, which is of significant research interest. The Doppler resilience and low sidelobe levels of ISAC waveforms are essential for effective target detection and information transmission in ISAC scenarios. However, designing waveforms that optimize both radar and communication performance presents substantial challenges. To address these challenges, a method for the joint design of ISAC waveforms and receiving filters is proposed. An ISAC model based on mismatched filtering is proposed, and an optimization problem is formulated. The Iterative Twisted appROXimation (ITROX) algorithm is presented to solve this nonconvex problem with guaranteed convergence. This approach enables the design of unimodular ISAC waveforms with Doppler resilience, achieving enhanced performance in both communication and radar functions.  Methods  To design ISAC waveforms that optimize radar and communication performance, the concept of mismatched filtering is introduced to formulate an optimization problem. The requirements for Doppler-resilient ISAC waveforms are first analyzed, followed by the proposal of a waveform model based on mismatched filtering. An optimization problem is then formulated, with the objective of minimizing the Weighted Integrated Sidelobe Level (WISL) and the Loss-in-Processing Gain (LPG). Constraints include the unimodular property of the transmitted waveform, the phase difference between the transmitted ISAC waveform and the communication data-modulated waveform, and the energy of the receiving filter. To solve this nonconvex optimization problem, the task is transformed into identifying a suitable Mismatched Filtering Sequences Pair (MFSP) under multiple constraints. An ISAC waveform design algorithm based on an improved ITROX framework is proposed to simplify the optimization process. The core concept of the ITROX algorithm is to iteratively search for the optimal projection of the matrix set, with the goal of maximizing the main lobe and minimizing the sidelobes within the region of interest. This approach minimizes WISL and LPG, satisfying the objective function requirements. Additionally, the combination of the three constraints ensures that the waveform meets both communication and radar sensing requirements. The SQUAREd Iterative Method (SQUAREM) is employed to improve the algorithm's convergence speed. The balance between WISL and LPG is controlled by adjusting the coefficients.  Results and Discussions  The ITROX-based ISAC waveform design method proposed in this paper effectively solves the formulated optimization problem, resulting in unimodular ISAC waveforms with Doppler resilience. Compared to existing ISAC waveform methods, the proposed ISAC waveform demonstrates a lower sidelobe level and Symbol Error Rate (SER) within the region of interest, with only a minor sacrifice in LPG. This leads to significant improvements in both radar sensing and communication performance. Simulation results validate the effectiveness of the proposed ISAC waveforms. These results show that the proposed method exhibits excellent convergence, with WISL rapidly converging to a stable value as iterations increase (Fig. 1). When the LPG coefficient is set to 0.9, a low sidelobe level and SER are achieved, despite a processing gain loss of 0.91 dB (Fig. 2, Fig. 3). For the same phase difference threshold, the proposed ISAC waveform exhibits a lower SER than existing methods, indicating superior communication performance (Fig. 4). When comparing the ISAC waveform designed by this method to existing methods with the same time-delay interval width, the proposed waveform demonstrates a lower sidelobe level, with sidelobes nearly zero, approaching ideal correlation performance (Fig. 5, Fig. 6). This leads to significant improvements in target detection by the ISAC system. Furthermore, the proposed ISAC waveform exhibits excellent Doppler resilience, maintaining low sidelobe levels within the given Doppler interval (Fig. 6), which contributes to improved target detection performance.  Conclusions  This paper proposes a method for the joint design of ISAC waveforms and receiving filters based on Doppler resilience. By integrating the concept of mismatched filtering with the ISAC model, an optimization problem is formulated to minimize WISL and LPG without compromising communication quality. Additionally, an improved ITROX algorithm is proposed to effectively solve the formulated nonconvex optimization problem. The results demonstrate that the proposed scheme maintains near-ideal correlation performance within the region of interest under specified Doppler intervals, with only a minor sacrifice in LPG, and enables communication with a low SER. Compared to existing ISAC waveform methods, the proposed ISAC waveform exhibits a lower sidelobe level and SER, showing superior radar sensing and communication performance. Furthermore, low sidelobe levels can be achieved in one or more regions of interest to meet different requirements by appropriately adjusting the weighting coefficient. Future work could explore more efficient optimization algorithms to design ISAC waveforms with enhanced Doppler resilience.
Aerospace Tracking Telemetry and Command on-the-spot Access Technology for low-orbit and High-density Satellite Constellations
DONG Guangliang, HAO Wanhong, ZHANG Guoting, TANG Da, GUO Jie
2025, 47(7): 2172-2182. doi: 10.11999/JEIT240466
Abstract:
  Objective   The rapid proliferation of Low Earth Orbit (LEO) mega-constellations has introduced significant challenges to traditional aerospace Tracking, Telemetry, and Command (TT&C) systems. These systems struggle to meet the growing demands of large-scale satellite operations due to limited capacity and inefficient resource management. This study proposes an innovative aerospace TT&C on-the-spot access technology, inspired by mobile communication systems, to address these challenges. The key objectives include enabling automatic, “base station-like” access for massive LEO satellite constellations, decoupling access and service links to enhance system scalability, and establishing a distributed, end-to-end TT&C network capable of supporting over 20,000 satellites in orbit.  Methods   The proposed technology integrates three core components: (1) Full-Time Panoramic Beam Coverage: Ground stations employ full airspace antenna arrays to generate low-power, continuous panoramic beams, ensuring 7~24 h signal coverage. This setup enables satellites entering a station’s coverage zone to automatically establish bidirectional links via dedicated access control channels. (2) Hybrid Multiple Access Scheme: A combination of Space Division Multiple Access (SDMA) and Spread spectrum ALOHA Multiple Access (SAMA) is employed. SDMA partitions the airspace into sectors using phased-array beams, while SAMA utilizes pseudo-random sequences and time-slotted ALOHA to resolve contention among satellites. This hybrid approach optimizes complexity, backward compatibility, and scalability. (3) Distributed Network Architecture: The system shifts from a centralized, schedule-driven model to a decentralized framework, enabling TT&C users to interact directly with satellites via ground stations. Key innovations include: (1) Dedicated Access Control Channels: Separate from service links, these channels manage handshaking, status reporting, and emergency requests. (2) Channelized Beamforming: Flexible beam control allows dynamic resource allocation and ensures compatibility between legacy parabolic antennas and new phased-array systems. (3) Parallel Acquisition: Parallel sampling-processing pipelines and multi-channel frequency search strategies enhance acquisition through correlated parallelism and buffering techniques. Experimental validation demonstrates a success rate exceeding 95% for 8-user concurrent acquisition under typical SNR conditions, effectively balancing Doppler dynamics, timing, SNR thresholds, and hardware constraints amid interference. System performance is assessed using a Poisson process model to simulate satellite arrival rates and collision probabilities, with Monte Carlo simulations for link reliability and capacity estimation.  Results and Discussions   Compared to traditional single-beam TT&C systems, the proposed technology eliminates the need for manual scheduling, reduces latency, and enables parallel service for hundreds of satellites. The hybrid SDMA-SAMA scheme mitigates the "near-far effect" and signal collisions, achieving an optimal balance between complexity and performance. Integration with existing infrastructure, such as legacy antennas, ensures cost-effectiveness and facilitates gradual deployment. Under typical conditions, the success probability for a single satellite to gain access to the ground station exceeds 99.75% within 10 seconds (after two transmission attempts).   Conclusions   This study introduces a groundbreaking solution to the TT&C challenges posed by mega-constellations. The on-the-spot access technology redefines satellite-ground interactions by emulating mobile communication principles, facilitating automatic, distributed, and scalable operations. Key achievements include: (1) Decoupling Access and Service Links: This architectural shift effectively resolves capacity bottlenecks inherent in traditional systems. (2) Hybrid Multiple Access: The SDMA-SAMA combination ensures backward compatibility while supporting future expansions. (3) Operational Flexibility: Both ground operators and satellites can initiate TT&C sessions, enhancing responsiveness in emergency scenarios. Future work will focus on integrating artificial intelligence for predictive resource allocation and extending the framework to relay satellite systems for global coverage. The proposed system represents a significant advancement toward efficient, autonomous, and large-scale space infrastructure management.
Cryption and Network Information Security
Bootstrapping Optimization Techniques for the FINAL Fully Homomorphic Encryption Scheme
ZHAO Xiufeng, WU Meng, SONG Weitao
2025, 47(7): 2183-2193. doi: 10.11999/JEIT241036
Abstract:
  Objective  Bootstrapping is a fundamental process in Fully Homomorphic Encryption (FHE) that directly affects its practical efficiency. The FINAL scheme, presented at ASIACRYPT 2022, achieves a 28% improvement in bootstrapping speed compared with TFHE, demonstrating high suitability for homomorphic Boolean operations. Nevertheless, further improvements are required to reduce its computational overhead and storage demands. This study aims to optimize the bootstrapping phase of FINAL by lowering its computational complexity and key size while preserving the original security level.  Methods  This study proposes two key optimizations. Accumulator compression for blind rotation: A blockwise binary distribution is incorporated into the Learning With Errors (LWE) key generation process. By organizing the key into blocks, each requiring only a single external product, the number of external product operations during blind rotation is reduced. Key reuse strategy for key switching: The LWE key is partially reused during the generation of the Number-theoretic Gadget Switching (NGS) key. The reused portion is excluded from the key switching key, thereby reducing both the key size and the number of associated operations.  Results and Discussions  Under equivalent security assumptions, the optimized FINAL scheme yields substantial efficiency gains. For blind rotation, the number of external product operations is reduced by 50% (from 610 to 305), and the number of Fast Fourier Transform (FFT) operations is halved (from 3,940 to 1,970) (Table 5). For key switching, the key size is reduced by 60% (from 11,264 to 4,554), and the computational complexity decreases from 13.8 × 106 to 5.6× 106 scalar operations (Table 6).  Conclusions  The proposed optimizations substantially improve the efficiency of the FINAL scheme’s bootstrapping phase. Blind rotation benefits from structured key partitioning, reducing the number of core operations by half. Key switching achieves comparable reductions in both storage requirements and computational cost through partial key reuse. These enhancements improve the practicality of FHE for real-world applications that demand efficient evaluation of Boolean circuits. Future directions include hardware acceleration and adaptive parameter tuning.
Dual-Memristor Brain-Like Chaotic Neural Network and Its Application in IoMT Data Privacy Protection
LIN Hairong, DUAN Chenxing, DENG Xiaoheng, Geyong Min
2025, 47(7): 2194-2210. doi: 10.11999/JEIT241133
Abstract:
  Objective  In recent years, frequent breaches of medical data have posed significant threats to patient privacy and health security, highlighting the urgent need for effective solutions to protect medical data privacy and security during transmission. This paper proposes a novel data privacy protection method for the Internet of Medical Things (IoMT) based on a dual-memristor-inspired brain-like chaotic neural network to address this challenge.  Methods  Leveraging the synaptic bionic characteristics of memristors, a dual-memristor brain-like chaotic neural network model based on the Hopfield neural network is developed. The complex chaotic dynamics of this model are thoroughly analyzed using nonlinear dynamics tools, including bifurcation diagrams, Lyapunov exponent spectra, phase portraits, time-domain waveforms, and basins of attraction. To validate its practicality and reliability, a hardware platform is created using a Microcontroller Unit (MCU), and hardware experiments confirm the model’s complex dynamic behaviors. Based on this model, an efficient IoMT data privacy protection method is designed by utilizing the complex chaotic properties of the dual-memristor brain-like chaotic neural network. A comprehensive security analysis of the encryption of colored medical image data is also performed.  Results and Discussions  The results demonstrate that the proposed network not only exhibits complex grid-like multi-structure chaotic attractors but also possesses the capability to regulate planar initial condition displacements, significantly enhancing its potential for cryptographic applications. Experimental findings indicate that this method performs exceptionally well across key metrics, including a large key space, low pixel correlation, high key sensitivity, and strong robustness against noise and data loss attacks.  Conclusions  This study presents an innovative and effective solution for protecting medical data privacy in IoMT environments, providing a solid foundation for the development of secure technologies in intelligent healthcare systems.
A Network Traffic Anomaly Detection Method Integrating Unsupervised Adaptive Sampling with Enhanced Siamese Network
YIN Zinuo, CHEN Hongchang, MA Hailong, HU Tao, BAI Luxin
2025, 47(7): 2211-2224. doi: 10.11999/JEIT241115
Abstract:
  Objective  The increasing complexity of network architectures and the rising frequency of cyberattacks have heightened the demand for effective network traffic anomaly detection. While machine learning and deep learning approaches have been widely applied, their effectiveness is often limited by the class imbalance commonly observed in network traffic data. To address this limitation, this study proposes a network traffic anomaly detection method integrating unsupervised adaptive sampling with enhanced Siamese network. An adaptive sampling algorithm is developed to balance the distribution of normal and anomalous traffic, improving the representativeness of training data. A Siamese Multi-Layer Perceptron (SMLP) model is then trained using a robust loss function to capture both similarities and differences in traffic patterns. This architecture enhances the model’s ability to identify anomalies, particularly under class-imbalance conditions. The proposed framework provides a scalable and data-efficient approach for improving the accuracy of network anomaly detection and reinforcing cybersecurity.  Methods  The proposed K-medoids-based Adaptive Few-shot Sampling (KAFS) algorithm applies unsupervised K-medoids clustering to group traffic data within each category based on feature distributions, forming multiple subclasses. From these, a small number of representative samples are adaptively selected to construct a balanced few-shot training set. This approach maintains a proportionate representation of normal and attack traffic, reducing model bias toward the dominant normal class and ensuring more effective learning across categories. Sample quality is further improved by prioritizing representativeness during selection. For the constructed training set, a traffic anomaly detection model based on a SMLP is designed. The model’s loss function combines encoding loss from the MLP with a prediction loss defined by the distance between anchor samples and corresponding normal or malicious samples. This structure enables the model to distinguish both similarities and subtle differences in traffic behavior, thereby enhancing the accuracy of attack traffic detection.  Results and Discussions  The proposed network traffic anomaly detection method, which integrates unsupervised adaptive sampling with an enhanced Siamese network, demonstrates strong performance on the CICIDS2017 and CICIDS2018 datasets. As shown in Fig. 8, the SMLP model trained using traffic samples generated by the KAFS sampling algorithm achieves superior detection performance, confirming the effectiveness of the KAFS approach. In Fig. 9, detection accuracies of 99.80% and 98.26% are achieved for attack-class traffic in the CICIDS2017 and CICIDS2018 datasets, respectively. Evaluation metrics presented in Fig. 9 and Fig. 10 show that the proposed method consistently outperforms other Siamese network architectures and loss functions in terms of accuracy, precision, Detection Rate (DR), and F1-score, further supporting the validity of the SMLP design. As shown in Tables 4 and 6, the method attains detection performance comparable to that of state-of-the-art algorithms while using substantially fewer samples, highlighting its suitability for practical deployment where data availability may be limited. Statistical analysis of the results (Tables 5 and 8) confirms that the performance gains achieved by the proposed method are statistically significant. Fig. 11 and Fig. 12 further illustrate that the method delivers notable improvements over existing approaches in detecting unknown attack types, demonstrating its adaptability and robustness under evolving threat conditions.  Conclusions  This study addresses the challenges of sparse attack traffic and class imbalance in network traffic anomaly detection by proposing a method that combines unsupervised adaptive sampling with an enhanced Siamese network. A KAFS algorithm is designed to dynamically select representative training sets using unsupervised clustering. To enable accurate detection with limited input samples, an SMLP is developed to compute distances between traffic samples. A robust loss function is introduced, incorporating both encoding loss from the MLP and prediction loss based on the distance between anchor, normal, and malicious samples, thereby improving training efficiency. Experimental validation using the CICIDS2017 and CICIDS2018 datasets confirm the method’s effectiveness in detecting attack traffic with few samples. Future work will focus on further enhancing few-shot intrusion detection techniques to improve detection accuracy in real-world network environments.
Image and Intelligent Information Processing
An improved Interacting Multiple Model Algorithm and Its Application in Airport Moving Target Tracking
LU Qixing, TANG Xinmin, QI Ming, GUAN Xiangmin
2025, 47(7): 2225-2236. doi: 10.11999/JEIT241150
Abstract:
  Objective  With the rapid growth of air traffic and expanding airport infrastructure, airport surfaces have become increasingly complex and congested. Higher aircraft density on taxiways and runways, increased ground vehicles, and obstacles complicate surface operations and heighten the risk of conflicts due to limited situational awareness. Pilots and ground controllers may struggle to obtain accurate environmental data, leading to potential safety hazards. To enhance surface surveillance and reduce collision risks, this study proposes an improved Interacting Multiple Model (IMM) filtering algorithm with adaptive transition probabilities. Unlike traditional IMM algorithms that rely on a fixed Markov transition probability matrix, the proposed method dynamically adjusts state transition probabilities to better adapt to operational conditions. This approach enhances tracking accuracy and improves aircraft trajectory prediction on airport surfaces, thereby increasing the safety and stability of ground operations.  Methods  The proposed algorithm integrates observation data and filtering residuals, constructing a fuzzy inference system for maneuver intensity using a fuzzy inference algorithm. This system infers the mapping relationship between observation data and the explicit state set in the Hidden Markov Model (HMM), deriving the corresponding state sequence. This process accurately captures target state changes, enhancing behavior prediction. The Baum-Welch algorithm in HMM is applied to solve the state transition matrix and update the observation probability matrix in real time, optimizing the adaptive update strategy for state transition probabilities. This improves model adaptability and accuracy across different environments. The algorithm integrates the fuzzy inference system for maneuver intensity with HMM and incorporates it into the IMM algorithm, forming a Fuzzy Hidden Markov-Interacting Multiple Model (FHMM-IMM) algorithm for real-time maneuvering target estimation. This approach significantly enhances tracking accuracy, particularly in complex and dynamic environments, ensuring high precision and stability for practical applications.  Results and Discussions  The proposed improved IMM algorithm is validated using actual airport surface ADS-B trajectory data. The results show that the algorithm adaptively adjusts parameters under non-equidistant prediction conditions, maintaining stable tracking performance (Figure 8). The position, velocity, and acceleration tracking error curves in both two-dimensional and one-dimensional Cartesian coordinates indicate a significant reduction in overall error, enhancing tracking accuracy (Figures 9, 10, and 11). Comparison with other algorithms confirms that the proposed method achieves a more stable tracking trajectory with lower errors, demonstrating superior performance (Figures 12, 13, 14, and 15). According to (Tables 2, 3, and 4), the two-dimensional position tracking accuracy improves by 63.5%, 54.3%, 40.3%, and 22.7%. The X-direction position tracking accuracy improves by 44.9%, 51.8%, 33.8%, and 35.2%, while the Y-direction position tracking accuracy improves by 63.9%, 62.9%, 52.7%, and 43.4%. The algorithm meets the real-time operational requirements of airport surface monitoring, further validating its effectiveness.  Conclusions  This study highlights the importance of precise four-dimensional trajectory tracking and prediction for airport surface aircraft, particularly in complex environments. Accurate trajectory tracking enhances taxiing safety and operational efficiency, addressing the challenges posed by increasing aircraft density on runways and taxiways. To improve tracking accuracy, an improved IMM algorithm with adaptive transition probabilities, based on Kalman filtering, is proposed. The main contributions are as follows: (1) A fuzzy inference system for maneuver intensity is developed, deriving explicit and hidden state sets and corresponding state sequences to capture target dynamics more accurately. (2) The FHMM-IMM algorithm is introduced for real-time estimation of maneuvering targets, incorporating time-varying state transition probabilities to enhance multi-model tracking and prediction in dynamic environments. (3) Experimental validation using real ADS-B trajectory data demonstrates that the FHMM-IMM algorithm achieves superior trajectory fitting, significantly reducing model errors. It also improves tracking accuracy for position, velocity, and acceleration in both two-dimensional and one-dimensional scenarios, verifying the effectiveness of the proposed model. These improvements provide a more precise and real-time solution for airport surface aircraft trajectory prediction and tracking, contributing to enhanced operational safety and efficiency.
Diffusion Model and Edge Information Guided Single-photon Image Reconstruction Algorithm
ZHANG Dan, LIAN Qiusheng, YANG Yuchi
2025, 47(7): 2237-2248. doi: 10.11999/JEIT241063
Abstract:
  Objective  Quanta Image Sensors (QIS) are solid-state sensors that encode scene information into binary bit-streams. The reconstruction for QIS consists of recovering the original scenes from these bit-streams, which is an ill-posed problem characterized by incomplete measurements. Existing reconstruction algorithms based on physical sensors primarily use maximum-likelihood estimation, which may introduce noise-like components and result in insufficient sharpness, especially under low oversampling factors. Model-based optimization algorithms for QIS generally combine the likelihood function with an explicit or implicit image prior in a cost function. Although these methods provide superior quality, they are computationally intensive due to the use of iterative solvers. Additionally, intrinsic readout noise in QIS circuits can degrade the binary response, complicating the imaging process. To address these challenges, an image reconstruction algorithm, Edge-Guided Diffusion Model (EGDM), is proposed for single-photon sensors. This algorithm utilizes a diffusion model guided by edge information to achieve high-speed, high-quality imaging for QIS while improving robustness to readout noise.  Methods  The proposed EGDM algorithm incorporates a measurement subspace constrained by binary measurements into the unconditional diffusion model sampling framework. This constraint ensures that the generated images satisfy both data consistency and the natural image distribution. Due to high noise intensity in latent variables during the initial reverse diffusion stages of diffusion models, texture details may be lost, and structural components may become blurred. To enhance reconstruction quality while minimizing the number of sampling steps, a bilateral filter is applied to extract edge information from images generated by maximum likelihood estimation. Additionally, the integration of jump sampling with a measurement subspace projection termination strategy reduces inference time and computational complexity, while preserving visual quality.  Results and Discussions  Experimental results on both the benchmark datasets, Set10 and BSD68 (Fig. 6, Fig. 7, Table 2), and the real video frame (Fig. 8) demonstrate that the proposed EGDM method outperforms several state-of-the-art reconstruction algorithms for QIS and diffusion-based methods in both objective metrics and visual perceptual quality. Notably, EGDM achieves an improvement of approximately 0.70 dB to 3.00 dB compared to diffusion-based methods for QIS in terms of Peak Signal-to-Noise Ratio (PSNR) across all oversampling factors. For visualization, the proposed EGDM produces significantly finer textures and preserves image sharpness. In the case of real QIS video sequences (Fig. 8), EGDM preserves more detailed information while mitigating blur artifacts commonly found in low-light video capture. Furthermore, to verify the robustness of the reconstruction algorithm to readout noise, the reconstruction of the original scene from the measurements is conducted under various readout noise levels. The experimental results (Table 3, Fig. 9, Fig. 10) demonstrate the effectiveness of the proposed EGDM method in suppressing readout noise, as it achieves the lowest average Mean Squared Error (MSE) and superior quality compared to other algorithms in terms of PSNR, particularly at higher noise levels. Visually, EGDM produces the best results, with sharp edges and clear texture patterns even under severe noise conditions. Compared to the EGDM algorithm without acceleration strategies, the implementation of jump sampling and measurement subspace projection termination strategies reduces the execution time by 5 seconds and 1.9 seconds, respectively (Table 4). Moreover, EGDM offers faster computation speeds than other methods, including deep learning-based reconstruction algorithms that rely on GPU-accelerated computing. After thorough evaluation, these experimental findings confirm that the high-performance reconstruction and rapid imaging speed make the proposed EGDM method an excellent choice for practical applications.  Conclusions  This paper proposes a single-photon image reconstruction algorithm, EGDM, based on a diffusion model and edge information guidance, overcoming the limitations of traditional algorithms that produce suboptimal solutions in the presence of low oversampling factors and readout noise. The measurement subspace defined by binary measurements is introduced as a constraint in the diffusion model sampling process, ensuring that the reconstructed images satisfy both data consistency and the characteristics of natural image distribution. The bilateral filter is applied to extract edge components from the MLE-generated image as auxiliary information. Furthermore, a hybrid sampling strategy combining jump sampling with measurement subspace projection termination is introduced, significantly reducing the number of sampling steps while improving reconstruction quality. Experimental results on both benchmark datasets and real video frames demonstrate that: (1) Compared with conventional image reconstruction algorithms for QIS, EGDM achieves excellent performance in both average PSNR and SSIM. (2) Under different oversampling factors, EGDM outperforms existing diffusion-based reconstruction methods by a large margin. (3) Compared with existing techniques, the EGDM algorithm requires less computational time while exhibiting strong robustness against readout noise, confirming its effectiveness in practical applications. Future research could focus on developing parameter-free reconstruction frameworks that preserve imaging quality and extending EGDM to address more complex environmental challenges, such as dynamic low-light or high dynamic range imaging for QIS.
Long-term Transformer and Adaptive Fourier Transform for Dynamic Graph Convolutional Traffic Flow Prediction Study
ZHANG Hong, YI Min, ZHANG Xijun, LI Yang, ZHANG Pengcheng
2025, 47(7): 2249-2262. doi: 10.11999/JEIT241076
Abstract:
  Objective  The rapid development of Intelligent Transportation Systems (ITS), combined with population growth and increasing vehicle ownership, has intensified traffic congestion and road capacity limitations. Accurate and timely traffic flow prediction supports traffic control centers in road management and assists travelers in avoiding congestion and improving travel efficiency, thereby relieving traffic pressure and enhancing road network utilization. However, current models struggle to capture the complex, non-stationary, and long-term dependent characteristics of traffic systems, limiting prediction accuracy. To address these limitations, deep learning approaches—such as dynamic graph convolution and Transformer architectures—have gained prominence in traffic prediction. This study proposes a dynamic graph convolutional traffic flow prediction model (ADGformer) that integrates a long-term Transformer with an adaptive Fourier transform. The model is designed to more effectively capture long-term temporal dependencies, handle non-stationary patterns, and extract latent dynamic spatiotemporal features from traffic flow data.  Methods  The ADGformer model, based on a Long-Term Transformer and Adaptive Fourier Transform, comprises three primary components:a stacked Long-term Gated Convolutional (LGConv) layer, a spatial convolutional layer, and a Multifaceted Fusion Module. The LGConv layer integrates a Masked Sub-series Transformer and dilated gated convolution to extract long-term trend features from temporally segmented traffic flow series, thereby enhancing long-term prediction performance. The Masked Sub-series Transformer—comprising the Subsequent Temporal Representation Learner (STRL) and a self-supervised task head—learns compact and context-rich representations from extended time series sub-segments. The spatial convolutional layer incorporates a dynamic graph constructor that generates learnable graphs based on the hidden states of spatially related nodes. These dynamic graphs are then processed using learnable dynamic graph convolution to extract latent spatial features that evolve over time. To address the non-stationarity of traffic flow sequences, an adaptive spectral block based on Fourier transform is introduced.  Results and Discussions  The ADGformer model adaptively models inter-node relationships through a dynamic graph constructor, effectively capturing spatial dependencies within traffic networks and the evolving spatial characteristics of traffic flows. It also learns compressed, context-rich subsequence representations and long-term temporal patterns from extended input sequences. The adaptive spectral block, based on the Fourier transform, reduces the non-stationarity of traffic flow data. To evaluate model performance, this study conducts comparison and ablation experiments on three benchmark datasets. On the PEMSD4 dataset, the ADGformer model reduces MAE under Horizon 3 by approximately 21.57% and 16.72% compared with traditional models such as VAR and LR, respectively. Under Horizon 12 on the England dataset, ADGformer reduces RMSE by 3.35% and 2.44% compared with GWNet and MTGNN, respectively. ADGformer achieves the highest prediction accuracy across all three datasets (Table 2). Visualization comparisons with XGBoost, ASTGCN, and DCRNN on PEMSD4 and England datasets further confirm its superior long-term predictive performance (Fig. 5). The model maintains robust performance as the prediction horizon increases. To assess the contributions of individual components, ablation experiments are performed for Horizons 3, 6, and 12. On Horizon 12 of the PEMSD4 dataset, ADGformer improves MAE by approximately 3.69%, 5.09%, 0.92%, and 7.59% relative to the variant models NMS-Trans, NLGConv, NASB, and NDGConv, respectively. On the England dataset, MAPE is reduced by 3.56%, 18.53%, 2.98%, and 4.87%, respectively (Table 3). Visual results of the ablation study (Fig. 6) show that ADGformer consistently outperforms its variants as the forecast step increases, confirming the effectiveness of each module in the proposed model.  Conclusions  To address the limitations of existing traffic flow prediction models—namely, (1) failure to capture hidden dynamic spatio-temporal correlations among road network nodes, (2) insufficient learning of key temporal information and long-term trend features from extended historical sequences, and (3) limited ability to reduce the non-stationarity of traffic data—this study proposes a combinatorial traffic flow prediction model, ADGformer, which enables accurate long-term traffic flow prediction. ADGformer employs a Masked Sub-series Transformer to pre-train long historical sequences and extract compact, information-rich subsequence representations. A gated mechanism combined with one-dimensional dilated convolution is then applied to capture long-term trend features and enhance temporal modeling capacity. The model also integrates a spatial convolutional layer constructed from dynamic traffic patterns. Through learnable dynamic graph convolution, it effectively captures evolving spatial dependencies across the network. Furthermore, an adaptive spectral block based on Fourier transform is designed to enhance feature representation and reduce non-stationarity by applying adaptive thresholding to suppress noise, while preserving both long- and short-term interactions. Given the complex and variable nature of traffic flow, future work will consider integrating additional features such as periodicity to further improve prediction performance.
Research on Resource Scheduling of Distributed CNN Inference System Based on AirComp
LIU Qiaoshou, DENG Yifeng, HU Haonan, YANG Zhenwei
2025, 47(7): 2263-2272. doi: 10.11999/JEIT241022
Abstract:
  Objective   In traditional AirComp systems, the computational accuracy is directly affected by the alignment of received signal phases from different transmitters. When applied to distributed federated learning and distributed inference systems, phase misalignment can introduce computational errors, reducing model training and inference accuracy. This study proposes the MOSI-AirComp system, in which transmitted signals in each computation round originate from the same node, thereby eliminating signal phase alignment issues.  Methods  (1) A dual-branch training model is proposed, increasing network complexity only during training. The traditional model is extended to a dual-branch structure, where the lower branch retains the original model, and the upper branch incorporates additional loss layers for training. (2) An MOSI-AirComp-based weight-power control scheme is introduced. Each node is equipped with multiple transmitting antennas and a single receiving antenna. Pre-trained model weights are offloaded to task nodes as part of the power control factor, which adjusts transmission power during inference. This optimization enhances signal amplitude for convolution operations while reducing computation time. Since data transmission originates from the same node, phase alignment issues are avoided. AirComp integrates signals from multiple antennas for convolution summation, enabling airborne convolution. (3) A TSP-based node selection algorithm is proposed, using weight mean and path as evaluation parameters to determine the optimal transmission path, ensuring efficient data transmission.  Results and Discussions  Compared to the traditional network model, the dual-branch training model significantly improves inference accuracy under small-scale fading. For the MNIST and CIFAR-10 datasets, accuracy increases by 2%~18% and 0.4%~11.2% under different SNR values (Fig. 5 and Fig. 6). The MSE decreases by 0.056~0.154 and 0.047~0.23 under different maximum node power budgets (Fig. 7). In noise-only scenarios, inference accuracy improves by 0.7%~5.5% and 0.3%~7.1% under different SNR values (Fig. 5 and Fig. 6), while MSE decreases by 0.035~0.152 and 0.056~0.253 under different maximum node power budgets (Fig. 8).  Conclusions  An MOSI-AirComp system is proposed to address the phase alignment issue inherent in traditional AirComp scenarios. The system enables airborne convolution through a power control scheme and enhances the traditional network model with a dual-branch structure. The upper branch simulates multiplicative Rayleigh fading using loss layers and incorporates model data into the convolution layer output of the lower branch to simulate additive noise effects. To account for node limitations in IoT networks, a model-weight-improved Traveling Salesman Problem (TSP) node selection algorithm is proposed. Future advancements in AirComp deployment for distributed computing and communication frameworks hold promise, particularly with the rapid development of 6G and IoT.
3D Reconstruction of Metro Tunnel Based on Path Likelihood Model and HMM Sequence Matching Localization
HU Zhaozheng, WANG Shuheng, MENG Jie, FENG Feng, ZHU Ziwei, LI Weigang
2025, 47(7): 2273-2284. doi: 10.11999/JEIT241122
Abstract:
  Objective  As the operational mileage of metro systems in China continues to increase, the inspection and maintenance of metro tunnels have become more critical. Accurate 3D reconstruction of metro tunnels is essential for construction, inspection, and maintenance. However, in severely degraded tunnel environments, existing SLAM algorithms based on laser or vision often struggle to construct maps and face limitations in complex scenarios. To address this challenge, this paper proposes a method for large-scale 3D reconstruction of metro tunnels by utilizing the matching of the Path Likelihood Model (PLM) and the Hidden Markov Model (HMM). The 3D reconstruction task is divided into two key processes: odometer positioning and high-precision 3D reconstruction via graph optimization. High-precision 3D reconstruction is achieved by effectively addressing both components.  Methods  For odometer-based localization, this paper presents a method that incorporates the PLM. The PLM is developed using kernel density estimation to analyze the vehicle’s track path, effectively representing the vehicle’s positional information as a probability distribution. Within the framework of a particle filter, this method converts the constructed PLM into position observations of the vehicle. Additionally, data from the onboard Inertial Measurement Unit (IMU) and the wheel speed sensor are integrated to enhance localization accuracy. To minimize cumulative errors in odometer-based localization, this paper reformulates the problem of loop closure detection as a sequence matching problem using the Viterbi algorithm within the framework of the HMM. This method effectively addresses the instability associated with single-frame matching in loop closure detection and significantly improves the overall performance. To resolve the reconstruction problem, this paper presents a method for 3D reconstruction using large-scale factor graph optimization. By optimizing the pose graph with multiple constraints, it enables high-precision 3D reconstruction of extensive metro tunnels.  Results and Discussions  The proposed method and model are tested and validated at the WeiJianian-ShuangShuianian and ShaHeyuan-DongZikou metro stations in Chengdu. The experimental results are as follows: the effectiveness of the proposed method is confirmed through two sets of ablation experiments, DR and DR+PATH. Furthermore, by comparing the results with those of two notable open-source LIDAR algorithms, LIO-SAM and Faster-LIO, the superiority of this method is demonstrated. The reconstruction accuracy achieved is high, and the reconstruction error remains consistent even as the running distance increases. Therefore, the method is suitable for application in real operational processes.  Conclusions  This paper addresses the challenges of 3D reconstruction in metro tunnels by proposing a novel algorithm that combines the PLM with HMM sequence matching. The PLM is developed using drawing information, which serves as the foundation for the reconstruction process. Within the framework of particle filtering, the likelihood model is used to correct errors from the IMU and wheel speed sensor. This results in accurate odometer readings for the onboard robot. Furthermore, the issue of loop matching is reformulated as an HMM sequence matching problem. By constructing loop constraints, accumulated positioning errors are effectively eliminated. Finally, the pose and loop constraints derived from the odometer data are integrated into the optimization model for a large-scale factor map, enabling high-precision 3D reconstruction of the metro tunnel. Field tests conducted at the WeiJianian-ShuangShuianian and ShaHeyuan-DongZikou metro stations in Chengdu, with comparisons with other algorithms, demonstrate that the proposed PLM and HMM sequence matching algorithm significantly improve 3D reconstruction accuracy in metro tunnels, particularly in severely degraded environments.
Gas Station Inspection Task Allocation Algorithm in Digital Twin-assisted Reinforcement Learning
LIAN Yuanfeng, TIAN Tian, CHEN Xiaohe, DONG Shaohua
2025, 47(7): 2285-2297. doi: 10.11999/JEIT241027
Abstract:
  Objective  With the increasing quantity of equipment in gas stations and the growing demand for safety, Multi-Robot Task Allocation (MRTA) has become essential for improving inspection efficiency. Although existing MRTA algorithms offer basic allocation strategies, they have limited capacity to respond to emergent tasks and to manage energy consumption effectively. To address these limitations, this study integrates digital twin technology with a reinforcement learning framework. By incorporating Lyapunov optimization and decoupling the optimization objectives, the proposed method improves inspection efficiency while maintaining a balance between robot energy use and task delay. This approach enhances task allocation in complex gas station scenarios and provides theoretical support for intelligent unmanned management systems in such environments.  Methods  The DTPPO algorithm constructs a multi-objective joint optimization model for inspection task allocation, with energy consumption and task delay as the primary criteria. The model considers the execution performance of multiple robots and the characteristics of heterogeneous tasks. Lyapunov optimization theory is then applied to decouple the time-energy coupling constraints of the inspection objectives. Using the Lyapunov drift-plus-penalty framework, the algorithm balances task delay and energy consumption, which simplifies the original joint optimization problem. The decoupled objectives are solved using a strategy that combines digital twin technology with the Proximal Policy Optimization (PPO) algorithm, resulting in a task allocation policy for multi-robot inspection in gas station environments.  Results and Discussions  The DTPPO algorithm decouples long-term energy consumption and time constraints using Lyapunov optimization, incorporating their variations into the reward function of the reinforcement learning model. Simulation results show that the Pathfinding inspection path (Fig. 4) generated by the DTPPO algorithm improves the task completion rate by 1.94% compared to benchmark experiments. In complex gas station environments (Fig. 5), the algorithm achieves a 1.92% improvement. When the task quantity parameter is set between 0.1 and 0.5 (Fig. 8), the algorithm maintains a high task completion rate even under heavy load. With 2 to 6 robots (Fig. 9), the algorithm demonstrates strong adaptability and effectiveness in resource-constrained scenarios.  Conclusions  This study addresses the coupling between energy consumption and time by decoupling the objective function constraints through Lyapunov optimization. By incorporating the variation of Lyapunov drift-plus-penalty terms into the reward function of reinforcement learning, a digital twin-assisted reinforcement learning algorithm, named DTPPO, is proposed. The method is evaluated in multiple simulated environments, and the results show the following: (1) The proposed approach achieves a 1.92% improvement in task completion rate compared to the DDQN algorithm; (2) Lyapunov optimization improves performance by 5.89% over algorithms that rely solely on reinforcement learning; (3) The algorithm demonstrates good adaptability and effectiveness under varying task quantities and robot numbers. However, this study focuses solely on Lyapunov theory, and future research should explore the integration of Lyapunov optimization with other algorithms to further enhance MRTA methods.
Research on Weld Defect Detection Method Based on Improved DETR
DAI Zheng, LIU Xiaojia, PAN Quan
2025, 47(7): 2298-2307. doi: 10.11999/JEIT241009
Abstract:
  Objective  Welding technology plays a pivotal role in industrial manufacturing, where X-ray image evaluation serves as a critical inspection method for assessing the internal quality of weld seams. X-ray inspection is effective in identifying defects such as slag inclusions, incomplete penetration, and porosity, which helps prevent structural failures and ensures the reliability and durability of welded components. This process is a fundamental quality control measure in industrial manufacturing. However, challenges persist in the assessment of weld seam X-ray images, particularly in relation to high workloads and inefficiencies. Conventional models often experience multi-scale feature information loss during feature extraction due to the significant variation in the size and morphology of defects, such as porosity, slag inclusions, and incomplete penetration, found in large structural weld seams. To address these limitations, the Detection Transformer with Concatenated Expand Convolutions and Augmented Feature Pyramid Networks (CADETR) model is proposed to improve detection performance for weld defects in large structural components.  Methods  The CADETR model is proposed for detecting weld defects in large structural components. The model comprises three core components: the DETR network, concatenated expand convolution (CEC) network, and Augmented Feature Pyramid Network (AFPN). The DETR network applies multi-head self-attention mechanisms to effectively capture global contextual relationships among feature map positions, enhancing perceptual capability and detection accuracy for weld defects. The CEC module adopts a composite expanded convolution structure, widening convolutional kernel receptive fields and significantly improving feature extraction for defects across various scales. The AFPN module reinforces multi-scale defect feature extraction by integrating hierarchical feature maps and employing a feature batch elimination mechanism, reducing overfitting and enhancing generalization performance in multi-scale defect detection. Additionally, a Penalized Cross Entropy Loss (PCE-Loss) function is proposed, which applies increased penalties to incorrect defect predictions, further improving model robustness and precision.  Results and Discussions  The performance of the CADETR defect detection model is evaluated through a comparative analysis with multiple models, including Faster RCNN, ECASNet, GeRCNN, DETR, MDCBNet, HPRT-DETR, and YOLOV11. Weld seam X-ray image data are input into each model, with variations in loss values recorded during the training process. Model performance in defect detection is assessed using Precision, Recall, and mAP metrics. Experimental results show that the CADETR model exhibits slightly higher loss values compared to HPRT-DETR and YOLOV11 but lower than other benchmark models (Fig. 7). The CADETR model demonstrates superior performance in mAP, achieving 91.6%, exceeding all comparative models (Table 3). The CADETR model proves particularly effective in detecting defects characterized by a high proportion of small targets and significant shape variations (Fig. 8).  Conclusions  This study addresses the challenges of detecting weld defects with significant variations in size and morphology in large structural components through the CADETR weld defect detection model. Evaluation using a welded seam X-ray image dataset revealed the following key findings: (1) The sequential integration of the CEC module, AFPN module, and PCE-Loss function into the baseline DETR framework improved mAP by 4.6%, 4.5%, and 3.4%, respectively, validating the contribution of each component. (2) The CADETR model achieved a 91.6% mAP for weld defect detection, with a single-image inference time of 0.036 s. (3) Compared to the original DETR, CADETR demonstrated a 8.9% improvement in mAP. For future implementation, the CADETR model will be deployed in a Browser/Server (B/S) architecture-based weld defect detection system, where both software algorithms and computational hardware resources will be hosted on cloud servers. This design ensures stable operational workflows and facilitates cross-platform data resource sharing.
SealVerifier: Seal Verification System Based on Dual-stream Model
LEI Meng, NING Qiyue, JU Jinjun, ZOU Liang
2025, 47(7): 2308-2319. doi: 10.11999/JEIT241059
Abstract:
  Objective  Seals serve a critical legal function in scenarios such as document authentication and contract execution, acting as essential markers of document authenticity and legitimacy. However, the increasing sophistication of seal forgery techniques, driven by advances in digital technology, presents new challenges to existing verification methods. In particular, low-quality or blurred seal images substantially reduce the accuracy and reliability of traditional approaches, limiting their practical utility. To address these limitations, this study proposes SealVerifier, an automatic seal verification system based on a dual-stream model. The method is designed to improve recognition accuracy, generalization ability, and robustness to noise. SealVerifier contributes to the intelligent development of seal verification and offers technical support for secure digital document authentication, thereby facilitating the broader deployment of reliable seal verification technologies.  Methods  SealVerifier comprises an image enhancement module and a dual-stream verification model, designed to improve the accuracy and robustness of seal authentication. The framework follows a two-stage pipeline: image preprocessing and authenticity verification. In the preprocessing stage, the DeARegNet module is introduced to correct degradation caused by uneven stamping pressure, scanner variability, paper background complexity, and interference from document content. DeARegNet integrates a Denoising Adversarial Network (DAN) and a GeomPix alignment module to enhance seal image clarity and consistency. DAN employs an adversarial training, consisting of a denoiser and a discriminator. The denoiser uses a multi-level residual dense connection module to extract fine-grained features and eliminate noise, thereby improving image resolution. The discriminator enforces denoising reliability by distinguishing between clean and denoised images using an adversarial loss. The GeomPix alignment module exploits geometric characteristics of circular and elliptical seals. It relies on a central pentagram positioning marker and the radial fan-shaped pixel density distribution to achieve high-precision alignment, significantly improving the accuracy and stability of image correction. In the verification stage, a dual-stream architecture combining EfficientNet and Streamlining Vision Transformer (SViT) is employed to extract local detail features and global structural information. EfficientNet performs efficient multi-scale feature extraction via compound scaling, capturing textures, edge contours, and subtle defects. SViT models global dependencies through self-attention mechanisms and enhances feature learning with high-dimensional multilayer perceptrons and denormalization techniques, thereby improving verification accuracy. To improve generalization and reduce inter-domain discrepancies among seal datasets, a Data Distribution Adapter (DDA) and Gradient Reversal Layer (GRL) are incorporated. These components use adversarial training to support the seal authenticity classifier—comprising EfficientNet and SViT—in learning domain-invariant features. This approach enhances robustness and adaptability in diverse application scenarios.  Results and Discussions  Experimental results demonstrate that the integration of the dual-stream architecture—EfficientNet for local detail extraction and SViT for global structural representation—enables SealVerifier to significantly improve verification accuracy. On a custom Chinese seal dataset comprising 30,699 image pairs, SealVerifier achieved precision, recall, and F1 scores of 91.34%, 96.83%, and 93.57%, respectively, outperforming existing methods (Table 3). The incorporation of a DDA and a dual loss function further reduced distributional discrepancies across seal datasets using adversarial training, enhancing both recognition accuracy and generalization performance (Table 4). Under noise interference, SealVerifier maintained high verification accuracy, confirming its robustness and applicability in real-world scenarios (Table 2).  Conclusions  This study proposes SealVerifier, a dual-stream model for fully automated seal authenticity verification. A Chinese seal dataset with complex backgrounds is constructed, and nine-fold cross-validation confirms the method’s effectiveness. SealVerifier integrates DeARegNet for image enhancement and combines EfficientNet and SViT to capture both fine-grained details and global semantic features. To address the limitations of conventional Vision Transformer (ViT) models, high-dimensional multilayer perceptrons and denormalization techniques are introduced, improving the model’s capacity to learn complex features and enhancing generalization and robustness. A DDA and dual loss function are also incorporated to mitigate dataset variability, enabling stable classification performance across heterogeneous seal images. Experimental results show that SealVerifier achieves precision, recall, and F1 scores of 91.34%, 96.83%, and 93.57%, respectively, demonstrating its performance advantage in seal verification tasks. Future work explores high-precision alignment strategies for multi-view seal images to further reduce error and improve image correction accuracy under challenging imaging conditions.
FCSNet: A Frequency-Domain Aware Cross-Feature Fusion Network for Smoke Segmentation
WANG Kaizheng, ZENG Yao, ZHANG Zhanxi, TAN Yizhang, WEN Gang
2025, 47(7): 2320-2333. doi: 10.11999/JEIT241021
Abstract:
  Objective  Vision-based smoke segmentation enables pixel-level classification of smoke regions, providing more spatially detailed information than traditional bounding-box-based detection approaches. Existing segmentation models based on Deep Convolutional Neural Networks (DCNNs) demonstrate reasonable performance but remain constrained by a limited receptive field due to their local inductive bias and two-dimensional neighborhood structure. This constraint reduces their capacity to model multi-scale features, particularly in complex visual scenes with diverse contextual elements. Transformer-based architectures address long-range dependencies but exhibit reduced effectiveness in capturing local structure. Moreover, the limited availability of real-world smoke segmentation datasets and the underutilization of edge information reduce the generalization ability and accuracy of current models. To address these limitations, this study proposes a Frequency-domain aware Cross-feature fusion Network for Smoke segmentation (FCSNet), which integrates frequency-domain and spatial-domain representations to enhance multi-scale feature extraction and edge information retention. A dataset featuring various smoke types and complex backgrounds is also constructed to support model training and evaluation under realistic conditions.  Methods  To address the challenges of smoke semantic segmentation in real-world scenarios, this study proposes FCSNet, a frequency-domain aware cross-feature fusion network. Given the high computational cost associated with Transformer-based models, a Frequency Transformer is designed to reduce complexity while retaining global representation capability. To overcome the limited contextual modeling of DCNNs and the insufficient local feature extraction of Transformers, a Domain Interaction Module (DIM) is introduced to facilitate effective fusion of global and local information. Within the network architecture, the Frequency Transformer branch extracts low-frequency components to capture large-scale semantic structures, thereby improving global scene comprehension. In parallel, a Multi-level High-Frequency perception Module (MHFM) is combined with Multi-Head Cross Attention (MHCA). MHFM processes multi-layer encoder features to capture high-frequency edge details at full resolution using a shallow structure. MHCA then computes directional global similarity maps to guide the decoder in aggregating contextual information more effectively.  Results and Discussions  The effectiveness of FCSNet is evaluated through comparative experiments against state-of-the-art methods using the RealSmoke and SMOKE5K datasets. On the RealSmoke dataset, FCSNet achieves the highest segmentation accuracy, with mean Intersection over Union (mIoU) values of 58.59% on RealSmoke-1 and 63.92% on RealSmoke-2, outperforming all baseline models (Table 4). Although its FLOPs are slightly higher than those of TransFuse, FCSNet demonstrates a favorable trade-off between accuracy and computational complexity. Qualitative results further highlight its advantages under challenging conditions. In scenes affected by clouds, fog, or building occlusion, FCSNet distinguishes smoke boundaries more clearly and reduces both false positives and missed detections (Fig. 8). Notably, in RealSmoke-2, which contains fine and sparse smoke patterns, FCSNet exhibits superior performance in smoke localization and edge detail segmentation compared to other methods (Fig. 9). On the SMOKE5K dataset, FCSNet achieves an mIoU of 78.94%, showing a clear advantage over competing algorithms (Table 5). Visual comparisons also indicate that FCSNet generates more accurate and refined smoke boundaries (Fig. 10). These results confirm that FCSNet maintains strong segmentation accuracy and robustness across diverse real-world scenes, supporting its generalizability and practical utility in smoke detection tasks.  Conclusions  To address the challenges of smoke semantic segmentation in real-world environments, this study proposes FCSNet, a network that integrates frequency- and spatial-domain information. A Frequency Transformer is introduced to reduce computational cost while enhancing global semantic modeling through low-frequency feature extraction. To compensate for the limited receptive field of DCNNs and the local feature insensitivity of Transformers, a DIM is designed to fuse global and local representations. An MHFM is employed to extract edge features, improving segmentation performance in ambiguous regions. Additionally, an MHCA mechanism aligns high-frequency edge features with decoder representations to guide segmentation in visually confusing areas. By jointly leveraging low-frequency semantics and high-frequency detail, FCSNet achieves effective fusion of contextual and structural information. Extensive quantitative and qualitative evaluations confirm that FCSNet performs robustly under complex interference conditions, including clouds, fog, and occlusions, enabling accurate smoke localization and fine-grained segmentation.
Scene Text Detection Based on High Resolution Extended Pyramid
WANG Manli, DOU Zeya, CAI Mingzhe, LIU Qunpo, SHI Yannan
2025, 47(7): 2334-2346. doi: 10.11999/JEIT241017
Abstract:
  Objective  Text detection, a critical branch of computer vision, has significant applications in text translation, autonomous driving, and hill information processing. Although existing text detection methods have improved detection performance, several challenges remain in complex natural scenes. Scene text exhibits substantial scale variations, making multi-scale text detection difficult. Additionally, inadequate feature utilization hampers the detection of small-scale text. Furthermore, increasing the receptive field often necessitates reducing image resolution, which results in severe spatial information loss and diminished feature saliency. To address these challenges, this study proposes the High-Resolution Extended Pyramid Network (HREPNet), a scene text detection method based on a high-resolution extended pyramid structure.  Methods  First, an improved feature pyramid was constructed by incorporating a high-resolution extension layer and a super-resolution feature module to enhance text resolution features and address the issue of low-resolution text. Additionally, a multi-scale feature extraction module was integrated into the backbone network to facilitate feature transfer. By leveraging a multi-branch dilated convolution structure and an attention mechanism, the model effectively captured multi-scale text features, mitigating the challenge posed by significant variations in text scale. Finally, an efficient feature fusion module was proposed to selectively integrate high-resolution and multi-scale features, thereby minimizing spatial information loss and addressing the problem of insufficient effective features.  Results and Discussions  Ablation experiments demonstrated that the simultaneous application of HREP, Multi-scale Feature Extraction Module (MFEM) and Efficient Feature Fusion Module (EFFM) significantly enhanced the model’s text detection performance. Compared with the baseline, the proposed method improved accuracy and recall by 6.3% and 8.9%, respectively, while increasing the F-measure by 7.6%. These improvements can be attributed to MFEM, which enhances multi-scale text detection, facilitates efficient feature transmission from the top to the bottom of the high-resolution extended pyramid, and supports the extraction of text features at different scales. This process enables HRFP to generate high-resolution features, thereby substantially improving the detection of low-resolution and small-scale text. Moreover, the large number of feature maps generated by HREP and MFEM are refined through EFFM, which effectively suppresses spatial redundancy and enhances feature expression. The proposed method demonstrated significant improvements in detecting text across different scales, with a more pronounced effect on small-scale text compared to large-scale text. Visualization results illustrate that, for small-scale text images (384 pixel), the detected text box area of the proposed method aligns more closely with the actual text area than that of the baseline method. Experimental results confirm that HREPNet significantly improves the accuracy of small-scale text detection. Additionally, for large-scale text images (2,048 pixel), the number of correctly detected text boxes increased considerably, demonstrating a substantial improvement in recall for large-scale text detection. Comparative experiments on public datasets further validated the effectiveness of HREPNet. The F-measure improved by 7.6% on ICDAR2015, 5.5% on CTW1500, and 3.0% on Total-Text, with significant enhancements in both precision and recall.  Conclusions  To address challenges related to large-scale variation, low resolution, and insufficient effective features in natural scene text detection, this study proposes a text detection network based on a High-Resolution Extended Pyramid. The High-Resolution Extended Pyramid is designed with the MFEM and the EFFM. Ablation experiments demonstrate that each proposed improvement enhances text detection performance compared with the baseline model, with the modules complementing each other to further optimize model performance. Comparative experiments on text images of different scales show that HREPNet improves text detection across various scales, with a more pronounced enhancement for small-scale text. Furthermore, experiments on natural scene and curved text demonstrate that HREPNet outperforms other advanced algorithms across multiple evaluation metrics, exhibiting strong performance in both natural scene and curved text detection. The method also demonstrates robustness and generalization capabilities. However, despite its robustness, the model has a relatively large number of parameters, which leads to slow inference speed. Future research will focus on optimizing the network to reduce the number of parameters and improve inference speed while maintaining accuracy, recall, and F-measure.
Density Clustering Hypersphere-based Self-adaptively Oversampling Algorithm for Imbalanced Datasets
TAO Xinmin, LI Junxuan, GUO Xinyue, SHI Lihang, XU Annan, ZHANG Yanping
2025, 47(7): 2347-2360. doi: 10.11999/JEIT241037
Abstract:
  Objective  Learning from imbalanced datasets presents significant challenges for the supervised learning community. Existing oversampling methods, however, have notable limitations when applied to complex imbalanced datasets. These methods can introduce noisy instances, leading to class overlap, and fail to effectively address within-class imbalance caused by low-density regions and small disjuncts. To overcome these issues, this study proposes the Density Clustering Hypersphere-based self-adaptively Oversampling algorithm (DCHO).  Methods  The DCHO algorithm first identifies clustering centers by dynamically calculating the density of minority class instances. Hyperspheres are then constructed around each center to guide clustering, and oversampling is performed within these hyperspheres to reduce class overlap. Oversampling weights are adaptively assigned according to the number of instances and the radius of each hypersphere, which helps mitigate within-class imbalance. To further refine the boundary distribution of the minority class and explore underrepresented regions, a boundary-biased random oversampling technique is introduced to generate synthetic samples within each hypersphere.  Results and Discussions  The DCHO algorithm dynamically identifies clustering centers based on the density of minority class instances, constructs hyperspheres, and assigns all minority class instances to corresponding clusters. This forms the foundation for oversampling. The algorithm further adjusts the influence of the cumulative density of instances within each hypersphere and the hypersphere radius on the allocation of oversampling weights through a defined trade-off parameter \begin{document}$ \alpha $\end{document}. Experimental results indicate that this approach reduces class overlap and assigns greater oversampling weights to sparse, low-density regions, thereby generating more synthetic instances to improve representativeness and address within-class imbalance (Fig. 7). When the trade-off parameter is set to 0.5, the algorithm effectively incorporates both density and boundary distribution, improving the performance of subsequent classification tasks (Fig. 11).  Conclusions  Comparative results with other popular oversampling algorithms show that: (1) The DCHO algorithm effectively prevents class overlap by oversampling exclusively within the generated hypersphere. Meanwhile, the algorithm adaptively assigns oversampling weights based on the local density of instances within the hypersphere and its radius, thereby addressing the within-class imbalance issue. (2) By considering the relationship between the hypersphere radius and the density of the minority class instances, the balance parameter \begin{document}$ \alpha $\end{document} is set to 0.5, which comprehensively addresses both the within-class imbalance caused by density and the enhancement of the minority class boundary distribution, ultimately improving classification performance on imbalanced datasets. (3) When applied to highly imbalanced datasets with complex boundaries, DCHO significantly improves the distribution of minority class instances, thereby enhancing the classifier’s generalization ability.
A Modulation Recognition Method Combining Wavelet Denoising Convolution and Sparse Transformer
ZHENG Qinghe, LIU Fanglin, YU Lisu, JIANG Weiwei, HUANG Chongwen, GUI Guan
2025, 47(7): 2361-2374. doi: 10.11999/JEIT241159
Abstract:
  Objective  Automatic Modulation Classification (AMC) is a key process in signal detection and demodulation, enabling the automatic identification of various modulation schemes in non-cooperative communication scenarios. The high transmission rate and low latency requirements of 6G wireless communications necessitate AMC to accurately and promptly identify different modulation schemes to adapt to complex communication scenes. As integrated 6G communication, sensing, and computing technologies evolve, an increasing number of modulation schemes are being designed and adopted. In the face of a more complex electromagnetic environment, AMC encounters challenges such as low recognition accuracy, susceptibility to environmental interference, and poor robustness. Existing feature selection methods struggle to generalize well in practical applications. By contrast, integrating adaptive feature selection methods with deep learning can generate more reliable and precise classification constraints, even for complex signal features that generalize across time. This study proposes an AMC method that combines learnable wavelet denoising convolution with a sparse Transformer, providing technical support for the advancement of integrated communication, sensing, and computing.  Methods  In the proposed AMC method combining wavelet denoising convolution and sparse Transformer, the learnable wavelet denoising convolution is proposed to assist deep learning in extracting appropriate denoised signal representations. It incorporates adaptive time-frequency features into the functional strategy of the objective function, delivering accurate spatiotemporal information to the Transformer and mitigating the effects of signal parameter offsets. A Sparse Feedforward Neural Network (SFFN) is then designed to replace the attention mechanism in the Transformer, modeling element relationships based on a limited set of key elements in the signal domain. This approach effectively optimizes gradients during training to address the challenges posed by limited signal length and the disregard of ordered feature element correlations in traditional Transformers. SFFN enhances spatial sampling positions within sparse cells by introducing additional offsets, focusing on a small group of key sampling points around the reference point. Moreover, it learns these offsets from the target task without requiring additional supervision. Given that most AMC frameworks benefit from multi-scale or multidimensional features, the proposed SFFN can be extended to multi-scale features without the need for feature pyramid networks.  Results and Discussions  During the experimental phase, modulation recognition performance testing, ablation studies, and comparative analyses are conducted on two publicly available datasets (RadioML 2016.10a and RML22) to demonstrate the excellent performance of the proposed method. Experimental results show that the sparse Transformer achieves classification accuracies of 63.84% and 71.13% on RadioML 2016.10a and RML22, respectively (Fig. 4). The primary limitation in modulation recognition performance occurs in the classification between the following modulation schemes: {WBFM, AM-DSB, QPSK, 8PSK, 16QAM, 64QAM} (Fig. 5). By comparing the sparse Transformer to a range of deep learning models (including CGDNet, CLDNN, DenseNet, GRU, and ResNet on RadioML 2016.10a, and DAENet, ICNet, LSTM, MCDNN, and MCNet on RML22), experimental results confirm the superior performance of the proposed method (Table 1). The classification accuracy of all models at various Signal-to-Noise Ratios (SNRs) is shown to highlight the performance advantage of the proposed method (Fig. 6). To qualitatively explore the effect of wavelet denoising convolution on improving classification accuracy, the power spectrum in the wavelet domain, represented by the selected denoised signals in RadioML 2016.10a, is presented (Fig. 7). The different wavelet coefficients selected for denoising signal representation reveal the component specificity of different wavelet basis functions in feature extraction, which helps determine the applicable signal types or model architectures (Figure 8). The sampling points of interest in the power concentration area are visualized on the wavelet scale map to observe the corresponding relationships between the elements learned by SFFN (Fig. 9). Furthermore, the denoised signal representation constructed by wavelet denoising convolution under different hyperparameter settings undergoes an ablation study to evaluate its impact on model performance (Table 2). The model structure is ablated and tested to observe the roles played by different modules (Table 3). Finally, different numbers of training samples are set to demonstrate the robustness of the model, with less than a 1.5% decrease in classification accuracy when the training set ratio is adjusted from 90% to 40% (Figure 10).  Conclusions  This study addresses the challenges of limited time-domain length and the neglect of ordered feature element correlations in Transformer models for signal processing by proposing an effective AMC method that combines wavelet denoising convolution and sparse Transformer. The proposed method significantly enhances AMC accuracy in complex communication scenarios and demonstrates robustness to model structure and hyperparameter selection. First, the learnable wavelet denoising convolution is proposed to assist deep learning models in adaptively extracting appropriate denoised signal representations. Subsequently, the SFFN module is designed to replace the attention mechanism in the Transformer, effectively modeling spatiotemporal element relationships. Experimental results on two public datasets (RadioML 2016.10a and RML22) show that the proposed method significantly improves AMC accuracy. By comparing a range of deep learning models, the proposed method demonstrates state-of-the-art AMC accuracy and robustness across different communication scenarios. Additionally, visualization experiments and ablation studies further confirm the robustness of the proposed method.
An Audio-visual Generalized Zero-Shot Learning Method Based on Multimodal Fusion Transformer
YANG Jing, LI Xiaoyong, RUAN Xiaoli, LI Shaobo, TANG Xianghong, XU Ji
2025, 47(7): 2375-2384. doi: 10.11999/JEIT241090
Abstract:
  Objective  Audio-visual Generalized Zero-Shot Learning (GZSL) integrates audio and visual signals in videos to enable the classification of known classes and the effective recognition of unseen classes. Most existing approaches prioritize the alignment of audio-visual and textual label embeddings, but overlook the interdependence between audio and video, and the mismatch between model outputs and target distributions. This study proposes an audio-visual GZSL method based on a Multimodal Fusion Transformer (MFT) to address these limitations.  Methods  The MFT employs a transformer-based multi-head attention mechanism to enable effective cross-modal interaction between visual and audio features. To optimize the output probability distribution, the Kullback-Leibler (KL) divergence between the predicted and target distributions is minimized, thereby aligning predictions more closely with the true distribution. This optimization also reduces overfitting and improves generalization to unseen classes. In addition, cosine similarity loss is applied to measure the similarity of learned representations within the same class, promoting feature consistency and improving discriminability.  Results and Discussions  The experiments include both GZSL and Zero-Shot Learning (ZSL) tasks. The ZSL task requires classification of unseen classes only, whereas the GZSL task addresses both unseen and seen class classification to mitigate catastrophic forgetting. To evaluate the proposed method, experiments are conducted on three benchmark datasets: VGGSound-GZSLcls, UCF-GZSLcls, and ActivityNet-GZSLcls (Table 1). MFT is quantitatively compared with five ZSL methods and nine GZSL methods (Table 2). The results show that the proposed method achieves state-of-the-art performance on all three datasets. For example, on ActivityNet-GZSLcls, MFT exceedes the previous best ClipClap-GZSL method by 14.6%. This confirms the effectiveness of MFT in modeling cross-modal dependencies, aligning predicted and target distributions, and achieving semantic consistency between audio and visual features. Ablation studies (Tables 35) further support the contribution of each module in the proposed framework.  Conclusions  This study proposes a transformer-based audio-visual GZSL method that uses a multi-head self-attention mechanism to extract intrinsic information from audio and video data and enhance cross-modal interaction. This design enables more accurate capture of semantic consistency between modalities, improving the quality of cross-modal feature representations. To align the predicted and target distributions and reinforce intra-class consistency, KL divergence and cosine similarity loss are incorporated during training. KL divergence improves the match between predicted and true distributions, while cosine similarity loss enhances discriminability within each class. Extensive experiments demonstrate the effectiveness of the proposed method.
Circuit and System Design
The Design and Implementation of a Secure and Efficient Firmware Trusted Platform Module for RISC-V Platforms
WANG Jie, WANG Juan
2025, 47(7): 2385-2395. doi: 10.11999/JEIT241112
Abstract:
  Objective  The Trusted Platform Module (TPM) is a critical technology in modern secure computing systems, providing hardware-based key management, trusted boot, and remote attestation to safeguard sensitive operations in embedded and cloud environments. However, current RISC-V platforms lack native TPM support, presenting a security challenge as these systems are increasingly deployed in diverse application scenarios. To address this limitation, RfTPM—a firmware-based TPM (fTPM) architecture—has been developed to deliver the same security functionality as conventional hardware TPMs without requiring additional hardware components or specialized security extensions. This solution provides an immediate, cost-effective means to secure RISC-V systems while contributing to the advancement of trusted computing on emerging processor architectures.  Methods  The development of RfTPM incorporates several innovative techniques to overcome the challenges of implementing TPM functionalities in firmware. The design utilizes the RISC-V Physical Memory Protection (PMP) mechanism to enforce strict memory isolation, ensuring that fTPM code and data are inaccessible to unauthorized processes. A novel static data protection strategy is introduced, combining a DRAM-based Physically Unclonable Function (PUF) with Flash locking to secure the generation and storage of cryptographic root keys, preventing rollback attacks on persistent fTPM data. To secure the boot process, RfTPM employs a delay measurement extension mechanism, which divides the boot sequence into two phases: a verification phase where each boot stage is measured and authenticated before control is transferred, and a subsequent measurement phase that continuously validates system integrity according to TPM standards. The architecture also features a dynamic permission exchange page, enabling zero-copy communication across different privilege levels by dynamically configuring PMP permissions, reducing data transfer overhead. Additionally, a fine-grained secure clock is established using the native RISC-V hardware timer to counter timing-based attacks. The solution is prototyped as a secure extension module within OpenSBI, integrated with a dedicated kernel driver and an adapted TPM Software Stack (TSS), and evaluated on a Genesys2 FPGA board simulating a Rocket Core running Linux.  Results and Discussions  Comprehensive experimental evaluations demonstrate that RfTPM meets stringent security requirements while offering significant performance benefits over both traditional hardware TPMs and conventional software TPM implementations. In a benchmark involving 2048-bit RSA key generation (Fig. 4), the hardware TPM required approximately 17.28 seconds to complete the operation, whereas the RfTPM implementation achieved the same task in just 2.18 seconds, representing a 7 times improvement. Further tests evaluating sealing, unsealing, signing, and verification commands (Fig. 5) reveal performance enhancements ranging from 3.7% to 8.2%, primarily due to the efficiency of the zero-copy communication mechanism. Additional evaluations of cryptographic operations show that RfTPM improved RSA encryption and decryption by 8.2% and 8.0%, respectively, and AES encryption and decryption by 9.1% and 9.2% (Fig. 6). Although the NVRAM startup process in RfTPM incurs minor overhead—measured at 5.28 milliseconds compared to 0.9 milliseconds for conventional software TPMs—this delay is negligible, as NVRAM initialization occurs only once during system boot and does not impact overall runtime performance. Memory footprint analysis further reveals that while conventional software TPMs may consume approximately 1 536 kB of physical memory, the combined footprint of the fTPM and OpenSBI firmware in RfTPM is only 956 kB, which can be reduced to 808 kB through compiler optimizations. These results collectively confirm that RfTPM not only provides robust defense against various security threats, including TOCTOU and rollback attacks, but also enhances operational efficiency, making it an optimal solution for secure computing on RISC-V platforms.  Conclusions  In summary, RfTPM represents the first comprehensive firmware-based TPM architecture specifically tailored for RISC-V platforms, effectively addressing critical challenges such as secure execution, trusted boot integrity, efficient inter-layer communication, and precise timekeeping without incurring additional hardware costs. By integrating advanced techniques—including PMP-based memory isolation, DRAM PUF-enhanced static data protection, a dual-phase boot process with delay measurement extension, dynamic permission exchange for zero-copy communication, and a hardware-based secure clock—RfTPM delivers robust security functionality that matches or exceeds that of traditional hardware TPMs. Experimental results confirm that RfTPM upholds rigorous security standards while offering substantial performance and resource utilization advantages over both hardware TPMs and existing software TPMs. The open-sourcing of core components further fosters community collaboration and provides a platform for future research focused on refining trusted computing solutions for emerging architectures like RISC-V. Future work may explore additional hardware optimizations, such as native AES instruction support, and further enhancements to file system performance to increase the efficiency and robustness of fTPM implementations.
Wire Length Driven Tension Refine Based Macro Placer
ZHU Yanzhen, YAN Haopeng, CAI Shuting, GAO Peng
2025, 47(7): 2396-2404. doi: 10.11999/JEIT241079
Abstract:
  Objective  With the introduction of reuse methodologies in integrated circuit design, the utilization of macro cells in Very Large Scale Integration (VLSI) has significantly increased. However, the considerable size difference between macro cells and standard cells presents a significant challenge for circuit placers. This study proposes a novel macro placer, WIMPlace, based on tension fine-tuning and wirelength-driven approaches. The aim is to address issues such as density imbalance and degradation of solution quality observed in existing mixed-size placers, thereby providing a more effective solution for VLSI design.  Methods  The proposed method in this paper consists of four stages: preprocessing, pre-placement, macro cell fine-tuning, and macro legalization. Initially, a weight-based partitioning approach is employed to group standard cells with macro into supersets, addressing density issues during the initial placement (Section 3.1). In the pre-placement stage, the DREAMPlace 2.0 tool is used for placing standard cells, and the initial positions of macro cells are determined based on the locations of these clusters (Section 3.2). A local tension model, inspired by the principle of surface tension in liquids, is then adopted to fine-tune the positions of macros, ensuring that connections between standard cells and macros are as compact as possible (Section 3.3, Fig. 2). Finally, a constraint graph-based macro legalization strategy is applied to prevent overlaps between macros (Section 3.4, Fig. 3).  Results and Discussions  Experimental results demonstrate that the WIMPlace achieves exceptional performance on the MMS benchmark, outperforming other advanced mixed-size placers, such as ePlace-MS and DREAMPlace 4.0. Specifically, in 15 out of 16 cases, it achieved the shortest wirelength, with average reductions of 4.31% and 2.39%, respectively (Section 4, Table 2). Additionally, WIMPlace exhibits excellent solution stability, particularly showing a linear increase in runtime as the number of cells increases (Section 4, Fig. 4), indicating that the algorithm not only optimizes wirelength effectively but also demonstrates high computational efficiency. Notably, in the newblue3 case, despite the macro cells occupying a significant portion of the chip area, WIMPlace still demonstrated strong adaptability.  Conclusions  In summary, WIMPlace, as proposed in this paper, is an efficient macro cell placer that achieves gradual fine-tuning optimization of macro cells by combining gradient field movements based on a surface tension analogy and employing preprocessing techniques to balance macros with their associated standard cells. Compared to existing mixed-size placers, WIMPlace demonstrates superior performance across multiple key metrics, particularly in wirelength optimization. Future work could focus on integrating additional design objectives, such as timing, congestion, and thermal management, to enhance the applicability and flexibility of WIMPlace. This study provides new perspectives and technical approaches for VLSI design.