Current Articles

2025, Volume 47,  Issue 8

Cover
Cover
2025, 47(8)
Abstract:
2025, 47(8): 1-4.
Abstract:
Excellence Action Plan Leading Column
Optimal and Suboptimal Architectures of Millimeter-wave Large-scale Arrays for 6G
HONG Wei, XU Jun, CHEN Jixin, HAO Zhangcheng, ZHOU Jianyi, YU zhiqiang, YANG Guangqi, JIANG Zhihao, YU Chao, HU Yun, HOU Debin, ZHU Xiaowei, CHEN Zhe, ZHOU Peigen
2025, 47(8): 2405-2415. doi: 10.11999/JEIT250109
Abstract:
  Objective  Beamforming array technology is a key enabler for modern radio systems, evolving across three primary dimensions: advancements in electromagnetic theory, innovations in semiconductor manufacturing, and iterations in system architectures. The development of mobile communication has driven the progress of beamforming array technologies. Hybrid beamforming array technology, in particular, was established as a critical solution for 5G millimeter-wave communications under the 3GPP Release 15 standard. To address the needs of future 6G communication and sensing, millimeter-wave beamforming arrays will evolve towards ultra-large-scale (>1000 elements), intelligent (AI-driven), and heterogeneous (integration of photonics and electronics) architectures, serving as the foundation for ubiquitous intelligent connectivity. This article investigates optimal and suboptimal large-scale beamforming array architectures for 6G millimeter-wave communications.  Methods  Large-scale beamforming arrays can be classified into three primary types: analog-domain beamforming arrays, digital-domain beamforming arrays, and hybrid-domain beamforming arrays. These arrays can be further categorized into single-beam and multi-beam configurations based on the number of supported beams. Each category includes various architectural variants (Figure 1). Analog-domain beamforming arrays (Figure 2) consist of passive and active beamforming arrays. Active beamforming arrays are further divided into Radio Frequency (RF) phase-shifting, Intermediate Frequency (IF) phase-shifting, and Local Oscillator (LO) phase-shifting architectures. Digital-domain implementations (Figure 3) include symmetric and asymmetric full-digital beamforming architectures. Hybrid-domain configurations (Figure 4) offer various combinations, such as architectures integrating RF phase-shifting phased subarrays with digital beamforming, or hybrid multi-beam arrays that combine passive beamforming networks with digital processing. In terms of performance, the symmetric full-digital beamforming architecture (Figure 5) is considered the optimal solution among all beamforming arrays. However, it faces challenges such as high system costs, excessive power consumption, and increased complexity due to the need for numerous high-speed ADCs and real-time processing of large data streams. To address these limitations in symmetric full-digital multi-beam large-scale arrays, an asymmetric large-scale full-digital multi-beam array architecture was proposed (Figure 6). Additionally, a spatial-domain orthogonal hybrid beamforming array (Figure 8) is proposed, which uses differentiated beamforming techniques across spatial dimensions to implement large-scale hybrid beamforming arrays.  Results and Discussions  Current 5G millimeter-wave base stations primarily utilize hybrid beamforming massive MIMO array architectures (Figure 7), which integrate RF phase-shifting phased subarrays with digital-domain beamforming. In this configuration, each two-dimensional RF phase-shifting phased subarray is connected to a dedicated Up-Down Converter (UDC), followed by secondary digital beamforming processing. However, this architecture limits independent beam control flexibility, often leading to the abandonment of digital beamforming in practical implementations. Therefore, each beam achieves only subarray-level gain. For mobile communication base stations, the adoption of asymmetric full-digital phased arrays (Figure 6), which include large-scale transmit arrays and smaller receive arrays, offers an optimal balance between cost, power consumption, complexity, and performance. This configuration meets uplink/downlink traffic demands while enabling wider receive beams (corresponding to compact receive arrays) that support accurate angle-of-arrival estimation. Theoretically, hybrid multi-beam architectures that combine digital beamforming in the horizontal dimension with analog beamforming in the vertical dimension (or vice versa) can reduce system complexity, cost, and power consumption without degrading performance, potentially outperforming current 5G mmWave hybrid beamforming solutions. However, these architectures face limitations due to beam bundling. Building upon the 4-subarray hybrid beamforming architecture (256 elements) used in existing 5G mmWave base stations (Figure 7), a spatial-domain orthogonal hybrid beamforming array is proposed (Figure 8). In this configuration, vertical-dimension elements are grouped into sixteen 1D RF phase-shifting phased subarrays (16 elements each), with each subarray connected to individual UDCs and ADC/DAC chains. The 16 data streams are processed through horizontal-dimension digital beamforming, achieving spatial orthogonality between the analog and digital beamforming domains. This architecture preserves full-aperture gain for each beam, supporting enhanced transmission rates and system capacity. This hybrid multi-beam solution maintains the same beamforming chip count as conventional hybrid architectures (Figure 7) for identical array scales, requiring only an increase in UDC channels from 4 to 16, with minimal cost impact. The proposed solution supports 16 simultaneous beams/data streams, resulting in a significant capacity improvement. For dual-polarization configurations, this extends to 32 beams/data streams, further enhancing system capacity. In horizontal-digital/vertical-analog implementations, all beams align along the horizontal plane, with vertical scanning limited to simultaneous elevation adjustment (Figure 9). Although vertical beam grouping enables independent elevation control, it results in beam gain degradation.  Conclusions  From a performance perspective, the symmetric full-digital beamforming array architecture can be considered the optimal solution. However, it is hindered by high system complexity and cost. The asymmetric full-digital beamforming array architecture significantly reduces system complexity and cost while closely approaching the performance of its symmetric counterpart, making it a suboptimal yet practical choice for large-scale beamforming arrays. Additionally, the spatially orthogonal hybrid beamforming array architecture—such as the design combining vertical analog phased subarrays with horizontal digital beamforming—represents another suboptimal solution. Notably, this hybrid architecture outperforms current 5G mmWave hybrid beamforming systems in terms of performance.
Overviews
Advances and Challenges in Wireless Channel Hardware Twin
FANG Sheng, ZHU Qiuming, XIE Yuetian, JIANG Hao, LI Hui, WU Qihui, MAO Kai, HUA Boyu
2025, 47(8): 2416-2428. doi: 10.11999/JEIT241093
Abstract:
  Significance   Wireless channel characteristics are key determinants of communication system performance and reliability. Channel twin technology—defined as the physical or digital reproduction of channel distortion effects—has become essential for system testing and validation. Digital twin approaches are favored for their flexibility and efficiency, and hardware-based platforms (i.e., channel emulators, CEs) are widely used for large-scale performance evaluation. However, as networks advance toward Terahertz (THz) bands, extremely large-scale massive Multiple-Input Multiple-Output (XL-MIMO) systems (e.g., 1024-antenna arrays), and integrated air-space-ground-sea communications, three key limitations remain: (1) Inability to support real-time processing for ultra-wide bandwidths over 10 GHz; (2) Inadequate dynamic emulation accuracy for non-stationary channels under high mobility; and (3) Insufficient hardware resources for simulating over 106 concurrent channels in XL-MIMO architectures. This study reviews state-of-the-art hardware twin technologies, identifies critical technical bottlenecks, and outlines future directions for next-generation channel emulation platforms.  Progress   Existing channel hardware twin technologies can be categorized into three paradigms based on channel data sources, model types, and implementation architectures. First, measured data-driven twin technology uses real-world propagation data to enable high-fidelity emulation. Signal replay reproduces specific electromagnetic environments by replaying recorded signals. Although this approach preserves scene-specific authenticity, it lacks flexibility due to dependence on measured datasets and storage constraints. Channel Impulse Response (CIR) replay extracts propagation characteristics from measurement data, making it applicable to unmodeled environments such as underwater acoustics. However, its accuracy depends on precise channel estimation and is limited by data sampling resolution and storage capacity. Second, deterministic model-driven twin technology generates CIR using Finite Impulse Response (FIR) filters by synthesizing multipath delays and fading coefficients for predefined scenarios. Techniques such as sparse filtering and subspace projection optimize the trade-off between accuracy and hardware efficiency. For example, current large-scale emulators support 256×256 MIMO systems with 512-tap FIR filters, requiring only four active taps. Nonetheless, limited clock resolution introduces phase distortion in the frequency response, reducing fidelity in high-frequency terahertz applications. Third, statistical model-driven twin technology emulates time-varying channel behavior by generating fading and delay profiles based on probability distributions. The sum-of-sinusoids method is widely employed due to its simplicity and low computational demand, while enhanced implementations—such as the coordinate rotating digital computer algorithm—minimize storage requirements for sinusoid generation. This paradigm offers strong scalability but sacrifices scenario-specific fidelity, limiting its ability to reproduce certain channel characteristics accurately. A comparative analysis across fidelity, flexibility, scalability, and implementation complexity shows that measured data-driven methods are best suited for reproducing real-world environments; deterministic models support configurable scenario design for known settings; and statistical models facilitate efficient emulation of large-scale networks. Each approach balances distinct advantages against inherent limitations.  Prospects   Future developments in channel hardware twin technologies are expected to integrate emerging innovations to overcome current limitations: (1) Deep learning techniques—such as generative adversarial networks—can learn from limited measured channel data to synthesize channel characteristics, reducing dependence on extensive datasets in measured data-driven approaches. (2) The environment-aware capabilities of next-generation communication networks enable dynamic reconstruction of propagation environments, addressing the lack of real-time adaptability in deterministic model-driven technologies. (3) Transfer learning enables the migration of knowledge across propagation scenarios, improving the cross-scenario generalization of statistical model-driven emulation without requiring large amounts of measured data. Future applications of channel hardware twin technologies are expected to advance in three primary directions: (1) real-time optimization of communication systems; (2) network planning and design; and (3) testing and evaluation of electromagnetic devices. Through the integration of deep learning and environmental sensing, hardware twin platforms will support intelligent, self-adaptive communication systems capable of meeting the increasing complexity of future network demands.  Conclusions  This review synthesizes recent progress in channel hardware twin technologies and addresses critical challenges posed by future communication scenarios characterized by ultra-wide bandwidths, high channel dynamics, and large-scale networking. Key issues include high-frequency wideband signal processing, emulation of non-stationary dynamic environments, and scalability to large multi-branch network architectures. A classification framework is proposed, categorizing existing hardware twin approaches into three paradigms—measured data-driven, deterministic model-driven, and statistical model-driven—based on data sources, modeling strategies, and implementation architectures. A comparative analysis of these paradigms evaluates their relative strengths and limitations in terms of authenticity, flexibility, scalability, and emulation duration, providing guidance for selecting appropriate emulation strategies in complex environments. Furthermore, this study explores the integration of emerging technologies such as generative networks, environmental sensing, and transfer learning to support data-efficient generation, dynamic scenario adaptation, and cross-scene generalization. These advancements are expected to enhance the efficiency and adaptability of channel hardware twins, enabling them to meet the stringent requirements of future communication systems in performance validation, network design, and device testing. This work offers a foundation for advancing innovation in channel hardware twin technologies and accelerating the development of next-generation wireless networks.
Exploration of Application of Artificial Intelligence Technology in Underwater Acoustic Network Routing Protocols
ZHAO Yihao, CHEN Yougan, LI Jianghui, WAN Lei, TAO Yi, WANG Xuchen, DONG Yanhan, TU Shen’ao, XU Xiaomei
2025, 47(8): 2429-2447. doi: 10.11999/JEIT250110
Abstract:
  Significance   In response to the strategic emphasis on maritime power, China has experienced growing demand for ocean resource exploration, ecological monitoring, and defense applications. Underwater acoustic networks provide an effective solution for data acquisition in these domains, with network performance largely dependent on the design and implementation of routing protocols. These protocols determine the transmission path and method, forming a foundation for optimizing underwater communication. Recent advances in Artificial Intelligence (AI) have prompted efforts to apply AI techniques to underwater acoustic network routing. By leveraging AI’s learning capacity, data insight capability, and adaptability, researchers aim to address challenges posed by dynamic underwater environments, energy limitations of nodes, and potential security threats. This paper examines the integration of AI technology into underwater acoustic network routing protocols and provides a critical evaluation of current research progress.   Progress   This paper reviews the application of AI technology in underwater acoustic network routing protocols, classifying existing approaches into flat and hierarchical routing categories. In flat routing, AI methods such as conventional heuristic algorithms, reinforcement learning, and deep learning have been applied to improve routing decisions. For hierarchical routing, AI is utilized not only for routing optimization but also for node clustering and layer structuring. These applications offer potential benefits, including enhanced routing efficiency, reduced energy consumption, improved end-to-end delay, and strengthened network security. Most performance evaluations are based on simulations. However, simulation environments vary considerably across studies, particularly in node quantity and density, ranging from small-scale to very large-scale networks. This variability complicates quantitative comparisons of performance metrics. Additionally, replicating these simulation scenarios in sea trials is limited by the logistical and financial constraints of deploying and recovering large numbers of nodes, thus impeding the validation of protocol performance under real-world conditions. The review further identifies critical challenges in applying AI to underwater acoustic networks. Many AI-based protocols operate under impractical assumptions, such as global knowledge of node positions and energy levels, which is rarely achievable in dynamic underwater settings. Maintaining such information requires substantial communication overhead, thereby increasing energy consumption and delay. Furthermore, the computational complexity of AI algorithms—particularly deep learning models—presents difficulties for implementation on underwater nodes with limited power, processing, and storage capacities. Few studies provide detailed complexity analyses, and hardware-based performance verifications remain scarce. This lack of real-world validation limits the assessment of the practical feasibility and effectiveness of AI-enabled routing protocols.  Conclusions  AI technology offers considerable potential for enhancing underwater acoustic network routing protocols by addressing key challenges such as environmental variability, energy constraints, and security threats. However, current research is constrained by several limitations. Many studies rely on unrealistic assumptions regarding the availability of complete node information, which is impractical in dynamic underwater settings. The acquisition and maintenance of such information entail substantial communication overhead, leading to increased energy consumption and delay. Moreover, the computational demands of AI algorithms—particularly deep learning models—often exceed the capabilities of resource-limited underwater nodes. Performance assessments remain predominantly simulation-based, with limited hardware implementation, thereby restricting the verification of real-world feasibility and effectiveness.  Prospects  Future research should prioritize the development of more accurate and realistic simulation platforms to support the evaluation of AI-based routing protocols. This includes the integration of advanced channel models and real-world observational data to improve simulation fidelity. Establishing standardized simulation conditions will also be essential for enabling consistent performance comparisons across studies. In parallel, greater emphasis should be placed on hardware implementation of AI algorithms, with efforts directed toward reducing algorithmic complexity and storage demands to accommodate the limitations of energy-constrained underwater nodes. Exploring cost-effective validation approaches, such as small-scale sea trials and semi-physical simulation frameworks, will be critical for assessing the practical performance and deployment feasibility of AI-enabled routing protocols.
An Overview of Resource Management Technology of 6G Integrated Communication, Sensing, and Computation Enabled Satellite-Terrestrial Intelligent Network
WU Yanyan, WU Song, DENG Wei
2025, 47(8): 2448-2472. doi: 10.11999/JEIT250140
Abstract:
  Significance   As 6G mobile communication systems continue to evolve, Integrated Communication, Sensing, and Computation (ICSC) technology has emerged as a key area of research. ICSC not only improves network performance but also meets increasingly diverse and personalized user requirements. Recent progress in spectrum sharing, high-precision sensing algorithms, dynamic computing resource scheduling, and Artificial Intelligence (AI) has supported the development of 6G networks. However, several challenges remain. These include inefficient spectrum utilization, limited accuracy and real-time performance of sensing algorithms, and insufficient adaptability and intelligence in computing resource scheduling strategies. Moreover, integrating these technologies into the 6G ICSC Enabled Satellite-Terrestrial Intelligent Network (6G-ICSC-STIN) for effective resource management and optimal allocation is an unresolved issue. To address demands for high bandwidth, low latency, and wide coverage in future networks, a distributed intelligent resource management strategy is designed. Based on this approach, a resource management framework combining game theory and multi-agent reinforcement learning is proposed, offering guidance for advancing resource management in 6G-ICSC-STIN systems.  Progress   This paper provides a comprehensive discussion of resource management technologies for 6G ICSC Enabled Satellite–Terrestrial Intelligent Networks (6G-ICSC-STIN). It summarizes key technological advances driving the field and presents recent progress in four core areas: spectrum sharing, high-precision sensing algorithms, dynamic computing resource scheduling, and the application of AI in ICSC systems. Measurement indicators for ICSC performance are also examined. Based on this review, a 6G-ICSC-STIN architecture is proposed (Fig. 2), integrating 6G communication, sensing, computation, and intelligent coordination technologies. This architecture fully leverages the capabilities of satellites, unmanned aerial vehicles, High-Altitude Platforms (HAPs), and ground terminals to enable seamless and full-domain coverage across space, air, ground, and sea. It supports deep integration of communication, sensing, computation, intelligence, and security, resulting in a unified network system characterized by more precise perception and transmission, improved resource coordination, lower system overhead, and enhanced user experience. To address complex resource management challenges, a functional block diagram comprising the application, service, capability, and resource layers is introduced (Fig. 3), aiming to identify new approaches for efficient resource allocation. A distributed intelligent resource management strategy is further proposed for the ICSC central, fog node, edge networks and terminal (Fig. 4). Within the integrated edge network, a novel “Master–Slave two-level edge node” architecture is designed, in which the Master node deploys a resource demand prediction model to estimate regional demand in real time (Fig. 6). Building on this strategy, a resource management framework based on game theory and multi-agent reinforcement learning is proposed (Fig. 5). This framework employs the Nash-Equilibrium Asynchronous Advantage Actor-Critic (Nash-E-A3C) algorithm, adopts a parallelized multi-agent and distributed computing approach, and integrates Nash equilibrium theory (Fig. 7), with the aim of achieving intelligent, collaborative, and efficient network resource management.  Conclusions  The distributed intelligent resource management strategy is essential for achieving efficient resource coordination and optimal utilization in the 6G-ICSC-STIN architecture. By decentralizing computing, storage, and communication resources across network nodes, it enables resource sharing and collaborative operation. The proposed architecture, grounded in game theory and multi-agent reinforcement learning, supports dynamic resource allocation and optimization. Agents are deployed at each node, where they make decisions based on local demands and environmental conditions using game-theoretic reasoning and Reinforcement Learning (RL) algorithms. This approach enables globally efficient resource management across the network.  Prospects   Cross-domain technological integration is fundamental to the realization of 6G-ICSC-STIN. Deep integration of sensing, communication, and computing capabilities can substantially enhance overall network performance and efficiency. However, this integration faces several challenges, including heterogeneous network compatibility, complex resource scheduling, fragmented security mechanisms, and slow progress in standardization. Efficient resource representation is critical for effective resource management and performance optimization. Existing studies show that resources in satellite-terrestrial integrated networks are heterogeneous, multidimensional, and unevenly distributed across large spatiotemporal scales, posing new challenges to resource coordination. This paper outlines future development trends in intelligent resource management for 6G-ICSC-STIN, synthesizing current research progress, key challenges, and future directions in cross-domain technology fusion and resource representation. These emerging technologies together form a foundation for intelligent and efficient resource management in 6G-ICSC-STIN and offer new pathways for the advancement of next-generation wireless communication systems.
A Review on Action Recognition Based on Contrastive Learning
SUN Zhonghua, WU Shuang, JIA Kebin, FENG Jinchao, LIU Pengyu
2025, 47(8): 2473-2485. doi: 10.11999/JEIT250131
Abstract:
  Significance   Action recognition is a key topic in computer vision research and has evolved into an interdisciplinary area integrating computer vision, deep learning, and pattern recognition. It seeks to identify human actions by analyzing diverse modalities, including skeleton sequences, RGB images, depth maps, and video frames. Currently, action recognition plays a central role in human-computer interaction, video surveillance, virtual reality, and intelligent security systems. Its broad application potential has led to increasing attention in recent years. However, the task remains challenging due to the large number of action categories and significant intra-class variation. A major barrier to improving recognition accuracy is the reliance on large-scale annotated datasets, which are costly and time-consuming to construct. Contrastive learning offers a promising solution to this problem. Since its initial proposal in 1992, contrastive learning has undergone substantial development, yielding a series of advanced models that have demonstrated strong performance when applied to action recognition.  Progress   Recent developments in contrastive learning-based action recognition methods are comprehensively reviewed. Contrastive learning is categorized into three stages: traditional contrastive learning, clustering-based contrastive learning, and contrastive learning without negative samples. In the traditional contrastive learning stage, mainstream action recognition approaches are examined with reference to the Simple framework for Contrastive Learning of visual Representations (SimCLR) and Momentum Contrast v2 (MoCo-v2). For SimCLR-based methods, the principles are discussed progressively across three dimensions: temporal contrast, spatio-temporal contrast, and the integration of spatio-temporal and global-local contrast. For MoCo-v2, early applications in action recognition are briefly introduced, followed by methods proposed to enrich the positive sample set. Cross-view complementarity is addressed through a summary of methods incorporating knowledge distillation. For different data modalities, approaches that exploit the hierarchical structure of human skeletons are reviewed. In the clustering-based stage, methods are examined under the frameworks of Prototypical Contrastive Learning (PCL) and Swapping Assignments between multiple Views of the same image (SwAV). For contrastive learning without negative samples, representative methods based on Bootstrap Your Own Latent (BYOL) and Simple Siamese networks (SimSiam) are analyzed. Additionally, the roles of data augmentation and encoder design in the integration of contrastive learning with action recognition are discussed in detail. Data augmentation strategies are primarily dependent on input modality and dimensionality, whereas encoder selection is guided by the characteristics of the input and its representation mapping. Various contrastive loss functions are categorized systematically, and their corresponding formulas are provided. Several benchmark datasets used for evaluation are introduced. Performance results of the reviewed methods are presented under three categories: unsupervised single-stream, unsupervised multi-stream, and semi-supervised approaches. Finally, the methods are compared both horizontally (across techniques) and vertically (across stages).  Conclusions  In the data augmentation analysis, two dimensions are considered: modality and transformation type. For RGB images or video frames, which contain rich pixel-level information, augmentations such as spatial cropping, horizontal flipping, color jittering, grayscale conversion, and Gaussian blurring are commonly applied. These operations generate varied views of the same content without altering its semantic meaning. For skeleton sequences, augmentation methods are selected to preserve structural integrity. Common strategies include shearing, rotation, scaling, and the use of view-invariant coordinate systems. Skeleton data can also be segmented by individual joints, multiple joints, all joints, or along spatial and temporal axes separately. Regarding dimensional transformations, spatial augmentations include cropping, flipping, rotation, and axis masking, all of which enhance the salience of key spatial features. Temporal transformations apply time-domain cropping and flipping, or resampling to different frame rates, to leverage temporal continuity and short-term action invariance. Spatio-temporal transformations typically use Gaussian blur and Gaussian noise to simulate real-world perturbations while preserving overall action semantics. For encoder selection, temporal modeling commonly uses Gated Recurrent Units (GRUs), Long Short-Term Memory networks (LSTMs), and Sequence-to-Sequence (S2S) models. LSTM is suitable for long-term temporal dependencies, while bidirectional GRU captures temporal patterns in both forward and backward directions, allowing for richer temporal representations. Spatial encoders are typically based on the ResNet architecture. ResNet18, a shallower model, is preferred for small datasets or low-resource scenarios, whereas ResNet50, a deeper model, is better suited for complex feature extraction on larger datasets. For spatio-temporal encoding, ST-GCN are employed to jointly model spatial configurations and temporal dynamics of skeletal actions. In the experimental evaluation, performance comparisons of the reviewed methods yield several constructive insights and summaries, providing guidance for future research on contrastive learning in action recognition.  Prospects   The limitations and potential developments of action recognition methods based on contrastive learning are discussed from three aspects: runtime efficiency, the quality of negative samples, and the design of contrastive loss functions.
A Review of Electronic Skin and Its Application in Clinical Diagnosis and Treatment of Traditional Chinese Medicine
WANG Zheng, MI Jinpeng, CHEN Guodong
2025, 47(8): 2486-2498. doi: 10.11999/JEIT250148
Abstract:
Integrating Electronic Skin (e-skin) into Traditional Chinese Medicine (TCM) diagnostics offers a novel approach to addressing long-standing issues of standardization and objectivity. Core diagnostic practices in TCM-pulse assessment, tongue analysis, and acupuncture, are predominantly based on subjective interpretation, which hinders reproducibility and limits broader clinical acceptance. This review examines recent advances in e-skin technology, including flexible electronics, multimodal sensing, and Artificial Intelligence (AI), and discusses their potential to support quantifiable, data-driven diagnostic frameworks. These developments may provide a technological basis for modernizing TCM while maintaining its holistic orientation. This review systematically examines the convergence of TCM clinical requirements and e-skin technologies through a comprehensive survey of over 60 peer-reviewed studies and patents published between 2015 and 2024. First, the current state of e-skin research is mapped onto the diagnostic needs of TCM, with a focus on material flexibility, multisensory integration, and energy autonomy. Second, key technical challenges are identified through comparative analysis of sensor performance metrics (e.g., sensitivity, durability) and TCM-specific biomarker detection requirements. Third, a framework is proposed for optimizing e-skin architectures in accordance with TCM’s systemic diagnostic logic. The analysis highlights three technical domains: (1) Material innovations: Graphene-polymer composites and liquid metal-hydrogel interfaces that enable conformal adherence to dynamic biological surfaces (Fig. 3). (2) Multimodal sensing: Heterogeneous sensor arrays capable of synchronously capturing pulse waveforms, tongue coatings, and acupoint bioimpedance (Table 1). (3) AI-driven signal interpretation: Deep learning models such as ResNet-1D and transformer networks for classifying TCM pulse patterns and body constitutions. e-skin technologies have advanced significantly in supporting the digital transformation of TCM through innovations in materials, sensing functions, and algorithmic design. In pulse diagnosis, graphene-based sensor arrays achieve 89.3% classification accuracy across 27 pulse categories (Table 2), exceeding manual assessments (Kappa: 0.72 vs. 0.51) by quantifying nuanced differences in pulse types such as “slippery” and “wiry” (Fig. 1). For tongue diagnosis, MXene-enabled multispectral imaging (400~1000 nm) supports automated analysis of coating thickness with an F1-score of 0.91, and reveals thermal-humidity gradients correlated with Yang Deficiency patterns (Fig. 6). Acupuncture standardization has improved through the use of piezoresistive needle arrays, which reduce insertion depth errors to ±0.3 mm. Integration with machine learning further enables classification of nine TCM body constitutions at 86.4% accuracy, supporting personalized therapeutic strategies (Fig. 5). Despite these achievements, key technical limitations remain. Material degradation and signal synchronization latency over 72 ms restrict real-time applications. Variability in sensor specifications (sampling rates from 50 to 2,000 Hz) and the lack of quantifiable biomarkers for TCM concepts such as Qi-Stagnation continue to hinder clinical validation (Table 2). Future research should focus on: (1) Self-healing materials: Bioinspired hydrogels with strain tolerance over 300% and enhanced fatigue resistance. (2) Edge-AI architectures: Lightweight transformer-CNN hybrids optimized for reduced latency (<20 ms). (3) TCM-specific biomarkers: Electrochemical sensors designed to detect molecular correlates of Yin-Yang imbalances. This review outlines a roadmap for modernizing TCM through e-skin integration by aligning technological advances with clinical requirements. Three key insights are emphasized: (1) Material-device co-design: Engineering stretchable electronics to accommodate the dynamic diagnostic contexts of TCM. (2) Multimodal data fusion: Combining pulse, tongue, and meridian signals to support systemic pattern differentiation. (3) Regulatory frameworks: Establishing TCM-oriented standards for sensor validation and clinical reliability. Emerging applications-including Internet of Things (IoT)-connected e-skin patches for continuous Zang-Fu organ monitoring and AI-guided acupuncture robotics-illustrate the field’s transformative potential. By 2030, the interdisciplinary integration of flexible electronics, artificial intelligence, and TCM principles is projected to enable e-skin diagnostic systems to be adopted in 40% of tertiary hospitals, supporting the transition of TCM toward a globally recognized precision medicine paradigm.
DTDS: Dilithium Dataset for Power Analysis
YUAN Qingjun, ZHANG Haojin, FAN Haopeng, GAO Yang, WANG Yongjuan
2025, 47(8): 2499-2508. doi: 10.11999/JEIT250048
Abstract:
  Objective  The development of quantum computing threatens the security of traditional cryptosystems and advances the research and standardisation of post-quantum cryptographic algorithms. The Dilithium digital signature algorithm is designed based on the lattice theory and was selected by USA National Institute of Standards and Technology (NIST) as the standard for post-quantum cryptographic algorithms in 2024. Meanwhile, the side channel analysis of Dilithium, especially the power analysis, has become a current research hotspot. However, the existing power analysis datasets are mainly for classical packet cryptography algorithms, such as AES, etc., and the lack of datasets for novel algorithms, such as Dilithium, restricts the research of side-channel security analysis methods.  Results and Discussions  For this reason, this paper collects and discloses the first power analysis dataset for the Dilithium algorithm, aiming to facilitate the research on power analysis of post-quantum cryptographic algorithms. The dataset is based on the open-source reference implementation of Dilithium, running on a Cortex M4 processor and captured by a dedicated device, and contains 60,000 traces captured during the Dilithium signature process, as well as the signature source data and sensitive intermediate values corresponding to each trace.  Conclusions  The constructed DTDS dataset is further visualised and analysed, and the execution process of the random polynomial generation function polyz_unpack and its effect on the traces are investigated in detail. Finally, the dataset is modelled and tested using template analysis and deep learning analytics to verify the validity and usefulness of the dataset. The dataset and code could be found at https://doi.org/10.57760/sciencedb.j00173.00001.
Wireless Communication and Internet of Things
Adaptive Multi-Mode Blind Equalization Scheme for OFDM-NOMA Systems
YANG Long, YU Kaixin, LI Jin, JIA Ziyi
2025, 47(8): 2509-2520. doi: 10.11999/JEIT250153
Abstract:
  Objective  Orthogonal Frequency Division Multiplexing (OFDM) combined with Non-Orthogonal Multiple Access (NOMA) is widely applied in next-generation wireless communication systems for its high spectral efficiency and support for concurrent multi-user transmission. However, in downlink transmission, the superposition of signals from multiple users on the same subcarrier yields non-standard Quadrature Amplitude Modulation (QAM) constellations, rendering conventional equalization techniques ineffective. In addition, channel variability and impulsive noise introduce severe distortion, further degrading system performance. To overcome these limitations, this paper proposes an unsupervised adaptive multi-mode blind equalization scheme designed for OFDM-NOMA systems.  Methods  The proposed equalization scheme combines the Multi-Mode Algorithm (MMA) with a Soft-Decision Directed (SDD) strategy to construct an adaptive cost function. This function incorporates the power allocation factors of NOMA users to compensate for amplitude and phase distortions introduced by the wireless channel. To minimize the cost function efficiently, an optimized Newton method is employed, which avoids direct matrix inversion to reduce computational complexity. An iterative update rule is derived to enable fast convergence with low processing overhead. The algorithm is implemented on a real-time Software-Defined Radio (SDR) system using the GNURadio platform for practical validation.  Results and Discussions  Simulation results show that the proposed equalization algorithm substantially outperforms conventional methods in both convergence speed and accuracy. Compared with the traditional Minimum Mean Square Error (MMSE) algorithm, it reduces convergence time by 90% while achieving comparable performance without the use of pilot signals (Fig. 8). Constellation diagrams before and after equalization confirm that the algorithm effectively restores non-standard QAM constellations distorted by NOMA signal superposition (Fig. 9). The method also demonstrates strong robustness to impulsive noise and dynamic channel variations. Complexity analysis indicates that the proposed algorithm incurs lower computational overhead than conventional Newton-based equalization approaches (Table 1). Experimental validation on the GNURadio platform confirms its ability to separate user signals and support accurate decoding in real-world OFDM-NOMA downlink conditions (Fig. 12).  Conclusions  This study presents a blind equalization scheme for OFDM-NOMA systems based on an MMA-SDD adaptive cost function and an optimized Newton method. The proposed algorithm compensates for amplitude and phase distortions, enabling reliable signal recovery without pilot information. Theoretical analysis, simulation results, and experimental validation confirm its fast convergence, robustness to noise, and low computational complexity. These characteristics support its potential for practical deployment in future NOMA-based wireless communication networks.
Anti-interrupted Sampling Repeater Jamming Method Based on down-sampling Processing Blind Source Separation
LIU Yipin, YU Lei, WEI Yinsheng
2025, 47(8): 2521-2534. doi: 10.11999/JEIT250193
Abstract:
  Objective  Advancements in radar jamming technology have made coherent jamming generated by Digital Radio Frequency Memory (DRFM) a significant threat to radar detection. This type of jamming exhibits considerable spectral overlap with target echo signals and shares similar time-frequency characteristics. Even after matched filtering is applied to the received signal, the jamming can still achieve high gain. Among various forms, Interrupted Sampling Repeater Jamming (ISRJ) presents both suppression and deception effects, combined with high agility and diversity, posing a considerable challenge to radar detection systems. Existing ISRJ suppression methods face several limitations, including reliance on prior knowledge of jamming parameters, reduced robustness against ISRJ style variations, and the need for advance detection of ISRJ forwarding strategies. Blind Source Separation (BSS) can extract source signals based solely on the received mixture, without requiring prior information about the source or transmission parameters. BSS is widely applied in radar anti-jamming scenarios due to its high robustness. However, as ISRJ is primarily deployed for self-defense jamming, conventional BSS methods lack spatial degrees of freedom and cannot effectively suppress such interference. To address this limitation, this study proposes a down-sampling BSS method for ISRJ suppression. By applying dechirping and down-sampling to the echo signal, varying the down-sampling retention positions produces multiple down-sampled output signals. Theoretical analysis demonstrates that the jamming and target signals in these multi-channel down-sampled outputs satisfy the linear mixing model required for BSS. BSS is subsequently applied to separate the ISRJ and target components. This study introduces BSS into ISRJ suppression, providing a highly robust approach that does not depend on prior knowledge, with theoretical validation supporting the method.  Methods   In self-defense ISRJ scenarios, the jamming and target share the same azimuth angle, resulting in a loss of spatial freedom in the received signal. Therefore conventional BSS methods based on linear instantaneous mixing models are no longer applicable. When all source signals originate from the same azimuth, the rank of the receiving array manifold matrix reduces to one, causing the array receiving model to degenerate into an effective single-channel system. However, BSS requires multiple mixed signals to perform signal separation. To overcome this limitation, this study proposes a down-sampling BSS method for ISRJ suppression. The approach begins by applying oversampling to the received signal, followed by dechirp processing of the single-channel echo signal that contains both jamming and target components. Through conjugate multiplication of the echo signal with a reference signal, both the ISRJ and target echo are converted into sinusoidal signals with fixed frequencies and time-domain windowing characteristics. Subsequently, the signal undergoes down-sampling, during which multiple down-sampled output signals are generated by varying the retention positions of the sampled data. This process effectively restores the degrees of freedom required for separation. Theoretical analysis confirms that the ISRJ and target components in the down-sampled output signals satisfy the linear mixing model necessary for BSS processing. The multi-channel down-sampled signals are then used as input for BSS, enabling the separation of jamming and target components. Pulse compression is performed via Fourier transform to enhance detection resolution. Finally, target detection is conducted on each separated component to isolate the jamming signals and recover the target echoes, thereby achieving anti-jamming performance.  Results and Discussions  The key innovation of the proposed method is the application of BSS to ISRJ suppression, eliminating the requirement for precise estimation of ISRJ parameters and demonstrating high robustness. Furthermore, a single-frequency, single-channel BSS approach based on down-sampling is presented, which has potential application beyond jamming suppression. Simulation results confirm that the proposed method effectively separates ISRJ from the target signal (Fig. 7) and suppresses multiple ISRJ types, including direct forwarding ISRJ (Fig. 5), repeated forwarding ISRJ (Fig. 7), and frequency-shift forwarding ISRJ (Fig. 6). Comparative experiments demonstrate that this method resolves the problem of degraded suppression performance caused by the jamming azimuth in existing BSS approaches. Compared with conventional ISRJ suppression algorithms, the proposed method maintains stable performance regardless of ISRJ slice width or jamming power. Moreover, it achieves superior output Signal-to-Interference-plus-Noise Ratio (SINR), confirming its effectiveness in enhancing anti-jamming capabilities.  Conclusions  To address the threat posed by ISRJ to radar systems, this study proposes an ISRJ suppression method based on down-sampling BSS. By applying down-sampling and dechirp processing to the received signal, multiple signals are generated, and the Joint Approximate Diagonalization of Eigenmatrices (JADE) BSS algorithm is employed to separate the jamming and target components. This method overcomes the dependence of conventional BSS approaches on spatial separability and remains effective in self-defense jamming scenarios where the jamming and target share the same azimuth. The proposed method demonstrates effective suppression of various ISRJ types, including direct forwarding, repeated forwarding, and frequency-shift forwarding. Compared with existing ISRJ suppression techniques, this approach provides improved anti-jamming performance, as it is largely unaffected by ISRJ slice width, does not require prior knowledge of jamming parameters, and exhibits minimal sensitivity to variations in Signal-to-Interference Ratio (SIR).
LLM Channel Prediction Method for TDD OTFS Low-Earth-Orbit Satellite Communication Systems
YOU Yuxin, JIANG Xinglong, LIU Huijie, LIANG Guang
2025, 47(8): 2535-2548. doi: 10.11999/JEIT250105
Abstract:
Orthogonal Time Frequency Space (OTFS) modulation shows promise in Low Earth Orbit (LEO) satellite-to-ground communications. However, rapid Doppler shift variation and high latency in LEO systems lead to channel aging. Real-time channel estimation increases the computational complexity of onboard receivers and reduces transmission efficiency due to substantial pilot overhead. This study addresses a Ka-band Multiple-Input Single-Output (MISO) OTFS satellite-to-ground communication system by designing a Downlink (DL) channel prediction scheme based on Uplink (IL) channel estimation. A high-precision channel estimation method is proposed, combining matched filtering with data detection to extract UL Channel State Information (CSI). An Adaptive Sparse Large Language Model (ASLLM)-based channel prediction network is then constructed to predict DL CSI. Compared with existing methods, simulations show that the proposed approach achieves lower Normalized Mean Square Error (NMSE) and Bit Error Rate (BER), with improved generalization across multiple scenarios and within an acceptable computational complexity range.  Objective   LEO satellite communication systems offer advantages over Medium-Earth-Orbit (MEO) and Geostationary-Earth-Orbit (GEO) systems, particularly in terms of reduced transmission latency and lower path loss. Therefore, LEO satellites are considered a key element of the Sixth-Generation (6G) Non-Terrestrial Network (NTN) satellite internet architecture. However, high-mobility channels between LEO satellites and ground stations introduce significant challenges for conventional Orthogonal Frequency Division Multiplexing (OFDM), resulting in marked performance degradation. OTFS modulation, which operates in the Delay-Doppler (DD) domain, has been shown to outperform OFDM in high-mobility scenarios, Multiple-Input Multiple-Output (MIMO) systems, and millimeter-wave frequency bands. This performance advantage is attributed to its robustness to Doppler shifts and inter-symbol interference. In modern Time Division Duplexing (TDD) satellite communication systems, OTFS receivers require high-complexity real-time channel estimation, and transmitters rely on extensive pilot overhead to encode CSI for reliable data recovery. To mitigate these limitations, channel prediction schemes using UL CSI to predict DL CSI have been proposed. However, broadband MISO-OTFS systems with large antenna arrays and high-resolution transmission demand precise and efficient CSI prediction under rapidly varying DD-domain conditions. The dynamic and rapidly aging characteristics of DD domain CSI present significant challenges for accurate prediction in broadband, high-mobility, and large-scale antenna communication systems. To address this, an ASLLM-based channel prediction method is developed. The proposed method enables accurate prediction of DD-domain CSI under these conditions.  Methods  By modeling the input–output relationship of a MISO OTFS satellite-to-ground communication system, this study proposes a data-assisted fractional Doppler matched filtering algorithm for channel estimation. This method leverages the shift property of correlation functions and integrates iterative optimization through Minimum Mean Square Error (MMSE) signal detection to achieve accurate estimation of DD domain CSI. The resulting high-precision CSI serves as a reliable input for the subsequent prediction network. The task of predicting DL slot CSI from UL slot CSI is formulated as a minimization of the NMSE between the network’s predicted CSI and the true DL CSI. The proposed ASLLM prediction network consists of a preprocessing layer, an embedding layer, a Generative Pre-trained Transformer (GPT) layer, and an output layer. The raw DD-domain CSI is first processed through the preprocessing layer to extract convolutional features. In the embedding layer, a value attention module and a position attention module are applied to convert the CSI features into a structured, text-like input suitable for GPT processing. The value attention module adaptively extracts sparse feature values of the CSI, while the position attention module encodes positional characteristics in a non-trainable manner. The core of the prediction network is a pre-trained, open-source GPT-2 backbone, which is used to model and forecast the CSI sequence. The network output is then passed through a linear transformation layer to recover the predicted DD-domain CSI.  Results and Discussions  The satellite-to-ground channel is modeled using the NTN-TDL-D dual mobility channel and simulated with QuaDRiGa. First, the performance of the data-assisted matched filtering channel estimation method is validated (Fig. 7). At a Signal-to-Noise Ratio (SNR) of 20 dB, the BER reaches the order of 0.001 after three iterations. Next, training loss curves for several neural network models are compared (Fig. 8). The ASLLM model exhibits the fastest convergence and highest stability. It also achieves superior NMSE and BER performance in MMSE data detection compared with other approaches (Fig. 9). ASLLM demonstrates strong generalization across different channel models and varying terminal velocities (Fig. 10). However, in cross-frequency generalization scenarios, a small number of additional training samples are still required to maintain accuracy (Fig. 11). Finally, ablation experiments confirm the contribution of each core module within the ASLLM architecture (Table 2). Comparisons of network parameters, training time, and inference time indicate that the computational complexity of ASLLM remains within an acceptable range (Table 3).  Conclusions  This study proposes a channel prediction method for TDD MISO OTFS systems, termed ASLLM, tailored to high-mobility scenarios such as communication between LEO satellites and high-speed trains. The approach leverages high-precision historical UL CSI, obtained through a data-assisted matched filtering algorithm, to predict future DL CSI. By extracting sparse features from DD domain CSI, the method fine-tunes a pre-trained GPT-2 model—originally trained on general knowledge—to improve predictive accuracy. Simulation results show that: (1) considering both computational complexity and estimation accuracy, optimal stopping criteria for the channel estimation algorithm are defined as an iteration number of 3 and a threshold of 0.001; (2) ASLLM outperforms existing prediction methods in terms of convergence speed, NMSE, BER, and generalization capability; and (3) each module of the network contributes effectively to performance, while overall computational complexity remains within a feasible range.
Chinese Semantic Communication System Based on Word-level and Sentence-level Semantics
DENG Jiewen, ZHAO Haitao, WEI Jibo, CAO Kuo, ZHANG Yichi, LUO Peng, ZHANG Yuyuan, LIU Yueling
2025, 47(8): 2549-2562. doi: 10.11999/JEIT250137
Abstract:
  Objective  To address the mismatch between limited communication resources and growing service demands, semantic communication—a novel paradigm—has been proposed and is expected to offer an effective solution. Unlike traditional approaches that focus on accurate symbol transmission, semantic communication operates at the semantic level, aiming to convey intended meaning by leveraging shared background knowledge at both the transmitter and receiver. Advances in semantic information theory provide a theoretical basis for this paradigm, while the development of artificial intelligence techniques for semantic extraction and understanding supports practical system implementation. Most existing semantic communication systems for textual data are based on English corpora; however, Chinese text differs markedly in word segmentation, lexical annotation, and syntactic structure. Systems tailored for Chinese corpora remain underexplored. Furthermore, current lexical code-based systems primarily focus on word-level semantics and fail to fully capture sentence-level semantics. This study addresses these limitations by mining and processing lexical and contextual semantics specific to Chinese text. A semantic communication system is proposed that uses Chinese corpora to learn and extract both word-level and sentence-level semantic associations. Lexical coding is performed at the transmitter, and joint context decoding is realized at the receiver, thereby improving the effectiveness and reliability of the communication process.  Methods  A Chinese semantic communication system is designed to capture both word-level and sentence-level semantics, leveraging the unique characteristics of Chinese text to enable efficient and reliable transmission of meaning. At the transmitter, a lexical coding method is proposed that encodes words based on their combined lexical semantic features. At the receiver, a two-stage decoding process is implemented. First, the Continuous Bag-of-Words (CBOW) model is used to learn word-level semantics from shared knowledge, estimating the conditional probability of the next word based on preceding words. Second, the Bidirectional Encoder Representations from Transformers (BERT) model is applied to capture sentence-level semantics, using Chinese characters as the fundamental processing unit to compute the probability distribution of words at each position in the sentence. Upon receiving the bit sequence, Huffman decoding is performed with a candidate code list mechanism to generate a set of candidate words. A recursive memoization algorithm then selects the most probable words based on word-level semantics. Finally, sentence-level semantics are applied to correct potential errors in the sentence, producing the recovered text.  Results and Discussions  The proposed semantic communication system improves effectiveness by encoding combined phrases during lexical coding, thereby reducing the number of coding objects. Reliability is enhanced by leveraging contextual associations during feature learning and joint decoding. For effectiveness, the average code length of the Huffman coding dictionary is 10.61, while the lexical coding dictionary for four categories achieves an average of 8.98. This represents an 18.15% increase in average coding rate. Experiments conducted on 100 randomly selected texts across different corpus sizes yield consistent results (Table 3, Fig. 5), validating the effectiveness of lexical coding. For reliability, system performance is first evaluated under varying parameter settings. The optimal values for context window size, lexical category count, and Hamming distance threshold are identified (Figs. 610). Comparative analysis across different systems is then conducted. Under an AWGN channel, the lexical+word-level+sentence-level semantic system achieves higher BLEU scores than the Huffman-only system when the Signal-to-Noise Ratio (SNR) is ≤6 dB, and matches the performance of DeepSC between –3 dB and 3 dB. At SNR ≥9 dB, its BLEU scores are slightly lower than those of the Huffman-only system but significantly higher than those of DeepSC. Across all SNR ranges, the lexical+word-level+sentence-level system outperforms the lexical+word-level system. The BLEU scores of the Huffman+word-level and Huffman+sentence-level systems are similar and consistently exceed those of the Huffman-only system. Similar trends are observed on Rayleigh and Rician fading channels and with METEOR scores (Figs. 11, 12). These results indicate that combining word-level and sentence-level semantics with a candidate set mechanism for joint context decoding substantially enhances transmission reliability at the receiver.  Conclusions  A Chinese semantic communication system based on word-level and sentence-level semantics is proposed. First, a lexical grouping and coding method based on LAC segmentation is developed by analyzing lexical features in Chinese text, which improves the effectiveness of the communication system. Second, the receiver models context co-occurrence probabilities by extracting word-level and sentence-level semantic features, enabling joint decoding through word selection and sentence-level error correction, thereby enhancing reliability. Simulation results show that the average code length of the Huffman coding dictionary is 10.61, while the lexical coding dictionary for four categories achieves an average of 8.98, resulting in an 18.15% increase in coding rate. On the AWGN channel, the proposed lexical+word-level+sentence-level system outperforms the Huffman-only system at low SNR and the DeepSC system at high SNR. The Huffman+word-level and Huffman+sentence-level systems yield similar reliability scores, both consistently higher than the Huffman-only system. These findings confirm that incorporating both word-level and sentence-level semantics significantly enhances system reliability.
Service Caching and Task Migration Mechanism Based on Internet of Vehicles
ZUO Linli, XIA Shichao, LI Yun, PAN Junnan, CHEN Bingyi
2025, 47(8): 2563-2572. doi: 10.11999/JEIT241097
Abstract:
  Objective  In the era of digital transformation and smart mobility, the Internet of Vehicles (IoV) has emerged as a transformative paradigm reshaping transportation systems and vehicle-related services. In recent years, the proliferation of IoV applications has led to the generation and processing of large volumes of real-time data, requiring ultra-low latency and high-efficiency computation to maintain seamless functionality and ensure high-quality user experiences. To meet these demands, Mobile Edge Computing (MEC) has been widely adopted in the IoV domain, effectively reducing the load on backhaul links. However, the dynamic and mobile nature of vehicular networks poses significant challenges to the effective deployment of edge services and the efficient management of task migration. Vehicles continuously move across regions with heterogeneous network conditions, edge node coverage, and service availability. Conventional static or rule-based approaches for service caching and task migration often fail to adapt to these environmental dynamics, leading to degraded performance, frequent service interruptions, and elevated energy consumption. This study proposes a Joint Service Caching and Task Migration Algorithm (SCTMA) tailored to the dynamic characteristics of the IoV environment. By incorporating machine learning, optimization techniques, and context-aware decision-making, SCTMA dynamically adjusts caching and migration strategies to ensure that appropriate services are delivered to suitable edge nodes at the optimal time, thereby minimizing latency and improving resource utilization.  Methods  This study systematically considers multiple constraints within the IoV system, including caching decisions, the number of cached services, cache capacity, CPU resource consumption, and task migration policies at edge nodes. To jointly optimize service caching and task migration under these constraints, a Markov Decision Process (MDP) model is constructed. The MDP framework captures the temporal dynamics of the IoV environment, wherein system states, such as vehicle location, service demand, and cache status evolve over time. The reward function is formulated to balance competing objectives, including minimizing latency, reducing energy consumption, and improving the cache hit ratio. To address inefficient utilization of Base Station (BS) caching resources and mitigate storage waste, the concept of a service hit ratio is introduced. Based on this ratio, BSs proactively cache frequently requested services, thereby reducing latency and energy usage during Vehicle User (VU) service requests and enhancing overall caching efficiency. A task migration algorithm is also developed, incorporating vehicle velocity to estimate the remaining dwell time of a VU within the coverage area of a RoadSide Unit (RSU). This estimation is used to compute the associated service data migration volume and assess migration costs. Building on this framework, a Joint SCTMA is proposed. SCTMA employs the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) method to address uncertainties in multi-agent settings. This approach reduces system communication and computation costs, optimizes migration notification strategies, and improves the cache hit ratio.  Results and Discussions  Simulation results indicate that the proposed SCTMA algorithm effectively reduces caching and task migration costs while improving the cache hit ratio. Following training, the system’s long-term average reward under SCTMA markedly exceeds that of baseline algorithms (Fig. 3). Specifically, SCTMA maintains the long-term average reward at approximately –30, whereas the best-performing comparative method stabilizes at around –38, corresponding to an improvement of at least 21.05%. Further analysis of edge device caching performance (Fig. 5(a), Fig. 5(b)) shows that as the maximum cache capacity increases, the system using SCTMA consistently achieves the highest cache hit ratio across all tested scenarios.  Conclusions  In edge computing-enabled IoV ecosystems, where vehicles interact with infrastructure and peer nodes through interconnected edge networks, this study examines decision-making mechanisms for service hit rate optimization and task migration. By formulating the joint optimization of service caching and task migration as A MDP, a Joint SCTMA is proposed. Simulation results show that SCTMA reduces service caching and task migration costs, shortens service request latency for VU, and improves overall system performance. However, the current study assumes an idealized IoV environment. Future research should evaluate the algorithm’s robustness and efficiency under real-world conditions.
Sparse Channel Estimation and Array Blockage Diagnosis for Non-Ideal RIS-Assisted MIMO Systems
LI Shuangzhi, LEI Haojie, GUO Xin
2025, 47(8): 2573-2583. doi: 10.11999/JEIT241108
Abstract:
  Objective  Reconfigurable Intelligent Surfaces (RISs) offer a promising approach to enhance Millimeter-Wave (mmWave) Multiple-Input Multiple-Output (MIMO) systems by dynamically manipulating wireless propagation. However, practical deployments are challenged by hardware faults and environmental blockages (e.g., dust or rain), which impair Channel State Information (CSI) accuracy and reduce Spectral Efficiency (SE). Most existing studies either overlook the interdependence between the CSI and blockage vector or fail to leverage the dual sparsity of multipath channels and blockage patterns. This study proposes a joint sparse channel estimation and blockage diagnosis scheme to overcome these limitations, thereby enabling reliable beamforming and enhancing system robustness in non-ideal RIS-assisted mmWave MIMO environments.  Methods  A third-order Parallel Factor (PARAFAC) decomposition model is constructed for the received signals using a tensor-based signal representation. The intrinsic relationship between mmWave channel parameters and the blockage vector is exploited to estimate spatial angular frequencies at the User Equipment (UE) and Base Station (BS) using Orthogonal Matching Pursuit (OMP). Based on these frequencies, a coupled observation matrix is formed to jointly capture residual channel parameters and blockage vector information. This matrix is reformulated as a Least Absolute Shrinkage and Selection Operator (LASSO) problem, which is solved using the Alternating Direction Method of Multipliers (ADMM) to estimate the blockage vector. The remaining channel parameters are then recovered using sparse reconstruction techniques by leveraging their inherent sparsity. Iterative refinement updates both the blockage vector and channel parameters, ensuring convergence under limited pilot overhead conditions.  Results and Discussions  For a non-ideal RIS-assisted mmWave MIMO system (Fig. 1), a signal transmission framework is designed (Fig. 2), in which the received signals are represented as a third-order tensor. Leveraging the dual-sparsity of multipath channels and the blockage vector, a joint estimation scheme is developed (Algorithm 1), enabling effective parameter decoupling through tensor-based parallel factor decomposition and iterative optimization. Simulation results show that the proposed scheme achieves superior performance in both channel estimation and blockage diagnosis compared with baseline methods by fully exploiting dual-sparsity characteristics (Fig. 3). SE analysis confirms the detrimental effect of blockages on system throughput and highlights that the proposed scheme improves SE by compensating for blockage-induced impairments (Fig. 4). The method also demonstrates strong estimation accuracy under reduced pilot overhead (Fig. 5) and improved robustness as the number of blocked RIS elements increases (Fig. 6). A decline in spatial angular frequency estimation is observed with fewer UE antennas, which negatively affects overall performance; however, estimation stabilizes as antenna count increases (Fig. 7). Moreover, when Non-Line-of-Sight (NLoS) path contributions decrease, the scheme exhibits enhanced performance due to improved resolution between Line-of-Sight (LoS) and NLoS components (Fig. 8).  Conclusions  This study proposes a joint channel estimation and blockage diagnosis scheme for non-ideal RIS-assisted mmWave MIMO systems, based on the dual sparsity of multipath channels and blockage vectors. Analysis of the tensor-based parallel factor decomposition model reveals that the estimation of spatial angular frequencies at the UE and BS is unaffected by blockage conditions. The proposed scheme accounts for the contributions of NLoS paths, enabling accurate decoupling of residual channel parameters and blockage vector across different propagation paths. Simulation results confirm that incorporating NLoS path information improves both channel estimation accuracy and blockage detection. Compared with existing methods, the proposed approach achieves superior performance in both aspects. In practical scenarios, real-time adaptability may be challenged if blockage states vary more rapidly than channel characteristics. Future work will focus on enhancing the scheme’s responsiveness to dynamic blockage conditions.
An Improved Modulation Recognition Method Based on Hybrid Kolmogorov-Arnold Convolutional Neural Network
ZHENG Qinghe, LIU Fanglin, YU Lisu, JIANG Weiwei, HUANG Chongwen, LI Bin, SHU Feng
2025, 47(8): 2584-2597. doi: 10.11999/JEIT250161
Abstract:
  Objective  With the rapid growth of communication devices and increasing complexity of electromagnetic environments, spectrum efficiency has become a critical performance metric for sixth-generation communication systems. Modulation recognition is an essential function of dynamic spectrum access, aiming to automatically identify the modulation scheme of received signals to enhance spectrum utilization. In practice, wireless signals are often affected by multipath propagation, interference, and noise, which pose challenges for accurate recognition. To address these issues, this study proposes a deep learning-based approach using an end-to-end model that eliminates manual feature extraction, mitigates limitations of handcrafted features, and improves recognition accuracy. By transferring general knowledge from signal classification to modulation recognition, a well-generalized method based on a hybrid Kolmogorov-Arnold Convolutional Neural Network (KA-CNN) is developed. This approach supports reliable communication in applications such as intelligent transportation, the Internet of Things (IoT), vehicular ad hoc networks, and satellite communication.  Methods  The proposed modulation recognition method first decomposes the signal into a multi-dimensional wavelet domain using a dual-tree complex wavelet packet transform. Different frequency components are then combined to construct a multi-scale signal representation, enabling the neural network to learn consistent features across frequencies. A deep learning structure, KA-CNN, is designed by integrating spline functions with nonlinear activation functions to enhance nonlinear fitting and continuous learning of periodic features. Spline functions are used to address the curse of dimensionality. To improve adaptability to varying signal parameters and enhance generalization across communication scenarios, multilevel grid training with Lipschitz regularization constraints is applied. In KA-CNN, the hybrid module transfers the characteristics of the spline function into convolution operations, which improves the model’s capacity to capture complex mappings between input signals and modulation schemes while retaining the efficiency of the Kolmogorov-Arnold network. This enhances both the expressive power and adaptability of deep learning models under complex communication conditions.  Results and Discussions  During the experimental phase, modulation recognition performance testing, ablation study, and comparative analysis are conducted on three publicly available datasets (RadioML 2016.10a, RadioML 2018.01a, and CSPB.ML.2023) to evaluate the performance of KA-CNN. Results show that KA-CNN achieves modulation recognition accuracies of 65.14%, 65.56%, and 78.40% on RadioML 2016.10a, RadioML 2018.01a, and CSPB.ML.2023, respectively (Figure 6). The main performance limitation arises in the classification of QPSK versus 8PSK, AM-DSB versus WBFM, and high-order QAM modulation types (Figure 7). Maximum differences in recognition accuracy of KA-CNN driven by different signal representations reach 2.04%, 3.46%, and 4.54% across the three datasets, demonstrating the effect of signal representation (Figure 8). The wavelet packet transform constructs a multi-scale time-frequency representation of signals that is insensitive to the maximum decomposition scale L and supports complementary learning of different modulation features. The hybrid Kolmogorov-Arnold convolutional module and the multi-dimensional perceptual cascade attention mechanism play key roles in enhancing modulation recognition accuracy, particularly under relatively high Signal-To-Noise Ratio (SNR) conditions (Figure 9). Additionally, finer grids and higher decomposition orders improve the model’s ability to extract discriminative signal features, thereby increasing recognition accuracy (Figure 10). Finally, a comparative evaluation against several deep learning models, including GGCNN, Transformer, PR-LSTM, and MobileViT, confirms the superior performance of KA-CNN (Figure 11).  Conclusions  This study proposes a hybrid KA-CNN to address the reduced modulation recognition accuracy caused by noise and parameter variation, as well as the limited generalization across communication scenarios in existing deep learning models. By integrating spline functions with nonlinear activation functions, KA-CNN mitigates the curse of dimensionality and improves its capacity for continuous learning of periodic features. A dual-tree complex wavelet packet transform is used to construct a multi-scale signal representation, enabling the model to extract consistent features across frequencies. The model is trained using multilevel grids with Lipschitz regularization constraints to enhance adaptability to varying signal parameters and improve generalization. Experimental results on three public datasets demonstrate that KA-CNN improves modulation recognition accuracy and exhibits robust generalization, particularly under low SNRs.
Multi-dimensional Performance Adaptive Content Caching in Mobile Networks Based on Meta Reinforcement Learning
LIN Peng, WANG Jun, LIU Yan, ZHANG Zhizhong
2025, 47(8): 2598-2607. doi: 10.11999/JEIT250100
Abstract:
  Objective  Content caching enhances the efficiency of video services in mobile networks. However, most existing studies optimize caching strategies for a single performance objective, overlooking their combined effect on key metrics such as content delivery latency, cache hit rate, and redundancy rate. An effective caching strategy must simultaneously satisfy multiple performance requirements and adapt to their dynamic changes over time. This study addresses these limitations by investigating the joint optimization of content delivery latency, cache hit rate, and redundancy rate. To capture the interdependencies and temporal variations among these metrics, a meta-reinforcement learning-based caching decision algorithm is proposed. Built on conventional reinforcement learning frameworks, the proposed method enables adaptive optimization across multiple performance dimensions, supporting a dynamic and balanced content caching strategy.  Methods  To address the multi-dimensional objectives of content caching, namely, content delivery latency, cache hit rate, and redundancy rate, this study proposes a performance-aware adaptive caching strategy. Given the uncertainty and temporal variability of interrelationships among performance metrics in real-world environments, dynamic correlation parameters are introduced to simulate the evolving behavior of these metrics. The caching problem is formulated as a dynamic joint optimization task involving delivery latency efficiency, cache hit rate, and a cache redundancy index. This problem is further modeled as a Markov Decision Process (MDP), where the state comprises the content popularity distribution and the caching state from the previous time slot; the action represents the caching decision at the current time slot. The reward function is defined as a cumulative metric that integrates dynamic correlation parameters across latency, hit rate, and redundancy. To solve the MDP, a Model-Agnostic Meta-Reinforcement Learning Algorithm (MAML-DDPG) is proposed. This algorithm reformulates the joint optimization task as a multi-task reinforcement learning problem, enabling adaptation to dynamically changing optimization targets and improving decision-making efficiency.  Results and Discussions  This study compares the performance of MAML-DDPG with baseline algorithms under a gradually changing Zipf parameter (0.5 to 1.5). Results show that MAML-DDPG maintains more stable system performance throughout the change, indicating superior adaptability. The algorithm’s response to abrupt shifts in optimization objectives is further evaluated by modifying weight parameters during training. Specifically, the experiments include comparisons among DDPG, \begin{document}$ {\mathrm{D}\mathrm{D}\mathrm{P}\mathrm{G}|}_{100} $\end{document}, \begin{document}$ \mathrm{M}\mathrm{A}\mathrm{M}\mathrm{L}{-\mathrm{D}\mathrm{D}\mathrm{P}\mathrm{G}|}_{100} $\end{document}, and \begin{document}$ \mathrm{M}\mathrm{A}\mathrm{M}\mathrm{L}{-\mathrm{D}\mathrm{D}\mathrm{P}\mathrm{G}|}_{150} $\end{document}, where \begin{document}$ {\mathrm{D}\mathrm{D}\mathrm{P}\mathrm{G}|}_{100} $\end{document} denotes a change in weight parameters at the 100th training cycle to simulate task mutation. Results show that the DDPG model exhibits a sharp drop in convergence value following the change and stabilizes at a lower performance level. In contrast, MAML-DDPG, although initially affected by the shift, recovers rapidly due to its meta-learning capability and ultimately converges to a higher-performing caching strategy.  Conclusions  This study addresses the content caching problem in mobile edge networks by formulating it as a joint optimization task involving cache hit rate, cache redundancy index, and delivery latency efficiency. To handle the dynamic uncertainty associated with these performance metrics, a MAML-DDPG is proposed. The algorithm enables rapid adaptation to changing optimization targets, improving decision-making efficiency. Simulation results confirm that MAML-DDPG effectively adapts to dynamic performance objectives and outperforms existing methods across multiple caching metrics. The findings demonstrate the algorithm’s capability to meet evolving performance requirements while maintaining strong overall performance.
RIS-Enhanced Semantic Communication Systems Oriented towards Semantic Importance and Robustness
ZHANG Zufan, YIN Xingran, ZHOU Jianping, LIU Yue
2025, 47(8): 2608-2620. doi: 10.11999/JEIT250159
Abstract:
  Objective  The deep integration of Deep Learning (DL) and Semantic Communication (SC) has become a key trend in next-generation communication systems. Current SC systems primarily adopt DL-based Joint Source-Channel Coding (JSCC) with end-to-end training to enable efficient semantic transmission. However, several limitations remain. Existing systems often optimize physical-layer channel characteristics or semantic-layer feature extraction in isolation, without establishing cross-layer mapping mechanisms. In addition, protection strategies for critical semantic features in fading channel environments are insufficient, limiting semantic recovery performance. To address these challenges, this study integrates Reconfigurable Intelligent Surfaces (RIS) into SC systems and proposes an intelligent transmission scheme based on dual-dimensional semantic feature metrics. The proposed approach effectively enhances semantic recovery capability under adverse channel conditions. This work provides a new intelligent solution for protecting semantic features in fading channels and establishes theoretical support for collaborative mechanisms between physical and semantic layers in SC systems.  Methods  This study develops a joint semantic importance-robustness metric model. Semantic importance is quantified using Bidirectional Encoder Representations from Transformers (BERT) combined with cosine similarity, while semantic robustness is assessed by measuring the loss increments of high-dimensional feature vectors during transmission. A dynamically updated background knowledge base is constructed to support a priority evaluation framework for semantic features (Fig. 2). During transmission, the system partitions the original text into high- and low-priority data streams based on feature priority. High-priority streams are transmitted through RIS-assisted channels, whereas low-priority streams are transmitted over conventional fading channels. At the physical layer, an alternating optimization algorithm jointly designs active precoding beamforming vectors and RIS passive phase matrices. At the receiver, semantic reconstruction is performed under the guidance of feature priority index lists (Fig. 1).  Results and Discussions  The proposed SISR-RIS system effectively reduces the distortion effects of channel fading on critical semantic features by establishing cross-layer mapping between semantic features and physical channels. Simulation results show that, in medium-to-low Signal-to-Noise Ratio (SNR) environments, the SISR-RIS system maintains high low-order BLEU scores and approaches the theoretical performance boundary near the 10 dB SNR threshold, achieving approximately 95% recovery accuracy for BLEU-1 and 92% for BLEU-2 (Fig.3(a)). As the n-gram order increases, the system outperforms the baseline Deep-SC system by approximately 10% in BLEU-4, confirming its improved capability for contextual semantic reconstruction (Fig.3(b)). Owing to the dual-dimensional metric mechanism, the system demonstrates stable performance with less than 1% variance in recovery accuracy across short and long sentences (Fig. 4). Case analysis indicates that when the original statements cannot be fully restored, the system maintains semantic equivalence through appropriate synonym substitutions. Additionally, core verbs and nouns are consistently assigned higher feature priority scores, which reduces the effect of channel fading on critical semantic features (Tables 2 and 3; Figs. 5 and 6).  Conclusions  This study proposes a RIS-enhanced SC system designed to account for semantic importance and robustness. By extracting semantic importance and robustness features to prioritise transmission and implementing a joint physical-semantic layer design enabled by RIS, the system provides enhanced protection for high-importance, low-robustness semantic features. Evaluations based on BLEU scores, BERT Semantic Similarity (BERT-SS) metrics, and case analyses demonstrate the following: (1) The proposed system achieves a 15% performance improvement over baseline systems in low SNR environments, with performance approaching theoretical limits near the 10 dB SNR threshold; (2) In high-SNR conditions, the system performs comparably to state-of-the-art methods across both BLEU and BERT-SS metrics; (3) The dual-dimensional semantic feature metric mechanism enhances contextual semantic relevance, reduces the recovery discrepancy between long and short sentences to below 1% in high-SNR scenarios, and demonstrates strong adaptability to varying text lengths.
Double Deep Q Network Algorithm-based Unmanned Aerial Vehicle-assisted Dense Network Resource Optimization Strategy
CHEN Jiamei, SUN Huiwen, LI Yufeng, WANG Yupeng, BIE Yuxia
2025, 47(8): 2621-2629. doi: 10.11999/JEIT250021
Abstract:
  Objective  To address the future trend of network densification and spatial distribution, this study proposes a multi-base station air–ground integrated ultra-dense network architecture and develops a semi-distributed scheme for resource optimization. The network comprises coexisting macro, micro, and Unmanned Aerial Vehicle (UAV) base stations. A semi-distributed Double Deep Q Network (DDQN)-based power control scheme is designed to reduce computational burden, improve response speed, and overcome the lack of global optimization in conventional fully centralized approaches. The proposed scheme enhances energy efficiency by combining distributed decision-making at the base station level with centralized training via a network trainer, enabling a balance between computational complexity and performance. The DDQN algorithm facilitates local decision-making while centralized coordination ensures overall network optimization.  Methods  This study establishes a complex dense network model for air–ground integration with coexisting macro, micro, and UAV base stations, and proposes a semi-distributed DDQN scheme to improve network energy efficiency. The methods are as follows: (1) Construct an integrated air–ground dense network model in which macro, micro, and UAV base stations share the spectrum through a cooperative mechanism, thereby overcoming the performance bottlenecks of conventional heterogeneous networks. (2) Develop an improved semi-distributed DDQN algorithm that enhances Q-value estimation accuracy, addressing the limitations of traditional centralized and distributed control modes and mitigating Q-value overestimation observed in conventional Deep Q Network (DQN) approaches. (3) Introduce a disturbance factor to increase the probability of exploring random actions, strengthen the algorithm’s ability to escape local optima, and improve estimation accuracy.  Results and Discussions  Simulation results demonstrate that the proposed semi-distributed DDQN scheme effectively adapts to dense and complex network topologies, yielding marked improvements in both energy efficiency and total throughput relative to traditional DQN and Q-learning algorithms. Key results include the following: The total throughput achieved by DDQN exceeds that of the baseline DQN and Q-learning algorithms (Fig. 3). In terms of energy efficiency, DDQN exhibits a clear advantage, converging to 84.60%, which is 15.18% higher than DQN (69.42%) and 17.1% higher than Q-learning (67.50%) (Fig. 4). The loss value of DDQN also decreases more rapidly and stabilizes at a lower level. With increasing iterations, the loss curve becomes smoother and ultimately converges to 100, which is 100 lower than that of DQN (Fig. 5). Moreover, DDQN achieves the highest user access success rate compared with DQN and Q-learning (Fig. 6). When the access success rate reaches 80%, DDQN requires significantly fewer iterations than the other two algorithms. This advantage becomes more pronounced under high user density. For example, when the number of users reaches 800, DDQN requires fewer iterations than both DQN and Q-learning to achieve comparable performance (Fig. 7).  Conclusions  This study proposes a semi-distributed DDQN strategy for intelligent control of base station transmission power in ultra-dense air–ground networks. Unlike traditional methods that target energy efficiency at the individual base station level, the proposed strategy focuses on optimizing the overall energy efficiency of the network system. By dynamically adjusting the transmission power of macro, micro, and airborne base stations through intelligent learning, the scheme achieves system-level coordination and adaptation. Simulation results confirm the superior adaptability and performance of the proposed DDQN scheme under complex and dynamic network conditions. Compared with conventional DQN and Q-learning approaches, DDQN exhibits greater flexibility and effectiveness in resource control, achieving higher energy efficiency and sustained improvements in total throughput. These findings offer a new approach for the design and management of integrated air–ground networks and provide a technical basis for the development of future large-scale dense network architectures.
Joint Resource Optimization Algorithm for Intelligent Reflective Surface Assisted Wireless Soft Video Transmission
WU Junjie, LUO Lei, ZHU Ce, JIANG Pei
2025, 47(8): 2630-2641. doi: 10.11999/JEIT250019
Abstract:
  Objective  Intelligent Reflecting Surface (IRS) technology is a key enabler for next-generation mobile communication systems, addressing the growing demands for massive device connectivity and increasing data traffic. Video data accounts for over 80% of global mobile traffic, and this proportion continues to rise. Although video SoftCast offers a simpler structure and more graceful degradation compared to conventional separate source-channel coding schemes, its transmission efficiency is restricted by the limited availability of wireless transmission resources. Moreover, existing SoftCast frameworks are not inherently compatible with IRS-assisted wireless channels. To address these limitations, this paper proposes an IRS-assisted wireless soft video transmission scheme.  Methods  Video soft transmission distortion is jointly determined by three critical wireless resources: transmit power, active beamforming at the primary transmitter, and passive beamforming at the IRS. Minimizing video soft transmission distortion is therefore formulated as a joint optimization problem over these resources. To solve this multivariable problem, an Alternating Optimization (AO) framework is employed to decouple the original problem into single-variable subproblems. For the fractional nonhomogeneous quadratic optimization and unit-modulus constraints arising in this process, the Semi-Definite Relaxation (SDR) method is applied to obtain near-optimal solutions for both active and passive beamforming vectors. Based on the derived beamforming vectors, the optimal power allocation factor for soft transmission is then computed using the Lagrange multiplier method.  Results and Discussions  Simulation results indicate that the proposed method yields an improvement of at least 1.82 dB in Peak Signal-to-Noise Ratio (PSNR) compared to existing video soft transmission approaches (Fig. 3). Besides, evaluation across extensive HEVC test sequences shows that the proposed method achieves an average received quality gain of no less than 1.51 dB (Table 1). Further simulations reveal that when the secondary link channel quality falls below a critical threshold, it no longer contributes to improving the received video quality (Fig. 5). Rapid variations in the secondary signal \begin{document}$c$\end{document} degrade the reception quality of the primary signal, with a reduction of approximately 0.52 dB observed (Fig. 6). Increasing the number of IRS elements significantly enhances both video reception quality and achievable rates for the primary and secondary links (Fig. 7); however, this improvement comes with a power-law scaling increase in computational complexity. Additional simulations confirm that the proposed method maintains per-frame quality fluctuations within an acceptable range across each Group Of Pictures (GOP) (Fig. 8). As GOP size increases, temporal redundancy within the source is more effectively removed, leading to further improvements in received quality, although this is accompanied by higher computational complexity (Fig. 9).  Conclusions  This paper proposes an IRS-assisted soft video transmission scheme that leverages IRS-aided secondary links to improve received video quality. To minimize video signal distortion, a multivariable optimization problem is formulated for joint resource allocation. An AO framework is adopted to decouple the problem into single-variable subproblems, which are solved iteratively. Simulation results show that the proposed method achieves significant improvements in both objective and subjective visual quality compared to existing video transmission algorithms. In addition, the effects of secondary link channel gain, secondary signal characteristics, the number of IRS elements, and GOP parameters on transmission performance are systematically examined. This study demonstrates, for the first time, the performance enhancement of video soft transmission using IRS and provides a technical basis for the development of video soft transmission in IRS-assisted communication environments.
Energy Characteristic Map Based Resource Allocation Algorithm for High-density V2V Communications
QIU Gongan, LIU Yongsheng, ZHANG Guoan, LIU Min
2025, 47(8): 2642-2651. doi: 10.11999/JEIT250004
Abstract:
  Objective  In high density scenarios, the random resource selection method has limitations in handling the high access collision probability of traffic safety messages under the limited frequency resource. At the same time, the variable topology accompanied by high mobility increases the failure rate of Vehicle to Vehicle (V2V) links. However, the traffic safety messages with ultra-high reliability and ultra-low latency are very important to ensure traffic safety and road efficiency under the present scenarios. To address these challenges, integrating the energy characteristic parameters in sub-frames and sub-carriers into the resource block map has emerged as a promising approach. By incorporating the distributed V2V links and designing effective reward functions, it is possible to decrease the access collision probability and smooth the dynamics of variable topology while maintaining high resource efficiency, thereby better meeting the needs of dense traffic. This research offers an intelligent solution for resource allocation in Cellular Vehicle to Everything (C-V2X) and provides theoretical support for the coordinated access of limited frequency with diverse link quality.  Methods  Based on the sustainable adjacency among the neighborhood vehicles in high-density V2V communications, Energy Characteristic Map (ECM) based resource allocation algorithm is proposed using Deep Reinforcement Learning algorithm. The guidance logic of the ECM algorithm periodically renews the energy indicators of candidate resources to train the weight coefficient matrix of two-layer Deep Neural Network (DNN) based on the characteristic results within the sensing window. The algorithm is then used as the action space in double Deep Q-learning Network (DQN) agent to maximize the V2V throughput, which has a main DQN and a target DQN. The state space in the DQN model includes the energy indicators of candidate resources such as the Received Signal Strength Indicator (RSSI) in sub-frames and Signal-to-Interference plus Noise Ratio (SINR) in sub-carriers, along with dynamic factors like the relative position and speed of other vehicles. The reward function is crucial for ensuring the resource efficiency and performance of the safety messages during the resource blocks selection. It accounts for factors such as the bandwidth and SINR of V2V links rewards to optimize decision-making. Additionally, the discount factor determines the weight of future rewards, balancing the importance of immediate versus future rewards. A lower discount factor typically emphasizes immediate rewards, leading to frequently resource block reselection, while a higher discount factor enhances the robustness of occupied resource.  Results and Discussions  The ECM algorithm periodically renews the energy indicators of candidate resources based on the characteristic results within the sensing window, which then serves as the action space in the double DQN agent. By defining an appropriate reward function, the main DQN in double DQN agent is established to select the candidate resource with high energy indicators for V2V links. The numerical results (Eq.(11) and Eq.(15)) between the packet received ratio and the energy indicators are analyzed using the discrete-time Markov chains. Simulation results show that the end-to-end disseminating performance of safety messages under variable V2V distances, simulated on WiLabV2Xsim, are represented (Fig.6, Fig.7). The reliability, PRR, is more than 0.95 under less than 160 veh/km (the blue line), while the comparative PRR is more than 0.95 under less than 120 veh/km (the green line) and 90 veh/km (the red line), respectively (Fig.10). At the same time, the latency, TD, is less than 3 ms under less than 180 veh/km (the blue line), while the comparative TD is less than 3 ms under less than 160 veh/km (the green line) and about 80 veh/km (the red line), respectively (Fig.11). The resource utilization, RU, is more than 0.6 under less than 180 veh/km (the blue line), while the comparative RU is more than 0.6 under less than 160 veh/km (the green line) and about 80 veh/km (the red line), respectively (Fig.12), demonstrating a 10~20% improvement in resource efficiency. When the discount factor is set to 0.9 while the learning rate is set to 0.01 (Fig.8, Fig.9), the VUE selects the resource blocks that balances immediate and long-term throughput, effectively improving the robustness of the main DQN, which meets the advanced V2V service requirements such as platooning in C-V2X.  Conclusions  This paper addresses the challenge of resource allocation in high-density V2V communications by integrating the ECM algorithm with double DQN agent. The proposed resource selection scheme enhances the RSS algorithm by establishing distributed V2V links using high quality resource blocks to maximize throughput. The scheme is evaluated through disseminating safety messages simulations under variable density, and the results show that: (1) The proposed scheme has high reliability with more than 0.95 PRR and ultra-low latency with less than 3 ms TD under upper 160 veh/km. (2) The resource efficiency has been improved by 10~20% over the RSS method; (3) Long-term and short-term rewards are considered by selecting the discount factor of 0.9 and the learning rate of 0.01 and enhance the robustness of DQN model. However, this study has not considered different resource characteristics for the heterogeneous messages with diverse Quality of Service (QoS) providing, which should be accounted for in future work.
Federated Deep Reinforcement Learning-based Intelligent Routing Design for LEO Satellite Networks
LI Xuehua, LIAO Hailong, ZHANG Xian, ZHOU Jiaen
2025, 47(8): 2652-2664. doi: 10.11999/JEIT250072
Abstract:
  Objective  The topology of Low Earth Orbit (LEO) satellite communication networks is highly dynamic, rendering traditional terrestrial routing methods unsuitable for direct application. Additionally, due to the limited onboard resources of satellites, Artificial Intelligence (AI)-based routing methods often experience low learning efficiency. Collaborative training requires data sharing and transmission, which poses significant challenges and data security risks. To address these issues, this research introduces Federated Deep Reinforcement Learning (FDRL) into LEO satellite communication networks. By leveraging FDRL’s capabilities in distributed perception, decision-making, and training, it facilitates the efficient learning of global routing strategies. Through local model aggregation and global model sharing among satellite nodes, FDRL dynamically adapts to topology changes while ensuring data privacy, thereby generating optimal routing decisions and enhancing the overall routing performance of LEO satellite networks. Furthermore, integrating Federated Learning (FL) into the LEO satellite network enables autonomous constellation training within regions, eliminating the need to transmit raw data to Ground Stations (GS), thus reducing reliance on GS and minimizing communication overhead during collaborative training.  Methods  A novel FDRL-based intelligent routing method for LEO satellite communication networks is proposed. This method develops a routing model that integrates network, communication, and computational energy consumption, with the optimization objective focused on maximizing the energy efficiency of the LEO satellite network. Utilizing a satellite clustering algorithm, the entire LEO satellite network is partitioned into multiple clusters. Within each cluster, the FDRL framework is implemented, where each LEO satellite uses the Advantage Actor-Critic (A2C) algorithm for local reinforcement learning. The policy network generates efficient routing actions, while the value network dynamically evaluates state values to reduce variance in policy updates. After a specified number of training rounds, the Federated Proximal Algorithm (FedProx) is applied at the cluster head satellite to conduct federated aggregation within the cluster. By collaboratively sharing model parameters among satellites, a global model is jointly trained, enhancing the generalization capability to optimize the network's energy efficiency.  Results and Discussions  To validate the effectiveness of the proposed method, the LEO satellite constellation is first clustered using the suggested clustering algorithm. The number of Cluster Member (CM) nodes within each cluster ranges from 6 to 8 (Fig. 5), with the variation in the CM node count not exceeding 5, indicating relatively stable clustering. FDRL training is then conducted within each cluster. Simulation results show that when the aggregation frequency is set to 400 (i.e., aggregation occurs every 400 time slots), training energy consumption is minimized (Fig. 6), and the reward is most stable (Fig. 7) compared to other aggregation frequencies. Next, the performance of the designed FL-A2C algorithm is compared to other baseline algorithms. The results demonstrate that the FL-A2C algorithm exhibits better convergence and higher total reward values than the benchmarks, namely Sarsa, MAD2QN, and REINFORCE (Fig. 8), although its total reward is slightly lower than that of A2C. Compared to Sarsa, REINFORCE, and MAD2QN, the designed method improves average network throughput by 83.7%, 19.8%, and 14.1%, respectively (Fig. 9); reduces average hop count by 25.0%, 18.9%, and 9.1%, respectively (Fig. 10); and enhances energy efficiency by 55.6%, 42.9%, and 45.8%, respectively (Fig. 11).  Conclusions  To address the challenges posed by the highly dynamic network topology of LEO satellite networks and the limitations of traditional terrestrial routing methods, this research presents a multi-agent FDRL routing method combined with satellite clustering. Comprehensive simulations are conducted to evaluate the intelligent routing method, and the results demonstrate that: (1) The designed FL-A2C algorithm achieves better convergence and enhances the energy efficiency of LEO satellite networks; (2) The stability of LEO satellite clustering is ensured by the proposed scheme; (3) The intelligent routing method outperforms benchmark schemes (Sarsa, REINFORCE, MAD2QN) with triple advantages, achieving 83.7%/19.8%/14.1% higher network throughput, 25.0%/18.9%/9.1% lower hop counts, and 55.6%/42.9%/45.8% better energy efficiency, respectively.
Swin Transformer-based Wideband Wireless Image Transmission Semantic Joint Encoding and Decoding Method
SHEN Bin, LI Xuan, LAI Xuebing, YANG Shuhan
2025, 47(8): 2665-2674. doi: 10.11999/JEIT250039
Abstract:
  Objective  Conventional studies on image semantic communication primarily address simplified channel models, such as Gaussian and Rayleigh fading channels. However, real-world wireless communication environments are characterized by complex multipath fading, which necessitates advanced signal processing at both the transmitter and receiver. To address this challenge, this paper proposes a Wideband Wireless Image Transmission Semantic Communication (WWIT-SC) system based on the Swin Transformer. The proposed method enhances image transmission performance in multipath fading channels through end-to-end semantic joint encoding and decoding.  Methods  The WWIT-SC system adopts the Swin Transformer as the core architecture for semantic encoding and decoding. This network not only processes semantic image representations but also improves adaptability to complex channel conditions through a joint mechanism based on Channel State Information (CSI) and Coordinate Attention (CA). CSI, a key signal in wireless systems, enables accurate estimation of channel conditions. However, due to temporal variations in wireless channels, CSI is often subject to attenuation and distortion, reducing its effectiveness when used in isolation. To address this limitation, the system incorporates a CSI-guided CA mechanism that enables fine-grained mapping and adjustment of semantic features across subcarriers. This mechanism integrates spatial and channel-domain features to localize critical information adaptively, thereby accommodating the channel’s time-varying behavior. A Channel Estimation Subnetwork (CES) is further implemented at the receiver to correct CSI estimation errors introduced by noise and dynamic channel variations. The CES enhances CSI accuracy during decoding, resulting in improved semantic image reconstruction quality.  Results and Discussions   The WWIT-SC and CA-JSCC models are trained under fixed Signal-to-Noise Ratio (SNR) conditions and evaluated at the same SNR values. Across all SNR levels, the WWIT-SC model consistently outperforms CA-JSCC. Specifically, Peak Signal-to-Noise Ratio (PSNR) improves by 6.4%, 8.5%, and 9.3% at different bandwidth ratios (R=1/12, 1/6, 1/3)(Fig.4). Both models are also trained using SNR values randomly selected from the range [0, 15] dB and tested at various SNR levels. Although random SNR training leads to reduced overall performance compared to fixed SNR training, WWIT-SC maintains superior performance over CA-JSCC across all conditions. Under these settings, PSNR gains of up to 6.8%, 8.3%, and 9.8% are achieved at different bandwidth ratios (R=1/12, 1/6, 1/3)(Fig. 4). Further evaluation is conducted by training both models on randomly cropped ImageNet images and testing them on the Kodak dataset. The WWIT-SC model trained on the larger dataset achieves up to a 4% PSNR improvement over CA-JSCC on Kodak (Fig. 6). A series of ablation experiments are conducted to assess the contributions of each module in WWIT-SC. First, the Swin Transformer is replaced with the Feature Learning (FL) module from CA-JSCC. Across all three bandwidth ratios, PSNR values for WWIT-SC exceed those of the modified WWIT-SC-FL variant at all SNR levels (Fig. 5(a)), confirming the importance of multi-scale feature extraction. Next, the CSI-CA module is replaced with the Channel Learning (CL) module from CA-JSCC. Again, WWIT-SC outperforms the modified WWIT-SC-CL model across all bandwidth ratios and SNR values (Fig. 5(b)), highlighting the role of the long-range dependency mechanism in enhancing feature localization and adaptation. Finally, the CES is removed to assess its contribution. The original WWIT-SC model consistently achieves higher PSNR values than the variant without CES at all bandwidth ratios and SNR levels (Fig. 5(c)), demonstrating that the inclusion of CES substantially improves channel decoding accuracy.  Conclusions  This paper proposes a Swin Transformer-based WWIT-SC system, integrating Orthogonal Frequency Division Multiplexing (OFDM) technology to enhance semantic image transmission under multipath fading channels. The scheme employs the Swin Transformer as the backbone for the semantic encoder-decoder and incorporates a CSI-assisted CA mechanism to accurately map critical semantic features to subcarriers, adapting to time-varying channel conditions. In addition, a CES at the receiver compensates for channel estimation errors, improving CSI accuracy. Experimental results show that, compared to CA-JSCC, the WWIT-SC system achieves up to a 9.8% PSNR improvement. This work presents a novel solution for semantic image transmission in complex broadband wireless communication environments.
Radar, Navigation and Array Signal Processing
Design of a Very Low Frequency Magnetic Induction Communication System Based on a Series-Array Magnetoelectric Antenna
ZHANG Feng, LI Jiaran, TIAN Yuxiao, XU Ziyang, GONG Zhaoqian, ZHUANG Xin
2025, 47(8): 2675-2684. doi: 10.11999/JEIT250065
Abstract:
  Objective  MagnetoElectric (ME) antennas, recognized for their high energy conversion efficiency and compact structure, have gained attention in portable cross-medium communication systems. In the Very Low Frequency (VLF) range, conventional antennas are typically large and difficult to deploy, whereas mechanical antennas—though smaller—exhibit limited radiation intensity, constraining communication range. To address these limitations, this study proposes a portable VLF magnetic induction communication system based on a series-array ME antenna. By connecting seven ME antenna units in series, the radiated field strength is substantially increased. Through the combination of strong ME coupling and an optimized system design, this work offers a practical solution for compact low-frequency communication.  Methods  The radiated magnetic flux density of the antenna is evaluated using a small air-core coil (diameter: 50 mm; length: 120 mm) with a gain-500 preamplifier as the receiving antenna. The conversion coefficient Tr of the receiving antenna is calibrated using a standard Helmholtz coil, enabling conversion of the measured voltage to magnetic flux density. The ME antenna is driven by a signal generator and power amplifier, and the magnetic field strength is measured at a distance of 1.2 m under different drive voltages. To balance hardware simplicity and efficient bandwidth usage, Binary Amplitude Shift Keying (BASK) modulation is employed. On the transmitter side, a computer transmits the bitstream to a Field-Programmable Gate Array (FPGA), which generates the baseband signal and multiplies it by a 27.2 kHz carrier to produce the modulated signal. Following power amplification, the signal directly drives the ME antenna. On the receiver side, the air-core coil receives the transmitted signal, which is subsequently amplified by the preamplifier. A National Instruments (NI) data acquisition module digitizes the signal. Demodulation, including filtering, coherent detection, and symbol decision, is performed on a computer. For laboratory safety and signal stability, the Root Mean Square (RMS) drive voltage is set to 14.8 V, and the symbol rate is fixed at 50 bps. Communication experiments are conducted over distances from 1.2 m to 11.4 m.  Results and Discussions  (1) Antenna radiation intensity. When the RMS drive voltage of the series-array ME antenna is 180.5 V (25.8 V per unit), the measured magnetic field strength reaches 93.6 nT at 1.2 m and 165 nT at 1.0 m. These values indicate strong performance among acoustically driven ME antennas. The results demonstrate that the combination of ME materials with a seven-element series configuration substantially enhances both ME coupling and radiated field strength. (2) System communication performance. The BASK system operates at 50 bps, matching the measured 111 Hz bandwidth of the ME antenna. The receiving antenna exhibits a bandwidth of 851 Hz at 27.6 kHz, which fully covers the transmitted signal. Due to laboratory space constraints, 128-bit random data are transmitted over distances ranging from 1.2 m to 11.4 m. Even at 11.4 m—where the received signal amplitude falls below 0.004 V—the proposed demodulation scheme successfully recovers the transmitted data. To verify these results, a theoretical model of magnetic field attenuation with distance is fitted to the experimental data, showing strong agreement except for minor deviations attributed to environmental noise. Noise spectrum analysis within a 100 Hz bandwidth centered at 27.2 kHz indicates a maximum environmental noise level of approximately 4.41 pT, resulting in a Signal-to-Noise Ratio (SNR) of 12.65 dB at 11.4 m. Based on the theoretical relationship between SNR and Bit Error Rate (BER) for coherent ASK, the maximum BER under these conditions is approximately 0.12%, consistent with the measured performance.  Conclusions  This study presents a VLF magnetic induction communication system based on a series-array ME antenna, with the ME antenna serving as the transmitter and an air-core coil as the receiver. A standard Helmholtz coil circuit is used to calibrate the conversion coefficient between received voltage and magnetic flux density. The radiated magnetic field strength is characterized by varying the ME antenna’s drive voltage. Notably, at an RMS drive voltage of 180.5 V, the ME antenna generates a magnetic induction of 165 nT at a distance of 1 m. Laboratory communication experiments confirm that, with a drive voltage of 14.8 V, ASK transmission achieves a range of 11.4 m at a symbol rate of 50 bps. In a high-noise environment with an in-band noise level of 4.41 pT, the system achieves a BER of 0.12%, consistent with theoretical predictions and confirming the reliability of the demodulation process. These results demonstrate the feasibility and efficiency of ME antennas for compact, low-frequency magnetic communication. Further performance improvements may be achieved by (1) operating in low-noise environments and (2) increasing the drive voltage to enhance radiation strength by up to a factor of 6.4.
Time-Series Information-Driven Parallel Interactive Multiple Model Algorithm for Underwater Target Tracking
LAN Chaofeng, ZHANG Tongji, CHEN Huan
2025, 47(8): 2685-2693. doi: 10.11999/JEIT250044
Abstract:
  Objective  Accurate underwater target tracking is critical in marine surveillance, military reconnaissance, and resource management. The nonlinear, stochastic, and uncertain motion of underwater targets, exacerbated by complex environmental dynamics, limits the effectiveness of traditional tracking methods. The Interacting Multiple Model (IMM) algorithm addresses this challenge by integrating several motion models and adaptively switching between them based on the target’s dynamic state. Such adaptability enables improved tracking under abrupt motion transitions, such as submarine maneuvers or the irregular paths of unmanned underwater vehicles. However, the classical IMM algorithm relies on a fixed Transition Probability Matrix (TPM), which can delay model switching and reduce tracking accuracy in highly dynamic settings. To overcome these limitations, this paper proposes an adaptive IMM algorithm that incorporates timing information, parallel processing, and information entropy. These enhancements improve model-switching speed and accuracy, increase adaptability to environmental changes, and boost overall tracking performance and stability.  Methods  This study proposes a Temporal Information Parallel Interacting Multiple Model (TIP-IMM) target tracking algorithm that integrates temporal information, information entropy evaluation, and parallel processing to adaptively correct the TPM. At each time step, the algorithm identifies the model with the highest probability and assesses whether this model remains dominant across consecutive time steps. If consistent, it is designated the main model. The algorithm then evaluates changes in the main model’s probability relative to other models and updates the TPM based on defined correction rules. A parallel structure is introduced: Module A performs TPM correction, while Module B executes a standard IMM algorithm. Both modules operate concurrently. Information entropy is employed to quantify the uncertainty of the model probability distribution. When entropy is low in Module A, its corrected results are prioritized; when entropy is high, the system places greater reliance on Module B to ensure stable and robust performance under varying conditions.  Results and Discussions  The proposed TIP-IMM algorithm is evaluated through simulation experiments, demonstrating improved tracking accuracy and stability. True motion and observation trajectories are generated using predefined initial parameters and a motion model. Filtering results from TIP-IMM are compared with those of three benchmark algorithms. The estimated trajectories from TIP-IMM align more closely with the true trajectories, as confirmed by the enlarged views in panels A and B (Fig. 2). Analysis of model probability evolution during filtering indicates that TIP-IMM exhibits smaller fluctuations during model transitions and identifies the dominant model more rapidly than the comparison methods (Fig. 3). To quantify tracking performance, Root Mean Square Error (RMSE) serves as the evaluation metric. TIP-IMM consistently yields lower and smoother RMSE curves across the full trajectory and in both X and Y directions (Figs. 4 and 5). Furthermore, average RMSE (ARMSE) serves as a comprehensive indicator. TIP-IMM achieves lower errors in both position and velocity estimates (Table 2), consistent with the trend observed in RMSE analysis. Although the algorithm incurs a slightly higher runtime than the reference methods, its execution time remains within the millisecond range, meeting the real-time requirements of practical applications (Table 3).  Conclusions  This study proposes a TIP-IMM algorithm to address limitations of the classical IMM algorithm, particularly model-switching delays and over-smoothing during abrupt target motion changes. By incorporating temporal correlation of model probabilities, parallel processing, and information entropy, TIP-IMM improves responsiveness and transition smoothness in dynamic environments. Simulation experiments confirm that TIP-IMM achieves faster and more accurate model switching than existing methods. Compared with the traditional IMM and benchmark algorithms, TIP-IMM improves overall tracking accuracy by 3.52% to 7.87% across multiple scenarios. It also reduces estimation error recovery time while maintaining high accuracy during sudden motion transitions. These results demonstrate the algorithm’s enhanced adaptability, robustness, and stability, making it well suited for underwater target tracking applications.
Trust Adaptive Event-triggered Robust Extended Kalman Fusion Filtering for Target Tracking
ZHU Hongbo, JIN Jiahui
2025, 47(8): 2694-2702. doi: 10.11999/JEIT250103
Abstract:
  Objective  Mobile Wireless Sensor Networks (MWSNs) with dynamic topology exhibit considerable application value across various fields, making target tracking a critical area of research. Although conventional filtering algorithms and event-triggered schemes have enabled basic target tracking, they remain limited in addressing motion modeling errors, Received Signal Strength (RSS) quantization inaccuracies, and adaptation to dynamic network conditions. To overcome these limitations, this study proposes a trust-adaptive event-triggered mechanism combined with an improved Extended Kalman Filter (EKF). The mechanism dynamically schedules a suitable number of trust anchor nodes based on network conditions, while the robust EKF estimates the motion state of the mobile target. This approach ensures stable, accurate, and consistent estimation even under time-varying process and measurement noise covariance. The proposed method offers an effective solution for RSS-based tracking in resource-constrained MWSNs by reducing power, computation, and bandwidth consumption, while improving tracking accuracy and maintaining robustness against measurement uncertainty and faulty nodes.  Methods  In the resource-constrained MWSN environment, a robust extended Kalman fusion filtering method with trust-adaptive event triggering is proposed for target tracking. This method incorporates a trust-adaptive, event-driven anchor node scheduling and information exchange mechanism. It dynamically adjusts to the spatial distribution of trusted anchor nodes near the target, schedules a number of anchor nodes close to the optimal value, and streamlines communication between these nodes and the mobile target. This design substantially reduces power, computational, and bandwidth demands while maintaining measurement credibility. To address uncertainties arising from motion modeling and RSS quantization, a robust extended Kalman trust fusion filtering algorithm based on mean drift is developed. The algorithm estimates the actual covariance by randomly sampling uniformly distributed process and measurement noise covariance matrices, thereby compensating for discrepancies between model predictions and observations. Additionally, only measurements from nodes identified as reliable are incorporated via adaptive weighted fusion, which enhances the stability, robustness, and accuracy of target tracking.  Results and Discussions  The proposed trust-adaptive event-triggered robust extended Kalman fusion filtering method substantially improves target tracking performance in resource-constrained MWSNs. By integrating a dynamic anchor node scheduling mechanism with a dual-layer noise compensation strategy, the method adjusts the response radius in real time through trust-adaptive event triggering. Therefore, the average number of trust response anchors remains stable at a preset target—for example, ANoTRA = 5.0583 when \begin{document}$ {{N}}_{\text{t}} $\end{document} = 5—while reducing communication resource consumption by 53.8% compared with the fixed threshold method (Fig. 2; Table 2). Furthermore, the use of uniformly distributed random sampling enables the algorithm to account for system uncertainty when the process noise covariance q is within [0.25, 1.5]. The introduction of a mean-shift algorithm helps to eliminate abnormal measurements, leading to a 42.6% reduction in tracking Root-Mean-Square Error (RMSE) compared with traditional approaches (Fig. 3, Fig. 4, Fig. 5). Under complex environmental conditions, with parameters set as q ∈ [0.25, 1.5], H = 10, L = 6, \begin{document}$ {{N}}_{\text{t}} $\end{document} = 5, and m = 8, the method demonstrates high accuracy and robustness. These results indicate that the proposed approach not only enhances tracking precision but also significantly improves the efficiency of resource utilization.  Conclusions  This study addresses the problem of mobile target tracking in resource-constrained MWSNs by integrating a trust-adaptive event-triggering mechanism with a robust extended Kalman fusion filtering algorithm. The proposed method leverages the advantages of trust-based adaptive triggering and robust filtering to achieve high tracking accuracy while reducing power, computational, and communication overhead. Simulation results demonstrate that (1) the robust EKF reduces the tracking root mean square error by 42.6% compared with the conventional EKF, and (2) the trust-adaptive event-triggering mechanism reduces communication resource consumption by 53.8% relative to static schemes such as non-trust-based adaptive triggering. This work focuses on tracking under low-noise conditions. Future research will extend the method to more complex nonlinear systems and explore the integration of statistical approaches and deep learning techniques for enhanced outlier identification and suppression under high interference.
Radar High-speed Target Tracking via Quick Unscented Kalman Filter
SONG Jiazhen, SHI Zhuoyue, ZHANG Xiaoping, LIU Zhenyu
2025, 47(8): 2703-2713. doi: 10.11999/JEIT250010
Abstract:
  Objective  The increasing prevalence of high-speed targets due to advancements in space technology presents new challenges for radar tracking. The pronounced motion of such targets within a single frame induces large variations in range, causing dispersion of echo energy across the range-Doppler plane and invalidating the assumption of concentrated target energy. This results in “range cell migration” and “Doppler cell migration”, both of which degrade tracking accuracy. To address these challenges, this study proposes a Quick Unscented Kalman Filter (Q-UKF) algorithm tailored for high-speed radar target tracking. The Q-UKF performs recursive, pulse-by-pulse state estimation directly from radar echo signals, thereby improving tracking precision and eliminating the need for conventional energy correction and migration compensation. Furthermore, the algorithm employs the Woodbury matrix identity to reduce computational burden while preserving the estimation accuracy of the standard Unscented Kalman Filter (UKF).  Methods  The target state vector at each pulse time is modeled as a three-dimensional random vector representing position, velocity, and acceleration. Target motion is governed by a kinematic model that characterizes its temporal dynamics. A measurement model is formulated based on the radar echo signals received at each pulse, defining a nonlinear relationship between the target state and the observed measurements. This formulation supports recursive state estimation. In the classical UKF, the high dimensionality of radar echo data necessitates frequent inversion of large covariance matrices, imposing a substantial computational burden. To mitigate this issue, the Q-UKF is developed. By incorporating the Woodbury matrix identity, the Q-UKF reduces the computational complexity of matrix inversion without compromising estimation accuracy relative to the classical UKF. Within this framework, Q-UKF performs pulse-by-pulse recursive estimation, integrating all measurements up to the current pulse to improve prediction accuracy. In contrast to conventional radar tracking methods that process complete frame data and apply multiple signal correction steps, Q-UKF operates directly on raw measurements and avoids such corrections, thereby simplifying the processing pipeline. This efficiency makes Q-UKF well suited for real-time tracking of high-speed targets.  Results and Discussions  The performance of the proposed Q-UKF method is assessed using Monte Carlo simulations. Estimation errors of the Q-UKF and Extended Kalman Filter (EKF) are compared over time (Fig. 3). During the effective pulse periods within each frame cycle, both methods yield accurate target state estimates. Estimation errors increase during the delay intervals, but rapidly decrease and stabilize once effective pulse signals resume, forming a periodic error pattern. To evaluate robustness, the Root Mean Square Error (RMSE) of state estimation is examined under varied initial conditions, including different positions, velocities, and accelerations. In all scenarios, both Q-UKF and EKF perform reliably, with Q-UKF consistently demonstrating superior accuracy (Fig. 4). Under Signal-to-Noise Ratios (SNRs) from –15 dB to 0 dB, the RMSEs in both Gaussian and Rayleigh noise environments (Fig. 5a and Fig. 5b) decrease with increasing SNR. Q-UKF maintains high accuracy even under low SNR conditions. In the Gaussian noise setting, Q-UKF improves estimation accuracy by an average of 10.60% relative to EKF; in the Rayleigh environment, the average improvement is 9.55%. In terms of computational efficiency, Q-UKF demonstrates the lowest runtime among the evaluated methods (EKF, UKF, and Particle Filter (PF)). The average computation time per effective pulse is reduced by 8.91% compared to EKF, 72.55% compared to UKF, and over 90% compared to PF (Table 2). This efficiency gain results from applying the Woodbury matrix identity, which alleviates the computational load of matrix inversion in high-dimensional radar echo data processing.  Conclusions  This study presents the Q-UKF method for high-speed target tracking in radar systems. The algorithm performs pulse-by-pulse state estimation directly from radar echo signals, advancing estimation granularity from the frame level to the pulse level. By removing the need for energy accumulation and migration correction, Q-UKF simplifies the conventional signal processing pipeline. The method incorporates the Woodbury matrix identity to efficiently invert covariance matrices, substantially reducing computational load. Simulation results show that Q-UKF consistently outperforms the EKF in estimation accuracy under varied initial target states, achieving an average improvement of approximately 10.60% under Gaussian noise and 9.55% under Rayleigh noise. Additionally, Q-UKF improves computational efficiency by 8.91% compared to EKF. Compared to the classical UKF, Q-UKF delivers equivalent accuracy with significantly reduced runtime. Although the PF may yield slightly better accuracy under certain conditions, its computational demand limits its practicality in real-time applications. Overall, Q-UKF provides a favorable balance between accuracy and efficiency, making it a viable solution for real-time tracking of high-speed targets. Its ability to address high-dimensional, nonlinear measurement problems also highlights its potential for broader application.
Non-orthogonal Prolate Spheroidal Wave Functions Signal Detection Method with Cross-terms
LU Faping, MAO Zhongyang, XU Zhichao, SHU Yihao, KANG Jiafang, WANG Feng, WANG Mengjiao
2025, 47(8): 2714-2723. doi: 10.11999/JEIT250052
Abstract:
  Objective  Non-orthogonal Shape Modulation based on Prolate Spheroidal Wave Functions (NSM-PSWFs) utilizes PSWFs with high time-frequency energy concentration as basis waveforms. This structure enables high spectral efficiency and time-frequency energy aggregation, making it a promising candidate for B5G/6G waveform design. However, due to the non-orthogonality of the PSWFs used for information transmission in NSM-PSWFs, mutual interference between non-orthogonal signals and poor bit error performance in coherent detection systems significantly limit their practical deployment. This issue is a common challenge in non-orthogonal modulation and access technologies. To address the problem of low detection performance resulting from mutual interference among non-orthogonal PSWFs, this study incorporates time-frequency domain characteristics into signal detection. A novel detection mechanism for non-orthogonal PSWFs in the time-frequency domain is proposed, with the aim of reducing interference between non-orthogonal PSWFs and enhancing detection performance.  Methods  Given the different time-frequency domain energy distribution characteristics of PSWF signals at various stages, particularly the "local" energy concentration in different regions, this study introduces cross-terms between signals. Based on an analysis of non-orthogonal signal time-frequency characteristics, with a focus on innovating detection mechanisms, a combined approach of theoretical modeling and numerical simulation is employed to explore novel methods for detecting non-orthogonal PSWF signals via cross-terms. Specifically: (1) The impact of interference between PSWF signals and Gaussian white noise on the time-frequency distribution of cross-terms is analyzed, demonstrating the feasibility of detecting PSWF signals in the time-frequency domain. (2) Building on this analysis, time-frequency characteristics are integrated into the detection process. A novel method for detecting non-orthogonal PSWFs based on cross-terms is proposed, accompanied by a strategy for selecting time-frequency feature parameters. The "integral value of cross-terms over symmetric time intervals at the frequency corresponding to the peak energy density of cross-terms" is chosen as the feature parameter. This shifts signal detection from the "one-dimensional energy domain (time or frequency)" to the "two-dimensional time-frequency energy domain," enabling detection by exploiting localized energy regions while simultaneously mitigating interference during statistical acquisition.  Results and Discussions  This study demonstrates the feasibility of detecting signals in the two-dimensional time-frequency domain and analyzes the impact of different PSWFs and AWGN on the distribution characteristics of cross-terms. Notably, AWGN interference can be regarded as a special form of “interference between PSWFs” exhibiting a linear superposition with PSWF-induced interference. The interference from PSWFs with time-domain parity opposite to that of the template signal can be eliminated through “symmetric time-interval integration” (Fig. 1, Table 1, Table 2). This establishes a theoretical foundation for the novel detection mechanism based on cross-terms and provides a reference for incorporating other two-dimensional distribution characteristics into signal detection. Additionally, a novel detection mechanism for non-orthogonal PSWFs based on cross-terms is proposed, utilizing time-frequency distribution characteristics for signal detection. This method effectively reduces interference between non-orthogonal PSWFs, thereby enhancing detection performance. It also offers valuable insights for exploring detection mechanisms based on other two-dimensional distribution characteristics. For example, compared to conventional coherent detection, the proposed method achieves a superior performance with approximately 1 dB improvement in bit error rate at 4 × 10–5 (Fig. 4).  Conclusions  This paper demonstrates the feasibility of detecting PSWFs in the two-dimensional time-frequency domain and proposes a novel detection method for non-orthogonal PSWFs based on cross-terms. The proposed method transforms traditional coherent detection from “global energy in the time/frequency domain” to “local energy in the time-frequency domain” significantly reducing interference between non-orthogonal signals and enhancing detection performance. This approach not only provides a new perspective for developing efficient detection methods for non-orthogonal signals but also serves as a valuable reference for investigating novel signal detection mechanisms in two-dimensional domains.
Multiple Maneuvering Target Poisson Multi-Bernoulli Mixture Filter for Gaussian Process Cognitive Learning
ZHAO Ziwen, CHEN Hui, LIAN Feng, ZHANG Guanghua, ZHANG Wenxu
2025, 47(8): 2724-2735. doi: 10.11999/JEIT241139
Abstract:
  Objective  Multiple Maneuvering Target Tracking (MMTT) remains a critical yet challenging problem in radar signal processing and sensor fusion, particularly under complex and uncertain conditions. The primary difficulty arises from the unpredictable or highly dynamic nature of target motion. Conventional model-based methods, especially Multiple Model (MM) approaches, rely on predefined motion models to accommodate varying target behaviors. However, these methods face limitations, including sensitivity to initial parameter settings, high computational cost due to model switching, and degraded performance when actual target behavior deviates from the assumed model set. To address these limitations, this study proposes a data-driven MMTT method that combines Gaussian Process (GP) learning with the Poisson Multi-Bernoulli Mixture (PMBM) filter to improve robustness and tracking accuracy in dynamic environments without requiring extensive model assumptions.  Methods  The proposed method exploits the data-driven modeling capability of GP, a non-parametric Bayesian inference approach that learns high-dimensional, nonlinear function mappings from limited historical data without specifying explicit functional forms. In this study, GP models both the state transition and observation processes of multi-target systems, reducing the dependence on predefined motion models. During the offline phase, historical target trajectories and sensor measurements are collected to build a training dataset. The squared exponential kernel is selected for its smoothness and infinite differentiability, which effectively captures the continuity and dynamic characteristics of target state evolution. GP hyperparameters, including length scale, signal variance, and observation noise variance, are jointly optimized by maximizing the log-marginal likelihood, ensuring generalization and expressiveness in complex environments. In the online filtering phase, the trained GP models are incorporated into the PMBM filter, forming a recursive GP-PMBM filtering structure. Within this framework, the PMBM filter employs a Poisson point process to represent undetected targets and a multi-Bernoulli mixture to characterize the posterior state distribution of detected targets. During the prediction step, the GP-derived nonlinear state transition model is propagated using the Cubature Kalman Filter (CKF). In the update step, the GP-learned observation model refines state estimates, enhancing both tracking accuracy and robustness.  Results and Discussions  Extensive simulation experiments under two different MMTT scenarios validate the effectiveness and performance advantages of the proposed method. In Scenario 1, a moderate 2D surveillance environment with clutter and a varying number of targets is constructed. The GP-PMBM filter significantly outperforms existing methods, including LSTM-PMBM, MM-PMBM, MM-GLMB, and MM-PHD filters, based on the Generalized Optimal Sub-Pattern Assignment (GOSPA) metric (Fig. 3). In addition, the GP-PMBM filter achieves the lowest standard deviation in cardinality estimation, demonstrating high accuracy and stability (Fig. 4). Further experiments under different monitoring conditions confirm the robustness of GP-PMBM. When clutter rates vary, the GP-PMBM filter consistently achieves the lowest average GOSPA error, reflecting strong stability under interference (Fig. 5). As detection probability decreases, most algorithms show significant degradation in accuracy. However, GP-PMBM maintains superior tracking performance, achieving the lowest GOSPA distance across all detection conditions (Fig. 6). In Scenario 2, target motion becomes more complex, with increased maneuverability and higher–frequency birth–death dynamics. Despite these challenges, the GP-PMBM filter maintains superior tracking performance, even under highly maneuverable conditions and frequent target appearance and disappearance (Fig. 9, Fig. 10).  Conclusions  This study proposes a novel GP-PMBM filtering framework for MMTT in complex environments. By integrating the data-driven learning capability of the GP with the PMBM filter, the proposed method addresses the limitations of conventional model-based tracking approaches. The GP-PMBM filter automatically learns unknown motion and observation models from historical data, eliminating the dependence on predefined model sets and significantly improving adaptability. Simulation results confirm that the GP-PMBM filter achieves superior tracking accuracy, improved cardinality estimation, and enhanced robustness under varying clutter levels and detection conditions. These results indicate that the proposed method is well-suited for environments characterized by frequent maneuvering changes and uncertain target behavior. Future work will focus on extending the GP-PMBM framework to multi-maneuvering extended target tracking tasks to address more challenging scenarios.
Research on Unmanned Aircraft Radio Frequency Signal Recognition Algorithm Based on Wavelet Entropy Features
LIU Bing, SHI Mingxin, LIU Jiaqi
2025, 47(8): 2736-2745. doi: 10.11999/JEIT250051
Abstract:
  Objective   With the rapid development and broad application of Unmanned Aerial Vehicle (UAV) technology, its use in military reconnaissance, agricultural spraying, logistics, and film production presents growing challenges in signal classification and safety supervision. Accurate classification of UAV Radio Frequency (RF) signals in complex electromagnetic environments is critical for real-time flight monitoring, autonomous obstacle avoidance, and communication reliability in multi-agent coordination. However, conventional recognition methods exhibit limitations in both feature extraction and classification accuracy, particularly under interference or multipath propagation, which severely reduces recognition performance and constrains practical implementation. To address this limitation, this study proposes a recognition algorithm based on wavelet entropy features and an optimized Support Vector Machine (SVM). The method enhances classification accuracy and robustness by extracting wavelet entropy features from UAV RF signals and optimizing SVM parameters using the Great Cane Rat Optimization Algorithm (GCRA). The proposed approach offers a reliable strategy for UAV signal identification under complex electromagnetic conditions. The results contribute to UAV airspace regulation and unauthorized flight detection and establish a foundation for future applications, including autonomous navigation and intelligent route planning. This work holds both theoretical value and practical relevance for supporting the secure and standardized advancement of UAV systems.   Methods   This study adopts a systematic approach to achieve accurate classification and recognition of UAV RF signals, including four key stages: data acquisition, feature extraction, classifier design, and performance verification. First, the publicly available DroneRFa dataset is selected as the experimental dataset. It contains RF signals from 24 mainstream UAV models (e.g., DJI Phantom 3, DJI Inspire 2) across three ISM frequency bands—915 MHz, 2.4 GHz, and 5.8 GHz (Fig. 1). Data collection follows a “pick-store-pick-store” protocol to preserve signal integrity and ensure accurate classification. During preprocessing, 50,000 sampling points are extracted from each channel (RF0_I, RF0_Q, RF1_I, RF1_Q), balancing data continuity and feature representativeness under hardware read/write constraints. Signal magnitudes are normalized to eliminate amplitude-related bias. For feature extraction, a three-level wavelet transform using the Daubechies “db4” wavelet is applied to decompose the signal at multiple scales. A four-dimensional feature matrix is constructed by computing wavelet spectral entropy (Figs. 2 and 3), which captures both time-frequency characteristics and energy distribution. Feature differences among UAV models are confirmed by F-test analysis (Table 1), establishing a robust foundation for classification. In the classifier design stage, the GCRA is applied to optimize the penalty parameter C and Gaussian kernel parameter σ of the SVM. The classification error rate serves as the fitness function during optimization (Fig. 5). Inspired by the foraging behavior of cane rats, GCRA offers improved global search performance. Finally, algorithm performance is evaluated using 10-fold cross-validation and benchmarked against unoptimized SVM, PSO-SVM, GA-SVM, and GWO-SVM (Table 3), demonstrating the robustness and reliability of the proposed method.   Results and Discussions   This study yields several key findings. For wavelet entropy feature extraction, the F-test confirms that features from all four channels are statistically significant (p < 0.05), demonstrating their effectiveness in distinguishing among UAV models (Table 1). In classifier optimization, the GCRA exhibits strong parameter search capability, with fitness convergence achieved within 50 iterations at approximately 0.03 (Fig. 6). The optimized SVM classifier reaches an average recognition accuracy of 98.5%, representing a 6.8 percentage point improvement over the traditional SVM (Table 3). At the individual model level, the highest recognition accuracy is observed for DJI Inspire 2 (99.0%), with all other models exceeding 97% (Table 2). Confusion matrix analysis indicates that all misclassification rates are below 3% (Table2, Fig. 7). Notably, under identical experimental conditions, GCRA-SVM outperforms other optimization algorithms—achieving higher accuracy than PSO-SVM (94.7%) and GA-SVM (94.2%)—with lower variance (±0.00032), indicating greater stability (Table 3). These results validate the discriminative power of wavelet entropy features and highlight the enhanced performance and robustness of GCRA-based SVM optimization.   Conclusions   Through systematic theoretical analysis and experimental validation, this study reaches several key conclusions. The wavelet entropy-based feature extraction method effectively captures the time-frequency characteristics of UAV RF signals. By employing multi-scale decomposition and energy distribution analysis, it accurately identifies the unique signal features of various UAV models. Statistical tests confirm significant differences among the features of different UAV categories, providing a solid foundation for feature selection in UAV identification. The optimization of SVM parameters using the GCRA substantially enhances classification performance, achieving an average accuracy of 98.5% and a peak of 99% on the DroneRFa dataset, with excellent stability. This method addresses the technical challenge of UAV RF signal recognition in complex electromagnetic environments, with performance metrics fully meeting practical application needs. The findings offer a reliable technical solution for UAV flight supervision and lay the groundwork for advanced applications such as autonomous obstacle avoidance. Future research may focus on evaluating the method’s performance in high-noise environments and exploring fusion strategies with other models. Overall, this study provides significant contributions both in terms of theoretical innovation and engineering application.
Real-time Adaptive Suppression of Broadband Noise in General Sensing Signals
WEN Yumei, ZHU Yu
2025, 47(8): 2746-2756. doi: 10.11999/JEIT250018
Abstract:
  Objective  Broadband noise is inevitable in sensing outputs due to thermal noise from the sensing system and various uncorrelated environmental disturbances. Adaptive filtering is a common method for removing such noise. At convergence, the adaptive filter output provides the optimal estimate of the sensing signal. However, during actual sensing, changes in the sensing signal lead to alterations in the statistical characteristics of the output. Therefore, the adaptive process must be re-adjusted to converge to a new steady state. The filter output during this adjustment is not the optimal estimate and introduces distortion, thereby adding extra noise. Fast-converging adaptive algorithms are typically employed to improve the filter’s response speed to such changes. Despite the speed of convergence and the methods used to update filter coefficients, the adjustment process remains unavoidable, during which the filter output is distorted, and additional noise is introduced. To ensure the filter remains at steady state without being influenced by changes in the sensing signal, a new adaptive filtering method is proposed. This method ensures that the input to the adaptive filter remains stationary, thereby preventing output distortion and the introduction of extra noise.  Methods  First, a threshold \begin{document}$ R $\end{document} and quantization scale \begin{document}$ Q $\end{document} are defined in terms of the noise standard deviation, \begin{document}$ \sigma $\end{document}, where \begin{document}$ R = 3\sqrt 2 \sigma $\end{document} and \begin{document}$ Q = 3\sigma $\end{document}. A quantization transformation is applied to the sensing output \begin{document}$ x(n) $\end{document} in real time, with the transformation result \begin{document}$ q(n) $\end{document} used as the new sequence to be filtered. When the absolute value of the first-order difference of \begin{document}$ x(n) $\end{document} is no less than \begin{document}$ R $\end{document}, the sensing signal \begin{document}$ s(n) $\end{document} is considered to have changed, and \begin{document}$ p(n) $\end{document} is set as the quantization value of \begin{document}$ x(n) $\end{document} according to \begin{document}$ Q $\end{document}. When the absolute value of the first-order difference of \begin{document}$ x(n) $\end{document} is less than \begin{document}$ R $\end{document}, \begin{document}$ s(n) $\end{document} is considered unchanged, and \begin{document}$ p(n) $\end{document} is equal to the previous value, i.e., \begin{document}$ p(n) = p(n - 1) $\end{document}. Let \begin{document}$ q(n) = x(n) - p(n) $\end{document}, \begin{document}$ q(n) $\end{document} contains both the information of the sensing signal and the noise. Although its variance may change slightly, the mean of \begin{document}$ q(n) $\end{document} remains 0, ensuring that \begin{document}$ q(n) $\end{document} stays relatively stationary. Next, \begin{document}$ q(n - {n_0}) $\end{document} is used as the input to the adaptive filter, with \begin{document}$ q(n) $\end{document} serving as the reference for the adaptive filter. Here, \begin{document}$ q(n - {n_0}) $\end{document} represents the time delay of \begin{document}$ q(n) $\end{document} and \begin{document}$ {n_0} $\end{document} denotes the length of the time delay. This method performs adaptive linear prediction of \begin{document}$ q(n) $\end{document} and filters out broadband noise. Finally, the output of the adaptive filter, \begin{document}$ y(n) $\end{document}, is compensated with \begin{document}$ p(n) $\end{document} to obtain an estimation of the sensing signal \begin{document}$ s(n) $\end{document} by removing noise.  Results and Discussions  The maximum mean square errors produced by the proposed method and conventional adaptive algorithms are compared using computer-simulated noisy band-limited step signals and noisy one-sided sinusoidal signals. Additionally, Signal-to-Noise Ratio (SNR) improvements obtained during filtering are also evaluated concurrently. For the noisy band-limited step signal (Table 1), the maximum mean square error of the proposed method is only 0.18% of that produced by the Recursive Least Squares (RLS) algorithm and 0.15%~0.19% of those generated by the Least Mean Square (LMS) algorithms. Correspondingly, the SNR improvement is 25.88 dB higher than the RLS algorithm and between 28.65 dB and 32.35 dB greater than the LMS algorithms. In processing a noisy one-sided sinusoidal signal (Table 2), the maximum mean square error generated by the proposed method is 0.3% of that generated by the RLS algorithm and 0.06%~0.08% of that generated by the compared LMS algorithms. The SNR improvement is 10.25 dB higher than that of the RLS algorithm and 26.53 dB~29.61 dB higher than that of the compared LMS algorithms. Figures 3 and 5 illustrate the quantization transformation outcomes for both the noisy band-limited step signal and noisy sinusoidal signal, demonstrating stability and consistency with theoretical expectations. Real sensing outputs primarily cover static or quasi-static signals (Figures 7 and 8); step or step-like signals (Figures 9 and 10), and periodic or quasi-periodic signals (Figures 11 and 12). Comparative analysis of the proposed method against common adaptive algorithms on varied real sensing outputs consistently shows superior filtering performance by the proposed method, with minimal distortion and no additional noise introduction, regardless of whether the sensing signals undergo changes.  Conclusions  A new adaptive filtering method is proposed in this paper. The proposed method ensures that the adaptive filter always operates at a steady state, avoiding the introduction of additional noise caused by distortion during the adjustment to the new steady state. The results from computer simulations and actual signal processing demonstrate that the proposed method provides effective filtering for both dynamic and static sensing signals, indicating that it outperforms commonly used adaptive algorithms.
Cryption and Network Information Security
Design of Private Set Intersection Protocol Based on National Cryptographic Algorithms
HUANG Hai, GUAN Zhibo, YU Bin, MA Chao, YANG Jinbo, MA Xiangyu
2025, 47(8): 2757-2767. doi: 10.11999/JEIT250050
Abstract:
  Objective  The rapid development of global digital transformation has exposed Private Set Intersection (PSI) as a key bottleneck constraining the digital economy. Although technical innovations and architectural advances in PSI protocols continue to emerge, current protocols face persistent challenges, including algorithmic vulnerabilities in international cryptographic primitives and limited computational efficiency when applied to large-scale datasets. To address these limitations, this study integrates domestic SM2 elliptic curve cryptography and the SM3 cryptographic hash function to enhance PSI protocol performance and protect sensitive data, providing technical support for China’s cyberspace security. A PSI protocol based on national cryptographic standards (SM-PSI) is proposed, with hardware acceleration of core cryptographic operations implemented using domestic security chips. This approach achieves simultaneous improvements in both security and computational efficiency.  Methods  SM-PSI integrates the domestic SM2 and SM3 cryptographic algorithms to reveal only the intersection results without disclosing additional information, while preserving the privacy of each participant’s input set. By combining SM2 elliptic curve public-key encryption with the SM3 hash algorithm, the protocol reconstructs encryption parameter negotiation, data obfuscation, and ciphertext mapping processes, thereby eliminating dependence on international algorithms such as RSA and SHA-256. An SM2-based non-interactive zero-knowledge proof mechanism is designed to verify the validity of public–private key pairs using a single communication round. This reduces communication overhead, mitigates man-in-the-middle attack risks, and prevents private key exposure. The domestic reconfigurable cryptographic chip RSP S20G is integrated to offload core computations, including SM2 modular exponentiation and SM3 hash iteration, to dedicated hardware. This software-hardware co-acceleration approach significantly improves protocol performance.  Results and Discussions  Experimental results on simulated datasets demonstrate that SM-PSI, through hardware-software co-optimization, significantly outperforms existing protocols at comparable security levels. The protocol achieves an average speedup of 4.2 times over the CPU-based SpOT-Light PSI scheme and 6.3 times over DH-IPP (Table 3), primarily due to offloading computationally intensive operations, including SM2 modular exponentiation and SM3 hash iteration, to dedicated hardware. Under the semi-honest model, SM-PSI reduces both the number of dataset encryption operations and communication rounds, thereby lowering data transmission volume and computational overhead. Its computational and communication complexities are substantially lower than those of SpOT-Light, DH-IPP, and FLASH-RSA, making it suitable for large-scale data processing and low-bandwidth environments (Table 1). Simulation experiments further show that the hardware-accelerated framework consistently outperforms CPU-only implementations, achieving a peak speedup of 9.0 times. The speedup ratio exhibits a near-linear relationship with dataset size, indicating stable performance as the ID data volume increases with minimal efficiency loss (Fig. 3). These results demonstrate SM-PSI’s ability to balance security, efficiency, and scalability for practical privacy-preserving data intersection applications.  Conclusions  This study proposes SM-PSI, a PSI protocol that integrates national cryptographic algorithms SM2 and SM3 with hardware-software co-optimization. By leveraging domestic security chip acceleration for core operations, including non-interactive zero-knowledge proofs and cryptographic computations, the protocol addresses security vulnerabilities presented in international algorithms and overcomes computational inefficiencies in large-scale applications. Theoretical analysis confirms its security under the semi-honest adversary model, and experimental results demonstrate substantial performance improvements, with an average speedup of 4.2 times over CPU-based SpOT-Light and 6.3 times over DH-IPP. These results establish SM-PSI as an efficient and autonomous solution for privacy-preserving set intersection, supporting China’s strategic objective of achieving technical independence and high-performance computation in privacy-sensitive environments.  Prospects   Future work will extend this research by exploring more efficient PSI protocols based on national cryptographic standards, aiming to improve chip-algorithm compatibility, reduce power consumption, and enhance large-scale data processing efficiency. Further efforts will target optimizing protocol scalability in multi-party scenarios and developing privacy-preserving set intersection mechanisms suitable for multiple participants to meet complex practical application demands. In addition, this research will promote integration with other privacy-enhancing technologies, such as federated learning and differential privacy, to support the development of a more comprehensive privacy protection framework.
One-sided Personalized Differential Privacy Random Response Algorithm Driven by User Sensitive Weights
LIU Zhenhua, WANG Wenxin, DONG Xinfeng, WANG Baocang
2025, 47(8): 2768-2779. doi: 10.11999/JEIT250099
Abstract:
  Objective  One-sided differential privacy has received increasing attention in privacy protection due to its ability to shield sensitive information. This mechanism ensures that adversaries cannot substantially reduce uncertainty regarding record sensitivity, thereby enhancing privacy. However, its use in practical datasets remains constrained. Specifically, the random response algorithm under one-sided differential privacy performs effectively only when the proportion of sensitive records is low, but yields limited results in datasets with high sensitivity ratios. Examples include medical records, financial transactions, and personal data in social networks, where sensitivity levels are inherently high. Existing algorithms often fail to meet privacy protection requirements in such contexts. This study proposes an extension of the one-sided differential privacy random response algorithm by introducing user-sensitive weights. The method enables efficient processing of highly sensitive datasets while substantially improving data utility and maintaining privacy guarantees, supporting secure analysis and application of high-sensitivity data.  Methods  This study proposes a one-sided personalized differential privacy random response algorithm comprising three key stages: sensitivity specification, personalized sampling, and fixed-value noise addition. In the sensitivity specification stage, user data are mapped to sensitivity weight values using a predefined sensitivity function. This function reflects both the relative importance of each record to the user and its quantified sensitivity level. The resulting sensitivity weights are then normalized to compute a comprehensive sensitivity weight for each user. In the personalized sampling stage, the data sampling probability is adjusted dynamically according to the user’s comprehensive sensitivity weight. Unlike uniform-probability sampling employed in conventional methods, this personalized approach reduces sampling bias and improves data representativeness, thereby enhancing utility. In the fixed-value noise addition stage, the noise amount is determined in proportion to the comprehensive sensitivity weight. In high-sensitivity scenarios, a larger noise value is added to reinforce privacy protection; in low-sensitivity scenarios, the noise is reduced to preserve data availability. This adaptive mechanism allows the algorithm to balance privacy protection with utility across different application contexts.  Results and Discussions  The primary innovations of this study are reflected in three areas. First, a one-sided personalized differential privacy random response algorithm is proposed, incorporating a sensitivity specification function to allocate personalized sensitivity weights to user data. This design captures user-specific sensitivity requirements across data attributes and improves system efficiency by minimizing user interaction. Second, a personalized sampling method based on comprehensive sensitivity weights is developed to support fine-grained privacy protection. Compared with conventional approaches, this method dynamically adjusts sampling strategies in response to user-specific privacy preferences, thereby increasing data representativeness while maintaining privacy. Third, the algorithm’s sensitivity shielding property is established through theoretical analysis, and its effectiveness is validated via simulation experiments. The results show that the proposed algorithm outperforms the traditional one-sided differential privacy random response algorithm in both data utility and robustness. In high-sensitivity scenarios, improvements in query accuracy and robustness are particularly evident. When the data follow a Laplace distribution, for the sum function, the Root Mean Square Error (RMSE) produced by the proposed algorithm is approximately 76.67% of that generated by the traditional algorithm, with the threshold upper bound set to 0.6 (Fig. 4(c)). When the data follow a normal distribution, in the coefficient of variation function, the RMSE produced by the proposed algorithm remains below 200 regardless of whether the upper bound of the threshold t is 0.7, 0.8, or 0.9, while the RMSE of the traditional algorithm consistently exceeds 200 (Fig. 5(g,h,i)). On real-world datasets, the proposed algorithm achieves higher data utility across all three evaluated functions compared with the traditional approach (Fig. 6).  Conclusions  The proposed one-sided personalized differential privacy random response algorithm achieves effective performance under an equivalent level of privacy protection. It is applicable not only in datasets with a low proportion of sensitive records but also in those with high sensitivity, such as healthcare and financial transaction data. By integrating sensitivity specification, personalized sampling, and fixed-value noise addition, the algorithm balances privacy protection with data utility in complex scenarios. This approach offers reliable technical support for the secure analysis and application of highly sensitive data. Future work may investigate the extension of this algorithm to scenarios involving correlated data in relational databases.
Locality-optimized Forward-secure Dynamic Searchable Symmetric Encryption
GUO Yutao, LIU Feng, WANG Feng, XUE Kaiping
2025, 47(8): 2780-2790. doi: 10.11999/JEIT250107
Abstract:
  Objective  With the rapid development of cloud computing, more individuals and enterprises are outsourcing data storage, raising significant concerns regarding data privacy. Traditional encryption methods preserve confidentiality but render the data unsearchable, severely limiting usability. Searchable Symmetric Encryption (SSE) addresses this limitation by enabling efficient keyword searches over encrypted data, and dynamic SSE further enhances practicality by supporting data updates. However, a critical challenge in dynamic SSE is the trade-off between forward security—ensuring that past queries cannot retrieve results from newly added data—and locality. Locality, defined as the number of non-contiguous storage accesses during a search, is a key metric for I/O efficiency and directly affects search performance. Poor locality causes search latency to increase linearly with keyword frequency, creating a significant performance bottleneck. Existing schemes either constrain the number of updates between searches or degrade update and read efficiency, limiting real-world applicability. This study proposes a novel scheme that transforms existing forward-secure dynamic SSE into locality-optimized variants without compromising key performance metrics such as update and read efficiency.  Methods  The proposed scheme improves locality by reorganizing SSE updates into batched operations. Instead of uploading each update individually, the client temporarily stores updates in a local buffer. Once a predefined threshold is reached, the accumulated updates are uploaded as a single package. Within each package, updates corresponding to the same keyword are stored contiguously to minimize non-contiguous storage accesses, thereby enhancing locality. The transformed scheme retains the use of the underlying forward-secure dynamic SSE to store essential metadata required for extracting the contents of each package during a search, thereby preserving forward security. However, search operations may reveal the storage positions of updates for some keywords within a package, potentially constraining the inferred distribution of updates for other keywords. To address this issue, a secure packaging algorithm is designed to mitigate such leakage and maintain the overall security of the scheme.   Results and Discussion   By implementing client-side buffering and batched updating, the proposed scheme transforms compatible forward-secure dynamic SSE schemes into locality-optimized variants. The integration of a secure packaging algorithm into the batching process ensures that forward security is preserved, as confirmed by a formal security proof, without introducing additional information leakage. A comprehensive evaluation is conducted, comparing a typical forward-secure dynamic SSE scheme (referred to as the original scheme), its transformed variants under various buffer sizes, and an existing locality-optimized forward-secure dynamic SSE scheme. Both theoretical and experimental analyses are performed. Theoretical analysis indicates that although the transformed scheme imposes an upper bound on locality, it maintains similar computational complexity to the original scheme in other critical aspects, such as update and read efficiency. Moreover, its update complexity and read performance outperform those of the existing locality-optimized scheme (Table 1). Experimental results yield three main findings. (1) Although client-side buffering requires additional storage, the overall client storage remains comparable to that of the original scheme (Table 2). (2) Update times are similar to the original scheme and are reduced to between 1% and 10% of those observed in the existing locality-optimized solution (Fig. 4). (3) For low-frequency keywords, search latency moderately increases—by up to 70%—relative to the original scheme. In contrast, for high-frequency keywords, latency is substantially reduced, ranging from 23.1% to 3.5% of that in the original scheme. Overall, the transformed scheme consistently achieves lower search latency than the existing solution (Fig. 5).  Conclusions  This study proposes a novel scheme that transforms forward-secure dynamic SSE into locality-optimized variants through client-side buffering and batched updating, without degrading core performance metrics (e.g., update and read efficiency). A secure packaging algorithm is introduced, and a formal security proof demonstrates that the scheme preserves forward security without incurring additional information leakage. Both theoretical and experimental results show that the scheme significantly improves locality and search efficiency for high-frequency keywords, while maintaining comparable update and read performance for other keywords. A notable limitation is that the scheme requires predefining the total number or an upper bound on different keywords, which restricts flexibility in dynamic environments. Addressing this limitation remains a key direction for future research. Additionally, extending the scheme to operate under malicious server assumptions or to support further security properties, such as backward security, also warrants investigation.
A Chosen-Plaintext Method on SM4: Linear Operation Challenges and the Countermeasures
TANG Xiaolin, FENG Yan, LI Zhiqiang, GUO Ye, GONG Guanfei
2025, 47(8): 2791-2799. doi: 10.11999/JEIT250014
Abstract:
  Objective   With increasing concerns over hardware security, techniques for exploiting hardware vulnerabilities have advanced rapidly. Among these, Side-Channel Attacks (SCAs) have received substantial attention for their ability to extract sensitive information via physical leakage. Power analysis, a prominent form of SCA, has been extensively applied to the Advanced Encryption Standard (AES). However, SM4—a block cipher issued by China’s State Cryptography Administration—presents greater challenges due to its unique linear transformation. Existing chosen-plaintext methods for attacking SM4 still encounter key limitations, including difficulty in constructing four-round chosen plaintexts, recovering initial keys from intermediate values, resolving symmetrical attack ambiguities, and filtering highly correlated incorrect guesses. This study systematically analyzes the root causes of these issues and proposes targeted countermeasures, effectively mitigating the constraints imposed by SM4’s linear operations.   Methods   This study systematically investigates the challenges in chosen-plaintext attacks on SM4 and proposes targeted countermeasures. To enable initial key recovery, the inverse transformation is expressed as a system of linear equations, and a new round-key derivation algorithm is developed. To facilitate the construction of four-round chosen plaintexts, additional critical constraints are incorporated into plaintext generation, yielding leak model expressions that are more concise and plaintext-dependent. To resolve symmetrical attack results, the set of 4-byte round-key candidates is reduced, and incorrect candidates are eliminated through analysis in subsequent rounds. To suppress interference from highly correlated false guesses, an average ranking method is applied.   Results and Discussions   The proposed countermeasures collectively resolve key limitations in chosen-plaintext attacks on SM4 and enhance attack efficiency. The key recovery algorithm (Algorithm 1) integrates Gaussian elimination with Boolean operations to extract round keys. For four-round plaintext construction, detailed expressions for the final three rounds are provided for the first time, expanding the number of valid values from 256 to at least 232 (Table 1), thereby enabling the recovery of four round keys (Fig. 4). The number of symmetrical attack results is reduced from 16 to 2 (Fig. 5), and subsequent round verification identifies the correct candidate (Fig. 6). The average ranking method yields clearer attack traces when analyzing 50,000 plaintexts across 10 groups (Fig. 7).  Conclusions   The proposed countermeasures effectively address the challenges introduced by linear operations in chosen-plaintext attacks on SM4. Correlation Power Analysis (CPA)-based experiments demonstrate that: (1) the key recovery algorithm and plaintext generation strategy enable successful extraction of round keys and reconstruction of the initial key; (2) symmetrical attack results can be resolved using only seven attack sets; and (3) the average ranking method reduces interference from secondary correlation peaks. This study focuses on unprotected SM4 implementations; future work will extend the analysis to masked versions.
Asymptotically Good Multi-twisted Codes over Finite Chain Rings
GAO Jian, CUI Qingxiang, ZHENG Yuqi
2025, 47(8): 2800-2807. doi: 10.11999/JEIT250032
Abstract:
  Objective  This study aims to address the theoretical gap in the asymptotic analysis of multi-twisted codes over finite chain rings and to provide a foundation for their application in high-efficiency communication and secure data transmission. As modern communication systems demand higher data rates, enhanced error resilience, and robust security, the design of error-correcting codes must balance code rate, error correction capability, and implementation complexity. Finite chain rings, as algebraic structures situated between finite fields and general rings, exhibit a hierarchical ideal structure, that enables sophisticated code designs while retaining the algebraic properties of linear codes. Compared to finite fields, codes over finite chain rings achieve flexible error correction and higher information density through homogeneous weights and Gray mapping. However, existing research has focused primarily on multi-twisted codes over finite fields, leaving the asymptotic properties over finite chain rings unexplored. By constructing 1-generator multi-twisted codes, this work is the first to prove their asymptotic goodness over finite chain rings—i.e., the existence of infinite code sequences\begin{document}$ {\mathcal{C}_i} $\end{document}with code rate\begin{document}$ R\left( {{\mathcal{C}_i}} \right) $\end{document}and relative distance\begin{document}$ {{\Delta }}\left( {{\mathcal{C}_i}} \right) $\end{document}bounded below as code lengths approach infinity. This result not only demonstrates the attainability of Shannon’s Second Theorem in finite chain ring coding but also offers novel solutions for practical systems, such as quantum-resistant encrypted communication and reliable transmission in high-noise channels.  Methods  In the basic concepts section, the structure of a finite chain ring is defined, utilizing its ideal chain structure to study code generation and properties. The concepts of homogeneous weight are introduced, and the homogeneous distance \begin{document}${d_{\hom }}$\end{document} is established to quantify error correction capabilities. A Gray map is constructed to transform the distance problems over finite chain rings into Hamming distance problems over finite fields. To study the asymptotic properties of multi-twisted codes, 1-generator multi-twisted codes are defined using the module structure of \begin{document}$ {{R}}\left[ x \right] $\end{document}, and their free condition is discussed, as demonstrated in Theorem 1: Each subcode \begin{document}$ {\mathcal{C}_i} = \left\langle {{a_i}\left( x \right)} \right\rangle $\end{document} must be a free constant cyclic code, and the rank of \begin{document}$ {\mathcal{C}_i} $\end{document} is determined by the degree of the check polynomial\begin{document}$ h(x) $\end{document}. The asymptotic properties of multi-twisted codes with identical block lengths, which are simpler to analyze than those with varying block lengths are considered. The selection of generators \begin{document}$ ({a_1}(x),{a_2}(x), \ldots ,{a_l}(x)) $\end{document} is treated as a random process, defining a probability space. By introducing the \begin{document}${q^s}$\end{document}-ary entropy function\begin{document}$ H(x) = x{\log _{{q^s}}}({q^s} - 1) - x{\log _{{q^s}}}x - (1 - x){\log _{{q^s}}}(1 - x) $\end{document}, the code rate\begin{document}$ R(\mathcal{C}) $\end{document}and the relative distance \begin{document}$ {{\Delta }}(\mathcal{C}) $\end{document} are analyzed. The Chinese Remainder Theorem is applied to decompose the finite chain ring into the direct product of local rings, transforming the global ideal analysis into localized studies to reduce complexity. Finally, it is proven that the relative homogeneous distance and the rate of multi-twisted codes are positively bounded from below. As the code length\begin{document}$ i \to \infty $\end{document}, the relative distance of the code satisfies \begin{document}$ \Pr \left( {{{\Delta }}\left( {{\mathcal{C}^{\prime (i)}}} \right) \ge \delta } \right) = 1 $\end{document}(Theorem 2) and \begin{document}$ \Pr \left( {{\text{rank}}\left( {{\mathcal{C}^{\prime (i)}}} \right) = {m_i} - 1} \right) = 1 $\end{document}(Theorem 3), leading to the conclusion that this class of multi-twisted codes over finite chain rings is asymptotically good.  Results and Discussions  This paper systematically constructs a class of 1-generator multi-twisted codes (Label 1) over finite chain rings and demonstrates that these codes are asymptotically good based on probabilistic methods and the Chinese Remainder Theorem. This constitutes the first analysis of the asymptotic properties of such codes over finite chain rings. Previous studies on the asymptotic properties of codes have primarily focused on codes over finite fields (e.g., cyclic and quasi-cyclic codes). By leveraging the hierarchical ideal structures of rings (e.g., homogeneous weight and the Chinese Remainder Theorem), the analytical complexity inherent to rings is overcome, thereby extending the scope of asymptotically good codes. This work extends classical finite-field random code analysis to finite chain rings, addressing the complexity of distance computation through complexity via homogeneous weights and Gray mappings. Additionally we leverage the bijection between q-cyclotomic cosets modulo \begin{document}${{M}}$\end{document} and irreducible factors of \begin{document}${x^{{M}}} - 1$\end{document}, combined with CRT-based ideal decomposition, significantly simplifies the asymptotic analysis (Lemma 4).  Conclusions  The asymptotic goodness of multi-twisted codes over finite chain rings has been systematically resoloved, addressing a critical theoretical gap. By constructing 1-generator free codes and applying probabilistic methods combined with the Chinese Remainder Theorem, this work provides the first proof of infinite code sequences over finite chain rings that approach Shannon’s theoretical limits in terms of code rate and relative distance. These codes are suitable for high-frequency communications in 5G/6G networks, deep-space links, and other noisy environments, offering enhanced spectral efficiency through high code rates and robust error correction. This result not only extends the algebraic framework of coding theory but also provides a new coding scheme with strong anti-interference capabilities and high security for practical communication systems. Future research may extend these findings to more complex ring structures and practical application scenarios, further advancing the application of coding theory in the information age.
A Black-Box Query Adversarial Attack Method for Signal Detection Networks Based on Sparse Subspace Sampling
LI Dongyang, WANG Linyuan, PENG Jinxian, MA Dekui, YAN Bin
2025, 47(8): 2808-2818. doi: 10.11999/JEIT241019
Abstract:
  Objective  The application of deep neural networks to signal detection has raised concerns regarding their vulnerability to adversarial example attacks. In black-box attack scenarios, where internal model information is inaccessible, this paper proposes a black-box query adversarial attack method based on sparse subspace sampling. The method offers an approach to evaluate the robustness of signal detection networks under black-box conditions, providing theoretical support and practical guidance for improving the reliability and robustness of these networks.  Methods  This study combines the characteristics of signal detection networks with the attack objective of reducing the recall rate of signal targets. The disappearance ratio of detected signal targets is used as the constraint for determining attack success, forming an adversarial attack model for signal detection networks. Based on the HopSkipJumpAttack (HSJA) algorithm, a black-box query adversarial attack method for signal detection networks is designed, which generates adversarial examples by approaching the model’s decision boundary. To further improve query efficiency, a sparse subspace query adversarial attack method is proposed. This approach constructs sparse subspace sampling based on the characteristics of signal adversarial perturbations. Specifically, during the generation of adversarial examples, signal components with large amplitudes are selected in proportion, and only these components are perturbed.  Results and Discussions  Experimental results show that under a decision boundary condition with a signal target disappearance ratio of 0.3, the proposed sparse subspace sampling black-box adversarial attack method reduces the mean Average Precision (mAP) by 43.6% and the recall rate by 41.2%. Under the same number of queries, all performance metrics for the sparse subspace sampling method exceed those of the full-space sampling approach, demonstrating improved attack effectiveness, with the success rate increasing by 2.5% (Table 2). In terms of signal perturbation intensity, the proposed method effectively reduces perturbation intensity through iterative optimization under both sampling spaces. At the beginning of the iterations, the perturbation energies for the two spaces are similar. As the number of query rounds increases, the perturbation energy required for sparse subspace sampling becomes slightly lower than that of full-space sampling, and the difference continues to widen. The average adversarial perturbation energy ratio for full-space sampling is 5.18%, whereas sparse subspace sampling achieves 5.00%, reflecting a 3.47% reduction relative to full-space sampling (Fig. 4). For waveform perturbations, both sampling strategies proposed in this study can generate effective adversarial examples while preserving the primary waveform characteristics of the original signal. Specifically, the full-space query method applies perturbations to every sampling point, whereas the sparse subspace query method selectively perturbs only the large-amplitude signal components, leaving other components unchanged (Fig. 5). This selective approach provides the sparse subspace method with a notable l0-norm control property for adversarial perturbations, minimizing the number of perturbed components without compromising attack performance. In contrast, the full-space sampling method focuses on optimizing the l2-norm of perturbations, without achieving this selective control.  Conclusions  This study proposes a black-box query adversarial attack method for signal detection networks based on sparse subspace sampling. The disappearance ratio of detected signal targets is used as the success criterion for attacks, and an adversarial example generation model for signal detection networks is established. Drawing on the HSJA algorithm, a decision-boundary-based black-box query attack method is designed to generate adversarial signal examples. To further enhance query efficiency, a sparse subspace sampling strategy is constructed based on the characteristics of signal adversarial perturbations. Experimental results show that under a decision boundary with a target disappearance ratio of 0.3, the proposed sparse subspace sampling black-box attack method reduces the mAP of the signal detection network by 43.6% and the recall rate by 41.2%. Compared with full-space sampling, sparse subspace sampling increases the attack success rate by 2.5% and reduces the average perturbation energy ratio by 3.47%. The sparse subspace method significantly degrades signal detection network performance while achieving superior attack efficiency and lower perturbation intensity relative to full-space sampling. Furthermore, the full-space query method introduces perturbations at all sampling points, whereas the sparse subspace method selectively perturbs only the high-amplitude signal components, leaving other components unchanged. This approach enforces l0-norm sparsity constraints, minimizing the number of perturbed components without compromising attack effectiveness. The proposed method provides a practical solution for evaluating the robustness of signal detection networks under black-box conditions and offers theoretical support for improving the reliability of these networks against adversarial threats.
Image and Intelligent Information Processing
Multi-objective Remote Sensing Product Production Task Scheduling Algorithm Based on Double Deep Q-Network
ZHOU Liming, YU Xi, FAN Minghu, ZUO Xianyu, QIAO Baojun
2025, 47(8): 2819-2829. doi: 10.11999/JEIT250089
Abstract:
  Objective  Remote sensing product generation is a multi-task scheduling problem influenced by dynamic factors, including resource contention and real-time environmental changes. Achieving adaptive, multi-objective, and efficient scheduling remains a central challenge. To address this, a Multi-Objective Remote Sensing scheduling algorithm (MORS) based on a Double Deep Q-Network (DDQN) is proposed. A subset of candidate algorithms is first identified using a value-driven, parallel-executable screening strategy. A deep neural network is then designed to perceive the characteristics of both remote sensing algorithms and computational nodes. A reward function is constructed by integrating algorithm execution time and node resource status. The DDQN is employed to train the model to select optimal execution nodes for each algorithm in the processing subset. This approach reduces production time and enables load balancing across computational nodes.  Methods  The MORS scheduling process comprises two stages: remote sensing product processing and screening, followed by scheduling model training and execution. A time-triggered strategy is adopted, whereby all newly arrived remote sensing products within a predefined time window are collected and placed in a task queue. For efficient scheduling, each product is parsed into a set of executable remote sensing algorithms. Based on the model illustrated in Figure 2, the processing unit extracts all constituent algorithms to form an algorithm set. An optimal subset is then selected using a value-driven parallel-executable screening strategy. The scheduling process is modeled as a Markov decision process, and the DDQN is applied to assign each algorithm in the selected subset to the optimal virtual node.  Results and Discussions  Simulation experiments use varying numbers of tasks and nodes to evaluate the performance of MORS. Comparative analyses are conducted against several baseline scheduling algorithms, including First-Come, First-Served (FCFS), Round Robin (RR), Genetic Algorithm (GA), Deep Q-Network (DQN), and Dueling Deep Q-Network (Dueling DQN). The results demonstrate that MORS outperforms all other algorithms in terms of scheduling efficiency and adaptability in remote sensing task scheduling. The learning rate, a critical hyperparameter in DDQN, influences the step size for parameter updates during training. When the learning rate is set to 0.00001, the model fails to converge even after 5,000 iterations due to extremely slow optimization. A learning rate of 0.0001 achieves a balance between convergence speed and training stability, avoiding oscillations associated with overly large learning rates (Figure 3 and Figure 4). The corresponding DDQN loss values show a steady decline, reflecting effective optimization and gradual convergence. In contrast, the unpruned DDQN initially declines sharply but plateaus prematurely, failing to reach optimal convergence. DDQN without soft updates shows large fluctuations in loss and remains unstable during later training stages, indicating that the absence of soft updates impairs convergence (Figure 5). Regarding decision quality, the reward values of DDQN gradually approach 25 in the later training stages, reflecting stable convergence and strong decision-making performance. Conversely, DDQN models without pruning or soft updates display unstable reward trajectories, particularly the latter, which exhibits pronounced reward fluctuations and slower convergence (Figure 6). A comparison of DQN, Dueling DQN, and DDQN reveals that all three show decreasing training loss, suggesting continuous optimization (Figure 7). However, the reward curve of Dueling DQN shows higher volatility and reduced stability (Figure 8). To further assess scalability, four sets of simulation experiments use 30, 60, 90, and 120 remote sensing tasks, with the number of virtual machine nodes fixed at 15. Each experimental configuration is evaluated using 100 Monte Carlo iterations to ensure statistical robustness. DDQN consistently shows superior performance under high-concurrency conditions, effectively managing increased scheduling pressure (Table 7). In addition, DDQN exhibits lower standard deviations in node load across all task volumes, reflecting more balanced resource allocation and reduced fluctuation in system utilization (Table 8 and Table 9).  Conclusions  The proposed MORS algorithm addresses the variability and complexity inherent in remote sensing task scheduling. Experimental results demonstrate that MORS not only improves scheduling efficiency but also significantly reduces production time and achieves balanced allocation of node resources.
A Multi-Agent Path Finding Strategy Combining Selective Communication and Conflict Resolution
WANG Yu, ZHANG Xuxiu
2025, 47(8): 2830-2840. doi: 10.11999/JEIT250122
Abstract:
  Objective  The rapid development of intelligent manufacturing, automated warehousing, and Internet of Things technologies has made Multi-Agent Path Finding (MAPF) a key approach for addressing complex coordination tasks. Traditional centralized methods face limitations in large-scale multi-agent systems due to excessive communication load, high computational complexity, and susceptibility to path conflicts and deadlocks. Existing methods rely on broadcast-based communication, which leads to information overload and poor scalability. Furthermore, current conflict resolution strategies are static and overly simplistic, making them ineffective for dynamically balancing task priorities and environmental congestion. This study proposes an MAPF strategy based on selective communication and hierarchical conflict resolution to optimize communication efficiency, reduce path deviations and deadlocks, and improve path planning performance and task completion rates in complex environments.  Methods  The proposed Decision Causal Communication with Prioritized Resolution (DCCPR) method integrates reinforcement learning and the A* algorithm and introduces selective communication with hierarchical conflict resolution. A dynamic joint masking decision mechanism enables targeted agent selection within the selective communication framework. The model is instantiated and validated using the Dueling Double Deep Q-Network (D3QN) algorithm, which dynamically selects agents for communication, reducing information redundancy, lowering communication overhead, and enhancing computational efficiency. The Q-network reward function incorporates expected paths generated by the A* algorithm, penalizing path deviations and cumulative congestion to guide agents toward low-congestion routes, thereby accelerating task completion. A hierarchical conflict resolution strategy is also proposed, which considers target distance, task Q-values, and task urgency. By combining dynamic re-planning using the A* algorithm with a turn-taking mechanism, this approach effectively resolves conflicts, enables necessary detours to avoid collisions, increases task success rates, and reduces average task completion time.  Results and Discussions  The experimental results show that the DCCPR method outperforms conventional approaches in task success rate, computational efficiency, and path planning performance, particularly in large-scale and complex environments. In terms of task success rate, DCCPR demonstrates superior performance across random maps of different sizes. In the 40 × 40 random map environment (Fig. 6), DCCPR consistently maintains a success rate above 90%, significantly higher than other baseline methods, with no apparent decline as the number of agents increases. In contrast, methods such as DHC and PRIMAL exhibit substantial performance degradation, with success rates dropping below 50% as agent numbers grow. DCCPR reduces communication overhead through its selective communication mechanism, while the hierarchical conflict resolution strategy minimizes path conflicts, maintaining stable performance even in high-density environments. In the 80 × 80 map (Fig. 7), under extreme conditions with 128 agents, DCCPR’s success rate remains above 90%, confirming its applicability to both small-scale and large-scale, complex scenarios. DCCPR also achieves significantly improved computational efficiency. In the 40 × 40 map (Fig. 8), it records the shortest average episode length among all methods, and the increase in episode length with higher agent numbers is substantially lower than that observed in other approaches. In the 80 × 80 environment (Fig. 9), despite the larger map size, DCCPR consistently maintains the shortest episode length. The hierarchical conflict resolution strategy effectively reduces path conflicts and prevents deadlocks. In environments with dense obstacles and high agent numbers, DCCPR dynamically adjusts task priorities and employs a turn-taking mechanism to mitigate delays caused by path competition. Moreover, in structured map environments not encountered during training, DCCPR maintains high success rates and efficiency (Table 2), demonstrating strong scalability. Compared to baseline methods, DCCPR achieves approximately a 79% improvement in task success rate and a 46.4% reduction in average episode length. DCCPR also performs well in warehouse environments with narrow passages, where congestion typically presents challenges. Through turn-taking and dynamic path re-planning, agents are guided toward previously unused suboptimal paths, reducing oscillatory behavior and lowering the risk of task failure. Overall, DCCPR sustains high computational efficiency while maintaining high success rates, effectively addressing the challenges of multi-agent path planning in complex dynamic environments.  Conclusions  The DCCPR method proposed in this study provides an effective solution for multi-agent path planning. Through selective communication and hierarchical conflict resolution, DCCPR significantly improves path planning efficiency and task success rates. Experimental results confirm the strong adaptability and stability of DCCPR across diverse complex environments, particularly in dynamic scenarios, where it effectively reduces conflicts and enhances system performance. Future work will focus on refining the communication strategy by integrating global and local communication benefits to improve performance in large-scale environments. In addition, real-world factors such as dynamic environmental changes and the energy consumption of intelligent agents will be considered to further enhance the system’s effectiveness.
Long-Term Trajectory Prediction Model Based on Points of Interest and Joint Loss Function
ZHOU Chuanxin, JIAN Gang, LI Lingshu, YANG Yi, HU Yu, LIU Zhengming, ZHANG Wei, RAO Zhenzhen, LI Yunxiao, WU Chao
2025, 47(8): 2841-2849. doi: 10.11999/JEIT250011
Abstract:
  Objective  With the rapid development of modern maritime and aerospace sectors, trajectory prediction plays an increasingly critical role in applications such as ship scheduling, aviation, and security. Growing demand for higher prediction accuracy exposes limitations in traditional methods, such as Kalman filtering and Markov chains, which struggle with complex, nonlinear trajectory patterns and fail to meet practical needs. In recent years, deep learning techniques, including LSTM, GRU, CNN, and TCN models, have demonstrated notable advantages in trajectory prediction by effectively capturing time series features. However, these models still face challenges in representing the heterogeneity and diversity of trajectory data, with limited capacity to extract features from multidimensional inputs. To address these gaps, this study proposes a long-term trajectory prediction model, PL-Transformer, based on points of interest and a joint loss function.  Methods  Building on the TrAISformer framework, the proposed PL-Transformer incorporates points of interest and a joint loss function to enhance long-term trajectory prediction. The model defines the positions of points of interest within the prediction range using expert knowledge and introduces correlation features between trajectory points and points of interest. These features are integrated into a sparse data representation that improves the model’s ability to capture global trajectory patterns, addressing the limitation of conventional Transformer models, which primarily focus on local feature changes. Additionally, the model employs a joint loss function that links latitude and longitude predictions with feature losses associated with points of interest. This approach leverages inter-feature loss relationships to enhance the model’s capability for accurate long-term trajectory prediction.  Results and Discussions  The convergence performance of the PL-Transformer model is evaluated by analyzing the variation in training and validation losses and comparing them with those of the TrAISformer model. The corresponding loss curves are presented in (Fig. 5). The PL-Transformer model exhibits faster convergence and improved training stability on both datasets. These results indicate that the introduction of the joint loss function enhances convergence efficiency and training stability, yielding performance superior to the TrAISformer model. In terms of short-term prediction accuracy, the results in Table 1 show that the PL-Transformer model achieves comparable overall prediction accuracy to the TrAISformer model. The PL-Transformer model performs better in terms of the Mean Absolute Percentage Error (MAPE) metric, while it shows slightly higher errors than the TrAISformer model for Mean Absolute Error (MAE), median Absolute Error (MdAE), and coefficient of determination (R2). For the widely used Mean Squared Error (MSE) metric, both models perform similarly. These results indicate that after incorporating points of interest and optimizing the loss function, the PL-Transformer model retains competitive performance in relative error control and fitting accuracy, while preserving the stability and robustness of the TrAISformer model in complex trajectory prediction tasks. For long-term prediction visualization, Table 2 presents the loss values for both models across medium to long-term prediction horizons (1 to 3 h). The PL-Transformer model achieves better long-term prediction accuracy than the TrAISformer model. Specifically, the loss for the PL-Transformer model increases from 2.058 (1 h) to 5.561 (3 h), whereas the TrAISformer model’s loss rises from 2.160 to 6.145 over the same period. In terms of time complexity analysis, although the PL-Transformer model incorporates additional feature engineering and joint loss computation steps, these enhancements do not substantially increase the overall time complexity. The total computational complexity of the PL-Transformer model remains consistent with that of the TrAISformer model.  Conclusions  This study proposes the PL-Transformer model, which incorporates points of interest and an optimized loss function to address the challenges posed by complex dynamic features and heterogeneity in trajectory prediction tasks. By introducing distance and bearing angle through feature engineering and designing a joint loss function, the model effectively learns and captures spatial and motion characteristics within trajectory data. Experimental results demonstrate that the PL-Transformer model achieves higher prediction accuracy, faster convergence, and greater robustness than the TrAISformer model and other widely used baseline models, particularly in long-term and complex dynamic trajectory prediction scenarios. Despite the strong performance of the PL-Transformer model in experimental settings, trajectory prediction tasks in real-world applications remain affected by various challenges, including data noise, high-frequency trajectory fluctuations, and the influence of external environmental factors. Future research will focus on improving the model’s adaptability to multimodal trajectory data, integrating multi-source information to enhance generalization capability, and incorporating additional feature engineering and optimization strategies to address more complex prediction tasks. In summary, the proposed PL-Transformer model provides an effective advancement for Transformer-based trajectory prediction frameworks and offers valuable reference for practical applications in trajectory forecasting and related fields.
Detection and Interaction Analysis of Place Cell Firing Information in Dual Brain Regions of Awake Active Rats
LI Ming, XU Wei, XU Zhaojie, MO Fan, YANG Gucheng, LV Shiya, LUO Jinping, JIN Hongyan, LIU Juntao, CAI Xinxia
2025, 47(8): 2850-2858. doi: 10.11999/JEIT250024
Abstract:
  Objective  Continuous monitoring of neural activities in free-moving rats is essential for understanding brain function but presents significant challenges regarding the stability and biocompatibility of the detection device. This study aims to provide comprehensive data on brain activity by simultaneously monitoring two brain regions. This approach is crucial for elucidating the neural encoding differences within these regions and the information exchange between them, both of which are integral to spatial memory and navigation processes. Spatial navigation is a fundamental behavior in rats, vital for their survival and interaction with their environment. Central to this behavior are place cells—neurons that selectively respond to an animal’s location, forming the basis of spatial memory and navigation. This study focuses on the hippocampal CA1 region and the Barrel Cortex (BC), both of which are critical for spatial processing. By monitoring these regions simultaneously, the aim is to uncover the neural dynamics underlying spatial memory formation and retrieval. Understanding these dynamics provides insights into the neural mechanisms of spatial cognition and memory, which are fundamental to higher cognitive functions and are often disrupted in neurological disorders such as Alzheimer’s disease and schizophrenia.  Methods  To achieve dual brain region monitoring, a four-electrode MicroElectrode Array (MEA) is designed to conform to the shape of the dual brain regions and is surface-modified with a Polypyrrole/Silver Nanowire (PPy/AgNW) nanocomposite material. Each probe of the MEA consists of eight recording sites with a diameter of 20 μm and one reference site. The MEA is fabricated using Microelectromechanical Systems (MEMS) technology and modified via an electrochemical deposition process. The PPy/AgNW nanocomposite modification is selected for its low impedance and high biocompatibility, which are critical for stable, long-term recordings. The deposition of PPy/AgNW is carried out using cyclic voltammetry. The stability of the modified MEA is assessed by cyclic voltammetry in phosphate-buffered saline to simulate in vivo charge/discharge processes. The MEA is then implanted into the CA1 and BC regions of rats, and neural activities are recorded during a two-week spatial memory task. Spike signals are analyzed to identify place cells and assess their firing patterns, while Local Field Potential (LFP) power is measured to evaluate overall neural activity. Mutual information analysis is performed to quantify the interaction between the two brain regions. The experimental setup includes a behavior arena where rats perform spatial navigation tasks, with continuous neural signal recording using the modified MEA.  Results and Discussions  The PPy/AgNW-modified MEA exhibits low impedance (53.01 ± 2.59 kΩ) at 1 kHz (Fig. 2). This low impedance is critical for high-fidelity signal acquisition, enabling the detection of subtle neural activities. The stability of the MEA is evaluated through 1000 cycles of cyclic voltammetry scanning, demonstrating high capacitance retention (92.51 ± 2.21%) and no significant increase in impedance (Fig. 3). These results suggest that the MEA maintains stable performance over extended periods, which is essential for long-term in vivo monitoring. The modified MEA successfully detects neural activities from the BC and CA1 regions over the two-week period. The average firing rates and LFP power in both regions progressively increase, indicating enhanced neural activity as the rats become more familiar with the spatial memory task (Fig. 4). This increase suggests that the rats’ spatial memory and navigation abilities improve over time, likely due to increased familiarity with the environment and task requirements. Place cells are identified in the recorded neurons, confirming the presence of spatially selective neuronal activity (Fig. 5). The identification of place cells is a key finding, as these neurons are fundamental to spatial memory and navigation. Additionally, the spatial stability of place cells in the CA1 region is higher than in the BC region, indicating functional differences between these areas in spatial memory processing (Fig. 5). This suggests that the CA1 region plays a more critical role in spatial memory consolidation. Mutual information analysis reveals significant information exchange between the dual brain regions during the initial memory phase, suggesting a role in memory storage (Fig. 6). This inter-regional communication is crucial for understanding how spatial information is processed and stored in the brain. The observed increase in mutual information over time indicates that the interaction between the BC and CA1 regions becomes more pronounced as the rats engage in spatial navigation, highlighting the dynamic nature of neural interactions during memory formation and retrieval.  Conclusions  This study successfully demonstrated continuous dual brain region monitoring in freely moving rats using a PPy/AgNW-modified MEA. The findings reveal dynamic interactions between the BC and CA1 regions during spatial memory tasks and highlight the importance of place cells in memory formation. Monitoring neural activities in dual brain regions over extended periods provides new insights into the neural basis of spatial memory and navigation. The results suggest that the CA1 region plays a critical role in spatial memory consolidation, while the BC region also contributes to spatial processing. This distinction highlights the value of studying multiple brain regions simultaneously to gain a comprehensive understanding of neural processes. The PPy/AgNW-modified MEA serves as a powerful tool for investigating the complex neural mechanisms underlying spatial cognition and memory, with potential applications in related neurological disorders.
Personalized Tensor Decomposition Based High-order Complementary Cloud API Recommendation
SUN Mengmeng, LIU Xiaowei, CHEN Wenhui, SHEN Limin, YOU Dianlong, CHEN Zhen
2025, 47(8): 2859-2871. doi: 10.11999/JEIT250003
Abstract:
  Objective  With the emergence of the cloud era in the Internet of Things, cloud Application Programming Interfaces (APIs) have become essential for managing data element dynamics, facilitating AI algorithm implementation, and coordinating access to computing resources. Cloud APIs have developed into critical digital infrastructure that supports the digital economy and the operation of service-oriented software. However, the rapid expansion of cloud APIs has impacted users’ decision-making processes and complicated the promotion of cloud APIs. This situation underscores the urgent need for effective cloud API recommendation methods to foster the development of the API economy and encourage the widespread adoption of cloud APIs. While existing research has focused on modeling invocation preferences, search keywords, or a combination of both to recommend suitable cloud APIs for a given Mashup, it does not address the need for personalized high-order complementary cloud APIs in practical software development. Personalized high-order complementary cloud API recommendation aims to provide developers with APIs that align with their personalized invocation preferences and complement the other APIs in their query set, thereby addressing the developers’ joint interests.  Methods  To address this issue, a Personalized Tensor Decomposition-based High-order Complementary cloud API Recommendation (PTDHCR) method is proposed. First, the invocation relationships between Mashups and cloud APIs, as well as the complementary relationships between cloud APIs, are represented as a three-dimensional tensor. RECAL tensor decomposition is applied to jointly learn and uncover personalized asymmetric complementary relationships between cloud APIs. Second, a personalized high-order complementary perception network is designed to account for the varying influence of different complementary relationships on recommendations. This network dynamically calculates the attention of a Mashup to the complementary relationships between different query and candidate cloud APIs using the multi-modal features of the Mashup, query cloud APIs, and candidate cloud APIs. Finally, the personalized complementary relationships are extended to higher orders, yielding a comprehensive personalized complementarity between candidate cloud APIs and the query set.  Results and Discussions  Extensive experiments are conducted on two real cloud API datasets. First, PTDHCR is compared with 11 baseline methods suitable for personalized high-order complementary cloud API recommendation. The experimental results (Tables 2 and 3) show that, on the PWA dataset, PTDHCR outperforms the best baseline by 0.12%, 0.14%, 1.46%, and 2.93% in terms of AUC. HR@10 improves by 0.91%, 1.01%, 3.45%, and 10.84%, while RMSE decreases by 0.33%, 0.7%, 1.36%, and 2.67%. PTDHCR also performs well on the HGA dataset, significantly outperforming the baseline methods in AUC, HR@10, and RMSE metrics. Second, experiments are conducted with varying complementary thresholds to evaluate PTDHCR’s performance at different complementary orders. The experimental results (Figure 4) indicate that PTDHCR’s recommendation performance improves progressively as the complementary order increases. This improvement is attributed to the method’s ability to incorporate more complementary information, thereby enhancing its recommendation capability. Next, a comparison experiment is performed to assess whether the personalized high-order complementary perception network can better capture high-order complementary relationships than the mean-value and semantic similarity-based methods. The experimental results (Figures 5 and 6) demonstrate that the personalized high-order complementary perception network outperforms other methods. This is due to the network’s ability to consider the contribution of different complementary relationships and dynamically compute the Mashup’s attention to each complementary relationship. Finally, an example is provided, evaluating the predicted probability of a Mashup invoking other candidate cloud APIs, given that it has already invoked the “Google Maps API” and the “Google AdSense API.” This example illustrates the personalized nature of the high-order complementary cloud API recommendation achieved by the PTDHCR method.  Conclusions  Existing methods fail to address the actual needs of developers for personalized high-order complementary cloud APIs in the development of service-oriented software. This paper defines the recommendation problem of personalized high-order complementary cloud APIs and proposes a solution. A personalized high-order complementary cloud API recommendation method based on tensor decomposition is introduced. Initially, the invocation relationships between Mashups and cloud APIs, as well as the complementary relationships between cloud APIs, are modeled as a three-dimensional tensor. RECAL tensor decomposition technology is then applied to jointly learn and explore the personalized asymmetric complementary relationships. Additionally, a high-order complementary perception network is constructed to dynamically compute Mashups’ attention towards various complementary relationships, which extends these relationships to higher orders. Experimental results show that PTDHCR outperforms state-of-the-art cloud API recommendation methods on real cloud API datasets. PTDHCR offers an effective approach to address the cloud API selection problem and contributes to the healthy development and popularization of the cloud API economy.
LFTA:Lightweight Feature Extraction and Additive Attention-based Feature Matching Method
GUO Zhiqiang, WANG Zihan, WANG Yongsheng, CHEN Pengyu
2025, 47(8): 2872-2882. doi: 10.11999/JEIT250124
Abstract:
  Objective  With the rapid development of deep learning, feature matching has advanced considerably, particularly in computer vision. This progress has led to improved performance in tasks such as 3D reconstruction, motion tracking, and image registration, all of which depend heavily on accurate feature matching. Nevertheless, current techniques often face a trade-off between accuracy and computational efficiency. Some methods achieve high matching accuracy and robustness but suffer from slow processing due to algorithmic complexity. Others offer faster processing but compromise matching accuracy, especially under challenging conditions such as dynamic scenes, low-texture environments, or large view-angle variations. The key challenge is to provide a balanced solution that ensures both accuracy and efficiency. To address this, this paper proposes a Lightweight Feature exTraction and matching Algorithm (LFTA), which integrates an additive attention mechanism within a lightweight architecture. LFTA enhances the robustness and accuracy of feature matching while maintaining the computational efficiency required for real-time applications.  Methods  LFTA utilizes a multi-scale feature extraction network designed to capture information from images at different levels of detail. A triple-exchange fusion attention mechanism merges information across multiple dimensions, including spatial and channel features, allowing the network to learn more robust feature representations. This mechanism improves matching accuracy, particularly in scenarios with sparse textures or large viewpoint variations. LFTA further integrates an adaptive Gaussian kernel to dynamically generate keypoint heatmaps. The kernel adjusts according to local feature strength, enabling accurate keypoint extraction in both high-response and low-response regions. To improve keypoint precision, a dynamic Non-Maximum Suppression (NMS) strategy is applied, which adapts to varying keypoint densities across different image regions. This approach reduces redundancy and improves detection accuracy. In the final stage, LFTA employs a lightweight module with an additive Transformer attention mechanism to refine feature matching. This module strengthens feature fusion while reducing computational complexity through depthwise separable convolutions. These operations substantially lower parameter count and computational cost without affecting performance. Through this combination of techniques, LFTA achieves accurate pixel-level matching with fast inference times, making it suitable for real-time applications.  Results and Discussions  The performance of LFTA is assessed through extensive experiments conducted on two widely used and challenging datasets: MegaDepth and ScanNet. These datasets offer diverse scenarios for evaluating the robustness and efficiency of feature matching methods, including variations in texture, environmental complexity, and viewpoint changes. The results indicate that LFTA achieves higher accuracy and computational efficiency than conventional feature matching approaches. On the MegaDepth dataset, an AUC@20° of 79.77% is attained, which is comparable to or exceeds state-of-the-art methods such as LoFTR. Notably, this level of performance is achieved while reducing inference time by approximately 70%, supporting the suitability of LFTA for practical, time-sensitive applications. When compared with other efficient methods, including Xfeat and Alike, LFTA demonstrates superior matching accuracy with only a marginal increase in inference time, proving its competitive performance in both accuracy and speed. The improvement in accuracy is particularly apparent in scenarios characterized by sparse textures or large viewpoint variations, where traditional methods often fail to maintain robustness. Ablation studies confirm the contribution of each LFTA component. Exclusion of the triple-exchange fusion attention mechanism results in a significant reduction in accuracy, indicating its function in managing complex feature interactions. Similarly, both the adaptive Gaussian kernel and dynamic NMS are found to improve keypoint extraction, emphasizing their roles in enhancing overall matching precision.  Conclusions  The LFTA algorithm addresses the long-standing trade-off between feature extraction accuracy and computational efficiency in feature matching. By integrating the triple-exchange fusion attention mechanism, adaptive Gaussian kernels, and lightweight fine-tuning strategies, LFTA achieves high matching accuracy in dynamic and complex environments while maintaining low computational requirements. Experimental results on the MegaDepth and ScanNet datasets demonstrate that LFTA performs well under typical feature matching conditions and shows clear advantages in more challenging scenarios, including low-texture regions and large viewpoint variations. Given its efficiency and robustness, LFTA is well suited for real-time applications such as Augmented Reality (AR), autonomous driving, and robotic vision, where fast and accurate feature matching is essential. Future work will focus on further optimizing the algorithm for high-resolution images and more complex scenes, with the potential integration of hardware acceleration to reduce computational overhead. The method could also be extended to other computer vision tasks, including image segmentation and object detection, where reliable feature matching is required.
Autonomous Teaming and Task Collaboration for Multi-Agent Systems in Dynamic Environments
WANG Chen, ZHU Cheng, LEI Hongtao
2025, 47(8): 2883-2894. doi: 10.11999/JEIT250079
Abstract:
  Objective  In dynamic and volatile battlefield environments, where the command structure of combat units may be disrupted, combat units must autonomously form appropriate tactical groups in edge operational settings, determine group affiliation, and rapidly allocate tasks. This study proposes a combat unit aggregation and planning method based on an adaptive clustering contract network, addressing the real-time limitations of traditional centralized optimization algorithms. The proposed method enables collaborative decision-making for autonomous group formation and supports multi-task optimization and allocation under dynamic battlefield conditions.  Methods  (1) An adaptive combat group division algorithm based on the second-order relative change rate is proposed. The optimal number of groups is determined using the Sum of Squared Errors (SSE) indicator, and spatial clustering of combat units is performed via an improved K-means algorithm. (2) A dual-layer contract network architecture is designed. In the first layer, combat groups participate in bidding by computing the net effectiveness of tasks, incorporating attributes such as attack, defense, and value. In the second layer, individual combat units conduct bidding with a load balancing factor to optimize task selection. (3) Mechanisms for task redistribution and exchange are introduced, improving global utility through a secondary bidding process that reallocates unassigned tasks and replaces those with negative effectiveness.  Results and Discussions  (1) The adaptive combat group division algorithm demonstrates enhanced situational awareness (Algorithm 1). Through dynamic clustering analysis, it accurately captures the spatial aggregation of combat units (Fig. 6 and Fig. 9), showing greater adaptability to environmental variability than conventional fixed-group models. (2) The multi-layer contract network architecture exhibits marked advantages in complex task allocation. The group-level pre-screening mechanism significantly reduces computational overhead, while the unit-level negotiation process improves resource utilization by incorporating load balancing. (3) The dynamic task optimization mechanism enables continuous refinement of the allocation scheme. It resolves unassigned tasks and enhances overall system effectiveness through intelligent task exchanges. Comparative experiments confirm that the proposed framework outperforms traditional approaches in task coverage and resource utilization efficiency (Table 4 and Table 5), supporting its robustness in dynamic battlefield conditions.  Conclusions  This study integrates clustering analysis with contract network protocols to establish an intelligent task allocation framework suited to dynamic battlefield conditions. By implementing dual-layer optimization in combat group division and task assignment, the approach improves combat resource utilization and shortens the kill chain. Future research will focus on validating the framework in multi-domain collaborative combat scenarios, refining bidding strategies informed by combat knowledge, and advancing command and control technologies toward autonomous coordination.
YOMANet-Accel: A Lightweight Algorithm Accelerator for Pedestrians and Vehicles Detection at the Edge
CHEN Ningjiang, LU Yaozong
2025, 47(8): 2895-2908. doi: 10.11999/JEIT250059
Abstract:
  Objective  Accurate and real-time detection of pedestrians and vehicles is essential for autonomous driving at the edge. However, deep learning-based object detection algorithms are often challenging to deploy in edge environments due to their high computational demands and complex parameter structures. To address these limitations, this study proposes a soft-hard coordination strategy. A lightweight neural network model, Yolo Model Adaptation Network (YOMANet), is designed, and a corresponding neural network accelerator, YOMANet Accelerator (YOMANet-Accel), is implemented on a heterogeneous Field-Programmable Gate Array (FPGA) platform. This system enables efficient algorithm acceleration for pedestrian and vehicle detection in edge-based autonomous driving scenarios.  Methods  The lightweight backbone of YOMANet adopts MobileNetv2 to reduce the number of network parameters. The neck network incorporates the Spatial Pyramid Pooling (SPP) and Path Aggregation Network (PANet) structures from YOLOv4 to expand the receptive field and accommodate targets of varying sizes. Depthwise separable convolution replaces standard convolution, thereby reducing training complexity and improving convergence speed. To enhance detail extraction, the Normalization-based Attention Module (NAM) is integrated into the head network, allowing suppression of irrelevant feature weights. For deployment on a FPGA platform, parallel computing and data storage schemes are designed. The parallel computing strategy adopts a loop blocking method to reorder inner and outer loops, enabling access to different output array elements through adjacent loop layers and facilitating parallel processing of output feature map pixels. Multiply-add trees are implemented in the Processing Engine (PE) to support efficient task allocation and operation scheduling. A double-buffer mechanism is introduced in the data storage scheme to increase data reuse, minimize transmission latency, and enhance system throughput. In addition, int8 quantization is applied to both weight parameters and activation functions, reducing the overall parameter size and accelerating parallel computation.  Results and Discussions  Experimental results on the training platform indicate that YOMANet achieves the inference speed characteristic of lightweight models while maintaining the detection accuracy of large-scale models, thereby improving overall detection performance (Fig. 12, Table 2). The ablation study demonstrates that the integration of MobileNetv2 and depthwise separable convolution significantly reduces the number of model parameters. Embedding the NAM attention mechanism does not noticeably increase model size but enhances detail extraction and improves detection of small targets (Table 3). Compared with other lightweight algorithms, the enhanced YOMANet shows improved detail extraction and superior detection of small and occluded targets, with substantially lower false and missed detection rates (Fig. 13). Results on the accelerator platform reveal that quantization has minimal effect on accuracy while substantially reducing model size, supporting deployment on resource-constrained edge devices (Table 4). When deployed on the FPGA platform, YOMANet retains detection accuracy comparable to GPU/CPU platforms, while power consumption is reduced by an order of magnitude, meeting the efficiency requirements for edge deployment (Fig. 14). Compared with related accelerator designs, YOMANet-Accel achieves competitive throughput and the highest Digital Signal Processing (DSP) efficiency, demonstrating the effectiveness of the proposed parallel computing and storage schemes in utilizing FPGA resources (Table 5).  Conclusions  Experimental results demonstrate that YOMANet achieves high detection accuracy and fast inference speed on the training platform, with enhanced performance for small and occluded targets, leading to a reduced missed detection rate. When deployed on the FPGA platform, YOMANet-Accel achieves an effective balance between detection performance and resource efficiency, supporting real-time pedestrian and vehicle detection in edge computing scenarios.
FSG: Feature-level Semantic-aware Guidance for multi-modal Image Fusion Algorithm
ZHANG Mei, JIN Ye, ZHU Jinhui, HE Lin
2025, 47(8): 2909-2918. doi: 10.11999/JEIT250042
Abstract:
  Objective  Multimodal vision techniques offer greater advantages than unimodal ones in autonomous driving scenarios. Fused images from multiple modalities enhance salient radiation information from targets while preserving background texture and detail. Furthermore, such fused images improve the performance of downstream visual tasks, i.e., semantic segmentation, compared with visible-light images alone, thereby enhancing the decision accuracy of automated driving systems. However, most existing fusion algorithms prioritize visual quality and standard evaluation metrics, often overlooking the requirements of downstream tasks. Although some approaches attempt to integrate task-specific guidance, they are constrained by weak interaction between semantic priors and fusion processes, and fail to address cross-modal feature variability. To address these limitations, this study proposes a multimodal image fusion algorithm, termed Feature-level Semantic-aware Guidance (FSG), which leverages feature-level semantic information from segmentation networks to guide the fusion process. The proposed method aims to enhance the utility of fused images in advanced vision tasks by strengthening the alignment between semantic understanding and feature integration.  Methods  The proposed algorithm adopts a parallel fusion framework integrating a fusion network and a segmentation network. Feature-level semantic prior knowledge from the segmentation network guides the fusion process, aiming to enhance the semantic richness of the fused image and improve performance in downstream visual tasks. The overall architecture comprises a fusion network, a segmentation network, and a feature interaction mechanism connecting the two. Infrared and visible images serve as inputs to the fusion network, whereas only visible images, which are rich in texture and detail, are used as inputs to the segmentation network. The fusion network uses a dual-branch structure for modality-specific feature extraction, with each branch containing two Adaptive Gabor convolution Residual (AGR) modules. A Multimodal Spatial Attention Fusion (MSAF) module is incorporated to effectively integrate features from different modalities. In the reconstruction phase, semantic features from the segmentation network are combined with image features from the fusion network via a Dual Feature Interaction (DFI) module, enhancing semantic representation before generating the final fused image.  Results and Discussions  This study includes fusion experiments and joint segmentation task experiments. For the fusion experiments, the proposed method is compared with seven state-of-the-art algorithms: DenseFuse, DIDFuse, U2Fusion, TarDal, SeAFusion, DIVFusion, and CDDFuse, across three datasets: MFNet, M3FD, and RoadScene. Both subjective and objective evaluations are conducted. For subjective evaluation, the fused images generated by each method are visually compared. For objective evaluation, six metrics are employed: Mutual Information (MI), Visual Information Fidelity (VIF), Average Gradient (AG), Sum of Correlation Differences (SCD), Structural Similarity Index Measure (SSIM), and Gradient-based Similarity Measurement (QAB/F). The results show that the proposed method performs consistently well across all datasets, effectively preserves complementary information from infrared and visible images, and achieves superior scores on all evaluation metrics. In the joint segmentation experiments, comparisons are made on the MFNet dataset. Subjective evaluation is presented through semantic segmentation visualizations, and objective evaluation uses Intersection over Union (IoU) and mean IoU (mIoU) metrics. The segmentation results produced by the proposed method more closely resemble ground truth labels and achieve the highest or second-highest IoU scores across all classes. Overall, the proposed method not only yields improved visual fusion results but also demonstrates clear advantages in downstream segmentation performance.  Conclusions  This study proposes an FSG strategy for multimodal image fusion networks, designed to fully leverage semantic information to improve the utility of fused images in downstream visual tasks. The method accounts for the variability among heterogeneous features and integrates the segmentation and fusion networks into a unified framework. By incorporating feature-level semantic information, the approach enhances the quality of the fused images and strengthens their performance in segmentation tasks. The proposed DFI module serves as a bridge between the segmentation and fusion networks, enabling effective interaction and selection of semantic and image features. This reduces the influence of feature variability and enriches the semantic content of the fusion results. In addition, the proposed MSAF module promotes the complementarity and integration of features from infrared and visible modalities while mitigating the disparity between them. Experimental results demonstrate that the proposed method not only achieves superior visual fusion quality but also outperforms existing methods in joint segmentation performance.
A Fake News Detection Approach Enhanced by Multi-Source Feature Fusion
HU Ze, CHEN Zhinan, YANG Hongyu
2025, 47(8): 2919-2934. doi: 10.11999/JEIT250041
Abstract:
  Objective  News exhibits multidimensional complexity, comprising structural, temporal, and content features. Structural features are reflected in the propagation path, depth, and breadth. Fake news often exhibits distinctive structural patterns, such as rapid diffusion through a limited number of “opinion leader” nodes or the formation of densely connected propagation clusters. Temporally, fake news tends to spread quickly within short timeframes, characterized by unusually high dissemination speeds and elevated interaction rates. Content features include the information conveyed in headlines and body text; fake news often contains sensationalized headlines, emotive language, inaccurate data, or fabricated claims. Detection models that rely solely on a single feature type often demonstrate limited discriminative performance. Therefore, capturing the hierarchical and heterogeneous attributes of news is critical to improving detection accuracy and remains a major focus of ongoing research. Current approaches predominantly emphasize content features, with limited incorporation of structural characteristics. Although Graph Neural Networks (GNN) have been employed to model propagation structures, their integration of content and temporal information remains inadequate. To address these limitations, this study proposes a fake news detection approach based on multi-source feature fusion, which enables more comprehensive feature representation and substantially enhances detection performance.  Methods  To enhance fake news detection performance, this study proposes a Multi-Source Feature Fusion Enhancement (MSFFE) approach that extracts features from three sources: structural, temporal, and content, and integrates them using an adaptive fusion mechanism. This mechanism dynamically adjusts the weight of each feature type to generate a unified representation with comprehensive expressiveness. The model comprises three core components: propagation tree encoding, multi-source feature extraction, and news classification (Fig. 2). In the propagation tree encoding component, a GNN is employed to represent the news propagation structure. Specifically, the GraphSAGE (Graph SAmple and aggreGatE) algorithm is used to aggregate node information from the propagation tree to the root node, enabling efficient capture of local structural patterns and temporal dynamics. Compared with conventional GNN methods, GraphSAGE improves scalability for large-scale graphs and reduces computational complexity by avoiding full-graph updates. In the multi-source feature extraction component, the model extracts structural, temporal, and content features. For structural features, the encoded propagation tree nodes are organized into a hypergraph. A hypergraph attention mechanism is then applied: first, hyperedge representations are updated via node-level attention; next, node representations are updated via hyperedge-level attention; and finally, structural-level features are obtained. For temporal features, node activity across multiple time windows is modeled using time-scale expansion and compression. A time decay attention mechanism is introduced to extract multi-scale temporal features, which are then fused into a unified temporal representation. For content features, the root node’s associated text is processed using a multi-head self-attention mechanism to capture semantic information, yielding content-level features. After extracting the three feature types, an adaptive multi-source fusion mechanism integrates them into a final news representation. This representation is passed through a fully connected layer and activation function for classification. The fully connected layer applies a linear transformation using a learnable weight matrix and bias term to produce predicted scores for each news instance. During training, model parameters are optimized to maximize classification accuracy. The final output is mapped to a probability in [0,1] using a Sigmoid activation function, indicating the likelihood of the news being classified as “true” or “fake.” A threshold of 0.5 is used for binary classification: probabilities above 0.5 are labeled “fake,” and those below are labeled “true.”  Results and Discussions  As shown in Table 3, the ablation experiments demonstrate that incorporating features from different sources into the base model significantly improves fake news detection accuracy. This finding confirms the effectiveness of the core components in the proposed approach. The integration of multi-source features enhances the overall detection performance, highlighting the advantage of the fusion mechanism in identifying fake news. Comparative experiments further support these results. As shown in (Table 2), the proposed approach outperforms existing approaches on both the Politifact and Gossipcop datasets. On the Politifact dataset, it improves accuracy by 3.64% and the F1 score by 3.41% compared with the State-Of-The-Art (SOTA) method Robust Trust Evaluation Architecture (RTRUST). On the Gossipcop dataset, the accuracy and F1 score increase by 0.55% and 0.56%, respectively. These improvements are attributed to the approach’s ability to effectively model high-order structural features and integrate temporal and content features, resulting in more comprehensive and discriminative feature representations.  Conclusions  Experimental results demonstrate that the proposed approach effectively extracts and fuses multi-source features, substantially improving the performance of fake news detection. By enhancing the model’s ability to represent structural, temporal, and content characteristics, the approach contributes to more accurate classification. This has the potential to mitigate the societal consequences of fake news, including public misinformation, reputational damage to organizations, and policy misjudgments.
Research on Deep Learning Analysis Model of Sleep ElectroEncephalography Signals for Anxiety Improvement
HUANG Chen, MA Yaolong, ZHANG Yan, WANG Shihui, YANG Chao, SONG Jianhua, CHEN Kansong, YANG Weiping
2025, 47(8): 2935-2944. doi: 10.11999/JEIT241123
Abstract:
  Objective  Anxiety is a common emotional disorder, characterized by excessive worry and fear, which negatively affects mental, physical, and social well-being. A bidirectional relationship exists between anxiety and sleep, poor sleep quality worsens anxiety symptoms, and anxiety disrupts normal sleep patterns. ElectroEncephaloGraphy (EEG) signals provide a non-invasive and informative means to investigate brain activity, making them useful for studying the neurophysiological underlying this association. However, conventional EEG analysis methods often fail to capture the complex, multiscale features needed to assess anxiety modulation during sleep. This study proposes an Improved Feature Pyramid Network (IFPN) model to enhance EEG analysis in sleep settings, with the aim of improving the detection and interpretation of anxiety-related brain activity.  Methods  The IFPN model comprises a preprocessing module, feature extraction module, and classification module, each being optimized for analyzing EEG signals related to anxiety during sleep. The preprocessing module applies Z-score normalization to EEG signals from individuals with anxiety to standardize signal amplitude across channels. Noise artifacts are reduced using a denoising process based on a feature pyramid network. Preprocessed signals are then converted into brain entropy topographies using Singular Spectral Entropy (SSE), which quantifies signal complexity. These entropy maps are processed by the IFPN backbone, which incorporates convolutional layers, SSE-guided upsampling, and lateral connections to enable multiscale feature fusion. The resulting features are input to a modified ResNet-50 network for classification, with SSE-based regularization applied to enhance model robustness and accuracy. The model is evaluated using two independent EEG datasets: a sleep deprivation dataset and a cognitive-state EEG dataset, both comprising participants with levels of anxiety.  Results and Discussions  The experimental results demonstrate that the IFPN model improves the detection of anxiety-related features in EEG signals during sleep. Spectral power analysis shows a significant reduction in β-band power after sleep, reflecting decreased hyperarousal commonly associated with anxiety. In Dataset 1, β-band power declines from 16% to 13% (p < 0.01), and in Dataset 2, from 19.5% to 15% (p < 0.05). This is accompanied by an increase in the θ/β power ratio, suggesting a shift toward a more relaxed neural state post-sleep. The IFPN model achieves 85% accuracy in identifying severe anxiety, outperforming baseline methods, which reach 78%. This improvement results from the model’s capacity to integrate multiscale features and selectively emphasize anxiety-related patterns, supporting more accurate classification of elevated anxiety states.  Conclusions  This study proposes an IFPN model for EEG analysis during sleep, with a focus on detecting anxiety-related neural activity. Unlike traditional approaches that rely on shallow architectures or frequency- limited metrics, the IFPN model addresses the multiscale and spatially heterogeneous nature of brain activity associated with anxiety. By incorporating SSE as a nonlinear dynamic feature, the model captures subtle regional and frequency-specific variations in EEG complexity. SSE functions as both a signal complexity metric and a functional biomarker of neural disorganization linked to anxiety. Integrated with the multiscale fusion capability of the feature pyramid network, SSE enhances the model’s ability to extract salient spatiotemporal features relevant to anxiety states. Experimental results show that the IFPN model outperforms existing methods in both accuracy and robustness, particularly in identifying severe anxiety, where conventional models often struggle due to noise and reduced discriminative performance. These findings highlight the model’s potential utility in clinical assessment of anxiety during sleep.
Circuit and System Design
A 2 pJ/bit, 4×112 Gbps PAM4 linear driver for MZM in LPO Application
ZHANG Shu’an, ZHU Wenrui, GU Yuandong, LEI Meng, ZHANG Jianling
2025, 47(8): 2945-2952. doi: 10.11999/JEIT250176
Abstract:
  Objective  The rapid increase in data transmission demands, driven by big data, cloud computing, and Artificial Intelligence (AI), requires advanced optical module technologies capable of supporting higher data rates, such as 800 Gbps. Conventional optical modules depend on power-intensive Digital Signal Processors (DSPs) for signal compensation, which increases cost, complexity, and energy consumption. This study addresses these limitations by proposing a Linear Driver Pluggable Optics (LPO) solution that eliminates the DSP while preserving high performance. The primary objective is to design a low-power, high-efficiency Mach–Zehnder Modulator (MZM) driver using 130 nm SiGe BiCMOS technology for 400 Gbps PAM4 applications. The design integrates Continuous-Time Linear Equalization (CTLE) and gain control to support reliable, cost-effective, and energy-efficient data transmission.  Methods  The proposed quad-channel MZM driver adopts a two-stage architecture: a merged Continuous-Time Linear Equalizer (CTLE) and Variable Gain Amplifier (VGA) stage (Stage 1), and an output driver (OUTDRV) stage (Stage 2). By integrating CTLE and VGA functions (Fig. 3), the design removes the pre-driver stage, improves current reuse, and enhances drive capability. Stage 1 employs a Gilbert cell-based core amplifier (Fig. 5a) with programmable peaking via Re and Ce, enabling a transfer function with adjustable gain (\begin{document}$ \eta $\end{document}) and peaking characteristics (Eq. 1). A novel low-frequency gain adjustment branch (Fig. 6) mitigates nonlinearity induced by conductor loss (Fig. 4), resulting in a flattened frequency response (Eq. 2). Stage 2 uses a cascode open-drain output structure to achieve a 3 Vppd swing at 56 Gbaud while reducing power consumption. Simulations and measurements confirm the design’s performance, with key metrics including S-parameters, Total Harmonic Distortion (THD), and Transmitter Dispersion Eye Closure for PAM4 (TDECQ).  Results and Discussions  The driver achieves a maximum gain of 19.49 dB with 9.2 dB peaking and a 12.57 dB gain control range. Measured S-parameters (Fig. 9) confirm the 19.49 dB gain, 47 GHz bandwidth, and a 4.4 dB programmable peaking range. The low-frequency adjustment circuit reduces gain by 1.6 dB below 3 GHz (Fig. 9c), effectively compensating for distortion caused by the skin effect. THD remains below 3.5% across input swings of 300~800 mVppd (Fig. 10). Eye diagrams (Fig. 11) demonstrate 56 Gbaud PAM4 operation, achieving a 3 Vppd output swing with TDECQ below 2.57 dB. The driver achieves a power efficiency of 2 pJ/bit (225.23 mW per channel), outperforming previous designs (Table 1). The use of a single 3.3 V supply eliminates the need for external DC-DC converters, facilitating system integration. Compared with recent drivers [11,14–16], this work demonstrates the highest data rate (112 Gb/s via PAM4) implemented in a mature 130 nm process while maintaining the lowest power consumption per bit.  Conclusions  This study presents a high-performance, energy-efficient MZM driver designed for LPO-based 400 Gbps optical modules. Key contributions include the merged CTLE–VGA architecture for optimized current reuse, a low-frequency gain adjustment technique that mitigates skin effect distortion, and a cascode output stage that achieves high swing and linearity. Measured results are consistent with simulations, confirming 19.49 dB gain, 3 Vppd output swing, and 2 pJ/bit energy efficiency. The elimination of DSPs, compatibility with cost-effective BiCMOS technology, and improved power performance highlight the driver’s potential for deployment in next-generation data centers and high-speed optical interconnects.
Calculation and Testing Method of Gap Impedance for X-Band Klystron Multi-Gap Output Cavity
GUO Xin, ZHANG Zhiqiang, GU Honghong, LIANG Yuan, SHEN Bin
2025, 47(8): 2953-2962. doi: 10.11999/JEIT250002
Abstract:
  Objective  From a structural standpoint, the klystron is a narrowband device that realizes beam-wave interaction through a sequence of independent resonant cavities. Advances in simulation techniques and computational methods for klystrons, as well as advancements in electric vacuum materials, manufacturing processes, and related technologies, have continuously enhanced their power and bandwidth performance. For example, high-power wideband klystrons are now widely applied in radar and communication systems. X-band wideband klystrons have achieved megawatt-level output pulse power, offering substantial utility in a range of radar applications. To satisfy the bandwidth demands of microwave electronic systems, research into wideband klystron technologies is increasingly prioritized. Expanding the bandwidth of the klystron output section is therefore a critical technology in the development of broadband klystrons.  Methods  Current approaches to expanding the frequency bandwidth of klystrons primarily rely on techniques such as staggered tuning of resonant cavities, integration of waveguide filters at the output cavity, and utilization of overlapping mode configurations in Multi-Gap Output Cavity (MGOC). The output section of high-frequency structures typically adopts a double-gap coupled cavity, for which gap impedance testing methods are relatively well established. Building on this foundation, a triple-gap coupled output cavity structure is developed, enabling further bandwidth enhancement. The flatness of the gap impedance across the operating band of an MGOC directly determines the gain and bandwidth performance of the klystron. Therefore, accurate calculation and testing of gap impedance are essential. This study proposes a method for calculating MGOC impedance based on cavity equivalent circuit theory. The MGOC is modeled as a resonant circuit comprising capacitive and inductive elements, and the gap impedance matrix is derived using the mesh current method. Based on microwave network theory, a corresponding experimental method for measuring MGOC impedance is also proposed. By analyzing the phase of the reflection coefficient at the output coupling port under various conditions—including all gaps open, single-gap short-circuits, and localized perturbations at individual gaps—the gap impedance within the frequency band of the cold test sample is determined. Using this theoretical framework, an X-band four-gap output cavity structure is designed. The gap impedance of the fabricated sample is measured to verify the validity of the proposed method.  Results and Discussions  The form of the MGOC impedance derived using equivalent circuit theory is presented (Equation 6). The experimental model of the X-band four-gap output cavity is constructed, and the optimized electrical parameters for each cavity are listed (Table 1). The calculated frequency bandwidth over which the internal impedance exceeds 3300 Ω in the X-band reaches 1200 MHz (Fig. 3). This represents a 30% increase compared to the triple-gap cavity and a twofold improvement over the double-gap cavity, meeting the expected design performance. The structural dimensions of the X-band four-gap output cavity are summarized (Table 2). A schematic of the MGOC modeled as an (n+1)-port microwave network is shown (Fig. 5). By solving for the impedance at the output coupling port, the relationship between the output port and other ports is obtained (Equation 13). The impedance for the all-gaps-open condition is given (Equation 14). The impedance for the case where a single gap is short-circuited and all other gaps remain open is derived (Equation 15), and the impedance corresponding to a perturbation in any single gap capacitance, with the remaining gaps open, is expressed (Equation 16). Based on transmission line theory, the impedance at each gap is calculated using these three sets of expressions (Equations 26~29). Using this theoretical framework, the X-band four-gap cavity prototype is fabricated and tested. To support structural optimization, the four fundamental mode field distributions of the four-gap cavity are first analyzed (Figs. 6 and 7). The parameters obtained via the equivalent circuit method are refined and adjusted for the cold test component. The final measured impedance distribution of the X-band four-gap cavity is presented (Fig. 9). The measured bandwidth, with a gap impedance exceeding 3400 Ω, reaches 1185 MHz, which closely agrees with the calculated result based on the equivalent circuit model.  Conclusions  This study proposes a method for calculating the gap impedance of a klystron MGOC based on the mesh current approach within the cavity equivalent circuit framework. A design scheme for an X-band four-gap output cavity is presented, and its impedance bandwidth is compared with those of triple-gap and double-gap cavities. The calculated bandwidth of the four-gap cavity is 33% greater than that of the triple-gap design and twice that of the double-gap counterpart. Building on this, a measurement method for MGOC gap impedance is developed using microwave network theory. Cold test experiments are conducted on an X-band four-gap cavity prototype. The measured results closely match the theoretical predictions, with the impedance exceeding 3400 Ω across nearly 1.2 GHz of bandwidth. Moreover, the proposed cold measurement technique enables the estimation of mutual impedance between cavity gaps by measuring impedance with any two gaps in a short-circuited state. This capability offers important insights into the coupling behavior among cavity modes. These findings provide a robust theoretical and experimental foundation for advancing broadband klystron technologies.