电子与信息学报

2026, 48(4): 1-1.

[Abstract](240) [FullText HTML] (127) [PDF 689KB](71)

Abstract:

2026, 48(4): 1-4.

[Abstract](194) [FullText HTML] (137) [PDF 287KB](45)

Abstract:

AutoPenGPT: Drift-Resistant Penetration Testing Driven by Search-Space Convergence and Dependency Modeling

HUANG Weigang, FU Lirong, LIU Peiyu, DU Linkang, YE Tong, XIA Yifan, WANG Wenhai

2026, 48(4): 1401-1411. doi: 10.11999/JEIT250873

[Abstract](858) [FullText HTML] (455) [PDF 3167KB](104)

Abstract:
Objective Industrial Control Systems (ICS) are widely deployed in critical sectors and often contain long-standing vulnerabilities due to strict availability requirements and limited patching opportunities. The increasing exposure of external management and access infrastructure has expanded the attack surface and allows adversaries to pivot from boundary components into fragile production networks. Continuous penetration testing of these components is essential but remains costly and difficult to scale when carried out manually. Recent work examines Large Language Models (LLMs) for automated penetration testing; however, existing systems often experience strategy drift and intention drift, which produce incoherent testing behaviors and ineffective exploitation chains. Methods This study proposes AutoPenGPT, a multi-agent framework for automated Web security testing. AutoPenGPT uses an adaptive exploration-space convergence mechanism that predicts likely vulnerability types from target semantics and constrains LLM-driven testing through a dynamically updated payload knowledge base. To reduce intention drift in multi-step exploitation, a dependency-driven strategy module rewrites historical feedback, models step dependencies, and generates coherent, executable strategies in a closed-loop workflow. A semi-structured prompt embedding scheme is also developed to support heterogeneous penetration testing tasks while preserving semantic integrity. Results and Discussions AutoPenGPT is evaluated on Capture-the-Flag (CTF) benchmarks and real-world ICS and Web platforms. On CTF datasets, it achieves 97.62% vulnerability-type detection accuracy and an 80.95% requirement completion rate, exceeding state-of-the-art tools by a wide margin. In real-world deployments, it reaches approximately 70% requirement completion and identifies seven previously undisclosed vulnerabilities, demonstrating practical effectiveness. Conclusions The contributions are threefold. (1) Strategy drift and intention drift in LLM-driven penetration testing are examined and addressed through adaptive exploration and dependency-aware strategy mechanisms that stabilize long-horizon testing behaviors. (2) AutoPenGPT is designed and implemented as a multi-agent penetration testing system that integrates semantic vulnerability prediction, closed-loop strategy generation, and semi-structured prompt embedding. (3) Extensive evaluation on CTF and real-world ICS and Web platforms confirms the effectiveness and practicality of the system, including the discovery of previously unknown vulnerabilities.

Resilient Average Consensus for Second-Order Multi-Agent Systems: Algorithms and Application

FANG Chongrong, HUAN Yuehui, ZHENG Wenzhe, BAO Xianchen, LI Zheng

2026, 48(4): 1412-1423. doi: 10.11999/JEIT251155

[Abstract](521) [FullText HTML] (363) [PDF 4881KB](87)

Abstract:
Objective Multi-Agent Systems (MASs) are central to collaborative tasks in dynamic environments, and consensus algorithms are essential for applications such as formation control. However, MASs are vulnerable to misbehaviors (e.g., malicious attacks or accidental faults) that disrupt consensus and degrade system performance. Existing resilient consensus methods for first-order systems are insufficient for second-order MASs, where both position and velocity states must be considered. This study develops a resilient average consensus framework for second-order MASs that maintains accurate collaboration under misbehaviors. The main challenges are distributed error detection and compensation for two-dimensional state errors (position and velocity) using one-dimensional acceleration inputs. Methods The study derives sufficient conditions for second-order average consensus under misbehaviors using graph theory and Lyapunov stability analysis. The system is modeled as an undirected graph

\begin{document}$ \mathcal{G}=(\mathcal{V},\mathcal{E}) $\end{document}

, and agents follow double-integrator dynamics. Two algorithms are proposed. Finite Input-Errors Detection-Compensation (FIDC): For finite control input errors, Detection Strategies 1 and 2 use two-hop communication to detect discrepancies in neighbors’ states or control inputs. Compensation Scheme 1 generates input sequences that satisfy the consensus conditions in Corollary 1. Infinite Attack Detection-Compensation (IADC): For infinite errors in control inputs, velocities, and positions, the detection strategies are extended to identify falsified data. Compensation Schemes 2 and 3 reduce the effect of these errors, and an exponentially decaying error bound isolates persistent attackers. The algorithms are fully distributed and require no global information. Results and Discussions Simulations on a 10-agent network demonstrate the effectiveness of the algorithms. Under FIDC, agents reach exact average consensus despite finite input errors caused by malicious or faulty agents (Fig. 3). IADC ensures consensus among normal agents after isolating malicious agents that exceed the error bound (Fig. 4). Experiments on a multi-robot platform confirm resilience to real-world faults (e.g., actuator failures) and attacks (e.g., false data injection). In fault scenarios, FIDC reduces the deviation of the formation center from 180 mm to 34 mm (Fig. 6). Under attacks, IADC isolates malicious robots, allowing normal agents to converge correctly (Fig. 7). Analyses of relaxed Assumption 1 (non-adjacent misbehaving agents) show that Detection Strategy 3 and majority voting address certain connected malicious topologies (Fig. 2), although complex cases need further study. Conclusions This work presents a resilient average consensus framework for second-order MASs. Theoretically, the study provides sufficient conditions for consensus under misbehaviors. The FIDC and IADC algorithms enable distributed detection, compensation, and isolation of errors. Simulations and physical experiments verify that the methods achieve accurate average consensus under both finite and infinite errors. Future research will explore extensions to directed networks, time-varying topologies, and higher-dimensional systems.

Data-Driven Secure Control for Cyber-Physical Systems under Denial-of-Service Attacks: An Online Mode-Dependent Switching-Q-Learning Algorithm

ZHANG Ruifeng, YANG Rongni

2026, 48(4): 1424-1433. doi: 10.11999/JEIT250746

[Abstract](639) [FullText HTML] (307) [PDF 2095KB](102)

Abstract:
Objective The open network architecture of Cyber-Physical Systems (CPSs) enables flexibility and scalability, but also increases vulnerability to cyber-attacks. In particular, Denial-of-Service (DoS) attacks represent a predominant threat, causing packet loss and performance degradation by channel jamming. CPSs under dormant and active DoS attacks can be modeled as dual-mode switched systems with stable and unstable subsystems, respectively. Therefore, switched system theory provides a promising framework for secure control design with high degrees of freedom and reduced conservatism. However, exact modeling of practical CPSs remains difficult due to attacks and noise. Although Q-learning-based control shows potential for unknown CPSs, a critical gap persists for switched systems with unstable modes, especially in establishing an evaluable stability criterion. Hence, learning-based secure control design and an evaluable security criterion for unknown CPSs under DoS attacks remain open problems. Methods An online mode-dependent switching-Q-learning algorithm is proposed to study data-driven secure control and an evaluable criterion for unknown CPSs under DoS attacks. First, CPSs under dormant and active DoS attacks are transformed into switched systems with stable and unstable subsystems, respectively. Then, the optimal control problem of the value function is addressed for model-based switched systems by constructing a Generalized Switching Algebraic Riccati Equation (GSARE) and deriving the corresponding mode-dependent optimal security controller. The existence and uniqueness of the GSARE solution are proved. Based on these results, a data-driven optimal security control law is developed through a novel online mode-dependent switching-Q-learning algorithm. Finally, by using the learned control gains and parameter matrices, a data-driven evaluable security criterion related to attack frequency and duration is established under switching and subsystem constraints. Results and Discussions Comparative experiments using a wheeled robot are conducted to verify the efficiency and advantages of the proposed methods. First, comparison between the model-based result (Theorem 1) and the data-driven result (Algorithm 1) shows that the optimal control gains and parameter matrices under threshold errors are successfully obtained from both the GSARE and the proposed learning algorithm, as indicated by the iterative curves (Fig. 2 and Fig. 3). Meanwhile, the tracking errors of the CPS converge to zero under the proposed data-driven controller (Fig. 5), ensuring exponential stability and verifying algorithm effectiveness. Second, the learning process curves (Fig. 4) show that although the initial learned control gain is not stabilizing, Algorithm 1 still converges to an optimal stabilizing gain. This result reduces conservatism compared with existing Q-learning approaches that require stabilizing initial gains. Third, comparison between the proposed data-driven evaluable security criterion (Theorem 2) and existing criteria shows that, even when the learned switching parameters do not satisfy conventional dwell-time constraints, the proposed criterion yields attack frequency and duration bounds under new switching and subsystem constraints. As shown in Tab. 1, the proposed criterion is less conservative than existing evaluable criteria. Finally, applying the learned controller and obtained DoS constraints to robot tracking control demonstrates faster and more accurate trajectory tracking compared with existing Q-learning controllers (Fig. 6 and Fig. 7), confirming the advantages of the proposed approach. Conclusions Based on switched system theory and learning-based control, an online mode-dependent switching-Q-learning algorithm and a corresponding evaluable security criterion are presented for unknown CPSs under DoS attacks. (1) By representing CPSs under dormant and active DoS attacks as switched systems with stable and unstable subsystems, respectively, the security problem is transformed into a stabilization problem with increased design freedom and reduced conservatism. (2) A novel online mode-dependent switching-Q-learning algorithm is developed for unknown switched systems with unstable modes, and comparative experiments show reduced conservatism relative to existing Q-learning methods. (3) A data-driven evaluable security criterion is established to characterize attack frequency and duration under switching and subsystem constraints, demonstrating lower conservatism than existing criteria based on single-subsystem or dwell-time constraints.

A Learning-Based Security Control Method for Cyber-Physical Systems Based on False Data Detection

MIAO Jinzhao, LIU Jinliang, SUN Le, ZHA Lijuan, TIAN Engang

2026, 48(4): 1434-1443. doi: 10.11999/JEIT250537

[Abstract](944) [FullText HTML] (326) [PDF 1117KB](155)

Abstract:
Objective Cyber-Physical Systems (CPS) constitute the backbone of critical infrastructures and industrial applications, but the tight coupling of cyber and physical components renders them highly susceptible to cyberattacks. False data injection attacks are particularly dangerous because they compromise sensor integrity, mislead controllers, and can trigger severe system failures. Existing control strategies often assume reliable sensor data and lack resilience under adversarial conditions. Furthermore, most conventional approaches decouple attack detection from control adaptation, leading to delayed or ineffective responses to dynamic threats. To overcome these limitations, this study develops a unified secure learning control framework that integrates real-time attack detection with adaptive control policy learning. By enabling the dynamic identification and mitigation of false data injection attacks, the proposed method enhances both stability and performance of CPS under uncertain and adversarial environments. Methods To address false data injection attacks in CPS, this study proposes an integrated secure control framework that combines attack detection, state estimation, and adaptive control strategy learning. A sensor grouping-based security assessment index is first developed to detect anomalous sensor data in real time without requiring prior knowledge of attacks. Next, a multi-source sensor fusion estimation method is introduced to reconstruct the system’s true state, thereby improving accuracy and robustness under adversarial disturbances. Finally, an adaptive learning control algorithm is designed, in which dynamic weight updating via gradient descent approximates the optimal control policy online. This unified framework enhances both steady-state performance and resilience of CPS against sophisticated attack scenarios. Its effectiveness and security performance are validated through simulation studies under diverse false data injection attack settings. Results and Discussions Simulation results confirm the effectiveness of the proposed secure adaptive learning control framework under multiple false data injection attacks in CPS. As shown in Fig. 1, system states rapidly converge to steady values and maintain stability despite sensor attacks. Fig. 2 demonstrates that the fused state estimator tracks the true system state with greater accuracy than individual local estimators. In Fig. 3, the compensated observation outputs align closely with the original, uncorrupted measurements, indicating precise attack estimation. Fig. 4 shows that detection indicators for sensor groups 2–5 increase sharply during attack intervals, while unaffected sensors remain near zero, verifying timely and accurate detection. Fig. 5 further confirms that the estimated attack signals closely match the true injected values. Finally, Fig. 6 compares different control strategies, showing that the proposed method achieves faster stabilization and smaller state deviations. Together, these results demonstrate robust control, accurate state estimation, and real-time detection under unknown attack conditions. Conclusions This study addresses secure perception and control in CPS under false data injection attacks by developing an integrated adaptive learning control framework that unifies detection, estimation, and control. A sensor-level anomaly detection mechanism is introduced to identify and localize malicious data, substantially enhancing attack detection capability. The fusion-based state estimation method further improves reconstruction accuracy of true system states, even when observations are compromised. At the control level, an adaptive learning controller with online weight adjustment enables real-time approximation of the optimal control policy without requiring prior knowledge of the attack model. Future research will extend the proposed framework to broader application scenarios and evaluate its resilience under diverse attack environments.

A Two-Stage Framework for CAN Bus Attack Detection by Fusing Temporal and Deep Features

TAN Mingming, ZHANG Heng, WANG Xin, LI Ming, ZHANG Jian, YANG Ming

2026, 48(4): 1444-1453. doi: 10.11999/JEIT250651

[Abstract](856) [FullText HTML] (487) [PDF 1955KB](78)

Abstract:
Objective The Controller Area Network (CAN), the de facto standard for in-vehicle communication, is inherently vulnerable to cyberattacks. Existing Intrusion Detection Systems (IDSs) face a fundamental trade-off: achieving fine-grained classification of diverse attack types often requires computationally intensive models that exceed the resource limitations of on-board Electronic Control Units (ECUs). To address this problem, this study proposes a two-stage attack detection framework for the CAN bus that fuses temporal and deep features. The framework is designed to achieve both high classification accuracy and computational efficiency, thereby reconciling the tension between detection performance and practical deployability. Methods The proposed framework adopts a “detect-then-classify” strategy and incorporates two key innovations. (1) Stage 1: Temporal Feature-Aware Anomaly Detection. Two custom features are designed to quantify anomalies: Payload Data Entropy (PDE), which measures content randomness, and ID Frequency Mean Deviation (IFMD), which captures behavioral deviations. These features are processed by a Bidirectional Long Short-Term Memory (BiLSTM) network that exploits contextual temporal information to achieve high-recall anomaly detection. (2) Stage 2: Deep Feature-Based Fine-Grained Classification. Triggered only for samples flagged as anomalous, this stage employs a lightweight one-dimensional ParC1D-Net. The core ParC1D Block (Fig. 4) integrates depthwise separable one-dimensional convolution, Squeeze-and-Excitation (SE) attention, and a Feed-Forward Network (FFN), enabling efficient feature extraction with minimal parameters. Stage 1 is optimized using BCEWithLogitsLoss, whereas Stage 2 is trained with Cross-Entropy Loss. Results and Discussions The efficacy of the proposed framework is evaluated on public datasets. (1) State-of-the-art performance. On the Car-Hacking dataset (Table 4), an accuracy and F1-score of 99.99% are achieved, exceeding advanced baselines. On the more challenging Challenge dataset (Table 5), superior accuracy (99.90%) and a competitive F1-score (99.70% are also obtained. (2) Feature contribution analysis. Ablation studies (Tables 6 and 7) confirm the critical role of the proposed features. Removal of the IFMD feature results in the largest performance reduction, highlighting the importance of behavioral modeling. A synergistic effect is observed when PDE and IFMD are applied together. (3) Spatiotemporal efficiency. The complete model remains lightweight at only 0.39 MB. Latency tests (Table 8, 9) demonstrate real-time capability, with average detection times of 0.62 ms on a GPU and 0.93 ms on a simulated CPU (batch size = 1). A system-level analysis (Section 3.5.4) further shows that the two-stage framework is approximately 1.65 times more efficient than a single-stage model in a realistic sparse-attack scenario. Conclusions This study establishes the two-stage framework as an effective and practical solution for CAN bus intrusion detection. By decoupling detection from classification, the framework resolves the trade-off between accuracy and on-board deployability. Its strong performance, combined with a minimal computational footprint, indicates its potential for securing real-world vehicular systems. Future research could extend the framework and explore hardware-specific optimizations.

Modeling, Detection, and Defense Theories and Methods for Cyber-Physical Fusion Attacks in Smart Grid

WANG Wenting, TIAN Boyan, WU Fazong, HE Yunpeng, WANG Xin, YANG Ming, FENG Dongqin

2026, 48(4): 1454-1468. doi: 10.11999/JEIT250659

[Abstract](772) [FullText HTML] (657) [PDF 1035KB](89)

Abstract:
Significance Smart Grid (SG), the core of modern power systems, enables efficient energy management and dynamic regulation through cyber-physical integration. However, its high interconnectivity makes it a prime target for cyberattacks, including False Data Injection Attacks (FDIAs) and Denial-of-Service (DoS) attacks. These threats jeopardize the stability of power grids and may trigger severe consequences such as large-scale blackouts. Therefore, advancing research on the modeling, detection, and defense of cyber-physical attacks is essential to ensure the safe and reliable operation of SGs. Progress Significant progress has been achieved in cyber-physical security research for SGs. In attack modeling, discrete linear time-invariant system models effectively capture diverse attack patterns. Detection technologies are advancing rapidly, with physical-based methods (e.g., physical watermarking and moving target defense) complementing intelligent algorithms (e.g., deep learning and reinforcement learning). Defense systems are also being strengthened: lightweight encryption and blockchain technologies are applied to prevention, security-optimized Phasor Measurement Unit (PMU) deployment enhances equipment protection, and response mechanisms are being continuously refined. Conclusions Current research still requires improvement in attack modeling accuracy and real-time detection algorithms. Future work should focus on developing collaborative protection mechanisms between the cyber and physical layers, designing solutions that balance security with cost-effectiveness, and validating defense effectiveness through high-fidelity simulation platforms. This study establishes a systematic theoretical framework and technical roadmap for SG security, providing essential insights for safeguarding critical infrastructure. Prospects Future research should advance in several directions: (1) deepening synergistic defense mechanisms between the information and physical layers; (2) prioritizing the development of cost-effective security solutions; (3) constructing high-fidelity information-physical simulation platforms to support research; and (4) exploring the application of emerging technologies such as digital twins and interpretable Artificial Intelligence (AI).

ReXNet: A Trustworthy Framework for Space-air Security Integrating Uncertainty Quantification and Explainability

LIU Zhuang, CHEN Yuran, ZHANG Jiatong, JIANG Yujing, WANG Xuhui

2026, 48(4): 1469-1479. doi: 10.11999/JEIT251159

[Abstract](552) [FullText HTML] (303) [PDF 8945KB](78)

Abstract:
Objective The Space-Air-Ground Integrated Network (SAGIN) has emerged as a strategic infrastructure for national development. However, its security vulnerabilities are increasingly evident. The physical, network, and application layers of SAGIN face different security challenges that require targeted protection strategies. Aerospace scenarios require both high predictive accuracy and transparent decision making. Therefore, more robust, reliable, and interpretable intelligent methods are needed to support network security and system trustworthiness. Methods A detection framework is proposed that integrates Uncertainty Quantification (UQ) and eXplainable Artificial Intelligence (XAI). In the front-end stage, a Bayesian deep learning method based on Monte Carlo Dropout is adopted to enable probabilistic prediction modeling. This approach separates and quantifies epistemic uncertainty and aleatoric uncertainty, which improves model reliability. In the back-end stage, SHAP and LIME are applied to provide feature attribution for each prediction, improving model interpretability and transparency. Moreover, the intermediate layer of the framework allows flexible replacement of deep learning backbones, enabling adaptation to different space and aerospace application scenarios. Results and Discussions Extensive experiments were conducted on representative space-air security datasets, including UAV swarm fault detection, ADS-B injection attacks, and network fraud detection. The experimental results show that the proposed framework achieves high-precision anomaly detection. It also evaluates prediction confidence and identifies unknown samples outside the model knowledge boundary. In addition, the framework generates logically consistent and traceable explanations for model decisions, which improves interpretability and operational reliability. The results indicate that the combined use of UQ and XAI improves the robustness and trustworthiness of intelligent models in aerospace security applications. Conclusions This study improves the reliability and transparency of anomaly detection models in the space-air domain. It reflects a transition in artificial intelligence applications from focusing only on prediction accuracy to emphasizing system trustworthiness. Future work will promote practical deployment of the framework. The focus will include real-time processing capability, lightweight implementation, and operation in resource-constrained environments such as onboard and on-orbit systems. These efforts support more secure, autonomous, and efficient operation of SAGIN and contribute to the sustainable development of future space-air information networks.

LLM-based Data Compliance Checking for Internet of Things Scenarios

LI Chaohao, WANG Haoran, ZHOU Shaopeng, YAN Haonan, ZHANG Feng, LU Tianyang, XI Ning, WANG Bin

2026, 48(4): 1480-1494. doi: 10.11999/JEIT250704

[Abstract](767) [FullText HTML] (414) [PDF 5467KB](75)

Abstract:
Objective The implementation of regulations such as the Data Security Law of the People’s Republic of China, the Personal Information Protection Law of the People’s Republic of China, and the European Union General Data Protection Regulation (GDPR) has established data compliance checking as a central mechanism for regulating data processing activities, ensuring data security, and protecting the legitimate rights and interests of individuals and organizations. However, the characteristics of the Internet of Things (IoT), defined by large numbers of heterogeneous devices and the dynamic, extensive, and variable nature of transmitted data, increase the difficulty of compliance checking. Logs and traffic data generated by IoT devices are long, unstructured, and often ambiguous, which results in a high false-positive rate when traditional rule-matching methods are applied. In addition, the dynamic business environments and user-defined compliance requirements further increase the complexity of rule design, maintenance, and decision-making. Methods A large language model-driven data compliance checking method for IoT scenarios is proposed to address the identified challenges. In the first stage, a fast regular expression matching algorithm is employed to efficiently screen potential non-compliant data based on a comprehensive rule database. This process produces structured preliminary checking results that include the original non-compliant content and the corresponding violation type. The rule database incorporates current legislation and regulations, standard requirements, enterprise norms, and customized business requirements, and it maintains flexibility and expandability. By relying on the efficiency of regular expression matching and generating structured preliminary results, this stage addresses the difficulty of reviewing large volumes of long IoT text data and enhances the accuracy of the subsequent large language model review. In the second stage, a Large Language Model (LLM) is employed to evaluate the precision of the initial detection results. For different categories of violations, the LLM adaptively selects different prompt words to perform differentiated classification detection. Results and Discussions Data are collected from 52 IoT devices operating in a real environment, including log and traffic data (Table 2). A compliance-checking rule library for IoT devices is established in accordance with the Cybersecurity Law, the Data Security Law, other relevant regulations, and internal enterprise information-security requirements. Based on this library, the collected data undergo a first-stage rule-matching process, yielding a false-positive rate of 64.3% and identifying 55 080 potential non-compliant data points. Three aspects are examined: benchmark models, prompt schemes, and role prompts. In the benchmark model comparison, eight mainstream large language models are used to evaluate detection performance (Table 5), including Qwen2.5-32B-Instruct, DeepSeek-R1-70B, and DeepSeek-R1-0528 with different parameter configurations. After review and testing by the large language model, the initial false-positive rate is reduced to 6.9%, which demonstrates a substantial improvement in the quality of compliance checking. The model’s own error rate remains below 0.01%. The prompt-engineering assessment shows that prompt design exerts a strong effect on review accuracy (Table 6). When general prompts are applied, the final false-positive rate remains high at 59%. When only chain-of-thought prompts or concise sample prompts are used, the false-positive rate is reduced to approximately 12% and 6%, respectively, and the model’s own error rate decreases to about 30% and 13%. Combining these strategies further reduces the error rate of the small-sample prompt approach to 0.01%. The effect of system-role prompt words on review accuracy is also evaluated (Table 7). Simple role prompts yield higher accuracy and F1 scores than the absence of role prompts, whereas detailed role prompts provide a clearer overall advantage than simple role prompts. Ablation experiments (Table 8) further examine the contribution of rule classification and prompt engineering to compliance checking. Knowledge supplementation is applied to reduce interference and misjudgment among rules, lower prompt redundancy, and decrease the false-alarm rate during large language model review. Conclusions A large language model-driven data compliance checking method for IoT scenarios is presented. The method is designed to address the challenge of assessing compliance in large-scale unstructured device data. Its feasibility is verified through rationality analysis experiments, and the results indicate that false-positive rates are effectively reduced during compliance checking. The initial rule-based method yields a false-positive rate of 64.3%, which is reduced to 6.9% after review by the large language model. Additionally, the error introduced by the model itself is maintained below 0.01%.

A Complexity-Reduced Active Interference Cancellation Algorithm in f-OFDM

CHEN Hao, WEN Jiangang, ZOU Yuanping, HUA Jingyu, SHENG Bin

2026, 48(4): 1495-1504. doi: 10.11999/JEIT251172

[Abstract](382) [FullText HTML] (181) [PDF 5498KB](47)

Abstract:
Objective Due to spectrum scarcity and diverse communication requirements, a waveform technology with high spectral efficiency, flexible subband configuration, and support for asynchronous communication is required for Sixth Generation mobile communication (6G). Among the candidate waveforms, filtered Orthogonal Frequency Division Multiplexing (f-OFDM) is considered a promising solution that satisfies these requirements. By applying subband filtering, f-OFDM enables flexible subband configuration and asynchronous transmission. However, the filtering mechanism inevitably introduces intrinsic interference into the system. A dominant component of this interference is InTer-subBand Interference (ITBI), which is mainly caused by Out-Of-Band Emission (OOBE) leakage from adjacent subbands. Therefore, suppressing subband OOBE is essential for reducing ITBI and improving the performance of f-OFDM systems. Based on the structure of f-OFDM systems, a Complexity-Reduced Active Interference Cancellation (CRAIC) algorithm is proposed to suppress the OOBE of f-OFDM subbands and improve overall system performance. Methods First, based on the spectral structure of f-OFDM, a subset of data subcarriers in the target subband is used to generate Cancellation Carriers (CCs). A CRAIC optimization model for f-OFDM systems is then constructed under the constraint of CCs power. The cost function is defined according to the superposed spectrum of data subcarriers and CCs at Desired Frequency Points (DFPs). Second, by introducing a real-complex domain transformation and reformulating the optimization model, the original complex-domain CRAIC programming problem is converted into a real-domain Second-Order Cone Programming (SOCP) problem, which enables efficient computation. Furthermore, computer simulations evaluate the effects of key parameters on CRAIC performance, including the number of CCs (

\begin{document}$ M $\end{document}

), the number of data subcarriers used to generate CCs (

\begin{document}$ K $\end{document}

), and the number of DFPs (

\begin{document}$ Q $\end{document}

). Based on these evaluations, practical recommendations are provided for configuring CRAIC parameters in f-OFDM systems. Results and Discussions Simulation results show that in the edge region of the adjacent subband, the proposed CRAIC algorithm produces the steepest Power Spectral Density (PSD) roll-off compared with the conventional ZP and Origin schemes. This result indicates that CRAIC provides the strongest ITBI suppression in this region and achieves the lowest Bit Error Rate (BER) for Edge Subcarriers (ESs) in the adjacent subband. Specifically, CRAIC achieves a maximum PSD reduction of 4 dB and 12 dB compared with ZP and Origin, respectively (Fig. 2a). This result occurs because the right Q/2 DFPs are largely located in the edge region of SB₂, which leads to effective spectral suppression in this area. Therefore, the BER at the edge of SB₂ is significantly lower for CRAIC than for Origin, and a visible performance improvement is also observed compared with ZP (Fig. 3a). Furthermore, the effects of key parameters

\begin{document}$ M $\end{document}

,

\begin{document}$ K $\end{document}

and

\begin{document}$ Q $\end{document}

are examined through simulations. The results show that increasing

\begin{document}$ M $\end{document}

continuously improves OOBE suppression capability (Fig. 4a), although spectral efficiency gradually decreases. In contrast, increasing

\begin{document}$ K $\end{document}

and

\begin{document}$ Q $\end{document}

produces only limited performance improvement. When these parameters exceed certain values, further increases do not provide additional gains (Fig. 5a and Fig. 6a). Based on these observations,

\begin{document}$ M=4 $\end{document}

,

\begin{document}$ K=8 $\end{document}

,

\begin{document}$ Q=4 $\end{document}

are selected as typical parameter settings for the scenario considered in this study. Under this configuration, CRAIC (

\begin{document}$ K=8 $\end{document}

) achieves significant improvements in ES BER compared with Origin and ZP (Fig. 8a), whereas the BER of Internal Subcarriers (ISs) remains nearly the same as that of the two benchmark schemes (Fig. 8b). Compared with the full-scale CRAIC scheme (

\begin{document}$ K=20 $\end{document}

), CRAIC (

\begin{document}$ K=8 $\end{document}

) reduces the size of the data-subcarrier mapping matrix by 60% while causing only limited BER degradation (Fig. 8a). These results indicate that the proposed algorithm preserves the performance of the full-scale Active Interference Cancellation (AIC) scheme while substantially reducing computational complexity. Conclusions A CRAIC algorithm for filtered OFDM systems is studied. The CRAIC optimization model is constructed under the constraint of CC power, and the cost function is defined based on the superposed spectrum of selected data subcarriers and CCs at DFPs. Through real-imaginary domain conversion and model reformulation, the complex-domain optimization problem is converted into a real-domain SOCP problem. Simulation results show that the CRAIC algorithm effectively reduces the PSD of the target subband, particularly in the transition region of the adjacent subband, which leads to clear improvement in edge BER performance. The effects of key parameters are also evaluated. Increasing

\begin{document}$ M $\end{document}

increases the performance gain of CRAIC compared with ZP, although spectral efficiency decreases. Increasing

\begin{document}$ K $\end{document}

improves OOBE suppression, although the gain gradually decreases and computational complexity increases. Increasing

\begin{document}$ Q $\end{document}

does not continuously reduce PSD. Overall, the CRAIC algorithm improves subband isolation in f-OFDM systems, reduces ITBI, and improves system performance.

Delay Deterministic Routing Algorithm Based on Inter-controller Cooperation for Multi-layer Low Earth Orbit Satellite Networks

HUANG Longhui, DING Xiaojin, ZHANG Gengxin

2026, 48(4): 1505-1516. doi: 10.11999/JEIT251100

[Abstract](551) [FullText HTML] (367) [PDF 2128KB](80)

Abstract:
Objective The massive scale and large number of satellites in multi-layer Low Earth Orbit (LEO) constellations produce highly dynamic network topologies. Coupled with time-varying traffic loads, this condition causes temporal fluctuations in satellite network resources, such as available link queue size and link bandwidth. These variations make it difficult to establish stable end-to-end transmission paths and guarantee Quality of Service (QoS). To address this problem, Software-Defined Networking (SDN) is applied to multi-layer LEO constellations. SDN controllers collect network state information and enable unified management of network resources. The constellation is divided into multiple regions, with a controller deployed in each region to coordinate the operation of the constellation. A deterministic delay routing algorithm is designed within the SDN controller to compute inter-region transmission paths for traffic and satisfy deterministic delay requirements. Methods A deterministic delay routing algorithm based on controller cooperation is proposed for multi-layer LEO constellations. First, a regional division strategy and controller deployment scheme are designed. The satellite network is partitioned into multiple regions, each managed by a designated controller. Second, criteria are defined for Inter-Satellite Links (ISLs) between satellites within the same layer and across different layers to characterize link communication states. Third, a Time-Varying Graph (TVG) model represents the network topology and link resource attributes, including bandwidth, queue size, and link duration. This model is combined with a multi-destination Lagrange relaxation method to optimize path selection. The resulting paths satisfy both delay and delay jitter constraints. Adjacent regional controllers exchange network state information to support cooperative computation of feasible inter-region transmission paths. Results and Discussions To evaluate the proposed method, a simulation system for multi-layer LEO constellations was developed. The performance of the algorithm was tested under different data transmission rates. Compared with IUDR, the proposed method improves network performance by reducing end-to-end delay, delay jitter, and packet loss rate, and by increasing throughput. At a data transmission rate of 3 Mbps, the average end-to-end delay is reduced by 16.0% (Fig. 3(a)), delay jitter by 37.9% (Fig. 3(b)), and packet loss rate by 37.2% (Fig. 3(c)). Throughput increases by approximately 2% (Fig. 3(d)). In terms of signaling overhead, the proposed algorithm achieves a higher Reduction-Improvement Gain Ratio, which increases by approximately 111.8% compared with IUDR. This result indicates superior overall performance of the DDRA-ICC. Additionally, the proposed method shows lower time complexity for route computation than IUDR. Conclusions To address deterministic delay requirements for traffic transmission in multi-layer LEO constellations, a controller cooperation-based deterministic delay routing algorithm is proposed. Performance evaluation under different load conditions shows that: (1) Compared with IUDR, the proposed algorithm reduces the average end-to-end delay, delay jitter, and packet loss rate by 16.0%, 37.9%, and 37.2%, respectively, and increases the average throughput by approximately 2%. (2) Although the additional overhead of DDRA-ICC is comparable to that of IUDR, the packet loss rate decreases further to 2.96%, representing a reduction of 52.49%, and the Reduction-Improvement Gain Ratio reaches 1.97. These results indicate lower packet loss, a higher Reduction-Improvement Gain Ratio, and a better balance between signaling overhead and reliability. Therefore, the proposed method provides advantages in ensuring deterministic traffic transmission. Future work may consider additional practical factors, such as satellite node failures and their effects on network performance, to further improve system capability.

Two-Channel Joint Coding Detection for Cyber-Physical Systems Against Integrity Attacks

MO Xiaolei, ZENG Weixin, FU Jiawei, DOU Keqin, WANG Yanwei, SUN Ximing, LIN Sida, SUI Tianju

2026, 48(4): 1517-1527. doi: 10.11999/JEIT250729

[Abstract](397) [FullText HTML] (282) [PDF 1979KB](62)

Abstract:
Objective Cyber-Physical Systems (CPS) are widely applied across infrastructure, aviation, energy, healthcare, manufacturing, and transportation, as computing, control, and sensing technologies advance. Due to the real-time interaction between information and physical processes, such systems are exposed to security risks during data exchange. Attacks on CPS can be grouped into availability, integrity, and reliability attacks based on information security properties. Integrity attacks manipulate data streams to disrupt the consistency between system inputs and outputs. Compared with the other two types, integrity attacks are more difficult to detect because of their covert and dynamic nature. Existing detection strategies generally modify control signals, sensing signals, or system models. Although these approaches can detect specific categories of attacks, they may reduce control performance and increase model complexity and response delay. Methods A joint additive and multiplicative coding detection scheme for the two-channel structure of control and output is proposed. Three representative integrity attacks are tested, including a control-channel bias attack, an output-channel replay attack, and a two-channel covert attack. These attacks remain stealthy by partially or fully obtaining system information and manipulating data so the residual-based χ2 detector output stays below the detection threshold. The proposed method introduces paired additive watermarking signals with positive and negative patterns, together with paired multiplicative coding and decoding matrices on both channels. These additional unknown signals and parameters introduce information uncertainty to the attacker and cause the residual statistics to deviate from the expected values constructed using known system information. The watermarking pairs and matrix pairs operate through different mechanisms. One uses opposite-sign injection, while the other uses a mutually inverse transformation. Therefore, normal control performance is maintained when no attack is present. The time-varying structure also prevents attackers from reconstructing or bypassing the detection mechanism. Results and Discussions Simulation experiments on an aerial vehicle trajectory model are conducted to assess both the influence of integrity attacks on flight paths and the effectiveness of the proposed detection scheme. The trajectory is modeled using Newton’s equations of motion, and attitude dynamics and rotational motion are omitted to focus on positional behavior. Detection performance with and without the proposed method is compared under the three attack scenarios (Fig. 2, Fig. 3, Fig. 4). The results show that the proposed scheme enables effective identification of all attack types and maintains stable system behavior, demonstrating its practical applicability and improvement over existing approaches. Conclusions This study addresses the detection of integrity attacks in CPS. Three representative attack types (bias, replay, and covert attacks) are modeled, and the conditions required for their successful execution are analyzed. A detection approach combining additive watermarking and multiplicative encoding matrices is proposed and shown to detect all three attack types. The design uses paired positive-negative additive watermarks and paired encoding and decoding matrices to ensure accurate detection while maintaining normal control performance. A time-varying configuration is adopted to prevent attackers from reconstructing or bypassing the detection elements. Using an aerial vehicle trajectory simulation, the proposed approach is demonstrated to be effective and applicable to cyber-physical system security enhancement.

Dynamic State Estimation of Distribution Network by Integrating High-degree Cubature Kalman Filter and Long Short-Term Memory Under False Data Injection Attack

XU Daxing, SU Lei, HAN Heqiao, WANG Hailun, ZHANG Heng, CHEN Bo

2026, 48(4): 1528-1538. doi: 10.11999/JEIT250805

[Abstract](445) [FullText HTML] (387) [PDF 3457KB](83)

Abstract:
Objective Dynamic state estimation of distribution networks is presented as a core technique for maintaining secure and stable operation in cyber-physical power systems. Its practical performance is limited by strong system nonlinearity, high-dimensional state characteristics, and the threat posed by False Data Injection Attack (FDIA). A method that integrates High-degree Cubature Kalman Filter (HCKF) with Long Short-Term Memory network (LSTM) is proposed. HCKF is applied to enhance estimation precision in nonlinear high-dimensional scenarios. The estimation outputs from HCKF and Weighted Least Squares (WLS) are combined for rapid FDIA identification using residual-based analysis. The LSTM model is then employed to reconstruct measurement data of compromised nodes and refine state estimation results. The approach is validated on the IEEE 33-bus distribution system, demonstrating reliable accuracy enhancement and effective attack resilience. Methods The strong nonlinearity of distribution networks limits the estimation accuracy of dynamic methods based on the Cubature Kalman Filter (CKF). A hybrid measurement state estimation model that combines data from Phasor Measurement Unit (PMU) and Supervisory Control And Data Acquisition (SCADA) is established. HCKF is applied to enhance estimation performance in nonlinear, high-dimensional scenarios by generating higher-order cubature points. Under FDIA, the estimation outputs from WLS and HCKF are jointly assessed, allowing rapid intrusion detection through residual evaluation and state consistency checking. Once an attack is identified, an LSTM model performs time-series prediction to reconstruct the measurement data of compromised nodes. The reconstructed data replace abnormal values, enabling correction of the final state estimation. Results and Discussions Experiments on the IEEE 33-bus distribution system show that without FDIA, HCKF achieves higher estimation accuracy for voltage magnitude and phase angle than CKF. The Average voltage Relative Error (ARE) of voltage magnitude decreases by 57.9%, and the corresponding phase-angle error decreases by 28.9%, confirming the superiority of the method for strongly nonlinear and high-dimensional state estimation. Under FDIA, residual-based detection effectively identifies cyber attacks and avoids false alarms and missed detections. The prediction error of LSTM for the measurement data of compromised nodes and their associated branches remains on the order of 10^–6, indicating high reconstruction fidelity. The combined HCKF and LSTM maintains stable state tracking after intrusion, and its performance exceeds that of WLS and adaptive Unscented Kalman Filter. Conclusions The dynamic state estimation method that integrates HCKF and LSTM enhances adaptability to strong nonlinearity and high-dimensional characteristics of distribution networks. Rapid and accurate FDIA identification is achieved through residual evaluation, and LSTM reconstructs the measurement data of compromised nodes with high reliability. The method maintains high estimation accuracy under normal operation and preserves stability and precision under cyber intrusion. It offers technical support for secure and stable operation of distribution networks in the presence of malicious attacks.

Security Protection for Vessel Positioning in Smart Waterway Systems Based on Extended Kalman Filter-Based Dynamic Encoding

TANG Fengjian, YAN Xia, SUN Zeyi, ZHU Zhaowei, YANG Wen

2026, 48(4): 1539-1548. doi: 10.11999/JEIT250846

[Abstract](543) [FullText HTML] (221) [PDF 1446KB](77)

Abstract:
Objective With the rapid development of intelligent shipping systems, vessel positioning data face severe privacy leakage risks during wireless transmission. Traditional privacy-preserving methods, such as differential privacy and homomorphic encryption, suffer from data distortion, high computational overhead, or reliance on costly communication links, making it difficult to achieve both data integrity and efficient protection. This study addresses the characteristics of vessel stabilization systems and proposes a dynamic encoding scheme enhanced by time-varying perturbations. By integrating the Extended Kalman Filter (EKF) and introducing unstable temporal perturbations during encoding, the scheme uses receiver-side acknowledgments (ACK feedback) to achieve reference-time synchronization and independently generates synchronized perturbations through a shared random seed. Theoretical analysis and simulations show that the proposed method achieves nearly zero precision loss in state estimation for legitimate receivers, whereas decoding errors of eavesdroppers grow exponentially after a single packet loss, effectively countering both single- and multi-channel eavesdropping attacks. The shared-seed synchronization mechanism avoids complex key management and reduces communication and computational costs, making the scheme suitable for resource-constrained maritime wireless sensor networks. Methods The proposed dynamic encoding scheme introduces a time-varying perturbation term into the encoding process. The perturbation is governed by an unstable matrix to induce exponential error growth for eavesdroppers. The encoded signal is constructed from the difference between the current state estimate and a time-scaled reference state, combined with the perturbation term. A shared random seed between legitimate parties enables deterministic and synchronized generation of the perturbation sequence without online key exchange. At the legitimate receiver, the perturbation is canceled during decoding, enabling accurate state recovery. Local state estimation at each sensor node is performed using EKF, and the overall communication process is reinforced by acknowledgment-based synchronization to maintain consistency between the sender and receiver. Results and Discussions Simulations are conducted in a wireless sensor network with four sensors tracking vessel states, including position, velocity, and heading. The results indicate that legitimate receivers achieve nearly zero estimation error (Fig. 3), Simulations were conducted in a wireless sensor network with multi-sensors tracking vessel states such as position, velocity, and heading. The results show that legitimate receivers achieve nearly zero estimation error (Fig. 3), while eavesdroppers experience exponentially growing errors after a single packet loss. The error growth rate correlates with the instability of the perturbation matrix, confirming the theoretical divergence. In multi-channel scenarios, independent perturbation sequences per channel prevent cross-channel correlation attacks. The scheme maintains low communication and computational overhead, making it practical for maritime environments. Furthermore, the method demonstrates strong adaptability to packet loss and channel variations, fulfilling SOLAS requirements for data integrity and reliability. Conclusions A dynamic encoding scheme with time-varying perturbations is proposed for privacy-preserving vessel state estimation. By integrating EKF with an unstable perturbation mechanism, the method ensures high estimation precision for legitimate users and exponential error growth for eavesdroppers. The main contributions are as follows: (1) an encoding framework that achieves zero precision loss for legitimate receivers; (2) a lightweight synchronization mechanism based on shared random seeds, which removes complex key management; and (3) theoretical guarantees of exponential error divergence for eavesdroppers under single- or multi-channel attacks. The scheme is robust to packet loss and channel asynchrony, complies with SOLAS data integrity requirements, and is suitable for resource-limited maritime networks. Future work will extend the method to nonlinear vessel dynamics, adaptive perturbation optimization, and validation in real maritime communication environments.

Privacy-preserving Computation in Trustworthy Face Recognition: A Comprehensive Survey

YUAN Lin, WU Yanshang, ZHANG Liyuan, ZHANG Yushu, WANG Nannan, GAO Xinbo

2026, 48(4): 1549-1568. doi: 10.11999/JEIT251063

[Abstract](1220) [FullText HTML] (566) [PDF 3747KB](152)

Abstract:
Significance With the widespread deployment of face recognition in Cyber-Physical Systems (CPS), including smart cities, intelligent transportation, and public safety infrastructures, privacy leakage has become a central concern for both academia and industry. Unlike many biometric modalities, face recognition operates in highly visible and loosely controlled environments, such as public spaces, consumer devices, and online platforms, where facial image acquisition is easy and pervasive. This exposure makes facial data especially vulnerable to unauthorized collection and misuse. Insufficient protection may lead to identity theft, unauthorized tracking, and deepfake generation, which threaten individual rights and reduce trust in digital systems. Therefore, facial data protection is not only a technical issue but also a significant societal and ethical challenge. This work integrates fragmented research across computer vision, cryptography, and privacy-preserving computation. It provides a unified perspective that guides the development of trustworthy face recognition ecosystems that balance usability, regulatory compliance, and public trust. Contributions This paper systematically reviews recent advances in privacy-preserving computation for face recognition, covering both theoretical foundations and practical implementations. The architecture and application pipeline of face recognition systems are first examined, and privacy risks at each stage are identified. At the data collection stage, unauthorized or covert capture of facial images introduces immediate risks of misuse. During model training and deployment, gradient leakage, membership inference, and overfitting may expose sensitive information about individuals contained in training data. At the inference stage, adversaries may reconstruct facial images, perform unauthorized recognition, or associate identities across datasets, which compromises anonymity. To address these threats, existing approaches are classified into four major privacy-preserving paradigms: data transformation, distributed collaboration, image generation, and adversarial perturbation. Within these paradigms, ten representative techniques are analyzed. Cryptographic computation, including homomorphic encryption and secure multiparty computation, enables recognition without revealing raw data but often introduces substantial computational overhead. Frequency-domain learning converts images into spectral representations to suppress identifiable details while retaining discriminative features. Federated learning decentralizes model training and reduces centralized data exposure, although it remains vulnerable to gradient inversion attacks. Image generation techniques, such as face synthesis and virtual identity modeling, reduce reliance on real facial data during training and evaluation. Differential privacy introduces calibrated noise to provide statistical privacy guarantees, whereas face anonymization obscures identifiable visual traits. Template protection and anti-reconstruction mechanisms defend stored facial features against reverse engineering. Adversarial privacy protection introduces imperceptible perturbations that interfere with machine recognition yet preserve human visual perception. Several representative studies in each category are further examined. Commonly used evaluation datasets are summarized. A comparative analysis is conducted across multiple dimensions, including face recognition performance, privacy protection effectiveness, and practical usability. This analysis systematically identifies the strengths and limitations of different types of methods. Prospects Several research directions are identified for future work. A primary challenge is to achieve a dynamic balance between privacy protection and system utility. Excessive protection may degrade recognition accuracy, whereas insufficient safeguards expose users to unacceptable risks. Adaptive mechanisms that adjust privacy levels according to context, task requirements, and user consent are therefore required. Another promising direction is the development of inherently privacy-aware recognition paradigms, such as feature representations that minimize identity leakage by design. The establishment of standardized evaluation frameworks for privacy risk and usability is also essential. Such frameworks would enable reproducible benchmarking and facilitate real-world deployment. The emergence of generative foundation models, including diffusion models and large multimodal models, further changes the research landscape. These models enable synthetic data generation and controllable identity representations. However, they also enable more advanced attacks, such as high-fidelity face reconstruction and identity impersonation. Addressing these dual effects requires interdisciplinary collaboration across computer vision, cryptography, law, and ethics, supported by appropriate regulation and continued methodological development. Conclusions This paper provides a comprehensive reference for researchers and practitioners engaged in trustworthy face recognition. By integrating advances from multiple disciplines, it promotes the development of effective facial privacy protection technologies and supports the secure, reliable, and ethically responsible deployment of face recognition in practical scenarios. The long-term goal is to establish face recognition as a trustworthy component of CPS that balances functionality, privacy protection, and societal trust.

A Review of Causal Feature Learning in Deep Learning Image Classification Models

WANG Xiaodong, JIANG Ling, LI Huihui, WANG Buhong

2026, 48(4): 1569-1590. doi: 10.11999/JEIT250738

[Abstract](931) [FullText HTML] (687) [PDF 2069KB](128)

Abstract:
Significance Deep learning is built on statistical correlations rather than causal relationships. Therefore, such models face major challenges in generalization, interpretability, and stability. Unlike human cognition, which mainly depends on causal discovery and use, current deep learning models remain at the bottom of the Pearl Causal Hierarchy (PCH). Therefore, integrating causal inference into deep learning has become a major research goal. As a core branch of deep learning, image classification models, represented by Convolutional Neural Networks (CNNs), show these limitations particularly clearly. Thus, causal inference is urgently needed to address this bottleneck. Among the available approaches for incorporating causal inference into these models, Causal Feature Learning (CFL), a framework that combines unsupervised machine learning with causal inference, shows clear advantages. Previous studies have confirmed that causal relationships are implicitly embedded in the pixel information of input images in image classification tasks. According to the Causal Coarsening Theorem (CCT), causal knowledge can be obtained from observed image data at low experimental cost. In classification tasks, the optimal solution is given by the Markov Boundary (MB) of the causal Bayesian network for the class variable. These theories strongly support efforts to connect deep image classification models with causal inference through CFL. Overall, the importance of CFL has become increasingly evident, and it is regarded as a promising breakthrough direction for next-generation models. Progress This paper provides a comprehensive review of CFL in deep learning image classification models from three core aspects: statistical causal inference theory, correlation analysis methods, and CFL implementations. First, the relevant definitions of CFL and its two mainstream statistical implementation frameworks are introduced, including causal discovery based on the Structural Causal Model (SCM) and causal effect estimation based on the Rubin Causal Model (RCM). Second, correlation analysis methods for deep learning image classification models, which are located at the threshold of the PCH, are systematically summarized from three perspectives: forward, backward, and horizontal. Third, with these auxiliary tools as a foundation, progress in CFL for image classification is classified into four main directions: causal Feature Discovery (CFD), Causal Feature Effect Estimation (CFEE), Causal Representation Learning (CRL), and Spurious Correlation Removal (SCR). CFD is based on the SCM framework and aims to derive confounding-free causal graphs through explicit or implicit causal intervention analysis of image data or models. Under the RCM framework, CFEE uses observed image data to quantitatively evaluate the causal effects of features, while addressing the lack of counterfactual samples and confounding bias. CRL focuses on selecting or extracting high-dimensional features from image data to learn causal relationships and identify low-dimensional cross-image representations. SCR removes non-causal features from images and preserves causal features through different methods. In addition, available toolkits, top conference resources, and academic organizations are listed. This paper also discusses key technical issues and future research directions. Conclusions This review summarizes the technological development of CFL. Overall, substantial progress has been made, although challenges remain in different research directions. CFD has the advantage of following the basic logic of causal theory, with clear and simple structures that are easy to understand. However, CFD still faces immature processing methods for high-dimensional image data and limited generalization ability. CFEE can effectively distinguish causal features from confounding features. Its evaluation results are closer to real decision-making logic and show strong general applicability. Common limitations of CFEE include the requirement for observable confounders, strong dependence on causal assumptions, and limited computational efficiency. CRL offers greater flexibility in representation dimensions and can identify causal factors that drive classification while excluding non-causal factors. Its main unresolved problems include generalization bias, factor coupling, prior dependence, weak evaluation, and high cost. SCR is highly targeted but has poor generalization ability. From a broader perspective, CFL should not be restricted to specific methods. Any method that aims to construct causal relationships from microvariables, such as image pixels, to causal macrovariables, such as global semantics, can be considered part of this field. Therefore, CFL remains an open research topic. Prospects The goal of causal inference is to move beyond correlation and clarify the causal relationships among variables by designing more rigorous experiments or using more advanced statistical methods. This requires deeper assumptions about feature relationships and broader exploration of underlying causal chains. Both remain highly challenging and are likely to become major focuses of future research in this field. To address the technical challenges in CFL, this paper proposes the following future directions: (1) unifying construction paradigms and establishing standards for image-based SCMs to improve the standardization and consistency of causal discovery; (2) developing RCM methods supported by generative artificial intelligence to address sample scarcity in causal effect estimation; (3) reforming models to learn new image causal representations, thereby fundamentally addressing the inherent limitations of CNNs in CFL; and (4) integrating spurious correlation analysis with reinforcement learning, and using reinforcement learning to equip deep learning image classification models with meta-learning capability for causal exploration. It can be expected that, once these key issues in CFL are resolved, the accuracy, generalization, interpretability, and stability of deep learning image classification models will improve substantially.

A Review of Research on Voiceprint Fault Diagnosis of Transformers

GONG Wenjie, LIN Guosong, WEI Xiaoguang

2026, 48(4): 1591-1607. doi: 10.11999/JEIT251076

[Abstract](891) [FullText HTML] (1008) [PDF 6935KB](98)

Abstract:
Significance Voiceprint fault diagnosis of transformers has become an active research area for ensuring safe and reliable operation of power systems. Traditional monitoring methods, such as dissolved gas analysis, infrared temperature measurement, and online partial discharge monitoring, exhibit limited real-time capability and rely heavily on expert experience. These limitations hinder effective detection of early-stage faults. Voiceprint fault diagnosis captures operational voiceprint signals from transformers and enables non-contact monitoring for early anomaly warning. This approach offers advantages in real-time performance, sensitivity, and fault coverage. This review systematically traces the technological evolution from traditional signal analysis to deep learning and compares the advantages, limitations, and application scenarios of different models across multiple dimensions. Key challenges are identified, including limited robustness to noise and imbalanced datasets. Potential research directions are proposed, including integration of physical mechanisms with data-driven methods and improvement of diagnostic transparency and interpretability. These analyses provide theoretical support and practical guidance for promoting the transition of voiceprint fault diagnosis from laboratory research to engineering applications. Progress Research on voiceprint fault diagnosis of transformers has progressed from traditional signal analysis to an intelligent recognition paradigm based on deep learning, reflecting a clear technological evolution. A bibliometric analysis of 188 papers from the CNKI and Web of Science databases shows that annual publications remained at 1～10 papers between 1997 and 2020, corresponding to an exploratory stage. Studies during this period focused mainly on fundamental voiceprint signal processing methods, including acoustic wave detection, wavelet transform, and Empirical Mode Decomposition (EMD). After 2020, Variational Modal Decomposition (VMD), Mel spectrum, and Mel Frequency Cepstral Coefficient (MFCC) were gradually applied to voiceprint feature extraction. Since 2021, publication output has increased rapidly and reached a historical peak in 2023. This growth was driven by advances in image and speech processing technologies. Early studies emphasized time-domain and frequency-domain analysis of voiceprint signals. Recent research increasingly converts voiceprint signals into two-dimensional time-frequency spectrogram representations. Model architectures have evolved from single-channel feature inputs with single-model outputs to complex frameworks with multi-channel feature extraction and multi-model fusion. Classical machine learning models, including Gaussian Mixture Model (GMM), Support Vector Machine (SVM), Random Forest (RF), and Back Propagation Neural Network (BPNN), form the foundation of voiceprint fault diagnosis but are limited in handling high-dimensional features. Deep learning models, such as Convolutional Neural Network (CNN), Residual neural Network (ResNet), Recurrent Neural Network (RNN), and Transformer, demonstrate advantages in automatic feature extraction and complex pattern recognition, although they require substantial computational resources. Conclusions This review summarizes the technological development of transformer voiceprint fault diagnosis from machine learning to deep learning. Although deep learning methods achieve high recognition accuracy for complex voiceprint signals, five major challenges remain. These challenges include limited robustness to noise in non-stationary environments, severe data imbalance caused by scarce fault samples, the black-box nature of deep learning models, fragmented evaluation systems resulting from inconsistent data acquisition standards, and insufficient cross-modal fusion of multi-source data. Sensitivity to environmental noise limits diagnostic performance under varying operating conditions. Data imbalance reduces recognition accuracy for rare fault types. Limited interpretability restricts fault mechanism analysis and diagnostic credibility. Inconsistent sensor placement and sampling parameters lead to poor comparability across datasets. Single-modal voiceprint analysis restricts effective utilization of complementary information from other data sources. Addressing these challenges is essential for advancing voiceprint fault diagnosis from laboratory validation to field deployment. Prospects Future research should focus on five directions. First, noise-robust voiceprint feature extraction methods based on physical mechanisms should be developed to address non-stationary interference in complex operating environments. Second, the lack of real-world fault data should be alleviated by constructing electromagnetic field-structural mechanics-acoustic coupling models of transformers to generate high-fidelity voiceprint fault samples, while unsupervised clustering methods should be applied to improve annotation efficiency and quality. Third, explainable deep learning architectures for voiceprint fault diagnosis that incorporate physical mechanisms should be designed. Attention mechanisms combined with SHapley Additive exPlanations, Grad-CAM, and physical equations can support process-level and post hoc interpretation of diagnostic results. Fourth, industry-wide collaboration is required to establish standardized voiceprint data acquisition protocols, benchmark datasets, and unified evaluation systems. Fifth, cross-modal fusion models based on multi-channel and multi-feature analysis should be developed to enable integrated transformer fault diagnosis through comprehensive utilization of multi-source information.

A Large-Scale Multimodal Instruction Dataset for Remote Sensing Agents

WANG Peijin, HU Huiyang, FENG Yingchao, DIAO Wenhui, SUN Xian

2026, 48(4): 1608-1622. doi: 10.11999/JEIT250818

[Abstract](1465) [FullText HTML] (529) [PDF 3337KB](238)

Abstract:
Objective The rapid advancement of Remote Sensing (RS) technology has reshaped Earth observation research, shifting the field from static image analysis to intelligent, goal-oriented cognitive decision-making. Modern RS systems are expected to perceive complex scenes, reason over heterogeneous information, decompose high-level objectives into executable subtasks, and make decisions under uncertainty. These requirements motivate the development of RS agents, which extend perception models to include reasoning, planning, and interaction functions. However, existing RS datasets remain task-centric and fragmented, as they are usually designed for single-purpose supervised learning such as object detection or land-cover classification. They seldom support multimodal reasoning, instruction following, or multi-step decision-making, all of which are essential for agentic workflows. Current RS vision-language datasets also have limited scale, constrained modality coverage, and simplified text annotations, with insufficient use of non-optical data such as Synthetic Aperture Radar (SAR) and infrared imagery. They further lack instruction-driven interactions that reflect real human-agent collaboration. This study constructs a large-scale multimodal image-text instruction dataset tailored for RS agents. The objective is to establish a unified data foundation that supports perception, reasoning, planning, and decision-making. By training models on structured instructions across diverse modalities and task categories, the dataset supports the development and evaluation of next-generation RS foundation models with agentic capability. Methods The dataset is built through a systematic and extensible framework that integrates multi-source RS imagery with instruction-oriented textual supervision. A unified input-output paradigm is defined to ensure compatibility across heterogeneous tasks and model architectures. This paradigm formalizes interactions between visual inputs and language instructions, allowing models to process image pixels, text descriptions, spatial coordinates, region references, and action-oriented outputs. A standardized instruction schema encodes task objectives, constraints, and expected responses in a consistent format. The construction process includes three stages. (1) Data collection and integration: multimodal RS imagery is aggregated from authoritative sources, covering optical, SAR, and infrared modalities with different spatial resolutions, scene types, and geographic distributions. (2) Instruction generation: a hybrid strategy combines rule-based templates with refinement by Large Language Models (LLMs). Template-based generation ensures task completeness and structural consistency, whereas LLM rewriting improves linguistic diversity and instruction complexity. (3) Task categorization and organization: the dataset is organized into nine core task categories and 21 sub-datasets that span low-level perception, mid-level reasoning, and high-level decision-making. A validation pipeline performs automated syntax and format checks, cross-modal consistency verification, and manual review of representative samples to ensure semantic alignment between images and instructions. Results and Discussions The dataset contains more than 2 million multimodal instruction samples, making it one of the largest and most comprehensive instruction resources in the RS domain. The inclusion of optical, SAR, and infrared imagery supports cross-modal learning and reasoning across heterogeneous sensing mechanisms. Compared with existing RS datasets, this dataset emphasizes instruction diversity, task compositionality, and agent-oriented interaction rather than isolated perception tasks. Baseline experiments conducted using state-of-the-art multimodal LLMs and RS foundation models show that the dataset supports evaluation across the full spectrum of agentic capabilities, from visual grounding and reasoning to high-level decision-making. The experiments also highlight challenges inherent to RS data, including extreme scale variation, dense object distributions, and long-range spatial dependencies. These challenges indicate important research directions for improving multimodal reasoning and planning in complex RS environments. Conclusions This work presents a large-scale multimodal image-text instruction dataset designed for RS agents. By organizing data across nine task categories and 21 sub-datasets, it provides a unified and extensible benchmark for agent-centric RS research. The contributions include: (1) a unified multimodal instruction paradigm for RS agents; (2) a 2-million-sample dataset covering optical, SAR, and infrared modalities; (3) empirical validation demonstrating support for end-to-end agentic workflows from perception to decision-making; and (4) a comprehensive evaluation benchmark based on baseline experiments. Future work will extend the dataset to temporal and video-based RS scenarios, integrate dynamic decision-making processes, and further improve reasoning and planning capability in real-world, time-varying environments.

Optimized Implementation of Low-Depth Lightweight S-Boxes

FENG Zixi, LIU Yupeng, DOU Guowei, LIU Chengle

2026, 48(4): 1623-1632. doi: 10.11999/JEIT250690

[Abstract](347) [FullText HTML] (163) [PDF 655KB](50)

Abstract:
Objective With the rapid development and widespread deployment of the Internet of Things (IoT), embedded systems, and mobile computing devices, secure communication and data protection on resource-constrained platforms have become a central focus in information security. These devices are typically characterized by severe limitations in computational capability, storage capacity, and energy consumption. These limitations make traditional cryptographic algorithms inefficient or even infeasible in such environments. In response, lightweight cryptographic algorithms have been proposed as an effective class of solutions. Their primary objective is to achieve security levels comparable to those of traditional algorithms while significantly reducing hardware and computational overhead through algorithmic simplification and structural optimization. These algorithms are designed to operate efficiently under tight resource constraints and are particularly suitable for applications such as sensor networks, smart cards, RFID systems, and wearable devices. From the perspective of hardware implementation, the design of lightweight cryptographic algorithms must consider multiple performance metrics, including throughput, latency, power efficiency, chip area, and circuit depth. Among these metrics, chip area and circuit depth are particularly critical because they directly affect production cost and computational speed. The Substitution-box (S-Box), as the core nonlinear component that provides confusion in most symmetric encryption schemes, plays a decisive role in determining both the security and implementation efficiency of the entire cipher. Therefore, efficient methods for realizing low-area and low-depth S-Boxes are of fundamental importance for the design of secure and practical lightweight cryptographic systems. Methods In this work, a novel S-Box optimization algorithm based on Boolean satisfiability (SAT) solving is proposed to optimize two key hardware metrics simultaneously: logic area and circuit depth. A circuit model with depth k and width w is constructed for this purpose. Under a given area constraint, SAT-solving techniques are used to determine whether the circuit model can implement the target S-Box. By iteratively adjusting the circuit depth, width, and area parameters, an optimized S-Box implementation is obtained. The method is specifically developed for 4-bit S-Boxes, which are widely used in many lightweight block ciphers, and it provides implementations that are highly efficient in both structural compactness and computational depth. This dual optimization approach reduces hardware cost while maintaining low latency, making it particularly suitable for scenarios in which both performance and energy efficiency are critical. The proposed method begins by transforming the S-Box implementation problem into a formal SAT problem, which enables the use of powerful SAT solvers to exhaustively explore possible logic-level representations. In this transformation, a diverse set of logic gates, including 2-input, 3-input, and 4-input gates, is used to construct flexible logic networks. To enforce area and depth constraints, arithmetic operations such as binary addition and comparator logic are encoded into SAT-compatible Boolean constraints, which guide the solver toward low-area and low-depth solutions. To further accelerate the solving process and avoid redundant search paths, symmetry-breaking constraints are introduced. These constraints eliminate logically equivalent but structurally different representations, thereby significantly reducing the size of the solution space. The CaDiCaL SAT solver, known for its speed and efficiency in handling large-scale SAT problems, is used to compute optimized S-Box implementations that minimize both depth and area. The proposed approach not only generates efficient implementations but also provides a general modeling framework that can be extended to other logic-synthesis problems in cryptographic hardware design. Results and Discussions To validate the effectiveness of the proposed optimization method, a comprehensive set of experiments is conducted on 4-bit S-Boxes from several representative lightweight block ciphers, including Joltik, Piccolo, Rectangle, Skinny, Lblock, Lac, Midori, and Prøst. The results show that the method consistently produces high-quality implementations that are competitive with, or superior to, existing state-of-the-art results in both chip area and circuit depth. Specifically, for the S-Boxes of Joltik and Piccolo, as well as those used in Skinny and Rectangle, the generated implementations match the best known results in both metrics, indicating that the method can successfully reproduce optimal or near-optimal designs. In the cases of Lblock and Lac, although the logic area remains similar to previous results, the circuit depth is significantly reduced from 10 to 3, representing a substantial improvement in processing latency and suitability for real-time applications. For the inverse S-Box of the Rectangle cipher, the proposed implementation achieves the same circuit depth as previous designs but reduces the area from 24.33 Gate Equivalents (GE) to 17.66 GE, yielding a more compact and efficient realization. The optimization results for the Midori S-Box further confirm the effectiveness of the method: the depth is reduced from 4 to 3, and the area is reduced from 20.00 GE to 16.33 GE. For the Prøst cipher’s S-Box, two alternative implementations are presented to illustrate the trade-off between area and depth. The first achieves a depth of 4 with an area of 22.00 GE, matching the best known depth but at a higher area cost. The second increases the depth to 5 but reduces the area significantly to 13.00 GE. These results show that the method supports flexible optimization under different design constraints and also provides deeper insight into the complexity and trade-offs of S-Box implementation. Conclusions This paper presents a SAT-based method for jointly optimizing S-Box hardware implementations in terms of area and circuit depth. By modeling S-Box realization as a satisfiability problem and using advanced constraint encoding, multi-input logic gates, and symmetry-breaking techniques, the method effectively reduces hardware complexity while maintaining or improving depth performance. Extensive experiments on various 4-bit S-Boxes show that the proposed approach matches or outperforms existing results, particularly in reducing circuit depth and improving logic compactness. This makes it well suited to lightweight cryptographic systems operating under strict constraints on silicon area, speed, and energy consumption. Despite these advantages, the method still has limitations. Although it achieves optimal or near-optimal results for 4-bit S-Boxes, scalability to larger instances, such as 5-bit or 8-bit S-Boxes, remains challenging because of the exponential growth of the search space and solving time. As model complexity increases, solving becomes computationally expensive and may fail to converge in practice. Future work will focus on improving modeling efficiency and solver performance through refined constraint generation, stronger pruning strategies, and heuristic-guided search, with the goal of extending the method to more complex S-Boxes and other nonlinear components in lightweight and post-quantum cryptographic systems.

Battery Pack Multi-fault Diagnosis Algorithm Based on Dual-Perspective Spectral Attention Fusion

LIU Mingjun, GU Shenyu, YIN Jingde, ZHANG Yifan, DONG Zhekang, JI Xiaoyue

2026, 48(4): 1633-1645. doi: 10.11999/JEIT251156

[Abstract](633) [FullText HTML] (421) [PDF 11488KB](69)

Abstract:
Objective With the rapid growth of electric vehicles and their widespread deployment, battery pack faults have become more frequent, creating an urgent need for efficient fault diagnosis methods. Although deep learning-based approaches have achieved notable progress, existing studies remain limited in addressing multiple fault types, such as Internal Short Circuit (ISC), sensor noise, sensor drift, and State-Of-Charge (SOC) inconsistency, and in modeling the coupling relationships among these faults. To address these limitations, a multi-fault diagnosis algorithm for battery packs based on dual-perspective spectral attention is proposed. A dual-perspective tokenization module is designed to extract spatiotemporal features from battery data, whereas a spectral attention mechanism addresses non-stationary time-series characteristics and captures long-term dependencies, thereby improving diagnostic performance. Methods To improve spatiotemporal feature extraction and fault diagnosis performance, a dual-perspective spectral attention fusion algorithm for battery pack multi-fault diagnosis is proposed. The overall architecture consists of four core modules (Fig. 3): a dual-perspective tokenization module, a spectral attention module, a feature fusion module, and an output module. The dual-perspective tokenization module applies positional encoding to jointly model temporal and spatial dimensions, enabling comprehensive spatiotemporal feature representation. When combined with the spectral attention mechanism, the capability of the model to handle non-stationary characteristics is strengthened, leading to improved diagnostic performance. In addition, to address the lack of comprehensive publicly available datasets for battery pack fault diagnosis, a new dataset is constructed, covering ISC, sensor noise, sensor drift, and SOC inconsistency faults. The dataset includes three operating conditions, FUDS, UDDS, and US06, which alleviates data scarcity in this research field. Results and Discussions Experimental results indicate that the proposed method improves average precision, recall, F1 score, and accuracy by 10.98%, 12.64%, 13.84%, and 13.45%, respectively, compared with existing optimal fault diagnosis methods. Comparison experiments under different operating conditions (Table 6) support this conclusion. Conventional convolutional neural network methods perform well in local feature extraction; however, fixed-size convolution kernels are not well suited to time features with varying frequencies, which limits long-term temporal dependency modeling and global feature capture. Recurrent neural network-based methods show reduced computational efficiency when large-scale datasets are processed. Transformer-based models face constraints in spatial feature extraction and in representing temporal variations. By contrast, the proposed algorithm addresses these limitations through an integrated architectural design. Ablation experiments demonstrate the contribution of each module to overall performance (Table 7), and the complete framework improves average F1 score and accuracy by 8.86% and 9.31%, respectively, compared with ablation variants. Robustness analysis under simulated noise conditions (Table 8) shows that the proposed method achieves accuracy improvements ranging from 49.95% to 124.34% over baseline methods at noise levels from –2 dB to –8 dB, indicating strong noise resistance. Conclusions A multi-fault diagnosis algorithm for battery packs is presented that integrates dual-perspective tokenization and spectral attention to combine spatiotemporal and spectral information. The dual-perspective tokenization module performs tokenization and positional encoding along temporal and spatial axes, which improves spatiotemporal representation. The spectral attention mechanism strengthens modeling of non-stationary signals and long-term dependencies. Experiments under FUDS, UDDS, and US06 driving cycles show that the proposed method outperforms existing multi-fault diagnosis approaches, with average gains of 13.84% in F1 score and 13.45% in accuracy. Ablation studies confirm that both modules contribute substantially and that their combination enables effective handling of complex time-series features. Under high-noise conditions (–2 dB, –4 dB, –6 dB, and –8 dB), the method also shows improved robustness, with accuracy gains of 49.95%, 90.39%, 112.01%, and 124.34%, respectively, compared with baseline methods. Several limitations remain. First, the data are mainly derived from laboratory simulations, and further validation under real-world operating conditions is required. Second, the effect of fault severity on battery management system hierarchical decision making has not been fully addressed, and future work will focus on establishing a fault severity grading strategy. Third, physical interpretability requires further improvement, and subsequent studies will explore the integration of equivalent circuit models or electrochemical mechanism models to balance diagnostic accuracy and interpretability.

Research on the Architecture of Dual-field Reconfigurable Polynomial Multiplication Unit for Lattice-based Post-quantum Cryptography

CHEN Tao, ZHAO Wangpeng, BIE Mengni, LI Wei, NAN Longmei, DU Yiran, FU Qiuxing

2026, 48(4): 1646-1658. doi: 10.11999/JEIT250929

[Abstract](521) [FullText HTML] (283) [PDF 2471KB](49)

Abstract:
Objective Polynomial multiplication accounts for more than 80% of the computational time in lattice cryptography algorithms. The Number Theoretic Transform (NTT) and Fast Fourier Transform (FFT) reduce the computational complexity of polynomial multiplication from exponential to logarithmic order. However, mainstream lattice cryptography algorithms, including Kyber, Dilithium, and Falcon, differ considerably in their parameter sets and polynomial multiplication implementations. To support polynomial multiplication under multiple parameter configurations and improve resource utilization, a dual-field reconfigurable polynomial multiplication unit architecture is proposed. Methods First, the computational network for polynomial multiplication is extracted according to the parameter characteristics of Kyber, Dilithium, and Falcon. The internal dual-field multiplication operations are optimized at the algorithm level. Next, a dual-field reconfigurable polynomial multiplication unit architecture is designed for the polynomial multiplication network. The dual-field reconfigurable multiplication unit is further optimized to improve computational speed. Finally, a parallelism analysis is conducted to improve resource utilization of the computational architecture. The proposed architecture achieves the highest area efficiency when supporting 1-lane 64 bit, 2-lane 32 bit, or 4-lane 16 bit operations. Results and Discussions The architecture is experimentally validated on the Xilinx FPGA XC7V2000TFLG1925. It simultaneously supports one channel of complex-form floating-point operations or two channels of 17

\begin{document}$ \sim $\end{document}

32 bit internal NTT operations and four channels of 16 bit internal NTT operations. At an operating frequency of 169 MHz, the architecture reduces the area-time product by more than 50%. Conclusions The proposed dual-field reconfigurable processing unit architecture provides advantages in scalability, area efficiency, and core unit performance. Its configurable bit-width design adapts more easily to traditional cryptographic processors and provides a practical approach for migrating conventional public-key cryptosystems to post-quantum cryptography.

Physical Layer Key Generation Method for Integrated Sensing and Communication Systems

LIU Kexin, HUANG Kaizhi, PEI Xinglong, JIN Liang, CHEN Yajun

2026, 48(4): 1659-1667. doi: 10.11999/JEIT251034

[Abstract](596) [FullText HTML] (379) [PDF 2417KB](99)

Abstract:
Objective Integrated Sensing And Communication (ISAC) has become a central technology in Sixth-Generation (6G) wireless networks, enabling simultaneous data transmission and environmental sensing. However, the characteristics of ISAC systems, including highly directional sensing signals and the risk of sensitive information leakage to malicious sensing targets, create specific security challenges. Physical layer security provides lightweight methods to enhance confidentiality. In secure transmission, approaches such as artificial noise injection and beamforming can partially improve secrecy, although they may reduce sensing accuracy or communication efficiency. Their effect also depends on the quality advantage of legitimate channels over eavesdropping channels. For Physical Layer Key Generation (PLKG), existing work has only demonstrated basic feasibility. Most current schemes adopt a radar-centric design, which limits compatibility with communication protocols and restricts key generation rates. This paper proposes a PLKG method tailored for ISAC systems. It aims to maximize the Sum Key Generation Rate (SKGR) under sensing accuracy constraints through a Twin Delayed Deep Deterministic policy gradient (TD3)-based joint communication and sensing beamforming algorithm, thereby improving the security performance of ISAC systems. Methods A MIMO ISAC system is considered, where a base station (Alice) equipped with multiple antennas communicates with single-antenna users (Bobs) and senses a malicious target (Eve). The system operates under a TDD protocol to leverage channel reciprocity. A PLKG protocol designed for ISAC systems is developed, including channel estimation, joint communication and sensing beamforming, and key generation. The SKGR is derived in closed form, and sensing accuracy is evaluated using the Cramér-Rao Bound (CRB). To maximize the SKGR under CRB constraints, a non-convex optimization problem for the joint design of communication and sensing beamforming matrices is formulated. Given its NP-hardness, an algorithm based on TD3 is proposed. TD3 employs dual critic networks to reduce overestimation, delayed policy updates to enhance stability, and target policy smoothing to improve robustness. The state includes channel state information, the actions correspond to beamforming matrices, and the reward function combines SKGR, CRB, and power constraints. Results and Discussions Simulation results confirm the effectiveness of the proposed design. The TD3-based algorithm achieves a stable SKGR of 18.5 bit/channel use after training (Fig. 4), outperforming benchmark schemes such as Deep Deterministic Policy Gradient (DDPG), greedy search, and random algorithms. The SKGR increases monotonically with transmit power because of reduced noise interference (Fig. 5). Increasing the number of antennas also improves SKGR, although the gain diminishes as power per antenna decreases. The scheme maintains stable SKGR across different distances to the eavesdropper (Fig. 6), demonstrating the robustness of PLKG against eavesdropping attacks. The proposed algorithm manages the complex optimization problem effectively and adapts to dynamic system conditions, offering a practical approach for secure ISAC systems. Conclusions This paper presents a PLKG method for ISAC systems. The proposed protocol generates consistent keys between the base station and communication users. The SKGR maximization problem with sensing constraints is solved using a TD3-based algorithm that jointly optimizes communication and sensing beamforming matrices. Simulation results show that the method outperforms benchmark schemes, with significant gains in SKGR and adaptability to system conditions. The study establishes a basis for integrating PLKG into ISAC to strengthen security without reducing sensing performance. Future work will examine real-time implementation and scalability in large networks.

UAV-assisted Mobile Edge Computing based on Hybrid Hierarchical DRL in the Internet of Vehicular

YANG Miaoyan, FANG Xuming

2026, 48(4): 1668-1677. doi: 10.11999/JEIT250743

[Abstract](526) [FullText HTML] (345) [PDF 2634KB](66)

Abstract:
Objective In the Internet of Vehicles (IoV), the use of Unmanned Aerial Vehicles (UAVs) to address increasing edge computing demand has become a key direction in 6G research. However, when Deep Reinforcement Learning (DRL) is applied to optimize system latency, the action space grows exponentially with the number of vehicles and causes training difficulty and slow convergence. This study proposes a two-layer hybrid solution for UAV-assisted Mobile Edge Computing (MEC) based on DRL, termed Hybrid Hierarchical Deep Reinforcement Learning (HHDRL). Methods The HHDRL algorithm adopts a two-layer architecture to decompose complex optimization tasks. The upper layer uses an agent based on Proximal Policy Optimization (PPO) and a multi-head actor network to manage user offloading and UAV control policies. The N heads determine offloading decisions for N users, including local processing or offloading to associated CAPs or the UAV. A separate UAV flight-control head selects discrete acceleration actions to satisfy practical control constraints. The lower layer applies a computationally efficient greedy algorithm to prioritize resources based on task characteristics. This hybrid hierarchical design reduces the computational cost associated with DRL-only resource allocation. Results and Discussions The performance of the HHDRL scheme was evaluated through numerical simulations using a Rician fading channel model, a UAV flight energy consumption model, and system parameters such as mission data sizes of 9～18 Mbits and mission complexities of 2 000～3 000 cycle/bit. Figure 3 shows that HHDRL converges faster than standard DRL, although the final reward is slightly lower. Figure 4 indicates that HHDRL maintains the user delay fairness of DRL. The evaluation in Figure 5 shows that the proposed method reduces system latency by approximately 71～91% compared with a random baseline and by 1～12% compared with the original DRL algorithm. Figure 6 shows training time results for different numbers of users; HHDRL consistently achieves shorter training times, and its training time grows more slowly as the number of users increases. This results from the reduced DRL output action space. When the PPO-based upper layer is replaced with other DRL algorithms, the scheme still outperforms the random baseline and achieves performance comparable to non-hierarchical DRL, demonstrating the generality of the architecture. Figure 8 shows that computational resources have the strongest effect on latency because computation typically dominates total task processing time. Figure 9 presents UAV trajectory optimization. Figure 9(a) shows realistic velocity changes under discrete acceleration control. Figure 9(b) shows that the UAV adjusts its position to track dynamic user distribution while maintaining stable flight. Conclusions This study presents an HHDRL algorithm that integrates DRL with a greedy strategy in a hierarchical framework to address the training challenges of UAV-assisted MEC in IoV scenarios. The simulations show that (1) the proposed method accelerates convergence and reduces training time compared with standard DRL; (2) its latency performance is comparable to DRL and significantly better than heuristic and random baselines; and (3) the framework effectively manages task offloading, resource allocation, and UAV trajectory optimization under practical constraints. Future work will extend the framework to multi-UAV collaboration and more complex environments.

Low-Complexity Joint Estimation Algorithm for Carrier Frequency Offset and Sampling Frequency Offset in 5G-NTN Low Earth Orbit Satellite Communications

GONG Xianfeng, LI Ying, LIU Mingyang, ZHAI Shenghua

2026, 48(4): 1678-1687. doi: 10.11999/JEIT251086

[Abstract](686) [FullText HTML] (387) [PDF 4958KB](90)

Abstract:
Objective The Doppler effect is a major impairment in Low Earth Orbit (LEO) satellite communications within 5G Non-Terrestrial Networks (5G-NTN). It introduces Carrier Frequency Offset (CFO), Sampling Frequency Offset (SFO), and Inter-Subcarrier Frequency Offset (ISFO) across subcarriers. Although existing estimation algorithms focus mainly on CFO and SFO, the effect of ISFO is insufficiently addressed. ISFO becomes highly detrimental to receiver performance when Orthogonal Frequency-Division Multiplexing (OFDM) systems use a large number of subcarriers and high-order modulation. Moreover, under joint CFO and SFO conditions, conventional Maximum Likelihood Estimation (MLE) methods often require one- or two-dimensional grid searches. This results in high computational cost. To reduce this cost, two joint estimation algorithms for CFO and SFO are proposed. Methods The influence of non-ideal factors at the transmitter, receiver, and channel, such as local oscillator offset, SFO in Digital-to-Analog Converters (DACs) and Analog-to-Digital Converters (ADCs), and the Doppler effect, is analyzed. A mathematical model for the received OFDM signal is developed, and the mechanism through which SFO and ISFO distort the phase of frequency-domain subcarriers is derived. Leveraging the pilot structure of 5G-NTN, two joint CFO and SFO estimation algorithms are proposed. (1) Algorithm 1 uses the sequence correlation between the received frequency-domain Demodulation Reference Signal (DMRS) vectors. After phase pre-compensation is applied, the normalized cross-correlation vector is computed. An objective function is constructed from this vector, and its unimodal behavior in the main lobe is used to estimate the parameters through a bisection search. (2) Algorithm 2 treats the estimation parameter as analogous to a CFO in single-carrier systems and adopts an L&R-based autocorrelation method to derive approximate closed-form expressions. Results and Discussions A computational complexity analysis compares the proposed algorithms with one-dimensional (1D-ML) and two-dimensional (2D-ML) grid-search MLE methods. Numerical results show that Algorithm 1 reduces complexity substantially. The number of complex multiplications, which represent the main computational cost, is 4% of that of the 2D-ML method, 8% of that of Algorithm 2, and 44% of that of the 1D-ML method. Although Algorithm 2 is more computationally demanding, it yields a closed-form estimation expression. The performance of each algorithm is evaluated through the Mean Square Error (MSE) of the estimated parameters. Simulations show that for a subcarrier number of 3072, the 1D-ML algorithm performs slightly better than the others at Signal-to-Noise Ratios (SNRs) below 5 dB. However, because robust modulation schemes such as BPSK and QPSK typically used at low SNRs tolerate larger offsets, the medium-to-high SNR range is of greater practical relevance. In this range, all four algorithms demonstrate comparable estimation performance. Conclusions This study addresses the effect of Doppler in 5G-NTN LEO satellite communications by analyzing the mechanism and influence of ISFO and by proposing two joint estimation algorithms for CFO and SFO. First, a mathematical model of the received signal is established considering non-ideal factors such as CFO, SFO, and ISFO. The combined effect of SFO and ISFO on OFDM signals is derived to be equivalent to their linear superposition, which expands the range of the equivalent SFO. Second, the objective function is defined using the cross-correlation vector of two DMRS sequences. By using its unimodal behavior within the main lobe, a binary search enables fast convergence. Subsequently, the parameter determined by SFO and ISFO is then treated as analogous to the CFO in single-carrier systems, allowing an approximate closed-form estimation solution to be obtained through the L&R method. Finally, complexity analysis and performance simulations show that the proposed algorithms provide significant computational savings and strong estimation performance. These results can support the development of 5G-NTN LEO satellite payloads and terminal products.

Secrecy Rate Maximization Algorithm for IRS Assisted UAV-RSMA Systems

WANG Zhengqiang, KONG Weidong, WAN Xiaoyu, FAN Zifu, DUO Bin

2026, 48(4): 1688-1697. doi: 10.11999/JEIT250452

[Abstract](519) [FullText HTML] (254) [PDF 1839KB](77)

Abstract:
Objective Under the stringent requirements of Sixth-Generation(6G) mobile communication networks for spectral efficiency, energy efficiency, low latency, and wide coverage, Unmanned Aerial Vehicle (UAV) communication has emerged as a key solution for 6G and beyond, leveraging its Line-of-Sight propagation advantages and flexible deployment capabilities. Functioning as aerial base stations, UAVs significantly enhance network performance by improving spectral efficiency and connection reliability, demonstrating irreplaceable value in critical scenarios such as emergency communications, remote area coverage, and maritime operations. However, UAV communication systems face dual challenges in high-mobility environments: severe multi-user interference in dense access scenarios that substantially degrades system performance, alongside critical physical-layer security threats resulting from the broadcast nature and spatial openness of wireless channels that enable malicious interception of transmitted signals. Rate-Splitting Multiple Access (RSMA) mitigates these challenges by decomposing user messages into common and private streams, thereby providing a flexible interference management mechanism that balances decoding complexity with spectral efficiency. This makes RSMA especially suitable for high-density user access scenarios. In parallel, Intelligent Reflecting Surfaces (IRS) have emerged as a promising technology to dynamically reconfigure wireless propagation through programmable electromagnetic unit arrays. IRS improves the quality of legitimate links while reducing the capacity of eavesdropping links, thereby enhancing physical-layer security in UAV communications. It is noteworthy that while existing research has predominantly centered on conventional multiple access schemes, the application potential of RSMA technology in IRS-assisted UAV communication systems remains relatively unexplored. Against this background, this paper investigates secure transmission strategies in IRS-assisted UAV-RSMA systems. Methods This paper investigates the effect of eavesdroppers on the security performance of UAV communication systems and proposes an IRS-assisted RSMA-based UAV communication model. The system comprises a multi-antenna UAV base station, an IRS mounted on a building, multiple single-antenna legitimate users, and multiple single-antenna eavesdroppers. The optimization problem is formulated to maximize the system secrecy rate by jointly optimizing precoding vectors, common secrecy rate allocation, IRS phase shifts, and UAV positioning. The problem is highly non-convex due to the strong coupling among these variables, rendering direct solutions intractable. To overcome this challenge, a two-layer optimization framework is developed. In the inner layer, with UAV position fixed, an alternating optimization strategy divides the problem into two subproblems: (1) joint optimization of precoding vectors and common secrecy rate allocation and (2) optimization of IRS phase shifts. Non-convex constraints are transformed into convex forms using techniques such as Successive Convex Approximation (SCA), relaxation variables, first-order Taylor expansion, and Semidefinite Relaxation (SDR). In the outer layer, the Particle Swarm Optimization (PSO) algorithm determines the UAV deployment position based on the optimized inner-layer variables. Results and Discussions Simulation results show that the proposed algorithm outperforms RSMA without IRS, NOMA with IRS, and NOMA without IRS in terms of secrecy rate. (Fig. 2) illustrates that the secrecy rate increases with the number of iterations and converges under different UAV maximum transmit power levels and antenna configurations. (Fig. 3) demonstrates that increasing UAV transmit power significantly enhances the secrecy rate for both the proposed and benchmark schemes. This improvement arises because higher transmit power strengthens the signal received by legitimate users, increasing their achievable rates and enhancing system secrecy performance. (Fig. 4) indicates that the secrecy rate grows with the number of UAV antennas. This improvement is due to expanded signal coverage and greater spatial degrees of freedom, which amplify effective signal strength in legitimate user channels. (Fig. 5) shows that both the proposed scheme and NOMA with IRS achieve higher secrecy rate as the number of IRS reflecting elements increases. The additional elements provide greater spatial degrees of freedom, improving channel gains for legitimate users and strengthening resistance to eavesdropping. In contrast, benchmark schemes operating without IRS assistance exhibit no performance improvement and maintain constant secrecy rate. This result highlights the critical role of the IRS in enabling secure communications. Finally, (Fig. 6) demonstrates the optimal UAV position when

\begin{document}${P_{\max }} = 30{\text{ dBm}}$\end{document}

. Deploying the UAV near the center of legitimate users and adjacent to the IRS minimizes the average distance to users, thereby reducing path loss and fully exploiting IRS passive beamforming. This placement strengthens legitimate signals while suppressing the eavesdropping link, leading to enhanced secrecy performance. Conclusions This study addresses secure communication scenarios with multiple eavesdroppers by proposing an IRS-assisted secure resource allocation algorithm for UAV-enabled RSMA systems. An optimization problem is formulated to maximize the system secrecy rate under multiple constraints, including UAV transmit power, by jointly optimizing precoding vectors, common rate allocation, IRS configurations, and UAV positioning. Due to the non-convex nature of the problem, a hierarchical optimization framework is developed to decompose it into two subproblems. These are effectively solved using techniques such as SCA, SDR, Gaussian randomization, and PSO. Simulation results confirm that the proposed algorithm achieves substantial secrecy rate gains over three benchmark schemes, thereby validating its effectiveness.

Radio Map Enabled Path Planning for Multiple Cellular-Connected Unmanned Aerial Vehicles

ZHOU Decheng, WANG Wei, SHAO Xiang, CHEN Mei, XIAO Jianghao

2026, 48(4): 1698-1707. doi: 10.11999/JEIT250821

[Abstract](526) [FullText HTML] (288) [PDF 2063KB](68)

Abstract:
Objective In collaborative operation scenarios of cellular-connected Unmanned Aerial Vehicles (UAVs), conflict avoidance strategies often cause unbalanced service quality. Traditional schemes focus on reducing total task completion time but do not ensure service fairness. To address this issue, a radio map-assisted cooperative path planning scheme is proposed. The objective is to minimize the maximum weighted sum of task completion time and communication disconnection time across all UAVs to improve service fairness in multi-UAV scenarios. Methods A Signal-to-Interference-plus-Noise Ratio (SINR) map is constructed to assess communication quality. The two-dimensional airspace is discretized into grids, and link gain maps are generated through ray tracing and Axis-Aligned Bounding Box detection to determine Line-of-Sight (LoS) or Non-Line-of-Sight (NLoS) conditions. The SINR map is produced by selecting, for each grid, the base station with the highest expected SINR. To solve the optimization problem, an Improved Conflict-Based Search (ICBS) algorithm with a hierarchical structure is developed. At the high-level stage, proximity conflicts are managed to maintain safety distances, and the cost function is reformulated to emphasize fairness by minimizing the maximum weighted time. The low-level stage applies a bidirectional A* algorithm for single-UAV path planning, using parallel search to improve efficiency while meeting the constraints set by the high-level stage. Results and Discussions The proposed scheme is evaluated through simulations across different scenarios. Building heights and positions are shown, where base station locations are marked by red stars and building heights are represented with color gradients from light to dark to indicate increasing height (Fig. 2). The wireless propagation characteristics between UAVs and ground base stations are demonstrated by the SINR map at an altitude of 60 m (Fig. 3), which shows significant SINR degradation in areas affected by building blockage and co-channel interference, resulting in communication blind zones. Trajectory planning results for four UAVs at an altitude of 60 m with an SINR threshold of 2 dB show that all UAVs avoid signal blind zones and complete tasks without collision risks under the proposed scheme (Fig. 4). The trade-off between task completion time and disconnection time is controlled by the weight coefficient (Fig. 5). The maximum weighted time increases monotonically as the weight coefficient increases, whereas the maximum disconnection time decreases. The bidirectional A* algorithm achieves higher computational efficiency than Dijkstra’s and traditional A* algorithms while maintaining optimal solution quality (Table 1). All three algorithms yield identical weighted times, confirming the optimality of the bidirectional A* approach, and its runtime is reduced significantly due to parallel search. Compared with three benchmark schemes, the proposed scheme achieves the lowest maximum weighted time for different SINR thresholds (Fig. 6). Performance analysis at different UAV altitudes shows that the proposed scheme maintains stable maximum weighted time below 75 m, while sharp increases appear above 75 m due to intensified interference from non-serving base stations (Fig. 7). The scalability analysis further shows clear improvements over benchmark schemes, especially when conflicts occur more frequently (Fig. 8). Conclusions To address fairness in cellular-connected multi-UAV systems, a radio map-assisted path planning scheme is proposed to minimize the maximum weighted time. Based on a discretized SINR map, an ICBS algorithm is developed. At the high-level stage, proximity conflicts and a reformulated cost function ensure safety and fairness, and at the low-level stage, a bidirectional A* algorithm increases search efficiency. Simulation results show that the proposed scheme lowers the maximum weighted time compared with benchmark schemes and improves fairness and overall multi-UAV collaboration performance.

Entropy-Enhanced Quantum Ripple Synergy Planning Method for Emergency Path of Unmanned Aerial Vehicles Driven by Survival Probability

WANG Enliang, ZHANG Zhen, SUN Zhixin

2026, 48(4): 1708-1718. doi: 10.11999/JEIT250694

[Abstract](510) [FullText HTML] (290) [PDF 4452KB](45)

Abstract:
Objective Natural disaster emergency rescue places stringent requirements on the timeliness and safety of Unmanned Aerial Vehicle (UAV) path planning. Conventional optimization objectives, such as minimizing total distance, often fail to reflect the critical time-sensitive priority of maximizing the survival probability of trapped victims. Moreover, existing algorithms struggle with the complex constraints of disaster environments, including no-fly zones, caution zones, and dynamic obstacles. To address these challenges, this paper proposes an Entropy-Enhanced Quantum Ripple Synergy Algorithm (E²QRSA). The primary goals are to establish a survival probability maximization model that incorporates time decay characteristics and to design a robust optimization algorithm capable of efficiently handling complex spatiotemporal constraints in dynamic disaster scenarios. Methods E²QRSA enhances the Quantum Ripple Optimization framework through four key innovations: (1) information entropy-based quantum state initialization, which guides population generation toward high-entropy regions; (2) multi-ripple collaborative interference, which promotes beneficial feature propagation through constructive superposition; (3) entropy-driven parameter control, which dynamically adjusts ripple propagation according to search entropy rates; and (4) quantum entanglement, which enables information sharing among elite individuals. The model employs a survival probability objective function that accounts for time-sensitive decay, base conditions, and mission success probability, subject to constraints including no-fly zones, warning zones, and dynamic obstacles. Results and Discussions Simulation experiments are conducted in medium- and large-scale typhoon disaster scenarios. The proposed E²QRSA achieves the highest survival probabilities of 0.847 and 0.762, respectively (Table 1), exceeding comparison algorithms such as SEWOA and PSO by 4.2～16.0%. Although the paths generated by E²QRSA are not the shortest, they are the most effective in maximizing survival chances. The ablation study (Table 3) confirms the contribution of each component, with the removal of multi-ripple interference causing the largest performance decrease (9.97%). The dynamic coupling between search entropy and ripple parameters (Fig. 2) is validated, demonstrating the effectiveness of the adaptive control mechanism. The entanglement effect (Fig. 4) is shown to maintain population diversity. In terms of constraint satisfaction, E²QRSA-planned paths consume only 85.2% of the total available energy (Table 5), ensuring a safe return, and all static and dynamic obstacles are successfully avoided, as visually verified in the 3D path plots (Figs. 6 and 7). Conclusions E²QRSA effectively addresses the challenge of UAV path planning for disaster relief by integrating adaptive entropy control with quantum-inspired mechanisms. The survival probability objective captures the essential requirements of disaster scenarios more accurately than conventional distance minimization. Experimental validation demonstrates that E²QRSA achieves superior solution quality and faster convergence, providing a robust technical basis for strengthening emergency response capabilities.

Peak-to-Average Power Ratio Reduction Theory and Method forOrthogonal Time Frequency Space Systems via Nonzero-Unitary Precoding

ZENG Junlong, JIANG Zhanjun, LIU Haoxiang, ZHANG Huawei, LI Cuiran

2026, 48(4): 1719-1728. doi: 10.11999/JEIT250888

[Abstract](363) [FullText HTML] (193) [PDF 2431KB](48)

Abstract:
Objective Orthogonal Time Frequency Space (OTFS) and its variants provide robust performance in high-mobility doubly selective channels. However, their inherently high Peak-to-Average Power Ratio (PAPR) limits power amplifier efficiency and practical implementation. Recent observations have revealed a mismatch between theory and practice. Some OTFS variants obtained by changing the orthogonal basis, such as DCT-based designs, reduce PAPR while maintaining an OTFS-like Bit Error Rate (BER). However, the prevailing explanation mainly attributes reliability to constant-modulus unitary transforms and does not directly account for such non-constant-modulus cases. Therefore, it remains unclear which unitary bases preserve the channel-hardening behavior that stabilizes effective gains and protects BER, and which unitary choices may degrade performance even though they are mathematically unitary. This paper aims to close this gap by establishing a verifiable and more general condition for BER-robust unitary precoding, and by developing a waveform and precoder design approach that suppresses PAPR without sacrificing reliability in OTFS and typical OTFS-like variants. Methods A waveform design framework based on nonzero-unitary precoding is established. An upper bound on effective channel-gain fluctuation is derived. It is shown that when the precoder satisfies a nonzero and near-uniform energy-spreading condition, the variance of the effective channel coefficients decreases as the time-frequency grid grows, indicating the emergence of a channel-hardening effect. On the basis of this result, waveform design is formulated as a peak-power minimization problem over the unitary precoder. The objective is to reduce the maximum instantaneous power while preserving the unitary structure required by the modulation framework. A CVX-based solver is used to provide a performance-reference benchmark for the formulated objective. For engineering implementation, an efficient algorithm is developed using the Alternating Direction Method of Multipliers (ADMM). In this method, the original nonconvex design is decomposed into low-cost sub-updates together with a unitary projection step, which enables scalable computation. Results and Discussions Simulation results under representative doubly selective channels with high terminal speeds show that the proposed precoder design achieves noticeable PAPR suppression while maintaining the BER close to that of conventional constant-modulus unitary precoding. In addition, the CVX-based benchmark reveals the attainable performance region, and the ADMM-based implementation approaches this reference with a favorable PAPR-BER trade-off. The computational advantage is also validated. Compared with general-purpose convex optimization, the ADMM solver reduces the overall runtime and complexity by roughly three orders of magnitude for typical OTFS parameter settings, which supports real-time or near-real-time deployment. The observed performance trends are consistent with the theoretical insight that near-uniform energy spreading stabilizes effective channel gains and prevents spiky basis vectors from degrading robustness. Furthermore, the framework is applicable to OTFS variants because basis selection and waveform shaping can be interpreted equivalently as unitary-precoder design within the same optimization architecture. Conclusions A theoretical and algorithmic solution for PAPR suppression in OTFS systems is presented through nonzero-unitary precoding. Channel hardening is established under a nonzero and near-uniform energy-spreading condition, which provides a principled justification for seeking low-PAPR solutions beyond constant-modulus transforms. A peak-power minimization formulation is adopted to translate this insight into waveform optimization, and a CVX benchmark is provided to quantify the achievable performance reference. A low-complexity ADMM algorithm is then constructed to enable scalable computation through simple sub-updates and unitary projection, while keeping BER performance essentially unchanged. The proposed approach provides a unified low-PAPR waveform design paradigm for OTFS and its variants, with theoretical generality, computational efficiency, and controllable performance under high-mobility doubly selective channels.

Research on Recognition Method in Mixture Scenarios of Ships and Floating Targets

DING Hao, LI Ao, CAO Zheng, LIU Ningbo, WANG Guoqing, SUN Dianxing

2026, 48(4): 1729-1739. doi: 10.11999/JEIT251119

[Abstract](454) [FullText HTML] (305) [PDF 3787KB](54)

Abstract:
Objective In radar maritime target detection scenarios, when two or more targets are located within the same range cell, mixture echoes are generated, such as echoes containing both ships and floating targets. Existing target recognition methods exhibit notable limitations in these scenarios because they typically focus on the Doppler channel with the strongest energy in the time-frequency domain. To address this issue, a target recognition method that integrates mode reconstruction and time-frequency features is proposed. The aim is to distinguish individual targets without prior knowledge of whether the received echoes contain mixture targets, thereby avoiding reliance on high range resolution or multipolarization information. Methods The core idea is to introduce Variational Mode Decomposition (VMD) to decompose radar echoes into multiple modal components, thereby enabling Doppler-channel separation. To address spurious modes and the fragmented representation of a single target across multiple modes after decomposition, an energy-constrained mode filtering method and a spectral-consistency-based mode clustering method are proposed for effective mode selection and reconstruction. Based on the reconstructed signals, time-frequency differences between ships and floating targets are analyzed in terms of micromotion and signal complexity. Features are extracted from two perspectives: motion stability and the disorder degree of energy distribution, referred to as VF and REDDC features, respectively. These features enable accurate identification of individual targets. Results and Discussions Experiments are conducted using X-band radar measured data under sea states 2～4 (Table 1 and Table 2). The results show that the proposed method achieves an average recognition accuracy of 97.32% in mixture scenarios. This performance significantly exceeds that of the existing four-feature recognition method (Table 3) and other advanced methods (Fig. 9). The effect of frequency separation between different targets is further examined. When the time-frequency ridge spacing exceeds 70 Hz, the recognition accuracy reaches 97.93% (Fig. 11). This result also provides empirical guidance for selecting an appropriate clustering threshold during the mode reconstruction stage. When mixture scenarios change to single-target scenarios due to relative motion, the proposed method achieves an average recognition accuracy of 93.34%. This value is 4.62% higher than that of the existing four-feature method (88.72%) (Table 4). Additional analysis indicates that the observation duration used for feature extraction should be no less than 0.25 s to maintain the expected recognition accuracy (Fig. 12). Conclusions This study examines recognition problems in maritime multi-target mixture scenarios. VMD is applied to separate the constituent components of mixture echoes. To address spurious modes and fragmented representation of target information across multiple modes, an energy-constrained mode filtering method and a spectral-consistency-based mode clustering method are proposed. VF and REDDC features are extracted from the perspectives of structural characteristics and signal complexity. A Support Vector Machine (SVM) classifier is then used for target recognition. Performance analysis confirms that the proposed method effectively identifies each constituent target in mixture echoes and maintains strong recognition performance in single-target scenarios. Future work will improve computational efficiency and real-time capability by optimizing the stopping criteria of VMD iterations and will further examine the application boundaries of the method using measured data under higher sea states.

DGCN-MFW: A Lightweight Human Action Recognition Network for Millimeter-Wave Radar 3D Point Clouds

DING Xuanyu, JIN Biao, ZHANG Zhenkai

2026, 48(4): 1740-1750. doi: 10.11999/JEIT251087

[Abstract](555) [FullText HTML] (412) [PDF 4318KB](55)

Abstract:
Objective Millimeter-wave radar 3D point clouds provide important spatial cues for human action recognition. However, their inherent disorder complicates feature extraction, and actions rely on temporal correlations across multiple frames, which makes single-frame analysis prone to error. In this paper, a dynamic graph convolutional network is proposed for long 3D point-cloud sequences to improve recognition performance and efficiency through multi-scale feature fusion, adaptive frame weighting, and cross-attention. Methods A dynamic graph convolutional network solution, DGCN-MFW, is proposed with three core components: dynamic graph convolution feature extraction, multi-scale feature fusion, and adaptive temporal frame weighting. In Step 1, dynamic graph convolution is used to automatically construct spatial geometry through local directed neighborhood graphs, and the neighborhoods are updated online. This design avoids manual graph construction and improves feature robustness. In Step 2, multi-scale feature fusion is applied to jointly extract and integrate point-cloud features across spatial and temporal dimensions, thereby capturing local details and global semantics. In Step 3, adaptive frame weighting is introduced to learn the importance of each frame, emphasize discriminative key frames, and suppress noisy or unimportant frames. Cross-attention is further used to enable information exchange between the center frame and its context, compensating for the limitations of single-frame analysis caused by motion blur, occlusion, or pose ambiguity. Results and Discussions The proposed network extracts features through dynamic graph convolution, performs multi-scale feature fusion and adaptive frame weighting, and ultimately completes human action recognition. It achieves strong performance on the public TI and Vayyar millimeter-wave radar point-cloud datasets. With only 2.06M parameters and 4.51 GFLOPs, it outperforms existing methods (Tables 2, 3, and 4). Ablation experiments confirm that both core modules substantially improve recognition accuracy (Table 1). The confusion matrices indicate accuracy above 99% for most actions on the two datasets, demonstrating superior recognition performance (Figs. 10 and 11). However, its scalability, parameter efficiency, and processing efficiency for large-scale data still require improvement. Future work will therefore focus on further lightweight design and architectural optimization to improve efficiency. Conclusions To address the two main challenges in mmWave radar 3D point-cloud-based human action recognition, an action recognition algorithm based on a dynamic graph convolutional network and multi-feature fusion is proposed. A multi-scale feature fusion module and cross-scale interaction are used to extract local and global features, which improves spatial representation. An adaptive frame-weighting module and a cross-attention mechanism are adopted to capture the temporal evolution of actions. The method achieves accuracies of 98.32% and 99.48% on two datasets with 2.06M parameters and 4.51 GFLOPs, outperforming mainstream models. It provides a new solution for high-precision, low-resource mmWave radar action recognition and is suitable for real-time scenarios such as industrial human-machine interaction, intelligent security, and healthcare.

Multi-Scale Deformable Alignment-Aware Bidirectional Gated Feature Aggregation for Stereoscopic Image Generation from a Single Image

ZHANG Chunlan, QU Yuwei, NIE Lang, LIN Chunyu

2026, 48(4): 1751-1762. doi: 10.11999/JEIT250760

[Abstract](415) [FullText HTML] (291) [PDF 12920KB](38)

Abstract:
Objective The generation of stereoscopic images from a single image usually relies on depth as a prior, which often leads to geometric misalignment, occlusion artifacts, and texture blurring. Recent studies have therefore shifted toward end-to-end learning of alignment transformation and rendering within the image or feature domain. By adopting a content-based feature transformation and alignment mechanism, high-quality novel images can be generated without explicit geometric information. However, three main challenges remain. First, fixed convolution has limited ability to model large-scale geometric and disparity changes, which restricts feature alignment performance. Second, texture and structural information are tightly coupled in network representations, and hierarchical modeling and dynamic fusion mechanisms are often absent. This limitation makes it difficult to preserve fine details while maintaining semantic consistency. Third, existing supervision strategies mainly focus on reconstruction errors and provide limited constraints on the intermediate alignment process, which reduces the efficiency of cross-view feature consistency learning. To address these challenges, a Multi-Scale Deformable Alignment-Aware Bidirectional Gated Feature Aggregation network is proposed for stereoscopic image generation from a single image. Methods First, to address image misalignment and distortion caused by the inability of fixed convolution to adapt to geometric deformation and disparity changes, a Multi-Scale Deformable Alignment (MSDA) module is proposed. This module employs multi-scale deformable convolution to adaptively adjust sampling positions based on image content, enabling effective alignment between source and target features across different scales. Second, to address texture blurring and structural distortion in synthesized images, a feature decoupling strategy is adopted to guide shallow layers to learn texture information and deeper layers to model structural information. A Texture-Structure Bidirectional Gating Feature Aggregation (Bi-GFA) module is designed to achieve dynamic complementarity and efficient fusion of texture and structural features. Third, to improve cross-view feature alignment accuracy, a Learnable Alignment-Guided Loss (LAG) function is proposed. This loss guides the alignment network to adaptively refine the offset field at the feature level, thereby improving the fidelity and semantic consistency of the synthesized images. Results and Discussions This study focuses on scene-level image synthesis from a single image. Quantitative results show that the proposed method performs better than all compared methods in terms of PSNR, SSIM, and LPIPS. The method also maintains stable performance across different dataset sizes and scene complexities, indicating strong generalization ability and robustness (Tab. 1 and Tab. 2). Qualitative comparisons indicate that the generated images are visually closest to the ground-truth images and exhibit high overall sharpness and detail fidelity. In the outdoor KITTI dataset, pixel alignment errors of foreground objects are effectively reduced (Fig. 4). In indoor scenes, facial and hair textures are clearly reconstructed. High-frequency regions, such as champagne towers and balloon edges, present sharp contours and accurate color reproduction without visible artifacts or blurring. Both global illumination and local structural details are well preserved, producing high perceptual quality (Fig. 5). Ablation experiments further confirm the effectiveness of the proposed MSDA, Bi-GFA, and LAG modules (Tab. 3). Conclusions A Multi-Scale Deformable Alignment-Aware Bidirectional Gated Feature Aggregation network is proposed to address strong dependence on ground-truth depth, geometric misalignment and distortion, texture blurring, and structural distortion in stereoscopic image generation from a monocular image. The MSDA module improves the flexibility and accuracy of cross-view feature alignment. The Texture-Structure Bi-GFA module enables complementary fusion of texture details and structural information. The LAG further refines offset field estimation and improves the fidelity and semantic consistency of the synthesized images. Experimental results show that the proposed method performs better than existing advanced methods in structural reconstruction, texture clarity, and viewpoint consistency, while maintaining strong generalization ability and robustness. Future work will examine the effect of different depth estimation strategies on system performance and investigate more efficient network architectures and model compression methods to reduce computational cost and support real-time stereoscopic image generation.

Small Object Detection Algorithm for UAV Aerial Images in Complex Environments

LIU Jie, LIU Shuhao, TIAN Ming, CUI Zhigang

2026, 48(4): 1763-1773. doi: 10.11999/JEIT251126

[Abstract](866) [FullText HTML] (649) [PDF 6050KB](132)

Abstract:
Objective Small object detection is critical in applications such as UAV (Unmanned Aerial Vehicle) inspection and intelligent transportation systems, where accurate perception of diminutive targets is essential for operational reliability and safety. It supports automated identification and tracking of challenging targets. However, the limited pixel size of small objects, combined with frequent occlusion and background integration, introduces strong background noise and leads to poor performance and high false-negative rates in existing detection models. To address these issues and to achieve high-performance and high-precision detection of small objects in complex scenes, this study proposes HAR-DETR, an enhanced version of the RT-DETR baseline model, designed to improve detection accuracy for small objects. Methods HAR-DETR is designed for small object detection in aerial images and integrates three major improvements: Aggregated Attention, RFF-FPN (Recalibrated Feature Fusion Network-FPN), and a high-resolution detection branch. In the backbone, Aggregated Attention strengthens the model’s focus on relevant features of small objects. By expanding the receptive field, the model captures detailed edge and texture information, improving multi-scale feature extraction. During feature fusion, RFF-FPN selectively integrates high- and low-level features to retain critical spatial information and context. This supports better reconstruction of edges and contours of small objects and improves localization and recognition, particularly when object details are partially obscured by cluttered backgrounds or variable lighting. The high-resolution detection branch (HRDB) emphasizes edge features of small objects, enhancing perception and improving robustness and precision. Results and Discussions The model is compared with commonly used object detection models, including YOLOv5, YOLOv8, and YOLOv10, using precision, recall, and mAP metrics to assess performance in small object detection. Experimental results show that HAR-DETR outperforms the comparative models on the VisDrone2019 dataset (Table 1). The mAP₅₀ and mAP_50-95 increase by 3.8% and 3.2%, respectively, relative to the baseline model (Table 2). These results demonstrate superior detection performance in aerial images under complex conditions. GradCAM heatmaps are used for comparative analysis and show consistent improvements across all proposed components compared with the baseline model (Fig. 6). In the generalization experiment, the VisDrone2019 validation set and RSOD dataset are evaluated under identical training settings. The results confirm that HAR-DETR maintains strong generalization across heterogeneous tasks (Tables 3 and 4). Conclusions This work addresses false positives and false negatives in small object detection for aerial images captured in complex environments by using HAR-DETR. Aggregated Attention is used in the backbone to expand the receptive field and improve global feature extraction. During feature fusion, the RFF-FPN structure strengthens feature representation. A high-resolution detection head further increases sensitivity to edge textures of small objects. Evaluation on the VisDrone2019 and RSOD datasets shows: (1) mAP₅₀ and mAP_50-95 improve by 3.8% and 3.2%, respectively, reaching 51.2% and 32.1%, which reduces false negatives and false positives; (2) HAR-DETR outperforms mainstream object detection models, confirming its effectiveness; (3) the model achieves high accuracy in cross-dataset training, demonstrating strong generalization. These results show that HAR-DETR has stronger semantic representation and spatial awareness, adapts well to varied aerial perspectives and target distributions, and provides a more versatile solution for UAV visual perception in complex environments.

Research on Ultrasound Imaging Algorithm Fused with Diffusion Model

YUAN Ye, HUANG Minshang, YANG Weifeng

2026, 48(4): 1774-1784. doi: 10.11999/JEIT251083

[Abstract](486) [FullText HTML] (247) [PDF 7319KB](80)

Abstract:
Objective Medical ultrasound imaging uses ultrasonic waves to probe human tissues and forms images by processing returning echoes. It has become an essential clinical diagnostic tool because it is noninvasive, safe, and capable of real-time imaging. However, conventional ultrasound imaging remains fundamentally limited by factors such as the finite width of ultrasonic pulses, variations in tissue acoustic impedance, and the complexity of echo signals. These factors lead to persistent challenges, including limited spatial resolution, severe speckle noise, and off-axis artifacts. These limitations directly reduce lesion detectability and diagnostic accuracy. Traditional approaches based on hardware optimization and signal processing algorithms, such as adaptive beamforming, have provided only incremental improvement. Their performance is often constrained by physical laws, computational complexity, and dependence on manual parameter tuning. Recent deep learning methods, particularly those based on Generative Adversarial Networks (GANs), have shown promising performance, but they suffer from training instability and limited interpretability. The diffusion model, an emerging state-of-the-art generative framework, has shown strong robustness and generalization in Computed Tomography (CT) and Magnetic Resonance Imaging (MRI) reconstruction. However, its application in ultrasound imaging remains largely unexplored. This study aims to address this gap by developing a novel diffusion model-based framework for high-quality ultrasound image formation and to provide a stable, efficient, and interpretable solution for improving ultrasound image quality. Methods A novel ultrasound imaging method based on a Denoising Diffusion Probabilistic Model (DDPM) is proposed. The core of the method is a multi-scale diffusion network architecture designed to progressively refine a low-quality ultrasound image, such as one generated by a simple Delay-And-Sum (DAS) beamformer, into a high-quality image. The process includes forward and reverse stages. In the forward stage, Gaussian noise is gradually added to a high-quality ground-truth image over a series of time steps. In the reverse stage, the model is trained to learn the conditional denoising function. A custom denoising network takes a low-resolution DAS image as conditional input and fuses it with the noisy image at each denoising step through residual connections and feature-wise transformations at multiple scales. This deep fusion mechanism enables the network to incorporate the underlying anatomical structure from the low-quality input while iteratively removing noise and artifacts through the diffusion process. The model is trained on a dataset of paired low-quality and high-quality ultrasound images, in which the high-quality images serve as the training target. The training objective is to maximize the variational lower bound of the likelihood, thereby enabling the network to reverse the noising process. The proposed method is quantitatively compared with traditional DAS, Minimum Variance (MV) beamforming, and a representative GAN-based super-resolution method using Peak Signal-to-Noise Ratio (PSNR) and Structural SIMilarity index (SSIM). Results and Discussions The proposed diffusion model demonstrates superior performance in improving ultrasound image quality. Quantitatively, the method achieves a mean PSNR of 35.2 dB and an SSIM of 0.933, with a PSNR improvement of 4.5 dB over conventional beamforming methods, while maintaining excellent structural fidelity. The method also consistently outperforms adaptive MV beamforming and GAN-based methods across all evaluation metrics, including contrast-to-noise ratio. Visual assessment supports these quantitative results. The generated images show markedly reduced speckle noise and substantially improved boundary definition of anatomical structures. Notably, these improvements are achieved without the blurring or artificial textures commonly observed in other deep learning-based methods. The multi-scale architecture with conditional feature injection effectively preserves structural integrity, as shown by the clear and continuous edges in the output images. The progressive denoising nature of the method also provides inherent interpretability for the image refinement process. Unlike the opaque single-step generation used in many other deep learning models, this method provides a transparent, stepwise enhancement pathway from the initial input to the final output. In addition, the training process remains stable and convergent, avoiding the instability that frequently affects adversarial training methods. Ablation experiments confirm the critical role of the deep fusion mechanism, and resolution analysis verifies substantial improvement in both lateral and axial resolution compared with all baseline methods. Conclusions This study develops and validates a novel ultrasound imaging method based on a diffusion model. The proposed framework effectively addresses key limitations of conventional methods and existing deep learning-based approaches. It avoids the complex matrix computations and manual parameter tuning required by adaptive beamformers and provides a more stable training framework than GAN-based methods. The results show that the method can substantially improve image quality by increasing PSNR and maintaining excellent structural similarity, thereby producing images with suppressed noise, reduced artifacts, and improved resolution. The multi-scale diffusion process preserves anatomical structures and provides a degree of interpretability for the image generation process. This work establishes diffusion models as a promising new framework for advanced ultrasound imaging and provides a robust, high-performance technical route for overcoming current bottlenecks in ultrasound image quality, with broad potential clinical value.

A Long-Short Term Fusion Spiking Neural Network for Detecting Tiny Moving Targets in Dynamic Vision

LI Miao, ZHANG Heng, CHEN Nuo, SHI Yangsi, HE Shiman, AN Wei

2026, 48(4): 1785-1794. doi: 10.11999/JEIT250785

[Abstract](512) [FullText HTML] (301) [PDF 2950KB](53)

Abstract:
Objective Long-distance electro-optical surveillance systems are widely used for applications such as space debris monitoring and unauthorized drone flight warning. In such systems, targets appear randomly and move rapidly. Because of the long detection distance, targets appear extremely small in the optical sensor and lack obvious morphological or texture features; therefore, they are classified as tiny moving targets. Conventional tiny-target perception mechanisms adopt the “image frame imaging + artificial neural network processing” paradigm. This approach generates large data volumes and requires high computational power and energy consumption, which restricts system lightweight deployment. In recent years, inspired by bionic perception and brain-like processing, the paradigm of “dynamic vision detection + brain-like processing” has emerged as a new direction. Dynamic vision provides low redundancy and high temporal resolution. However, its output is not regular image frames but sparse event streams, which require new processing methods. The Spiking Neural Network (SNN) is regarded as the third-generation neural network. It uses sparse connections and spike-based representations and naturally matches the asynchronous event triggering and bright-dark pulse output of dynamic vision sensors. Existing SNN-based methods mainly focus on targets with clear shapes in scenarios such as autonomous driving and are not well suited for tiny moving targets in long-distance electro-optical surveillance systems. To address this problem, a Long-Short Term Fusion SNN is proposed to support the application of dynamic vision in tiny moving target detection. Methods The proposed network architecture contains four main components. First, a short-term feature extraction module, the Spiking Swin Transformer (SST), is designed to capture the morphological expansion characteristics of tiny targets. This module focuses on spatiotemporal correlations across adjacent time steps and spatial regions. It integrates a spiking self-attention mechanism to enhance the learning of irregular pixel correlations and temporal dependencies. Second, a long-term feature extraction module, the spiking ConvLSTM (SCL), is proposed to learn motion continuity embedded in long temporal sequences. A longer temporal range provides richer learnable motion features. The SCL is designed based on the ANN-style ConvLSTM architecture and takes advantage of the inherent temporal processing capability of spiking recurrent neural networks to strengthen long-term temporal memory. Third, features from the SST and SCL branches are aligned and integrated through tensor alignment and additive fusion, forming the Spiking Feature Pyramid Network (SFPN). This module performs spiking pyramid operations to fuse cross-scale spatiotemporal features across different network depths. Finally, a detection head is used to extract and identify tiny targets. Results and Discussions The proposed algorithm is validated using real dynamic vision data for drone detection. Experimental results show clear performance improvements across several evaluation metrics. Compared with methods that rely only on short-term temporal features, the proposed method increases recall by about 1.3% and improves accuracy by precision 0.9%, which allows more reliable detection of tiny moving targets. Analysis of the F1-score further indicates that recall improves by 1.3% while false alarms are reduced. These results confirm that the dual-path spiking memory network for long-term feature extraction strengthens the ability of the model to identify subtle target characteristics. In particular, the integration of long-term temporal features improves discrimination between noise events and genuine tiny targets. Conclusions This study addresses tiny moving target detection under dynamic vision and proposes a method based on Long-Short Term Fusion SNN. Considering the morphological expansion characteristics and motion continuity of tiny targets, the SST module and the SCL module are designed to extract short-term and long-term temporal features. Multi-scale dual-path features are fused through a spiking pyramid module. By learning high-dimensional features across different temporal windows, the method enables deeper mining and automatic learning of limited surface features of tiny targets. Experiments on real dynamic vision data verify the performance advantage of the proposed method, achieving a recall rate above 95% and outperforming comparison algorithms. Ablation experiments further demonstrate that long-term temporal feature learning and larger temporal data ranges improve tiny target detection performance. The proposed method enables natural integration between sparse event streams from dynamic vision sensors and spiking neural mechanisms. It provides algorithmic support for applying the “bionic detection + brain-like processing” perception paradigm in long-distance electro-optical surveillance systems.

Multi-scale Frequency Adapter and Dual-path Attention for Time Series Forecasting

YANG Zhenzhen, XU Yi, WAN Chengye, YANG Yongpeng

2026, 48(4): 1795-1805. doi: 10.11999/JEIT251188

[Abstract](483) [FullText HTML] (249) [PDF 4941KB](58)

Abstract:
Objective With the rapid development of big data technology, time series data are increasingly used in meteorology, power systems, finance, and other fields. However, mainstream forecasting methods face challenges in multi-scale modeling and frequency-domain feature extraction, which limit the ability to capture dynamic properties and periodic patterns in complex datasets. Traditional statistical approaches, such as AutoRegressive Integrated Moving Average (ARIMA), rely on assumptions of linear relationships and therefore perform poorly when applied to nonlinear or high-dimensional time series data. Although deep learning methods, particularly those based on convolutional neural networks and Transformer architectures, improve forecasting accuracy through advanced feature extraction and long-range dependency modeling, limitations remain in efficiently extracting and integrating multi-scale features in both temporal and frequency domains. These limitations reduce stability and forecasting accuracy, especially in dynamic and heterogeneous applications. This study proposes an intelligent forecasting framework that models multi-scale information and improves prediction accuracy across different scenarios. Methods A Multi-scale Frequency Adapter and Dual-path Attention (MFADA) framework is proposed for time series forecasting. The framework integrates two key modules: the Multi-scale Frequency Adapter (MFA) and the Multi-scale Dual-path Attention (MDA). The MFA module captures multi-scale frequency features through adaptive pooling and deep convolution operations. This design improves sensitivity to different frequency components and supports modeling of both short-term and long-term dependencies. The MDA module applies a multi-scale attention mechanism to strengthen fine-grained modeling across temporal and feature dimensions. It enables effective extraction and fusion of comprehensive time-domain and frequency-domain information. The framework is designed with computational efficiency to ensure scalability. Experiments on eight public datasets verify the effectiveness and robustness of the proposed method compared with existing time series forecasting approaches. Results and Discussions Extensive experiments were conducted on eight publicly available multivariate datasets, including ECL, Weather, ETT (ETTm1, ETTm2, ETTh1, ETTh2), Solar-Energy, and Traffic. Evaluation metrics include Mean Absolute Error (MAE) and Mean Squared Error (MSE). Model complexity was assessed through parameter count, FLoating Point Operations (FLOPs), and training time. Comparisons were performed with state-of-the-art models, including Fredformer, Peri-midFormer, iTransformer, TFformer, PatchTST, MSGNet, TimesNet, and TCM. Results show that MFADA achieves superior forecasting performance on most datasets and forecasting horizons (Table 1). The model obtains the best average MSE and MAE of 0.163 and 0.261 on ECL, representing decreases of 13.2% and 17.3% compared with TimesNet for forecasting length 96. On the periodic ETTm1 dataset, the average MSE reaches 0.377, which is 5.3% lower than MSGNet. Ablation experiments (Table 2) confirm the contributions of the MFA and MDA modules. Removing MFA or replacing MDA with standard self-attention increases forecasting errors on ECL, Weather, ETTh1, and ETTh2. These results indicate the complementary roles of both modules in modeling complex temporal patterns. Complexity analysis (Fig. 2) shows that MFADA achieves a balanced trade-off among forecasting accuracy, parameter efficiency, and training time, outperforming Fredformer, MSGNet, and TimesNet. Visualization results for ECL and ETTh2 (Fig. 3, Fig. 4) demonstrate that MFADA effectively follows ground-truth trends, captures turning points, and improves prediction accuracy at both global and local levels. Performance on the Traffic dataset is relatively weaker because of strong spatial correlations in the data, which indicates potential directions for future research. Conclusions This paper proposes MFADA, a time series forecasting method that integrates multi-scale frequency adaptation and dual-path attention mechanisms. MFADA presents four main advantages: (1) The MFA module effectively extracts and integrates multi-scale frequency-domain features through pyramid pooling and channel gating, which improves representation across different temporal scales. (2) The MDA module captures multi-scale dependencies in both temporal and feature dimensions, enabling fine-grained dynamic modeling. (3) The architecture maintains computational efficiency through lightweight convolution and pooling operations. (4) Experimental results across eight datasets and multiple forecasting horizons demonstrate strong generalization ability, particularly for multivariate and long-term forecasting tasks. These results show that MFADA improves both accuracy and efficiency in time series forecasting and provides useful directions for research and practical applications. Future work will explore the integration of spatial correlation information to further improve model applicability.

Research on Proximal Policy Optimization for Autonomous Long-Distance Rapid Rendezvous of Spacecraft

LIN Zheng, HU Haiying, DI Peng, ZHU Yongsheng, ZHOU Meijiang

2026, 48(4): 1806-1819. doi: 10.11999/JEIT250844

[Abstract](681) [FullText HTML] (479) [PDF 3329KB](79)

Abstract:
Objective With increasing demands from deep-space exploration, on-orbit servicing, and space debris removal missions, autonomous long-distance rapid rendezvous capabilities are required for future space operations. Traditional trajectory planning approaches based on analytical methods or heuristic optimization show limitations when complex dynamics, strong disturbances, and uncertainties are present, which makes it difficult to balance efficiency and robustness. Deep Reinforcement Learning (DRL) combines the approximation capability of deep neural networks with reinforcement learning-based decision-making, which supports adaptive learning and real-time decisions in high-dimensional continuous state and action spaces. In particular, Proximal Policy Optimization (PPO) is a representative policy gradient method because of its training stability, sample efficiency, and ease of implementation. Integration of DRL with PPO for spacecraft long-distance rapid rendezvous is therefore expected to overcome the limits of conventional methods and provide an intelligent, efficient, and robust solution for autonomous guidance in complex orbital environments. Methods A spacecraft orbital dynamics model is established by incorporating J2 perturbation, together with uncertainties arising from position and velocity measurement errors and actuator deviations during on-orbit operations. The long-distance rapid rendezvous problem is formulated as a Markov Decision Process, in which the state space includes position, velocity, and relative distance, and the action space is defined by impulse duration and direction. Fuel consumption and terminal position and velocity constraints are integrated into the model. On this basis, a DRL framework based on PPO is constructed. The policy network outputs maneuver command distributions, whereas the value network estimates state values to improve training stability. To address convergence difficulties caused by sparse rewards, an enhanced dense reward function is designed by combining a position potential function with a velocity guidance function. This design guides the agent toward the target while enabling gradual deceleration and improved fuel efficiency. The optimal maneuver strategy is obtained through simulation-based training, and robustness is evaluated under different uncertainty conditions. Results and Discussions Based on the proposed DRL framework, comprehensive simulations are conducted to assess effectiveness and robustness. In Case 1, three reward structures are examined: sparse reward, traditional dense reward, and an improved dense reward that integrates a relative position potential function with a velocity guidance term. The results show that reward design strongly affects convergence behavior and policy stability. Under sparse rewards, insufficient process feedback limits exploration of feasible actions. Traditional dense rewards provide continuous feedback and enable gradual convergence, but terminal velocity deviations are not fully corrected at later stages, which leads to suboptimal convergence and incomplete satisfaction of terminal constraints. In contrast, the improved dense reward guides the agent toward favorable behaviors from early training stages while penalizing undesirable actions at each step, which accelerates convergence and improves robustness. The velocity guidance term allows anticipatory adjustments during mid-to-late approach phases rather than delaying corrections to the terminal stage, resulting in improved fuel efficiency.Simulation results show that the maneuvering spacecraft performs 10 impulsive maneuvers, achieving a terminal relative distance of 21.326 km, a relative velocity of 0.005 0 km/s, and a total fuel consumption of 111.212 3 kg. To evaluate robustness under realistic uncertainties, 1 000 Monte Carlo simulations are performed. As summarized in Table 6, the mission success rate reaches 63.40%, and fuel consumption in all trials remains within acceptable bounds. In Case 2, PPO performance is compared with that of Deep Deterministic Policy Gradient (DDPG) for a multi-impulse fast-approach rendezvous mission. PPO results show five impulsive maneuvers, a terminal separation of 2.281 8 km, a relative velocity of 0.003 8 km/s, and a total fuel consumption of 4.148 6 kg. DDPG results show a fuel consumption of 4.322 5 kg, a final separation of 4.273 1 km, and a relative velocity of 0.002 0 km/s. Both methods satisfy mission requirements with comparable fuel use. However, DDPG requires a training time of 9 h 23 min, whereas PPO converges within 6 h 4 min, indicating lower computational cost. Overall, the improved PPO framework provides better learning efficiency, policy stability, and robustness. Conclusions The problem of autonomous long-distance rapid rendezvous under J2 perturbation and uncertainties is investigated, and a PPO-based trajectory optimization method is proposed. The results demonstrate that feasible maneuver trajectories satisfying terminal constraints can be generated under limited fuel and transfer time, with improved convergence speed, fuel efficiency, and robustness. The main contributions include: (1) development of an orbital dynamics framework that incorporates J2 perturbation and uncertainty modeling, with formulation of the rendezvous problem as a Markov Decision Process; (2) design of an enhanced dense reward function that combines position potential and velocity guidance, which improves training stability and convergence efficiency; and (3) simulation-based validation of PPO robustness in complex orbital environments. Future work will address sensor noise, environmental disturbances, and multi-spacecraft cooperative rendezvous in more complex mission scenarios to further improve practical applicability and generalization.

Image Deraining Driven by CLIP Visual Embedding

SUN Jin, CUI Yuntong, TIAN Hongwei, HUANG Changcheng, WANG Jigang

2026, 48(4): 1820-1831. doi: 10.11999/JEIT251066

[Abstract](508) [FullText HTML] (332) [PDF 5379KB](71)

Abstract:
Objective Rain streaks introduce visual distortions that degrade image quality and significantly impair downstream vision tasks such as feature extraction and object detection. This work addresses the problem of single-image rain streak removal. Existing methods often rely heavily on restrictive priors or synthetic datasets. This dependence limits robustness and generalization because such data differ from complex and unstructured real-world scenarios. Contrastive Language-Image Pre-training(CLIP) demonstrates strong zero-shot generalization through large-scale image-text contrastive learning. Motivated by this property, this study proposes FCLIP-UNet, a visual-semantic-driven deraining architecture designed to improve rain removal and generalization in real-world rainy environments. Methods FCLIP-UNet adopts a U-Net encoder-decoder architecture and formulates deraining as pixel-level detail regression guided by high-level semantic features. During the encoding stage, textual queries are omitted. Instead, the first four layers of a frozen CLIP-RN50 are employed to extract robust features that are decoupled from rain distribution. These features exploit the semantic representation capability of CLIP to suppress diverse rain patterns. To guide accurate image restoration, a collaborative decoding architecture that integrates ConvNeXt-T and an Upsampling DepthWise convolution Block (UpDWBlock) is adopted. The decoder employs ConvNeXt-T in place of conventional convolution modules to expand the receptive field and capture global contextual information. It parses rain streak patterns by using semantic priors extracted from the encoder. Under the constraint of these priors, UpDWBlock reduces information loss during upsampling and reconstructs fine-grained image details. Multi-level skip connections compensate for information loss introduced during encoding. In addition, a Layer-wise Differentiated Feature Perturbation Strategy (LDFPS) is incorporated to enhance robustness and adaptability in complex real-world rainy scenes. Results and Discussions Comprehensive evaluations are conducted on the Rain13K composite dataset by comparing the proposed model with ten state-of-the-art deraining algorithms. FCLIP-UNet shows consistently superior performance across all five testing subsets of Rain13K. In particular, the method outperforms the second-best approach on both datasets: on Test100 by 0.32 dB in Peak Signal-to-Noise Ratio (PSNR) and 0.006 in Structural Similarity Index Measure (SSIM); on Test2800 by 0.14 dB and 0.002, respectively. On Rain100H and Rain100L, FCLIP-UNet achieves competitive results, including the best SSIM on Rain100H and comparable results on other metrics (Table 3). To evaluate model generalization, the Rain13K-pretrained FCLIP-UNet is further tested on three datasets with different rainfall distribution characteristics: SPA-Data, HQ-RAIN, and MPID (Table 4, Fig. 7). Qualitative and quantitative evaluations are also conducted on the real-world NTURain-R dataset (Table 5, Figs. 8

\begin{document}$ \sim $\end{document}

10). These results consistently demonstrate the strong generalization capability of FCLIP-UNet. Ablation experiments on Rain100H validate the proposed encoder design and confirm the effectiveness of both UpDWBlock and LDFPS (Tables 6, 8～11). Additional ablation studies show that the use of LDFPS, combined with a 1:1 weighting ratio between L₁ loss and perceptual loss, provides the best performance for FCLIP-UNet (Tables 9

\begin{document}$ \sim $\end{document}

11). Conclusions This study proposes FCLIP-UNet, a deraining network designed for real-world generalization by leveraging the CLIP paradigm. Three main contributions are presented. First, image deraining is formulated as a pixel-level regression task that reconstructs rain-free images from high-level semantic features. A frozen CLIP image encoder extracts representations that remain stable across different rain distributions, thereby reducing domain shifts caused by diverse rain models. Second, a decoder that integrates ConvNeXt-T with an UpDWBlock is designed, and an LDFPS is proposed to improve robustness to unseen rain distributions. Third, a composite loss function jointly optimizes pixel-level accuracy and perceptual consistency. Experiments on both synthetic and real-world rainy datasets show that FCLIP-UNet effectively removes rain streaks, preserves fine image details, and achieves strong deraining performance with reliable generalization capability.

Genetic-algorithm-optimized All-metal Metasurface for Cross-band Stealth via Low-cost Computer Numerical Control Fabrication

ZHANG Ming, ZHANG Najiao, LI Jialei, LI Kang, Vazgen MELIKYAN, YANG Lin, HOU Weimin

2026, 48(4): 1832-1842. doi: 10.11999/JEIT251080

[Abstract](588) [FullText HTML] (401) [PDF 5952KB](59)

Abstract:
Objective Traditional electromagnetic stealth materials face the practical challenge of achieving both microwave absorption and infrared stealth. Conventional solutions, including geometric optimization and multilayer composite coatings, often suffer from narrow bandwidth, complex fabrication, and limited cross-band compatibility. This study proposes a genetic algorithm-optimized all-metal random coding metasurface that enables concurrent broadband Radar Cross Section (RCS) reduction and low infrared emissivity on a monolithic metallic platform, thereby addressing these practical limitations. Methods Monolithic all-metal C-shaped resonant units are employed. The design is based on the Pancharatnam-Berry geometric phase, in which the reflection phase is regulated by the rotation angle of the unit. Coding schemes of 2-bit, 3-bit, and 4-bit are implemented, corresponding to 4, 8, and 16 discrete phase states. A MATLAB-CST co-simulation framework is established. CST extracts unit responses using the Finite Element Method (FEM), whereas MATLAB applies a genetic algorithm to optimize the phase distribution for scattering energy diffusion. All-metal metasurface prototypes (150×150 mm², 10×10 array) are fabricated using Computer Numerical Control(CNC) cutting. Results and Discussions Genetic algorithm optimization converges within 6～8 generations. Increasing the number of coding bits enhances phase randomness. The 4-bit metasurface achieves an average 10 dB RCS reduction over 11

\begin{document}$ \sim $\end{document}

18.4 GHz. Simulation results agree with anechoic chamber measurements under oblique incidence angles from 0° to 60°. Infrared imaging confirms the low emissivity of the metallic surface. Compared with conventional composite or multilayer structures, the all-metal design simplifies fabrication, prevents interfacial mismatch, and improves structural stability. The metasurface demonstrates broadband, wide-angle, and cross-band stealth performance. Conclusions This study presents a genetic algorithm-optimized all-metal random coding metasurface that achieves cross-band stealth compatibility. The design addresses the persistent challenge of realizing both microwave performance and thermal management in conventional stealth materials. Three main technical contributions are demonstrated. (1)The monolithic copper structure provides greater than 99.9% infrared reflectivity in the 8

\begin{document}$ \sim $\end{document}

14 μm band, verified by FLIR imaging, and achieves an average 10 dB RCS reduction over 11

\begin{document}$ \sim $\end{document}

18.4 GHz. (2)The single-material configuration removes the risk of delamination. The CNC-fabricated prototype maintains structural integrity under 60° oblique incidence and reduces fabrication cost by approximately 78% compared with lithographic processing. (3)The co-simulation optimization framework converges within eight generations for 4-bit coding, enabling broadband scattering manipulation over 7.4 GHz. The proposed metasurface combines fabrication reliability, cost efficiency, and dual-band stealth capability. These characteristics provide a practical basis for large-scale deployment in military stealth systems and satellite platforms that require multispectral concealment and long-term structural durability.

One-pass Architectural Synthesis for Continuous-Flow Microfluidic Biochips Based on Deep Reinforcement Learning

LIU Genggeng, JIAO Xinyue, PAN Youlin, HUANG Xing

2026, 48(4): 1843-1852. doi: 10.11999/JEIT251058

[Abstract](583) [FullText HTML] (332) [PDF 3358KB](74)

Abstract:
Continuous-Flow Microfluidic Biochips (CFMBs) are widely applied in biomedical research because of miniaturization, high reliability, and low sample consumption. As integration density increases, design complexity significantly rises. Conventional stepwise design methods treat binding, scheduling, layout, and routing as separate stages, with limited information exchange across stages, which leads to reduced solution quality and extended design cycles. To address this limitation, a one-pass architectural synthesis method for CFMBs is proposed based on Deep Reinforcement Learning (DRL). Graph Convolutional Neural networks (GCNs) are used to extract state features, capturing structural characteristics of operations and their relationships. Proximal Policy Optimization (PPO), combined with the A* algorithm and list scheduling, ensures rational layout and routing while providing accurate information for operation scheduling. A multiobjective reward function is constructed by normalizing and weighting biochemical reaction time, total channel length, and valve count, enabling efficient exploration of the decision space through policy gradient updates. Experimental results show that the proposed method achieves a 2.1% reduction in biochemical reaction time, a 21.3% reduction in total channel length, and a 65.0% reduction in valve count on benchmark test cases, while maintaining feasibility for larger-scale chips. Objective CFMBs have gained sustained attention in biomedical applications because of miniaturization, high reliability, and low sample consumption. With increasing integration density, design complexity escalates substantially. Traditional stepwise design methods often yield suboptimal solutions, extended design cycles, and feasibility limitations for large-scale chips. To address these challenges, a one-pass architectural synthesis framework is proposed that integrates DRL to achieve coordinated optimization of binding, scheduling, layout, and routing. Methods All CFMB design tasks are integrated into a unified optimization framework formulated as a Markov decision process. The state space includes device binding information, device locations, operation priorities, and related parameters, whereas the action space adjusts device placement, operation-to-device binding, and operation priority. High-dimensional state features are extracted using GCNs. PPO is applied to iteratively update policies. The reward function accounts for biochemical reaction time, total flow-channel length, and the number of additional valves. These metrics are evaluated using the A* algorithm and list scheduling, normalized, and weighted to balance trade-offs among objectives. Results and Discussions Based on the current state and candidate actions, architectural solutions are generated iteratively through PPO-guided policy updates combined with the A* algorithm and list scheduling. The defined reward function enables the generation of CFMB architectures with improved overall quality. Experimental results show an average reduction of 2.1% in biochemical reaction time, an average reduction of 21.3% in total flow-channel length, and an average reduction of 65.0% in additional valve count compared with existing methods. These improvements reduce manufacturing cost and operational risk. Conclusions A one-pass architectural synthesis method for CFMBs based on DRL is proposed to address flow-layer design challenges. By applying GCN-based state feature extraction and PPO-based policy optimization, the multiobjective design problem is transformed into a sequential decision-making process that enables joint optimization of binding, scheduling, layout, and routing. Experimental results obtained from multiple benchmark test cases confirm improved performance in biochemical reaction completion time, total channel length, and valve count, while preserving scalability for larger chip designs.

A Triple Modular Redundancy Voter Insertion Algorithm Utilizing Stagnation-Aware Probabilistic Reordering

LIU Zhaoting, LIU Peng

2026, 48(4): 1853-1862. doi: 10.11999/JEIT250825

[Abstract](300) [FullText HTML] (697) [PDF 1224KB](58)

Abstract:
Objective With the rapid development of integrated circuit technology, performance degradation and failure of electronic devices in high-energy particle radiation environments become increasingly prominent. High reliability is required in applications such as aerospace, the nuclear industry, petroleum exploration, and deep-sea detection. Among the available reliability enhancement techniques, Triple Modular Redundancy (TMR) is widely regarded as one of the most effective methods. In TMR, three identical copies of a digital circuit operate in parallel with the same input, and the correct output is obtained through majority voting when one copy fails. Common implementation methods include fine-grained TMR, system-level partitioning, and state synchronization. State synchronization is a key step in TMR-based radiation hardening because it restores registers to the correct state after a fault. This process is achieved by inserting synchronization voters, but the resulting resource overhead is often high. This study proposes a new synchronization voter insertion algorithm to reduce hardware cost. The objective is to develop and validate an algorithm that avoids exponential runtime complexity and, relative to existing methods, reduces the number of required synchronization voters. Methods After circuit preprocessing, the synchronization voter insertion task is formulated as a Feedback Vertex Set Problem (FVSP). The memory circuit is first extracted from the digital circuit to exclude nodes outside the candidate range and reduce circuit size. A Feedback Vertex Set (FVS) is then solved to identify the flip-flop nodes at which synchronization voters should be inserted. By inserting voters at the outputs of these flip-flops, all cycles containing memory elements are broken, and state synchronization is ensured. In implementation, a Simulated Annealing (SA) algorithm is used. Topological ordering is adopted to avoid direct loop detection and to reduce the time complexity of cycle checking. To improve search efficiency and solution quality, a Stagnation-Aware Probabilistic Reordering (SAPR) scheme is incorporated into the SA framework. A priority-based mechanism is applied during topological reordering to reduce conflicts and false conflict judgments in critical search steps. The candidate-set update strategy is also refined so that insertion positions with the fewest conflicts are selected in the topological ordering. When the FVS is not improved over multiple iterations, reordering is triggered with a certain probability to balance computational cost and the ability to escape local optima. Results and Discussions The quality of the FVS obtained by the SAPR-SA-FVSP algorithm is evaluated by comparison with three other methods. The proposed method shows higher probabilities of achieving the minimum average, best, and worst values, which indicates better overall solution quality (Table 3). Furthermore, SAPR-SA-FVSP shows a smaller mean standard deviation, which indicates better stability. The average standard deviations over all test graphs are 0.596 34 for SA-FVSP, 0.667 55 for the Nonuniform Neighborhood Sampling (NNS)-based SA method, 0.651 93 for dynamic-threshold reordering, and 0.562 17 for SAPR-SA-FVSP, confirming the superior stability of the proposed method (Table 4). Using the ISCAS89 and ITC'99 benchmark circuits, the proposed voter insertion algorithm is further compared with the critical path-based voter insertion algorithm and the highest-fanout flip-flop algorithm. Across all test cases, SAPR-SA-FVSP yields the smallest number of synchronization voters. The maximum reduction reaches 78.88% relative to the critical path-based method and 74.05% relative to the highest-fanout flip-flop algorithm (Table 5). The proposed algorithm also shows better speed and robustness. It runs successfully on all test cases without failure. The average execution times on the circuits for which all three algorithms complete successfully are 9880.19 ms for the critical path-based algorithm, 9 625.04 ms for the highest-fanout flip-flop algorithm, and 3 389.73 ms for the proposed algorithm. Conclusions The proposed SAPR strategy improves the conventional SA-FVSP method and yields better solution quality and greater stability. On this basis, a resource-efficient synchronization voter insertion algorithm is proposed for restoring correct register states in TMR-hardened digital circuits. The algorithm divides the task into memory-circuit extraction and FVSP solving. Its completeness and efficiency are demonstrated theoretically, and substantial reductions in synchronization voter insertion are verified on benchmark circuits relative to the critical path-based and highest-fanout flip-flop methods. The proposed method therefore provides an effective approach for reducing hardware overhead while maintaining high reliability in TMR hardening of digital circuits.

Optimal Federated Average Fusion of Gaussian Mixture-Probability Hypothesis Density Filters

XUE Yu, XU Lei

2026, 48(4): 1863-1874. doi: 10.11999/JEIT250759

[Abstract](458) [FullText HTML] (255) [PDF 2571KB](60)

Abstract:
Objective To realize optimal decentralized fusion tracking of uncertain targets, this study proposes a federated average fusion algorithm for Gaussian Mixture-Probability Hypothesis Density (GM-PHD) filters, designed with a hierarchical structure. Each sensor node operates a local GM-PHD filter to extract multi-target state estimates from sensor measurements. The fusion node performs three key tasks: (1) maintaining a master filter that predicts the fusion result from the previous iteration; (2) associating and merging the GM-PHDs of all filters; and (3) distributing the fused result and several parameters to each filter. The association step decomposes multi-target density fusion into four categories of single-target estimate fusion. We derive the optimal single-target estimate fusion both in the absence and presence of missed detections. Information assignment applies the covariance upper-bounding theory to eliminate correlation among all filters, enabling the proposed algorithm to achieve the accuracy of Bayesian fusion. Simulation results show that the federated fusion algorithm achieves optimal tracking accuracy and consistently outperforms the conventional Arithmetic Average (AA) fusion method. Moreover, the relative reliability of each filter can be flexibly adjusted. Methods The multi-sensor multi-target density fusion is decomposed into multiple groups of single-target component merging through the association operation. Federated filtering is employed as the merging strategy, which achieves the Bayesian optimum owing to its inherent decorrelation capability. Section 3 rigorously extends this approach to scenarios with missed detections. To satisfy federated filtering’s requirement for prior estimates, a master filter is designed to compute the predicted multi-target density, thereby establishing a hierarchical architecture for the proposed algorithm. In addition, auxiliary measures are incorporated to compensate for the observed underestimation of cardinality. Results and Discussions modified Mahalanobis distance (Fig.3). The precise association and the single-target decorrelation capability together ensure the theoretical optimality of the proposed algorithm, as illustrated in Fig. 2. Compared with conventional density fusion, the Optimal Sub-Pattern Assignment (OSPA) error is reduced by 8.17% (Fig. 4). The advantage of adopting a small average factor for the master filter is demonstrated in Figs. 5 and 6. The effectiveness of the measures for achieving cardinality consensus is also validated (Fig. 7). Another competitive strength of the algorithm lies in the flexibility of adjusting the average factors (Fig. 8). Furthermore, the algorithm consistently outperforms AA fusion across all missed detection probabilities (Fig. 9). Conclusions This paper achieves theoretically optimal multi-target density fusion by employing federated filtering as the merging method for single-target components. The proposed algorithm inherits the decorrelation capability and single-target optimality of federated filtering. A hierarchical fusion architecture is designed to satisfy the requirement for prior estimates. Extensive simulations demonstrate that: (1) the algorithm can accurately associate filtered components belonging to the same target, thereby extending single-target optimality to multi-target fusion tracking; (2) the algorithm supports flexible adjustment of average factors, with smaller values for the master filter consistently preferred; and (3) the superiority of the algorithm persists even under sensor malfunctions and high missed detection rates. Nonetheless, this study is limited to GM-PHD filters with overlapping Fields Of View (FOVs). Future work will investigate its applicability to other filter types and spatially non-overlapping FOVs.

2026 Vol. 48, No. 4