Latest Articles

Articles in press have been peer-reviewed and accepted, which are not yet assigned to volumes/issues, but are citable by Digital Object Identifier (DOI).
Display Method:
Construction of Entanglement-Assisted Quantum MDS Codes
QU Yuanyue, GAO Jian
Available online  , doi: 10.11999/JEIT251251
Abstract:
  Objective  Entanglement-assisted quantum error-correcting codes (EAQECCs) provide a powerful mechanism for protecting quantum information through the use of pre-shared entanglement between sender and receiver. Traditional constructions of EAQECCs mainly rely on classical cyclic or constacyclic codes and often require strong algebraic constraints that limit the range of achievable parameters. This paper aims to develop a general and systematic framework for constructing new families of EAQECCs derived from twisted Reed-Solomon (TRS) codes over finite fields. The motivation is twofold: first, to extend the classical Reed–Solomon-based code design to its twisted form so as to capture richer algebraic structures; and second, to determine the exact number of maximally entangled pairs required for achieving the quantum Singleton bound. The ultimate goal is to produce maximum-distance separable (MDS) EAQECCs that outperform existing constructions in flexibility and parameter diversity.  Methods  The proposed method begins with the definition of TRS codes over finite fields, which introduce a “twist” parameter into the generator matrix, thereby altering the structure of their parity-check matrices. By systematically analyzing the associated coset-sum matrices and corresponding to twisted and untwisted cases, the rank of their product is determined. This rank directly equals the number of required entangled states, which forms the theoretical basis of our EAQECCs design.A detailed algebraic analysis shows that contains a submatrix with entries \begin{document}$ {M}_{l,j}=\displaystyle\sum\nolimits_{y\in W}{\left({\xi }^{j}y\right)}^{tl} $\end{document}, which simplifies to under certain group-theoretic conditions. The resulting matrix, which is a Vandermonde matrix, ensures full rank and thus provides an explicit characterization of the entanglement structure. This establishes the rank-preserving property crucial to constructing MDS EAQECCs. Based on these results, we derive two families of EAQECCs characterized by the number of entangled pairs. The corresponding parameters are tabulated and expressed as which satisfy the quantum Singleton bound with equality, confirming the MDS nature of the constructed codes.  Results and Discussions  Comprehensive parameter analyses and explicit examples verify the theoretical findings. Comparative studies further demonstrate the flexibility of the proposed framework. Unlike previous constructions that require divisibility conditions such as \begin{document}$ a\mid (q+1) $\end{document}and \begin{document}$ a\mid (q-1) $\end{document}, our approach remains valid under broader algebraic configurations, thereby significantly extending the feasible range of codes parameters. This difference is conceptually summarized in the remark section and verified numerically. A systematic comparison of our results with existing MDS EAQECCs(Tables 4)reveals several new parameter regimes previously inaccessible to classical or cyclic-code-based constructions. Particularly, our method yields larger code lengths and more adaptable entanglement consumption rates \begin{document}$ \dfrac{c}{n} $\end{document}, improving both the efficiency and generality of EAQECCs. The algebraic consistency across all tested cases confirms the correctness and universality of the TRS-based framework.  Conclusions  This study establishes a comprehensive algebraic framework for constructing MDS EAQECCs derived from twisted Reed–Solomon codes. By rigorously analyzing the rank properties of coset-sum matrices, we precisely determine the entanglement requirement and identify conditions under which the constructed codes achieve the quantum Singleton bound. Two broad classes of MDS EAQECCs are obtained, corresponding to \begin{document}$ a\mid \left(q+1\right) $\end{document} and \begin{document}$ a\mid \left(q-1\right) $\end{document}, respectively, both verified through explicit examples and tabulated results. Compared with existing papers, the proposed approach not only generalizes prior constructions but also extends the achievable parameter space to cases not covered by Reed–Solomon codes or cyclic codes frameworks. The derived codes exhibit improved structural flexibility, theoretical clarity, and potential applicability to high-performance quantum information systems. This work thus provides a novel and unified perspective for developing algebraically optimized EAQECCs, laying the foundation for future research on TRS-based quantum codes families and their efficient encoding implementations.
Design of an Aerospace-grade Radiation-hardened SRAM Cell for High-speed Read/Write Applications
CAI Shuo, SHUAI Wei, HU Xing, LIANG Xinjie, HUANG Zhu, YU Fei
Available online  , doi: 10.11999/JEIT251287
Abstract:
  Objective  With the continued scaling of Complementary Metal-Oxide-Semiconductor (CMOS) technology nodes and the reduction in supply voltage, Static Random Access Memory (SRAM) in aerospace environments becomes increasingly sensitive to high-energy particle radiation and is prone to Single-Node Upset (SNU) and Double-Node Upset (DNU). This sensitivity poses a serious challenge to the reliability of spaceborne Systems-on-Chip (SoC). Existing Radiation-Hardened-By-Design (RHBD) structures, however, usually cannot balance strong radiation tolerance with high-speed access performance. This work therefore aims to design an aerospace-grade radiation-hardened SRAM cell for high-speed read/write applications that provides both strong radiation resistance and fast access performance.  Methods  The proposed Read Fast and Write Fast 16-Transistor (RFWF16T) SRAM is built on a dual-source isolation architecture composed of 16 transistors (8 PMOS and 8 NMOS) (Fig. 1, Fig. 2). By using a symmetric recovery mechanism, the RFWF16T reduces the number of key sensitive nodes to only two. Redundant transistors (P2 and P6) are used to establish a stable high-level isolation state, which isolates the storage nodes from potential disturbances during the non-access phase. To achieve high-speed operation, the RFWF16T combines a short feedback path with a low-impedance voltage discharge loop. Unlike conventional hardened cells that rely on stacked transistors, which increase resistance and delay, the RFWF16T adopts a parallel access topology connected to word lines and bit lines. This configuration forms a low-impedance path during write operations and significantly accelerates node voltage switching (Fig. 3). Performance verification confirms the self-recovery capability of the four data nodes. A comprehensive variation analysis is conducted, including Process-Voltage-Temperature (PVT) variations and 2,000-point Monte Carlo simulations. Additionally, an improved Electrical Quality Metric (EQM) is proposed to evaluate multidimensional performance quantitatively.  Results and Discussions  The RFWF16T exhibits strong overall performance, particularly in overcoming the speed bottleneck of hardened SRAM cells. In terms of access speed, the RFWF16T performs substantially better than typical models such as S8P8N, SAW16T, and RH20T. Under standard conditions (28 nm CMOS process, 1.0 V, 25 °C, TT corner), the RFWF16T achieves a Read Access Time (RAT) of 20.97 ps and a Write Access Time (WAT) of 2.72 ps. These values correspond to average speed improvements of 46.65% and 14.77%, respectively, over eight comparable hardened structures (Table 2). PVT analysis confirms that the RFWF16T maintains the lowest latency across voltages from 0.7 V to 1.1 V and temperatures from -25 °C to 125 °C (Fig. 6). This write-speed advantage is attributed to the removal of write contention through optimized discharge paths. In terms of noise margin and stability, the RFWF16T demonstrates strong robustness and achieves the highest Write Word-line Toggle Voltage (WWTV) among nine comparative structures. Its Hold Static Noise Margin (HSNM) and Read Static Noise Margin (RSNM) also rank among the best, which ensures stability under disturbances (Fig. 7). In radiation hardening, the RFWF16T achieves a 100% self-recovery rate for SNUs and an 83.3% recovery rate for DNUs, reaching the state-of-the-art level among DNU-recoverable units (Table 1). Monte Carlo simulations confirm that the average recovery times of the internal nodes range from 1.09 ns to 1.19 ns (Fig. 4, Fig. 5). In terms of overhead, the RFWF16T maintains a normalized area of 1.00× (4.3 μm × 1.9 μm) (Table 3, Fig. 2) and an average power consumption of 23.45 nW (Table 4). Although the power consumption is slightly higher, this increase is a reasonable trade-off for the substantial speed advantage. In the EQM evaluation, the RFWF16T obtains the highest score, which confirms its overall advantage in balancing reliability, speed, and stability (Fig. 7).  Conclusions  A radiation-hardened SRAM cell, RFWF16T, is proposed for aerospace-grade high-speed read/write applications. The cell contains only two sensitive nodes and achieves 100% self-recovery for SNUs and an 83.3% recovery rate for DNUs, which demonstrates strong radiation tolerance. Compared with eight other SRAM cells, the RFWF16T significantly reduces read and write delay with only a slight increase in area and power consumption, while maintaining good noise immunity and the best electrical quality metric. PVT and Monte Carlo simulations further confirm the stability and robustness of the proposed cell under different operating conditions. Future work will focus on array-level integration and tape-out verification, and on its application in satellite-borne high-speed data processing.
A Closed-loop Feedback Adaptive Beam Alignment Algorithm for Shipborne Low Earth Orbit Satellite Communication Terminals
CHEN Haotian, MA Zixian, XIE Xinhong, LI Nayu, LI Baozhu, SONG Chunyi, XU Zhiwei
Available online  , doi: 10.11999/JEIT251324
Abstract:
  Objective  The 6G-based SATellite COMmunication (SATCOM) network has become a primary solution for ubiquitous and oceanic communications. Compared with traditional Geostationary Earth Orbit (GEO) satellites, the latest generation of Low Earth Orbit (LEO) satellites offers higher throughput, lower end-to-end latency, and lower deployment cost. Phased arrays are therefore widely used in LEO SATCOM because of their beam agility. However, maritime wind-wave disturbances cause nonlinear relative motion between shipborne terminals and LEO satellites, which creates major challenges for high-precision satellite acquisition and tracking. To address this issue, a new beam alignment algorithm is required for LEO SATCOM systems. Such an algorithm should first obtain the instantaneous target state and motion characteristics through target acquisition, and then use a multi-target tracking method to predict satellite trajectories on the basis of the target states, thereby compensating for estimation errors caused by severe coupled motions.  Methods  The proposed closed-loop feedback adaptive beam alignment algorithm consists of two tightly coupled components: target acquisition and target state updating. In the target acquisition stage, a RAnk Reduction Estimator(RARE) is first used to decompose the array factor matrix and convert the original two-dimensional Direction Of Arrival(DOA) estimation problem into two sequential one-dimensional estimation problems. This process greatly reduces the computational complexity of each Sparse Bayesian Learning(SBL) iteration. On the basis of the coarse grid generated by RARE, an Adaptive Newton Sparse Bayesian Learning(ANSBL) method is developed. ANSBL uses block-sparse Bayesian learning to achieve initial target acquisition on the coarse grid, and then performs two-stage Newton refinement to reduce off-grid mismatch. This strategy provides high-accuracy DOA estimation in both \begin{document}$ \theta $\end{document} and \begin{document}$ \varphi $\end{document} and improves angular observation precision. In the target state updating stage, an Unscented Kalman Filter(UKF)-based ternary joint prediction mechanism is proposed. The UKF simultaneously predicts the target motion state, signal variance, and noise variance for the next target acquisition process. These predicted probability distributions are then used to update the initial grid and hyperparameters of the subsequent SBL acquisition stage, providing more consistent and comprehensive initial values. Through this closed-loop interaction, target acquisition and state tracking are deeply integrated, which substantially reduces the number of SBL iterations required for convergence. This advantage is particularly evident under high sea-state conditions, where reduced beam alignment time is critical.  Results and Discussions  The proposed closed-loop feedback adaptive beam alignment algorithm first uses on-grid DOA estimation to reduce array factor correlation and improve target acquisition efficiency, and then uses Newton iteration to achieve higher off-grid accuracy (Fig. 3). The proposed method is subsequently validated using real ship attitude data collected from a 28,000-DWT bulk carrier under actual sea conditions (Fig. 4). The UKF refines the DOA results through state updating. Its predictions of signal position, signal variance, and noise variance provide accurate initial values for the hyperparameters, thereby reducing the number of iterations and enabling faster convergence than other algorithms (Fig. 5). Under low sea-state conditions, the proposed method not only achieves satellite alignment in less than 0.2 s, but also reduces the satellite position estimation error from ±1°\begin{document}$ \sim $\end{document}±0.5° (Fig. 6(a)). Under high sea-state conditions, the UKF effectively predicts satellite positions and reduces the satellite position estimation error from ±2.5°\begin{document}$ \sim $\end{document}±0.65°, which verifies the robust tracking accuracy and error mitigation capability of the proposed method in harsh marine environments (Fig. 6(b)).  Conclusions  To meet the performance requirements of beam alignment algorithms for LEO communication satellites, this paper proposes a closed-loop feedback adaptive beam alignment algorithm. The algorithm first uses a block-based SBL algorithm to obtain grid-based DOA estimation results, and then achieves super-resolution direction estimation under off-grid conditions through adaptive Newton iteration. Through the UKF, the estimation results are dynamically calibrated in real time. The UKF further predicts the target motion state, signal variance, and noise variance for the next target acquisition process, thereby improving tracking continuity and alignment accuracy. Numerical simulations show that the proposed algorithm outperforms traditional beam alignment methods in both numerical accuracy and robustness, and effectively mitigates severe terminal shaking under complex sea conditions.
Research on Monophonic Speech Separation Method Using Time-Frequency Domain Multi-scale Information Interaction Strategy
LAN Chaofeng, YANG Guotao, CHEN Yingqi, GUO Xiaoxia
Available online  , doi: 10.11999/JEIT251340
Abstract:
  Objective  Monaural speech separation aims to extract individual speaker signals from a single-channel mixture. It is a core technology for addressing the “cocktail party problem” and has substantial application value in low-resource, low-latency scenarios such as mobile voice assistants, teleconferencing, and hearing aids. However, the lack of spatial cues in single-channel signals, together with the substantial overlap of multiple speakers in both time-domain waveforms and frequency-domain spectra, makes accurate separation highly challenging, especially when the integrity and clarity of the target speech must be preserved. Current deep learning-based models often show limitations in three closely related aspects: effective coordination of multi-scale dependencies, efficient fusion of time-frequency information, and control of computational complexity. To address these challenges, a novel Multi-Scale Attention model integrating Time-Frequency domain information (MSA-TF) is proposed to improve separation performance, computational efficiency, and generalization capability.  Methods  The MSA-TF model contains three key components. First, a lightweight Time-Frequency fusion module is designed. The module first divides the frequency band into four subbands on the basis of speech priors, such as low-frequency energy concentration and high-frequency detail sensitivity, to extract spectral features efficiently. A dynamic gating mechanism with decomposed convolutions and SiLU activation is then applied to adaptively enhance speaker-discriminative features and suppress redundant channels associated with noise. Finally, a cross-attention mechanism is used to promote deep interaction between time-domain and frequency-domain features during the encoding stage. Global semantic information from the time domain guides the selection and weighting of useful frequency-domain features, allowing mutual correction and complementarity. This module adds only 0.8 M parameters. Second, a Multi-scale Interaction Separator is proposed to address the limitations of sequential or loosely coupled multi-scale processing in models such as SepFormer. Multi-granularity features, ranging from frame-level F1 to syllable-level semantic F4, are extracted through cascaded dilated convolutions. Its core is the “GF-LF Iterative Feedback” mechanism. The Global Flash module, based on efficient FLASH attention, captures long-range dependencies and syllable-level context. This global information is upsampled and injected into local features (Fk) through residual connections. Local Flash modules, also based on FLASH attention, then process the enhanced local features (F'k) to model fine-grained structures and suppress frame-level noise. The updated local features are subsequently fed back through adaptive pooling to refine the global representation in the next iteration. This closed-loop bidirectional flow enables deep synergy between global semantics and local details. A gated fusion mechanism at the end dynamically balances the contributions of different scales. Third, to control computational complexity, an efficient hierarchical grouped attention mechanism is adopted, reducing the complexity from quadratic to nearly linear with sequence length. The overall MSA-TF architecture is end-to-end and consists of a 1D convolutional encoder, the integrated time-frequency and multi-scale modules, a mask network, and a symmetric decoder.  Results and Discussions  Extensive experiments are conducted on the standard WSJ0-2mix and Libri-2mix datasets, with Scale-Invariant Signal-to-Noise Ratio (SI-SNR) and Signal-to-Distortion Ratio (SDR) used as evaluation metrics. Ablation studies (Table 1) confirm the individual and joint contributions of the proposed modules. When only the time-frequency module is added to the TDAnet baseline, SI-SNR increases by 0.3 dB and SDR by 0.4 dB with only a small increase in parameters, confirming its contribution to signal structure modeling, particularly for high-frequency details. When only the multi-scale interaction module is incorporated, SI-SNR increases by 2.5 dB and SDR by 2.7 dB, highlighting its central role in modeling long-term dependencies. When the time-frequency and multi-scale modules are combined in the complete MSA-TF core, a synergistic effect is obtained, reaching 17.6 dB SI-SNR, which exceeds the sum of the individual gains. This result indicates that the dual-dimensional features provided by time-frequency fusion and the deep dependency modeling enabled by multi-scale interaction strengthen each other. Spectrogram analysis (Fig. 4) further shows that the time-frequency module effectively suppresses residual high-frequency noise and produces clearer spectral contours for the target speech. On the WSJ0-2mix test set (Table 2), MSA-TF achieves state-of-the-art performance, with 17.6 dB SI-SNR and 17.8 dB SDR. It matches the performance of SuperFormer and substantially outperforms strong baselines such as Conv-Tasnet by 2.3 dB SI-SNR, while maintaining a reasonable parameter count of 15.6 M. Compared with models with larger parameter sizes, such as SignPredictionNet at 55.2 M, MSA-TF shows more efficient modeling. For generalization evaluation on the completely unseen Libri-2mix dataset (Table 3), MSA-TF, trained only on WSJ0-2mix, achieves 14.2 dB SI-SNR and 14.7 dB SDR. Its performance is comparable to that of Conv-Tasnet models trained specifically on Libri-2mix, which achieve 14.4 dB SI-SNR, and it outperforms BLSTM-Tasnet trained on Libri-2mix. This strong cross-dataset adaptability indicates that the model captures universal time-frequency characteristics and multi-scale dependency structures in speech signals rather than overfitting to a specific dataset distribution.  Conclusions  An MSA-TF model is presented to address key challenges in monaural speech separation through deep integration of multi-scale time-frequency information interaction. The proposed lightweight Time-Frequency fusion module efficiently supplements time-domain features with discriminative frequency-domain information. The Multi-scale Interaction Separator, with its iterative feedback mechanism, enables dynamic bidirectional information flow across scales and substantially improves the joint modeling of short-term details and long-term dependencies. Combined with an efficient attention design, the model achieves superior performance without excessive computational cost. Experimental results show that MSA-TF achieves leading separation performance on standard benchmarks and shows strong generalization ability on unseen data distributions, confirming the effectiveness of this comprehensive design. The model provides an efficient, robust, and generalizable solution for practical low-resource application scenarios. Future work may examine advanced cross-modal fusion techniques and dynamic scale adjustment strategies to further improve robustness and performance in more complex and variable acoustic environments.
Intelligent Sorting Algorithm for Multi-station Radar Signals Based on Federated Learning
YE Chengji, XIE Jian, ZHANG Zhaolin, WANG Ling
Available online  , doi: 10.11999/JEIT251355
Abstract:
  Objective  Radar signal sorting is a critical step in electronic reconnaissance and battlefield situational awareness. It is used to accurately separate interleaved pulse streams in complex electromagnetic environments. Although multi-station cooperative reconnaissance systems provide spatial diversity gains that can mitigate the parameter ambiguity and aliasing problems of single-station systems, their practical deployment faces major challenges. Traditional centralized processing architectures require massive volumes of raw Pulse Description Word (PDW) data to be transmitted to a central server. This requirement leads to prohibitive communication bandwidth costs and increases the risk of leakage of sensitive electromagnetic spectrum intelligence. In addition, because stations are geographically distributed and differ in antenna scanning patterns, the data collected at different stations often show significant Non-Independent and Identically Distributed (Non-IID) characteristics. Such heterogeneity reduces the generalization ability of local models trained on isolated data islands. To resolve the conflict between data isolation and the need for collaborative intelligence, a multi-station collaborative radar signal sorting method is proposed based on a Federated Learning (FL) framework. Collaborative model training is enabled without exchange of raw data, so that data privacy is preserved, communication overhead is reduced, and sorting robustness is improved in heterogeneous and noisy battlefield environments.  Methods  A centralized federated sorting framework is constructed to coordinate multiple reconnaissance stations. The method contains three main components: feature preprocessing, a lightweight local temporal model, and a heterogeneity-aware aggregation strategy. First, in data preprocessing, the raw PDW parameters, including TOA, CF, and PW, are normalized to address substantial differences in scale. Specifically, TOA is transformed into first-order differential values to extract Pulse Repetition Interval (PRI) information, which prevents numerical saturation and captures periodic patterns effectively (Fig. 3). Second, a local time-series sorting model is designed for the resource constraints of edge devices. A bidirectional Long Short-Term Memory (LSTM) network is used as the backbone to capture long-range dependencies and dynamic patterns in pulse sequences from both forward and backward directions. To accelerate convergence and prevent gradient vanishing, residual connections are added to fuse static and dynamic features. The extracted features are then mapped to the radiation source category space through a cascaded linear classification layer. Third, to address model drift caused by Non-IID data, including feature distribution shift and label distribution shift, a new aggregation strategy is proposed based on parameter decomposition and proximal regularization. Model parameters are decoupled into a feature extractor and a classifier. During federated aggregation, only the parameters of the generic feature extractor are uploaded and globally averaged, whereas the personalized classifier parameters are retained locally to adapt to the class distribution of each station. Furthermore, a proximal regularization term is added to the local loss function (Eq. 20). This constraint limits the deviation of local updates from the global model and ensures that the optimization direction does not diverge substantially because of local data heterogeneity, thereby improving the stability and convergence speed of the global model.  Results and Discussions  Extensive simulation experiments are conducted on core datasets with 3 stations and 5 radars, and on extended datasets with 9 stations and 12 radars, including complex modulation patterns such as jitter, sliding, and staggering. Quantitative analysis shows that the proposed method achieves sorting performance comparable to that of Centralized Learning (CL). On the core dataset, the Precision, Recall, and F1-score of the proposed method reach 96.51%, 96.35%, and 96.42%, respectively, exceeding those of FedAvg by approximately 0.67% in F1-score. On the more challenging extended dataset, the performance advantage becomes more significant, with an F1-score improvement of 3.86% over FedAvg (Table 4). These results indicate that the parameter decomposition strategy effectively balances common feature learning with personalized decision-making. Analysis by class further shows that, for categories that are difficult to distinguish, such as Radar 7 and Radar 10, the proposed method improves recognition accuracy by up to 15% and 6%, respectively, compared with FedAvg (Fig. 7 and Fig. 8). Robustness tests further demonstrate the adaptability of the method. When the number of participating stations increases from 3 to 9 (Fig. 9), the F1-score rises steadily from 73.53% to 83.75%. This result confirms that enlarging node scale in the FL framework produces collaborative gains through more diverse samples and reduced geographic statistical heterogeneity, which substantially improve model generalization and robustness. Under severe class skew conditions, the method maintains an F1-score above 80% on the core dataset (Fig. 10 and Fig. 11). Furthermore, under extreme electromagnetic conditions characterized by high pulse loss rates of 70% and spurious pulse rates of 70%, the model maintains sorting performance above 75%, which demonstrates strong robustness against noise and interference (Fig. 12).  Conclusions  An FL-based framework is proposed for multi-station collaborative radar signal sorting to address data privacy and transmission constraints in distributed reconnaissance. By integrating a lightweight LSTM with a heterogeneity-aware aggregation mechanism, the method effectively captures temporal pulse features and mitigates model drift caused by Non-IID data. Experimental results verify that the approach achieves accuracy comparable to that of centralized methods and shows superior robustness under label skew and severe data degradation, including high pulse loss and spurious pulse rates. This study provides a privacy-preserving and efficient solution for intelligent signal processing in distributed electronic warfare systems.
Dynamic Scale Perception-Driven Multi-UAV Collaborative 3D Object Detection Method
DUAN Shujing, WANG Zhirui, CHENG Peirui, FU Kun
Available online  , doi: 10.11999/JEIT251378
Abstract:
  Objective  Multi-UAV collaborative 3D object detection is a core technology for low-altitude intelligent perception, and the Bird’s-Eye View (BEV) feature representation paradigm provides support for global spatial consistency. However, in practical UAV remote-sensing scenarios, targets are extremely small, sparsely distributed, and embedded in a large proportion of background regions. Existing Transformer-based BEV perception methods adopt a homogeneous full-image feature-processing strategy. This strategy not only wastes computing resources because of excessive computation in large background areas, but also tends to dilute small-target features with background noise, making it difficult to balance computational efficiency and detection accuracy. Meanwhile, multi-UAV collaboration requires cross-device information interaction to achieve view complementarity and information gain, but this process is prone to redundant information and even feature conflicts. Traditional fixed-weight aggregation methods cannot accurately identify effective information or suppress redundancy, resulting in poor consistency of global BEV features and reduced collaborative detection accuracy. Therefore, the development of a detection network that is adaptive to multi-UAV aerial scenarios is of clear practical value.  Methods  A dynamic scale-aware detection network is proposed for efficient and accurate 3D object detection through two core modules: the Dynamic Scale-aware BEV Generation (DSBG) module and the Adaptive Collaborative BEV-Feature Aggregation (ACFA) module. The network establishes an end-to-end pipeline of “multi-view image input-dynamic scale adaptive feature encoding-BEV space 3D detection” (Fig. 1). First, the observed images collected by each UAV are processed independently by a parameter-sharing ResNet-50 backbone network to generate feature maps with a consistent structure. The DSBG module then takes these feature maps as input, calculates the amplitude of feature responses in each spatial region through the Local Scale-Aware Unit, and estimates the target distribution. On this basis, differentiated BEV grid encoding is dynamically allocated: high-resolution dense grids are assigned to high-response target regions to preserve fine-grained features, whereas low-resolution sparse grids are assigned to low-response background regions to reduce invalid computation. At the same time, target query vectors with spatial position priors are generated. The ACFA module receives the multi-resolution BEV features generated by the DSBG module, concatenates the dual-resolution features from different UAVs in the channel dimension, upsamples the low-resolution features to align them with the high-resolution features, models the local correlations of two-scale features through 3×3 convolution, and obtains a globally consistent BEV feature map through element-wise weighted summation. Finally, the global BEV features are fed into the DETR decoder for 3D target prediction, with Focal Loss used for classification and Smooth L1 Loss used for regression (Eqs. 5\begin{document}$ \sim $\end{document}6).  Results and Discussions  Extensive experiments are conducted on two public multi-UAV collaborative simulation datasets, AeroCollab3D and Air-Co-Pred. The results show that the proposed method achieves strong performance on both datasets. Compared with current state-of-the-art methods and baseline models, it not only improves mean Average Precision (mAP) by up to 7.2 percentage points, but also substantially reduces key evaluation metrics, including mean size error by more than 48%, mean localization error, and mean orientation error. In particular, clear advantages are observed in small-target detection and fine-grained category recognition, with pedestrian detection accuracy improved by nearly 10 percentage points. Ablation experiments verify the effectiveness of both the DSBG and ACFA modules. The proposed method steadily improves detection accuracy while significantly reducing computational cost by up to 41.6%, thereby achieving coordinated optimization of accuracy and efficiency. Visualization results (Fig. 3) show that the predicted bounding boxes have higher spatial alignment with the ground truth, effectively alleviating the common problems of target overlap and missed detection in traditional methods. Fig. 4 further illustrates the technical advantages of multi-UAV collaborative detection. Even for targets occluded by obstacles, the proposed method achieves efficient detection, thereby enhancing the comprehensive perception capability of the global region.  Conclusions  A dynamic scale-aware detection network is proposed for multi-UAV collaborative 3D object detection to address the core challenges of the efficiency-accuracy tradeoff and poor feature consistency in traditional methods. The DSBG module achieves dynamic matching between the BEV encoding scale and target distribution, thereby reducing redundant computation, whereas the ACFA module improves multi-scale and multi-view feature aggregation to ensure global feature consistency and accuracy. Experimental results on two datasets confirm that the proposed method outperforms existing advanced methods in detection accuracy, computational efficiency, and robustness. Future work will focus on optimizing dynamic scale-adjustment strategies with temporal information and exploring multi-sensor fusion with lightweight LiDAR data to improve detection stability in complex scenarios.
Privacy-preserving Computation in Trustworthy Face Recognition: A Comprehensive Survey
YUAN Lin, WU Yanshang, ZHANG Liyuan, ZHANG Yushu, WANG Nannan, GAO Xinbo
Available online  , doi: 10.11999/JEIT251063
Abstract:
  Significance   With the widespread deployment of face recognition in Cyber-Physical Systems (CPS), including smart cities, intelligent transportation, and public safety infrastructures, privacy leakage has become a central concern for both academia and industry. Unlike many biometric modalities, face recognition operates in highly visible and loosely controlled environments, such as public spaces, consumer devices, and online platforms, where facial image acquisition is easy and pervasive. This exposure makes facial data especially vulnerable to unauthorized collection and misuse. Insufficient protection may lead to identity theft, unauthorized tracking, and deepfake generation, which threaten individual rights and reduce trust in digital systems. Therefore, facial data protection is not only a technical issue but also a significant societal and ethical challenge. This work integrates fragmented research across computer vision, cryptography, and privacy-preserving computation. It provides a unified perspective that guides the development of trustworthy face recognition ecosystems that balance usability, regulatory compliance, and public trust.  Contributions   This paper systematically reviews recent advances in privacy-preserving computation for face recognition, covering both theoretical foundations and practical implementations. The architecture and application pipeline of face recognition systems are first examined, and privacy risks at each stage are identified. At the data collection stage, unauthorized or covert capture of facial images introduces immediate risks of misuse. During model training and deployment, gradient leakage, membership inference, and overfitting may expose sensitive information about individuals contained in training data. At the inference stage, adversaries may reconstruct facial images, perform unauthorized recognition, or associate identities across datasets, which compromises anonymity. To address these threats, existing approaches are classified into four major privacy-preserving paradigms: data transformation, distributed collaboration, image generation, and adversarial perturbation. Within these paradigms, ten representative techniques are analyzed. Cryptographic computation, including homomorphic encryption and secure multiparty computation, enables recognition without revealing raw data but often introduces substantial computational overhead. Frequency-domain learning converts images into spectral representations to suppress identifiable details while retaining discriminative features. Federated learning decentralizes model training and reduces centralized data exposure, although it remains vulnerable to gradient inversion attacks. Image generation techniques, such as face synthesis and virtual identity modeling, reduce reliance on real facial data during training and evaluation. Differential privacy introduces calibrated noise to provide statistical privacy guarantees, whereas face anonymization obscures identifiable visual traits. Template protection and anti-reconstruction mechanisms defend stored facial features against reverse engineering. Adversarial privacy protection introduces imperceptible perturbations that interfere with machine recognition yet preserve human visual perception. Several representative studies in each category are further examined. Commonly used evaluation datasets are summarized. A comparative analysis is conducted across multiple dimensions, including face recognition performance, privacy protection effectiveness, and practical usability. This analysis systematically identifies the strengths and limitations of different types of methods.   Prospects   Several research directions are identified for future work. A primary challenge is to achieve a dynamic balance between privacy protection and system utility. Excessive protection may degrade recognition accuracy, whereas insufficient safeguards expose users to unacceptable risks. Adaptive mechanisms that adjust privacy levels according to context, task requirements, and user consent are therefore required. Another promising direction is the development of inherently privacy-aware recognition paradigms, such as feature representations that minimize identity leakage by design. The establishment of standardized evaluation frameworks for privacy risk and usability is also essential. Such frameworks would enable reproducible benchmarking and facilitate real-world deployment. The emergence of generative foundation models, including diffusion models and large multimodal models, further changes the research landscape. These models enable synthetic data generation and controllable identity representations. However, they also enable more advanced attacks, such as high-fidelity face reconstruction and identity impersonation. Addressing these dual effects requires interdisciplinary collaboration across computer vision, cryptography, law, and ethics, supported by appropriate regulation and continued methodological development.  Conclusions  This paper provides a comprehensive reference for researchers and practitioners engaged in trustworthy face recognition. By integrating advances from multiple disciplines, it promotes the development of effective facial privacy protection technologies and supports the secure, reliable, and ethically responsible deployment of face recognition in practical scenarios. The long-term goal is to establish face recognition as a trustworthy component of CPS that balances functionality, privacy protection, and societal trust.
ReXNet: A Trustworthy Framework for Space-air Security Integrating Uncertainty Quantification and Explainability
LIU Zhuang, CHEN Yuran, ZHANG Jiatong, JIANG Yujing, WANG Xuhui
Available online  , doi: 10.11999/JEIT251159
Abstract:
  Objective  The Space-Air-Ground Integrated Network (SAGIN) has emerged as a strategic infrastructure for national development. However, its security vulnerabilities are increasingly evident. The physical, network, and application layers of SAGIN face different security challenges that require targeted protection strategies. Aerospace scenarios require both high predictive accuracy and transparent decision making. Therefore, more robust, reliable, and interpretable intelligent methods are needed to support network security and system trustworthiness.  Methods  A detection framework is proposed that integrates Uncertainty Quantification (UQ) and eXplainable Artificial Intelligence (XAI). In the front-end stage, a Bayesian deep learning method based on Monte Carlo Dropout is adopted to enable probabilistic prediction modeling. This approach separates and quantifies epistemic uncertainty and aleatoric uncertainty, which improves model reliability. In the back-end stage, SHAP and LIME are applied to provide feature attribution for each prediction, improving model interpretability and transparency. Moreover, the intermediate layer of the framework allows flexible replacement of deep learning backbones, enabling adaptation to different space and aerospace application scenarios.  Results and Discussions  Extensive experiments were conducted on representative space-air security datasets, including UAV swarm fault detection, ADS-B injection attacks, and network fraud detection. The experimental results show that the proposed framework achieves high-precision anomaly detection. It also evaluates prediction confidence and identifies unknown samples outside the model knowledge boundary. In addition, the framework generates logically consistent and traceable explanations for model decisions, which improves interpretability and operational reliability. The results indicate that the combined use of UQ and XAI improves the robustness and trustworthiness of intelligent models in aerospace security applications.  Conclusions  This study improves the reliability and transparency of anomaly detection models in the space-air domain. It reflects a transition in artificial intelligence applications from focusing only on prediction accuracy to emphasizing system trustworthiness. Future work will promote practical deployment of the framework. The focus will include real-time processing capability, lightweight implementation, and operation in resource-constrained environments such as onboard and on-orbit systems. These efforts support more secure, autonomous, and efficient operation of SAGIN and contribute to the sustainable development of future space-air information networks.
A Fast and Accurate Programming Strategy for Analog In-Memory Computing Validated With a Transposable RRAM Macro and 0.64% Fully-Parallel RMS Error
XIE Lifan, WEI Songtao, YAO Peng, WU Dong, TANG Jianshi, QIAN He, GAO Bin, WU Huaqiang
Available online  , doi: 10.11999/JEIT251174
Abstract:
  Objective  Non-Volatile Memory (NVM)-based Compute-in-Memory (CIM) is considered a promising candidate for next-generation artificial intelligence accelerators because of its high energy efficiency and instant wake-up capability. However, the conventional Write-and-Verify (W&V) scheme cannot satisfy the speed and precision requirements of highly parallel CIM macros. The main limitation arises from the inefficient verification stage. Cell-by-cell reading must be repeated for the entire array, which significantly increases programming time. In addition, switching from the verify state, where only one row is active, to the compute state, where all rows are active, introduces systematic errors such as reference drift and IR-drop-induced weight inaccuracy. Analog CIM macros with on-chip programming must also tolerate large and non-uniform offsets under massive parallel operation. This work proposes three techniques: (1) a Back-Propagation-Assisted Programming (BPAP) scheme that rapidly and accurately locates failing cells without full-array verification; (2) an Analog-domain Offset-Canceling Structure (AOSC) that compensates channel-wise offsets in situ; and (3) a transposable Resistive Random-Access Memory (RRAM) macro equipped with parallel Two-Channel current-domain Analog-to-Digital Converters (TC-ADC), which doubles the effective sampling rate with only 15% additional ADC area.  Methods  As shown in Fig. 2, the transposable RRAM macro contains two processing elements (PEs) and a shared backward-processing ADC (BP-ADC). Each PE includes an input loader (IL), a Digital-to-Analog Converter (DAC) array, a Bit-Line (BL) buffer and switch array, and 32 TC-ADCs. This configuration supports fully parallel forward computation. An Error Loader (EL) and a Source-Line (SL) buffer are also included to provide an error input vector for transposed matrix-vector multiplication (MVM). Fig. 3 illustrates the programming flow of the BPAP scheme. After AOSC calibration, a forward calculation is first executed. The differences between the expected outputs (yexp) and the measured outputs (yreal) are then computed on chip and used as inputs for the following back-propagation phase. The derivatives of the RRAM weights are calculated using several validation patterns. This training-like process adapts to the actual RRAM states and detects programming failures under the highly parallel computing condition. Weights with derivatives exceeding a predefined error threshold are selected for remapping. This approach enables accurate programming without performing cell-by-cell verification across the entire array. In the forward phase (Fig. 4a), each 2T2R cell is configured as a signed weight, and the SLs are clamped at VCM by the TC-ADCs. For each PE, a fully parallel 4b-IN/4b-W MVM operation is completed with 320 active rows of 2T2R cells, and 32 ADCs perform simultaneous conversions. In the backward phase (Fig. 4b), only the upper half of the reference voltages drives the SL buffers, and the weight is configured in 1T1R mode. Differential computation between the positive and negative 1T1R cells is performed by an external processor. Fig. 5 shows the operation of the AOSC scheme. Redundant rows in the RRAM array are programmed to compensate the analog computing offsets in situ. Offset currents are first measured by applying an all-zero input pattern to the regular weights. The redundant RRAM weights are then programmed to minimize the offset currents under a constant input voltage. During normal computation, these programmed redundancy rows receive the same input voltage to cancel the offsets. The macro supports this AOSC operation with only about 1% additional array area. Fig. 6 shows the TC-ADC architecture. A class-AB output stage, together with associated switches and capacitors, enables two-channel conversion and reduces the computation latency by half. This design increases the ADC area by only about 15% while achieving a 2× sampling rate.  Conclusions  Replacing the conventional W&V procedure with BPAP, together with AOSC calibration and TC-ADC acceleration, enables reliable and high-precision programming of analog RRAM-CIM macros under massive parallel operation. The measured results show 96.5% classification accuracy on MNIST and a 4.8% improvement on ImageNet. The proposed techniques are compatible with standard 2T2R and 1T1R RRAM bit cells and can be extended to larger arrays and deeper neural networks.
Multi-Scale Deformable Alignment-Aware Bidirectional Gated Feature Aggregation for Stereoscopic Image Generation from a Single Image
ZHANG Chunlan, QU Yuwei, NIE Lang, LIN Chunyu
Available online  , doi: 10.11999/JEIT250760
Abstract:
  Objective  The generation of stereoscopic images from a single image usually relies on depth as a prior, which often leads to geometric misalignment, occlusion artifacts, and texture blurring. Recent studies have therefore shifted toward end-to-end learning of alignment transformation and rendering within the image or feature domain. By adopting a content-based feature transformation and alignment mechanism, high-quality novel images can be generated without explicit geometric information. However, three main challenges remain. First, fixed convolution has limited ability to model large-scale geometric and disparity changes, which restricts feature alignment performance. Second, texture and structural information are tightly coupled in network representations, and hierarchical modeling and dynamic fusion mechanisms are often absent. This limitation makes it difficult to preserve fine details while maintaining semantic consistency. Third, existing supervision strategies mainly focus on reconstruction errors and provide limited constraints on the intermediate alignment process, which reduces the efficiency of cross-view feature consistency learning. To address these challenges, a Multi-Scale Deformable Alignment-Aware Bidirectional Gated Feature Aggregation network is proposed for stereoscopic image generation from a single image.  Methods  First, to address image misalignment and distortion caused by the inability of fixed convolution to adapt to geometric deformation and disparity changes, a Multi-Scale Deformable Alignment (MSDA) module is proposed. This module employs multi-scale deformable convolution to adaptively adjust sampling positions based on image content, enabling effective alignment between source and target features across different scales. Second, to address texture blurring and structural distortion in synthesized images, a feature decoupling strategy is adopted to guide shallow layers to learn texture information and deeper layers to model structural information. A Texture-Structure Bidirectional Gating Feature Aggregation (Bi-GFA) module is designed to achieve dynamic complementarity and efficient fusion of texture and structural features. Third, to improve cross-view feature alignment accuracy, a Learnable Alignment-Guided Loss (LAG) function is proposed. This loss guides the alignment network to adaptively refine the offset field at the feature level, thereby improving the fidelity and semantic consistency of the synthesized images.  Results and Discussions  This study focuses on scene-level image synthesis from a single image. Quantitative results show that the proposed method performs better than all compared methods in terms of PSNR, SSIM, and LPIPS. The method also maintains stable performance across different dataset sizes and scene complexities, indicating strong generalization ability and robustness (Tab. 1 and Tab. 2). Qualitative comparisons indicate that the generated images are visually closest to the ground-truth images and exhibit high overall sharpness and detail fidelity. In the outdoor KITTI dataset, pixel alignment errors of foreground objects are effectively reduced (Fig. 4). In indoor scenes, facial and hair textures are clearly reconstructed. High-frequency regions, such as champagne towers and balloon edges, present sharp contours and accurate color reproduction without visible artifacts or blurring. Both global illumination and local structural details are well preserved, producing high perceptual quality (Fig. 5). Ablation experiments further confirm the effectiveness of the proposed MSDA, Bi-GFA, and LAG modules (Tab. 3).  Conclusions  A Multi-Scale Deformable Alignment-Aware Bidirectional Gated Feature Aggregation network is proposed to address strong dependence on ground-truth depth, geometric misalignment and distortion, texture blurring, and structural distortion in stereoscopic image generation from a monocular image. The MSDA module improves the flexibility and accuracy of cross-view feature alignment. The Texture-Structure Bi-GFA module enables complementary fusion of texture details and structural information. The LAG further refines offset field estimation and improves the fidelity and semantic consistency of the synthesized images. Experimental results show that the proposed method performs better than existing advanced methods in structural reconstruction, texture clarity, and viewpoint consistency, while maintaining strong generalization ability and robustness. Future work will examine the effect of different depth estimation strategies on system performance and investigate more efficient network architectures and model compression methods to reduce computational cost and support real-time stereoscopic image generation.
Spherical Geometry-guided and Frequency-Enhanced Segment Anything Model for 360° Salient Object Detection
CHEN Xiaolei, SHEN Yujie, ZHONG Zhihua
Available online  , doi: 10.11999/JEIT251254
Abstract:
  Objective  With the rapid development of Virtual Reality (VR) and Augmented Reality (AR) technologies and the increasing demand for omnidirectional visual applications, accurate salient object detection in complex 360° scenes has become critical for system stability and intelligent decision-making. The Segment Anything Model (SAM) demonstrates strong transferability across two-dimensional vision tasks. However, it is primarily designed for planar images and lacks explicit modeling of spherical geometry, which limits its direct application to 360° Salient Object Detection (360° SOD). To address this limitation, this study integrates the generalization capability of SAM with spherical-aware multi-scale geometric modeling to improve 360° SOD. Specifically, a Multi-Cognitive Adapter (MCA), Spherical Geometry Guided Attention (SGGA), and Spatial-Frequency Joint Perception Module (SFJPM) are proposed to enhance multi-scale structural representation, mitigate projection-induced geometric distortions and boundary discontinuities, and strengthen joint global and local feature modeling.  Methods  The proposed 360° SOD framework is built on SAM and consists of an image encoder and a mask decoder. During encoding, spherical geometry modeling is incorporated into patch embedding by mapping image patches onto a unit sphere and explicitly modeling spatial relationships between patch centers. This strategy injects geometric priors into the attention mechanism, which improves sensitivity to non-uniform geometric characteristics and reduces information loss caused by omnidirectional projection distortion. The encoder adopts a partial freezing strategy and is organized into four stages, each containing three encoder blocks. Each block integrates the MCA for multi-scale contextual fusion and the SGGA to model long-range dependencies in spherical space. Multi-level features are concatenated along the channel dimension to form a unified representation. The representation is then refined by the SFJPM, which jointly captures spatial structures and frequency-domain global information. The fused features are subsequently fed into the SAM mask decoder. Saliency maps are optimized under ground-truth supervision to achieve accurate object localization and boundary refinement.  Results and Discussions  Experiments are conducted using the PyTorch framework on an RTX 3090 GPU with an input resolution of 512 × 512. Evaluations are performed on two public datasets, 360-SOD and 360-SSOD, and compared with 14 state-of-the-art methods. The proposed approach consistently achieves superior performance across six evaluation metrics. On the 360-SOD dataset, the model achieves a Mean Absolute Error (MAE) of 0.015 2 and a maximum F-measure of 0.849 2, outperforming representative methods such as MDSAM and DPNet. Qualitative results show that the proposed method produces saliency maps that are highly consistent with ground-truth annotations. The model handles challenging scenarios effectively, including projection distortion, boundary discontinuity, multi-object scenes, and complex backgrounds. Ablation studies further show that MCA, SGGA, and SFJPM each contribute to performance improvement and operate complementarily.  Conclusions  This study proposes an SAM-based framework for 360° salient object detection that jointly addresses multi-scale representation, spherical distortion awareness, and spatial-frequency feature modeling. The MCA improves multi-scale feature fusion, the SGGA compensates for Equirectangular Projection(ERP)-induced geometric distortion, and the SFJPM enhances long-range dependency modeling. Extensive experiments verify the effectiveness and feasibility of applying SAM to 360° SOD. Future research will extend this framework to omnidirectional video and multi-modal scenarios to further improve spatiotemporal modeling and scene understanding.
Construction Methods of Two-Dimensional Golay-Zero Correlation Zone Array Sets with Flexible Parameters
WANG Meiyue, LIU Tao, CHEN Xiaoyu, LI Yubo
Available online  , doi: 10.11999/JEIT251360
Abstract:
  Objective  Sequences with good correlation properties are widely used in wireless communications, cryptography, and radar systems. However, a sequence set cannot simultaneously achieve ideal autocorrelation and ideal cross-correlation. This limitation has led to the study of two signal classes with ideal correlation properties: Zero Correlation Zone (ZCZ) sequences and Golay Complementary Sets (GCS). A Golay-ZCZ sequence set combines the advantages of both. Its constituent sequences exhibit ideal periodic autocorrelation and cross-correlation within the ZCZ, and the sums of their aperiodic autocorrelations are zero at all nonzero shifts. Therefore, a Golay-ZCZ set is both a ZCZ set and a GCS. It can thus be used in the applications of both sequence classes. An array set is a two-dimensional extension of a sequence set. Although Golay-ZCZ sequence sets have been widely studied and constructed, research on Two-Dimensional (2D) Golay-ZCZ array sets remains limited. This study proposes three constructions of 2D Golay-ZCZ array sets based on 2D multivariable functions and the concatenation operator. These array sets can be used as precoding matrices for massive Multiple Input Multiple Output (MIMO) omnidirectional transmission.  Methods  Three construction methods for 2D Golay-ZCZ array sets are proposed, including one direct construction and two indirect constructions. The resulting parameters have not been reported in existing studies. In the first construction, a 2D Golay-ZCZ array set is generated using 2D multivariable functions, with parameters expressed as prime powers. This direct function-based approach enables efficient synthesis of the target arrays. The second and third constructions generate 2D Golay-ZCZ array sets through horizontal and vertical concatenation of Two-Dimensional Complete Complementary Codes (2D CCC), respectively. In these indirect constructions, the parameters are not restricted to prime powers. This property broadens the applicability of the methods and increases parameter flexibility.  Results and Discussions  The first construction generates a 2D Golay-ZCZ array set with array size \begin{document}$ p_{1}^{{m}_{1}}\times p_{2}^{{m}_{2}} $\end{document} and ZCZ size \begin{document}$ ({p}_{1}-1)p_{1}^{{\pi }_{1}(2)-1}\times ({p}_{2}-1)p_{2}^{{\sigma }_{1}(2)-1} $\end{document} through a direct function-based method, where \begin{document}$ {p}_{1} $\end{document} and \begin{document}$ {p}_{2} $\end{document} are prime numbers. For clarity, the magnitudes of the 2D periodic cross-correlation function of the constructed array set are illustrated in Example 1 (Fig. 1). The second construction generates a ZCZ array set with array size \begin{document}$ {L}_{1}\times {N}^{2}{L}_{2} $\end{document} and ZCZ size \begin{document}$ ({L}_{1}-1)\times (N-1){L}_{2} $\end{document} based on the horizontal concatenation of \begin{document}$ (N,N,{L}_{1},{L}_{2}) $\end{document} 2D CCC. The third construction generates a ZCZ array set with array size \begin{document}$ {N}^{2}{L}_{1}\times {L}_{2} $\end{document} and ZCZ size \begin{document}$ (N-1){L}_{1}\times ({L}_{2}-1) $\end{document} based on the vertical concatenation of \begin{document}$ (N,N,{L}_{1},{L}_{2}) $\end{document} 2D CCC. An illustrative example of Construction 2 is provided, and the corresponding correlation magnitudes are shown in (Figs. 2 and 3). As summarized in (Table 1), the construction methods proposed in this paper generate parameter sets that have not been reported in the existing literature. The constructed array sets provide considerable flexibility in array dimensions and ZCZ sizes. This flexibility is valuable for the design of precoding matrices in MIMO omnidirectional transmission systems. In practical implementations, the dimension of a precoding matrix is typically determined by the number of transmit antennas, whereas the ZCZ size must match the maximum multipath delay spread of the channel. Owing to this parameter flexibility, the proposed 2D Golay-ZCZ array sets support adaptive selection under different antenna configurations and channel conditions.  Conclusions  Three construction methods for 2D Golay-ZCZ array sets are proposed. These methods generate array sets with flexible array sizes and large ZCZ widths. The first construction is based on a 2D multivariable function and can include previous results as special cases without using kernels. The second and third constructions rely on the concatenation operator and provide greater parameter flexibility. The proposed 2D Golay-ZCZ arrays have potential applications in MIMO omnidirectional transmission. The parameter-flexible array sets can be selected according to different antenna configurations and channel conditions. This property suppresses multi-antenna interference within the zero-correlation zone and maintains uniform transmitted energy.
TTSPD: A Multimodal Traffic Scene Perception Dataset Integrating Tire Data
YING Zongchen, GUI Lin, YANG Jiahan, ZHANG Fangwei, WANG Junfan, DONG Zhekang
Available online  , doi: 10.11999/JEIT260022
Abstract:
  Objective  With the rapid development of Intelligent Transportation Systems (ITS) and autonomous driving technologies, accurate traffic environment perception is a fundamental prerequisite for vehicle safety and decision making. Current perception frameworks primarily rely on high-resolution cameras and LiDAR sensors. Although these sensors provide rich information, they create severe challenges across the Perception-Storage-Calculation pipeline. High acquisition costs limit large-scale deployment. In addition, the massive data volume produced by high-dimensional sensors places heavy pressure on onboard storage and computational resources, often exceeding the power and thermal budgets of vehicle-grade edge platforms. These constraints motivate the exploration of alternative sensing paradigms that are cost-effective, compact, and computationally efficient while maintaining reliable perception accuracy. In response, the present study shifts the perception perspective from conventional external sensors to the tire-road contact interface, where abundant physical interaction information naturally exists. The objective is to construct a novel multimodal dataset, termed the Tire-integrated Traffic Scene Perception Dataset (TTSPD), which combines internal tire dynamics with external visual observations. This dataset is used to examine whether low-dimensional tire sensing data can complement or partially substitute high-dimensional visual data for accurate road surface classification. The study also aims to establish a new data morphology that balances perception performance and system efficiency for future intelligent vehicles.  Methods  To construct a high-quality and practically usable multimodal dataset, an integrated hardware-software acquisition framework is developed. From a hardware perspective, a specialized sensing system is designed by coupling tire-mounted multi-parameter sensors with a vehicle-mounted camera. To ensure reliable operation under the harsh mechanical conditions of a rotating tire, sensing nodes are encapsulated using a rubber-based composite material that provides mechanical protection and long-term stability. Wireless transmission is implemented using Bluetooth Low Energy (BLE) 5.0 with an adaptive frequency-hopping mechanism, enabling low-power and reliable communication during high-speed rotation. During data acquisition, the system synchronously collects six types of internal tire signals, including radial acceleration, tire temperature, and tire pressure, producing approximately 1.8 million sampling points. In parallel, a dashboard-mounted camera records high-resolution traffic scene images totaling 309 GB across four representative road surface conditions. To address the heterogeneity between high-frequency one-dimensional tire signals and two-dimensional visual data, a timestamp-based association strategy is adopted to achieve scene-level temporal alignment rather than strict frame-by-frame correspondence. Sensor sequences and image segments are grouped according to shared temporal windows and driving scenarios. This approach ensures semantic and temporal consistency at the scene level. The alignment strategy reflects practical deployment conditions and forms the basis of the final TTSPD dataset for multimodal fusion research.  Results and Discussions  The effectiveness of the proposed TTSPD is evaluated through comprehensive road surface classification experiments using mainstream deep learning models. Initial experiments based solely on visual data demonstrate strong baseline performance, with classification accuracies ranging from 87.25% to 93.75% (Table 7). These results confirm the quality and diversity of the visual modality in the dataset. The primary contribution of this study is the quantification of efficiency gains enabled by tire-based sensing. Comparative experiments progressively reduce the amount of visual data while integrating low-dimensional tire signals, particularly radial acceleration (Table 9). The results show that the multimodal model achieves approximately 95% of the full-data baseline accuracy while using only about 38.75% of the original data volume. This reduction in data dependency produces significant system-level benefits. Storage requirements decrease by approximately 61.25%, and overall model training time decreases by about 54.10% (Fig. 8). These findings indicate that tire dynamics encode high-value physical features related to road texture and surface conditions that complement visual cues. The proposed dataset therefore supports the development of lighter perception pipelines without reducing recognition performance.  Conclusions  This study addresses the long-standing Perception-Storage-Calculation bottleneck in vision-dominated autonomous driving systems by proposing the TTSPD. Multi-parameter sensors are embedded within tires using rubber-based encapsulation, and stable wireless communication is achieved through BLE 5.0. A robust tire-camera data acquisition system is therefore established. The resulting dataset covers four common and safety-critical road surface types: cement, asphalt, damaged, and water-covered roads. It provides a comprehensive foundation for multimodal perception research. Experimental results show that combining low-dimensional tire sensing data with visual information significantly improves perception efficiency. Approximately 95% of peak classification accuracy is achieved using only about 38.75% of the original data volume. This result effectively reduces storage pressure and computational cost, reflected in a 61.25% reduction in data storage and a 54.10% reduction in training time. The TTSPD dataset therefore proposes a practical data morphology that supports efficient and high-performance perception under vehicle-grade computational constraints. It also provides valuable resources for the future development of ITS.
Image Deraining Driven by CLIP Visual Embedding
SUN Jin, CUI Yuntong, TIAN Hongwei, HUANG Changcheng, WANG Jigang
Available online  , doi: 10.11999/JEIT251066
Abstract:
  Objective  Rain streaks introduce visual distortions that degrade image quality and significantly impair downstream vision tasks such as feature extraction and object detection. This work addresses the problem of single-image rain streak removal. Existing methods often rely heavily on restrictive priors or synthetic datasets. This dependence limits robustness and generalization because such data differ from complex and unstructured real-world scenarios. Contrastive Language-Image Pre-training(CLIP) demonstrates strong zero-shot generalization through large-scale image-text contrastive learning. Motivated by this property, this study proposes FCLIP-UNet, a visual-semantic-driven deraining architecture designed to improve rain removal and generalization in real-world rainy environments.  Methods  FCLIP-UNet adopts a U-Net encoder-decoder architecture and formulates deraining as pixel-level detail regression guided by high-level semantic features. During the encoding stage, textual queries are omitted. Instead, the first four layers of a frozen CLIP-RN50 are employed to extract robust features that are decoupled from rain distribution. These features exploit the semantic representation capability of CLIP to suppress diverse rain patterns. To guide accurate image restoration, a collaborative decoding architecture that integrates ConvNeXt-T and an Upsampling DepthWise Convolution Block (UpDWBlock) is adopted. The decoder employs ConvNeXt-T in place of conventional convolution modules to expand the receptive field and capture global contextual information. It parses rain streak patterns by using semantic priors extracted from the encoder. Under the constraint of these priors, UpDWBlock reduces information loss during upsampling and reconstructs fine-grained image details. Multi-level skip connections compensate for information loss introduced during encoding. In addition, a Layer-wise Differentiated Feature Perturbation Strategy (LDFPS) is incorporated to enhance robustness and adaptability in complex real-world rainy scenes.  Results and Discussions  Comprehensive evaluations are conducted on the Rain13K composite dataset by comparing the proposed model with ten state-of-the-art deraining algorithms. FCLIP-UNet shows consistently superior performance across all five testing subsets of Rain13K. In particular, the method outperforms the second-best approach on both datasets: on Test100 by 0.32 dB in Peak Signal-to-Noise Ratio (PSNR) and 0.06 in Structural Similarity Index Measure (SSIM); on Test2800 by 0.14 dB and 0.002, respectively. On Rain100H and Rain100L, FCLIP-UNet achieves competitive results, including the best SSIM on Rain100H and comparable results on other metrics (Table 3). To evaluate model generalization, the Rain13K-pretrained FCLIP-UNet is further tested on three datasets with different rainfall distribution characteristics: SPA-Data, HQ-RAIN, and MPID (Table 4, Fig. 7). Qualitative and quantitative evaluations are also conducted on the real-world NTURain-R dataset (Table 5, Figs. 8\begin{document}$ \sim $\end{document}10). These results consistently demonstrate the strong generalization capability of FCLIP-UNet. Ablation experiments on Rain100H validate the proposed encoder design and confirm the effectiveness of both UpDWBlock and LDFPS (Tables 6\begin{document}$ \sim $\end{document}8). Additional ablation studies show that the use of LDFPS, combined with a 1:1 weighting ratio between L1 loss and perceptual loss, provides the best performance for FCLIP-UNet (Tables 9\begin{document}$ \sim $\end{document}11).  Conclusions  This study proposes FCLIP-UNet, a deraining network designed for real-world generalization by leveraging the CLIP paradigm. Three main contributions are presented. First, image deraining is formulated as a pixel-level regression task that reconstructs rain-free images from high-level semantic features. A frozen CLIP image encoder extracts representations that remain stable across different rain distributions, thereby reducing domain shifts caused by diverse rain models. Second, a decoder that integrates ConvNeXt-T with an UpDWBlock is designed, and an LDFPS is proposed to improve robustness to unseen rain distributions. Third, a composite loss function jointly optimizes pixel-level accuracy and perceptual consistency. Experiments on both synthetic and real-world rainy datasets show that FCLIP-UNet effectively removes rain streaks, preserves fine image details, and achieves strong deraining performance with reliable generalization capability.
PSAQNet: A Perceptual Structure Adaptive Quality Network for Authentic Distortion Oriented No-reference Image Quality Assessment
JIA Huizhen, ZHAO Yuxuan, FU Peng, WANG Tonghan
Available online  , doi: 10.11999/JEIT251220
Abstract:
  Objective  No-Reference Image Quality Assessment (NR-IQA) is critical for practical imaging systems when pristine reference images are unavailable. However, many existing methods face three major challenges: limited robustness under complex distortions, weak generalization when distortion distributions shift (e.g., from synthetic to real-world settings), and insufficient modeling of geometric or structural degradations such as spatially varying blur, misalignment, and texture-structure coupling. These limitations cause models to rely excessively on dataset-specific statistics and reduce their effectiveness when applied to diverse scenes with mixed degradations. To address these issues, the Perceptual Structure Adaptive Quality Network (PSAQNet) is proposed to improve the accuracy and adaptability of NR-IQA under complex distortion conditions.  Methods  PSAQNet is designed as a unified CNN-Transformer framework that preserves hierarchical perceptual cues and supports global context reasoning. Instead of relying on late-stage pooling, distortion evidence is progressively enhanced throughout the network. The architecture contains several key components. The Advanced Distortion Enhanced Module (ADEM) operates on multi-scale features extracted from a pre-trained backbone. It adopts multi-branch gating and a distortion-aware adapter to emphasize degradation-related signals and reduce interference from dominant image content. This mechanism dynamically selects feature branches that correspond to perceptual degradation patterns, which is beneficial for spatially non-uniform or mixed distortions. To model geometric degradations, PSAQNet integrates Spatial-Guided Convolution (SGC) and Channel-Aware Adaptive Kernel convolution (CA_AK). SGC improves spatial sensitivity by guiding convolutional responses with structure-aware cues and focusing on regions where geometric distortions are prominent. CA_AK further improves geometric modeling by adaptively adjusting receptive behavior and recalibrating channels to preserve distortion-sensitive components. Additionally, PSAQNet incorporates efficient feature fusion strategies. Group Convolutional Block Attention Module (GroupCBAM) enables lightweight attention-based fusion of multi-level CNN features, whereas AttInjector selectively injects local distortion cues into global Transformer representations. This design allows global semantic reasoning to be guided by localized degradation evidence without introducing redundancy or instability.  Results and Discussions  Extensive experiments on six benchmark datasets containing both synthetic and real-world distortions demonstrate that PSAQNet achieves strong performance and stable agreement with human subjective judgments. The proposed method outperforms several recent approaches, particularly on real-world distortion datasets. These results indicate that PSAQNet effectively enhances distortion evidence, models geometric degradation, and integrates local distortion cues with global semantic representations. Such capabilities improve robustness under distribution shifts and reduce reliance on narrow distortion priors. Ablation studies confirm the contribution of each module. ADEM increases distortion saliency, SGC and CA_AK improve sensitivity to geometric degradations, and GroupCBAM and AttInjector strengthen the interaction between local and global features. Cross-dataset evaluations further demonstrate the generalization capability of PSAQNet across different content categories and distortion types. Scalability experiments also show that the framework benefits from stronger pretrained backbones without compromising its modular design.  Conclusions  PSAQNet addresses several key limitations in NR-IQA by integrating local distortion enhancement, geometric-aware feature modeling, and global semantic fusion within a unified framework. The modular architecture improves robustness and generalization across diverse distortion conditions and supports practical deployment in real-world scenarios. Future work will explore vision–language pre-training to improve cross-scene adaptability.
A Channel Phase Self-compensation Method for Active-Integrated Arrays
SUN Liying, LU Yunlong, XU Jun, HU Yang
Available online  , doi: 10.11999/JEIT251325
Abstract:
The seamless integration of active circuitry and antennas can effectively improve link performance and system integration. At present, active-integrated antennas are mainly designed by adjusting the antenna impedance while maintaining the desired radiation characteristics to achieve direct matching with active transistors. However, the effect of the antenna’s complex impedance on the phase response of the active channel, as well as its potential application in active-integrated phased arrays, has not been thoroughly studied. This paper proposes a channel phase self-compensation method for active-integrated arrays. For each active channel, the active transistor is directly integrated with the radiating element, where the load impedance at the transistor drain is matched to the input impedance of the antenna element. Under a constant active gain, the required complex load impedance is solved to establish an explicit mapping between the phase response of each active channel and its corresponding load impedance. According to the phase-shift requirements among array channels, appropriate load impedances are selected as the input impedances of the corresponding radiating elements. This approach applies a predefined phase distribution to each channel without using external phase-shifting structures. It can control the initial beam direction or compensate for the path difference between elements in conformal arrays. An active-integrated phased-array antenna with a preset beam direction is designed as a demonstration example to verify the effectiveness of the proposed method. The method provides an efficient design approach for next-generation active-integrated arrays.  Objective  In the traditional design approach, active circuit channels and antenna arrays are matched to 50 Ω before interconnection. This configuration occupies considerable physical space and limits system-level integration. In addition, insertion loss in passive matching networks and mismatch loss at the interconnections reduce overall link performance. Direct co-integration of active circuitry and antenna elements can address these limitations. However, multi-channel active-integrated antenna arrays often require one or multiple superimposed phase distributions across the channels to satisfy different application requirements, such as initial beam offset in fuze systems, wavefront compensation in conformal active phased arrays, and wide-angle beam scanning. These phase gradients are typically realized through backend phase-shifting networks. In this work, the complex impedance characteristics of the antenna are adjusted when it is directly integrated with the active circuitry. The phase response of the active-integrated channels can therefore be tuned within a certain range without using complex matching networks or additional phase shifters. This strategy reduces the complexity and performance requirements of the backend phase-shifting network. The advantages are more evident in millimeter-wave, high-frequency, and terahertz systems, where the available phase-shift range of phase shifters is limited.  Methods  Phase self-compensation of the active channels is achieved through the direct integration of the active transistor and the radiating element. In this configuration, the drain output of the transistor is directly connected to the input of the radiating element, and impedance transformation is realized within the antenna element. The proposed method includes three main steps. (1) The active transistor is first modeled as a two-port network. By evaluating the antenna element’s complex impedance as the load on different constant-gain circles, the mapping between the phase response of the active channel and the load impedance is established. The achievable phase-shift range of the active channel is then determined. (2) According to the required phase-shift distribution among the array channels, suitable combinations of active gain and corresponding complex load impedances (not unique) are selected. These combinations are not unique. (3) The realizability of the selected impedances is examined according to the characteristics of the radiating element. The impedance values with the highest feasibility are implemented by optimizing the radiating element, which includes fine adjustment of its geometry and feed position to meet the target impedance. When the radiating element is modified, particularly for circularly polarized elements, desirable radiation characteristics must also be preserved, including good axial ratio and beam-scanning performance.  Results and Discussions  The proposed phase self-compensation mechanism enables the array to achieve initial beam pointing and compensate for path-length differences caused by special array geometries, such as conformal or curved surfaces, without using additional phase-shifting structures. Therefore, the performance requirements of the backend phase-shifting network in active phased arrays can be reduced. To verify the effectiveness of the proposed method, a 1×4 circularly polarized active-integrated linear array (Fig. 9) is designed and demonstrated. Based on channel-level impedance calculations (Fig. 6) and an analysis of the antenna-element impedance characteristics (Fig. 8), a phase gradient of 38° between adjacent channels is synthesized and applied to the circularly polarized active-integrated array. Without degrading the circular polarization performance and without external phase-shifting circuitry, the initial beam direction of the active-integrated phased array is shifted to the desired angle of θ0 = 12° (Fig. 13). The phase self-compensation design does not degrade the beam-scanning capability of the array. After an additional phase gradient is applied for beam steering, the array achieves a scanning range of up to 50°. The gain reduction remains within 2 dB relative to the initial pointing direction, and the axial ratio remains below 4 dB throughout the scanning range.  Conclusions  Within the framework of active-integrated arrays, this work uses the phase-tuning effect produced by the complex impedance at the antenna port when the radiating element is directly matched to the active transistor. A desired phase-gradient distribution can therefore be synthesized among the channels of an active-integrated phased array within an achievable range. This capability enables compensation for required phase distributions, such as preset beam direction and path-length equalization in conformal-array applications, without relying on additional phase shifters. Therefore, the complexity and performance requirements of the backend phase-shifting circuitry are reduced. The effectiveness of the proposed method is validated through a multi-channel circularly polarized active-integrated phased-array prototype with a preset beam direction. Both full-wave simulations and experimental measurements confirm that the phase self-compensation mechanism provides the required initial beam pointing while preserving beam-scanning capability and polarization performance. This study provides a new approach for the design of high-efficiency next-generation active-integrated phased arrays.
Design of a CNN Accelerator Based on Systolic Array Collaboration with Inter-Layer Fusion
LU Di, WANG Zhen Fa
Available online  , doi: 10.11999/JEIT250867
Abstract:
  Objective  With the rapid deployment of deep learning in edge computing, the demand for efficient Convolutional Neural Network (CNN) accelerators continues to increase. Although traditional CPUs and GPUs provide strong computational capability, they incur high power consumption, long latency, and limited scalability in real-time embedded scenarios. FPGA-based accelerators, due to their reconfigurability and parallelism, provide a viable alternative. However, current designs often show low resource utilization, memory access bottlenecks, and difficulty in balancing throughput and energy efficiency. To address these issues, a systolic array–based CNN accelerator with inter-layer fusion optimization is proposed. The design integrates an enhanced memory hierarchy and optimized computation scheduling. Hardware-oriented convolution mapping and lightweight quantization are adopted to improve computational efficiency and reduce resource consumption, while meeting real-time inference requirements for applications such as intelligent surveillance and autonomous driving.  Methods  This study addresses core challenges in FPGA-based CNN accelerators, including data transfer overhead, insufficient resource utilization, and low processing unit efficiency. A hybrid accelerator architecture based on systolic array–assisted inter-layer fusion is proposed. Computation-intensive adjacent layers are tightly coupled and executed sequentially within a single systolic array, which reduces frequent off-chip memory accesses for intermediate results. This reduces data transfer overhead and power consumption and improves computation speed and overall energy efficiency. A dynamically reconfigurable systolic array is further developed to support multi-dimensional matrix multiplications with varying scales. This design avoids resource waste caused by fixed-function hardware and reduces FPGA logic consumption, thereby improving hardware adaptability and flexibility. A streaming systolic array computation scheme is also introduced through coordinated computation flow and control logic. Processing elements maintain a high-efficiency operating state, and data flows continuously through the computation engine in a pipelined and parallel manner. This improves processing unit utilization, reduces idle cycles, and increases overall throughput.  Results and Discussions  To determine appropriate quantization precision, experiments are conducted on the MNIST dataset using VGG16 and ResNet50 under fixed-point quantization with 12-bit, 10-bit, 8-bit, and 6-bit precision. As shown in Table 1, inference accuracy decreases significantly when precision falls below 8 bits, indicating that excessively low precision weakens model representational capacity. On the proposed accelerator, VGG16, ResNet50, and YOLOv8n achieve peak computational performances of 390.25 GOPS, 360.27 GOPS, and 348.08 GOPS, respectively. Performance comparisons with FPGA accelerators reported in the literature are summarized in Table 4. Table 5 presents comparisons with CPU and GPU platforms in terms of throughput and energy efficiency. For VGG16, ResNet50, and YOLOv8n, the proposed accelerator delivers throughput that is 1.76×, 3.99×, and 2.61× higher than the corresponding CPU platforms. Energy efficiency improves by 3.1× (VGG16), 2.64× (ResNet50), and 2.96× (YOLOv8n) compared with GPU platforms, demonstrating superior energy utilization.  Conclusions  A systolic array–assisted inter-layer fusion CNN accelerator architecture is proposed. A theoretical analysis of computational density confirms the performance advantages of the design. To address variation in convolution window sizes in the second layer, a dynamically reconfigurable systolic array method is developed. A streaming systolic array scheme is also implemented to sustain pipelined and parallel data flow within the computation engine. This design reduces idle cycles and improves throughput. Experimental results show that the accelerator achieves high computational performance with minimal loss in inference accuracy. Peak performances of 390.25 GOPS, 360.27 GOPS, and 348.08 GOPS are achieved for VGG16, ResNet50, and YOLOv8n, respectively. Compared with CPU and GPU platforms, the proposed accelerator shows superior energy efficiency and is suitable for resource-constrained and energy-sensitive edge computing scenarios.
Design of a Narrowband Energy Selective Protective Antenna Integrating Electromagnetic Protection and Anti-interference Capabilities
GAI Longjie, XU Yanlin, WANG Sijun, LIU Peiguo, HU Ning, HE Zhengwei
Available online  , doi: 10.11999/JEIT251363
Abstract:
  Objective  With the rapid advancement of wireless communication technologies, the electromagnetic (EM) environment has become increasingly complex. Electronic information equipment is facing growing challenges from high-intensity radiation fields (HIRFs) and out-of-band interference, making the co-design of EM protection and out-of-band interference suppression for electronic information systems a critical and urgent issue. As the front-end of the radio frequency channel, antennas serve as the primary pathway for converting EM waves in free space into guided waves within microwave circuits. High-power EM waves can couple into the system through antennas, causing EM damage. Moreover, in single-frequency application scenarios, if the antenna lacks narrowband characteristics, out-of-band interference signals may also enter the system via the antenna, disrupting normal operation. Therefore, it is essential to design a narrowband energy-selective protective antenna that simultaneously achieves out-of-band interference suppression and in-band protection against strong EM threats, thereby enhancing the operational stability and environmental adaptability of electronic information equipment in complex EM environments.  Methods  This paper focuses on the design of a coaxial-fed microstrip patch antenna, carrying out structural design and simulation optimization based on this antenna type. In accordance with the design requirements for the specific operating frequency of 915 MHz, the antenna structure itself is endowed with both narrowband characteristics and EM protection capabilities, thereby achieving an integrated design of EM protection and anti-interference. A high dielectric constant contributes simultaneously to antenna miniaturization and narrowband operation. Therefore, a TP-2 substrate with a dielectric constant of 20 is selected in this work to achieve the desired narrowband performance. In traditional coaxial-fed microstrip patch antennas, the probe structure passes directly through the dielectric substrate and connects to the radiating patch, leaving insufficient space for integrating protective structures. To address this limitation, a design approach featuring a layered substrate with a central hollow cavity is adopted, constructing a layered-cavity protective structure that enables the antenna itself to possess energy-selective protection characteristics.  Results and Discussions  To verify the performance of the proposed antenna, this study carried out physical fabrication and experimental measurement (Fig. 14). The measured center frequency of the antenna is 928.5 MHz, with an operating bandwidth from 927.0 MHz to 930.0 MHz. Although the measured center frequency exhibits a shift of 12.8 MHz compared to the simulated design value, the antenna still demonstrates favorable narrowband characteristics (Fig. 15). The measured radiation pattern agrees well with simulated results. In the Phi = 0 deg plane, the antenna exhibits stable omnidirectional radiation characteristics, with a measured maximum gain of 2.5 dBi (Figs. 11, 16). The shielding effectiveness (SE) of the antenna was measured using a high-power injection test method. As the injected power increased, the radiated power grew linearly. When the injected power reached 22 dBm, the growth trend of the radiated power saturated, indicating that the diodes in the protection structure began to conduct and the energy-selection mechanism was activated. Upon further increasing the injected power, the SE gradually rose. When the injected power reached 48 dBm, the radiated power surged to the level corresponding to the original linear radiated power, and the SE dropped sharply, indicating that the diodes had broken down and the protection structure failed. In summary, the activation threshold of the antenna’s protection function is 26 dBm, and the device damage threshold is 48 dBm. Within this range, a maximum SE of 26 dB is achieved (Fig. 18).  Conclusions  Based on the structure of a coaxial-fed microstrip patch antenna, this paper designs and implements a narrowband energy-selective protective antenna with an integrated EM protection and anti-interference capabilities. The study covers the complete research process, from theoretical analysis and structural simulation optimization to physical fabrication and experimental verification. Firstly, the characteristic mode analysis (CMA) method was employed to qualitatively investigate the potential operating modes of the microstrip patch antenna. By analyzing the electric and magnetic field modal distributions, the impedance matching characteristics were examined, and the optimal position for the coaxial feed point was determined accordingly. Subsequently, the use of a high-permittivity substrate enabled both antenna miniaturization and narrowband performance, achieving an interference suppression capability (ISC) of better than 22.1 dB. Furthermore, a structural design featuring a layered substrate with a central hollow cavity was proposed, creating a cavity-based protective structure integrated into the feed probe region. An equivalent circuit model was also established to thoroughly explain the operational mechanisms of the antenna under normal and protective states. Finally, the antenna prototype was fabricated and its performance was measured. Measured results demonstrate that the antenna exhibits favorable narrowband characteristics, and its radiation pattern aligns well with simulated results, with a measured maximum gain of 2.5 dBi. Moreover, applying the reciprocity principle and using a high-power injection method for SE testing, a maximum SE of 26 dB was recorded, confirming its excellent EM protection capability. Compared with existing protective antennas, the proposed structure simultaneously achieves out-of-band interference suppression and EM protection within the antenna's own architecture, effectively advancing the integrated design of “frequency-domain interference suppression and energy-domain protection.” It should be noted that factors such as non-uniform dielectric constant of the substrate and fabrication tolerances led to a certain deviation between the measured and simulated center frequency, which reflects to some extent the sensitivity of narrowband antennas to structural parameters. In future research, the introduction of a tunable mechanism could be explored to develop a frequency-reconfigurable narrowband energy-selective protective antenna, thereby dynamically compensating for frequency deviations and enhancing design robustness and environmental adaptability.
Shallow-Water Geoacoustic Parameter Inversion with Stokes Parameters and a Multi-Task Attention U-Net
HUANG Qianzhuo, LI Xiaoman, BI Xuejie, ZHANG Zishi, TONG Han, LI Fei
Available online  , doi: 10.11999/JEIT251085
Abstract:
  Objective  Geoacoustic parameters in shallow water are crucial for analyzing the characteristics of underwater acoustic propagation. However, traditional inversion methods face challenges such as high computational complexity, significant cost, and strong dependence on the accuracy of environmental models. To address these issues, this study proposes an efficient and robust inversion approach designed to overcome the limitations of conventional methods. The proposed method aims to provide more reliable and stable estimation of shallow-water geoacoustic parameters, enabling improved performance in practical applications while maintaining computational efficiency and robustness.  Methods  This study is based on the Stokes polarization parameters of the vector acoustic field. Signals received by a single vector hydrophone are processed using a warping transformation to separate and extract normal modes propagating in a shallow-water waveguide. The extracted signals are subsequently used to compute the Stokes parameters, which are normalized and employed as input features for the inversion model. An attention-enhanced multi-task U-Net neural network is constructed, adopting a shared encoder and multiple decoder branches to predict key geoacoustic parameters, including compressional wave velocity, shear wave velocity, density, and attenuation coefficients. In addition, channel and spatial attention mechanisms, together with a multi-task loss function incorporating uncertainty weighting, are applied to optimize feature extraction and achieve adaptive balancing among the different parameter inversion tasks.  Results and Discussions  The introduced attention mechanism proves effective in suppressing fluctuations in model predictions, thereby significantly boosting the accuracy and stability of geoacoustic parameter inversion.The mean absolute percentage errors for both compressional and shear wave velocities were consistently below 5 %(Table 2) upon evaluation on a dataset comprising 200 test cases. With the incorporation of attention mechanisms, the errors in shear wave velocity and seabed density were further reduced to less than 3 %(Table 3), demonstrating enhanced precision in predicting these key parameters. The proposed method is not only insensitive to parameters mismatch but also exhibits strong robustness against environmental variations. Furthermore, the approach was validated using real measurement data from a shallow-water region in the northern South China Sea(Fig.16), confirming both the effectiveness and reliability of the method in practical scenarios(Table 4 and Fig.18). These results collectively demonstrate that the attention-enhanced multi-task U-Net framework can effectively capture critical features from Stokes polarization parameters, leading to more stable and accurate geoacoustic parameter estimation in shallow-water environments.  Conclusions  The inversion method based on Stokes parameters and an attention-enhanced multi-task U-Net can effectively improve the accuracy and stability of shallow-water geoacoustic parameter estimation, showing particularly strong performance in predicting compressional wave velocity, shear wave velocity, and density. However, it still has limitations in the inversion of seabed attenuation coefficients. Future research should further improve feature extraction methods and network architecture, and explore the applicability of this approach under more complex marine conditions.
A Testability Evaluation Method Based on Reconvergent Fan-out
WU Wenjun, LIANG Huaguo, YOU Chang, DOU Xianrui, XIAO Jiahui, LU Yingchun
Available online  , doi: 10.11999/JEIT251286
Abstract:
  Objective  As the scale and structural complexity of integrated circuits continue to increase, accurate testability evaluation has become essential for Trojan detection, fault diagnosis, and test-point optimization in modern Design-For-Testability (DFT) flows. Metrics such as controllability, observability, and fault coverage rely heavily on reliable probabilistic modeling of signal propagation. However, existing analytical and learning-based approaches often exhibit degraded accuracy in circuits containing dense Reconvergent Fan-Out (RFO) structures, where strong signal correlation invalidates classical independence assumptions and introduces significant estimation bias. Although several enhanced techniques attempt to incorporate structural information, many suffer from high computational cost or limited scalability when applied to deeper or more reconvergent logic networks. This work aims to address these limitations by proposing a testability evaluation method that incorporates RFO structural characteristics to improve modeling accuracy while maintaining practical computational efficiency.  Methods  The proposed approach begins with a structural-analysis algorithm that identifies RFO regions through a topological traversal of the circuit. A dedicated RFO-recognition mechanism maps each root fan-out node to its corresponding reconvergent fan-out nodes, capturing the structural dependencies that govern correlated signal behavior and providing the foundation needed for accurate probabilistic modeling. Building on this structural extraction, a weighted conditional-probability model is formulated to correct testability distortion within reconvergent regions. Unlike prior optimization schemes, the weighting strategy assigns influence-based weights derived from the contribution of each root node to the target node, yielding probability estimates that better reflect real testability behavior. Furthermore, an efficient computational framework is developed, integrating conditional probability propagation and weight selection within a single topological-traversal process, thereby maintaining low algorithmic complexity while enhancing accuracy.  Results and Discussions  The proposed method is evaluated on representative benchmark circuits from the ISCAS-85, ISCAS-89, ITC’99 , and EPFL suites. Performance is assessed in terms of controllability accuracy, ordering consistency, fault-coverage estimation, and runtime efficiency. For controllability prediction, the method achieves an average RMSE of 0.0568, corresponding to an average reduction of 25% compared with existing techniques, as reported in Table 2. Ordering consistency also improves, with the average Spearman correlation coefficient reaching 0.935, outperforming existing techniques. Fault-coverage estimation demonstrates similarly strong performance, with an average relative error of 3.64%, which is lower than previously reported methods, as shown in Table 1. Runtime analysis further indicates that the proposed framework maintains practical computational efficiency. Across all benchmark circuits, the method achieves an average speedup of 7× while preserving high accuracy, as illustrated in Figure 5.  Conclusions  This work addresses the degra dation of testability-evaluation accuracy caused by reconvergent fan-out structures in integrated circuits by proposing a reconvergent-fan-out-aware testability analysis method. The presented RFO-structure identification algorithm extracts reconvergent information at the topology level and establishes explicit mappings between root nodes and reconvergent fan-out nodes. Based on this structural foundation, a weighted conditional-probability model is constructed to mitigate probability distortion induced by signal correlation in RFO regions. An efficient computational framework is further developed to integrate the entire computation within a streamlined traversal-based process. Experimental results demonstrate that the proposed technique achieves accurate fitting of controllability RMSE and ordering consistency with respect to simulation-based ground truth. In testability estimation, the predicted fault-coverage values also match simulation results closely. While maintaining high accuracy, it also exhibits low computational overhead.
Optimal Federated Average Fusion of Gaussian Mixture–Probability Hypothesis Density Filters
XUE Yu, XU Lei
Available online  , doi: 10.11999/JEIT250759
Abstract:
  Objective  To realize optimal decentralized fusion tracking of uncertain targets, this study proposes a federated average fusion algorithm for Gaussian Mixture–Probability Hypothesis Density (GM-PHD) filters, designed with a hierarchical structure. Each sensor node operates a local GM-PHD filter to extract multi-target state estimates from sensor measurements. The fusion node performs three key tasks: (1) maintaining a master filter that predicts the fusion result from the previous iteration; (2) associating and merging the GM-PHDs of all filters; and (3) distributing the fused result and several parameters to each filter. The association step decomposes multi-target density fusion into four categories of single-target estimate fusion. We derive the optimal single-target estimate fusion both in the absence and presence of missed detections. Information assignment applies the covariance upper-bounding theory to eliminate correlation among all filters, enabling the proposed algorithm to achieve the accuracy of Bayesian fusion. Simulation results show that the federated fusion algorithm achieves optimal tracking accuracy and consistently outperforms the conventional Arithmetic Average (AA) fusion method. Moreover, the relative reliability of each filter can be flexibly adjusted.  Methods  The multi-sensor multi-target density fusion is decomposed into multiple groups of single-target component merging through the association operation. Federated filtering is employed as the merging strategy, which achieves the Bayesian optimum owing to its inherent decorrelation capability. Section 3 rigorously extends this approach to scenarios with missed detections. To satisfy federated filtering’s requirement for prior estimates, a master filter is designed to compute the predicted multi-target density, thereby establishing a hierarchical architecture for the proposed algorithm. In addition, auxiliary measures are incorporated to compensate for the observed underestimation of cardinality.  Results and Discussions  modified Mahalanobis distance (Fig.3). The precise association and the single-target decorrelation capability together ensure the theoretical optimality of the proposed algorithm, as illustrated in Fig. 2. Compared with conventional density fusion, the Optimal Sub-Pattern Assignment (OSPA) error is reduced by 8.17% (Fig. 4). The advantage of adopting a small average factor for the master filter is demonstrated in Figs. 5 and 6. The effectiveness of the measures for achieving cardinality consensus is also validated (Fig. 7). Another competitive strength of the algorithm lies in the flexibility of adjusting the average factors (Fig. 8). Furthermore, the algorithm consistently outperforms AA fusion across all missed detection probabilities (Fig. 9).  Conclusions  This paper achieves theoretically optimal multi-target density fusion by employing federated filtering as the merging method for single-target components. The proposed algorithm inherits the decorrelation capability and single-target optimality of federated filtering. A hierarchical fusion architecture is designed to satisfy the requirement for prior estimates. Extensive simulations demonstrate that: (1) the algorithm can accurately associate filtered components belonging to the same target, thereby extending single-target optimality to multi-target fusion tracking; (2) the algorithm supports flexible adjustment of average factors, with smaller values for the master filter consistently preferred; and (3) the superiority of the algorithm persists even under sensor malfunctions and high missed detection rates. Nonetheless, this study is limited to GM-PHD filters with overlapping Fields Of View (FOVs). Future work will investigate its applicability to other filter types and spatially non-overlapping FOVs.
A Causality-Guided KAN Attention Framework for Brain Tumor Classification
FAN Yawen, WANG Xiang, YUE Zhen, YU Xiaofan
Available online  , doi: 10.11999/JEIT250865
Abstract:
  Objective  Convolutional Neural Network (CNN)-based Computer-Aided Diagnosis (CAD) systems have advanced brain tumor classification in recent years. However, performance remains limited by feature confusion and insufficient modeling of high-order interactions. This study proposes a framework that integrates causal feature guidance with a KAN attention mechanism. A Confusion Balance Index (CBI) is developed to quantify real label distribution within clusters. A causal intervention mechanism then incorporates confused samples to strengthen discrimination between causal variables and confounding factors. A spline-based KAN attention module is further constructed to model high-order feature interactions and enhance focus on critical lesion regions and discriminative features. The combined causal modeling and nonlinear interaction enhancement improves robustness and addresses the inability of traditional architectures to capture complex pathological feature relationships.  Methods  A pre-trained CLIP model is used for feature extraction to obtain semantically rich visual representations. K-means clustering and the CBI are applied to identify confusing factor images, after which a causal intervention mechanism incorporates these samples into the training process. A causal-enhanced loss function is then designed to strengthen discrimination between causal variables and confounding factors. To address limited high-order interaction modeling, a Kolmogorov-Arnold Network (KAN)-based attention mechanism is integrated. This spline-based module constructs flexible nonlinear attention representations and refines high-order feature interactions. When fused with the backbone network, it improves discriminative performance and generalization.  Results and Discussions  The proposed method achieves superior performance across three datasets. On DS1, the model reaches 99.92% accuracy, 99.98% specificity, and 99.92% precision, outperforming RanMerFormer (+0.15%) and SAlexNet (+0.23%) and exceeding traditional CNNs by more than 2% (95%~97%). Swin Transformers reach 98.08% accuracy but only 91.75% precision, indicating stronger robustness of the proposed model in reducing false detections. On DS2, the method achieves 98.86% accuracy and 98.80% precision, exceeding the next-best RanMerFormer. On a more challenging in-house dataset, it maintains 90.91% accuracy and 95.45% specificity, showing generalization in complex settings. The gains result from the KAN attention mechanism’s ability to model high-order interactions and the causal reasoning module’s decoupling of confounding factors. These components improve focus on lesion regions and stabilize decision-making in complex scenarios. The results demonstrate reliable performance for clinical precision diagnostics.  Conclusions  The findings confirm that the proposed framework improves brain tumor classification. The combined effect of the causal intervention mechanism and the KAN attention module is the primary contributor to performance gains. These improvements require minimal increases in model parameters and inference latency, preserving efficiency and practicality. The study proposes a methodological direction for medical image classification and shows potential utility in few-shot learning and clinical decision support.
Multi-path Resource Allocation for Confidential Services Based on Network Coding and Fragmentation Awareness in EONs
LIU Huanlin, AN Dongxin, CHEN Yong, CHEN Haonan, MA Bing, ZOU Jiachen
Available online  , doi: 10.11999/JEIT251222
Abstract:
  Objective  Each fiber in Elastic Optical Networks (EONs) provides enormous bandwidth capacity and carries a large volume of services and data. If any element in EONs is eavesdropped on or attacked, even for a short period, a large amount of data may be leaked or lost, which significantly reduces network performance. Moreover, confidential services are increasingly sensitive to data leakage and loss during transmission. Network attacks may therefore compromise a large number of confidential services. Network Coding (NC) combines data from different services using the XOR operation and transmits the coded data through EONs. Decoding is then performed at the receiver to recover the original information, providing a potential method to mitigate data eavesdropping during transmission. However, NC requires encryption constraints in EONs. Specifically, the routing and Frequency Slot (FS) allocation of other services must overlap with those of the confidential service to be encrypted. Therefore, routing and spectrum allocation for confidential services should consider both NC constraints and the efficiency of resource allocation.  Methods  A Multi-path Resource Allocation based on Network Coding and Fragmentation Awareness (MRA-NCFA) method is proposed to support secure and reliable transmission of confidential services under eavesdropping attacks. First, the proposed method applies NC to encrypt service data and adopts multi-path protection to improve transmission reliability. Second, in the routing stage, different strategies are designed for confidential and non-confidential services. For non-confidential services, the objective is to balance network load and improve resource utilization. A path weight function based on path load is designed. This function considers path hop count, the maximum idle spectrum block on the path, and the required FS of the service. The path with the largest function value is selected as the transmission path. For confidential services, routing selection focuses on preventing information leakage while considering path resource availability. Therefore, a path cost function based on eavesdropping probability is designed, and a routing strategy that considers this probability is adopted. Finally, different resource allocation strategies are applied. For non-confidential services, the objective is to maximize spectrum efficiency. Spectrum fragmentation should be minimized to maintain resource continuity and consistency. Therefore, a fragmentation-aware spectrum allocation strategy is designed. A fragmentation measurement formula evaluates the effect of service allocation on link resources. For confidential services, encryption constraints and FS matching must be satisfied. Therefore, a spectrum allocation strategy based on FS and fragmentation sensing is designed. This strategy considers both the effect of spectrum fragments and the effect of established service resources, which improves transmission security for confidential services.  Results and Discussions  The proposed MRA-NCFA algorithm achieves the lowest service blocking probability (Fig. 2). During routing selection, both confidential and non-confidential services consider path resource conditions. During resource allocation, fragmentation effects are also considered, which preserves idle resources for subsequent services as much as possible. In addition, confidential services adopt a multi-path transmission method. Large services can be divided into multiple sub-services, which improves spectrum resource utilization. As the number of services increases, the spectrum utilization of the MRA-NCFA algorithm improves significantly. This improvement results from the multi-path transmission mechanism, which divides large services into smaller ones and allows efficient use of small spectrum fragments. In addition, both confidential and non-confidential services consider path resource quantity during routing and prefer paths with lower spectrum consumption. During resource allocation, fragmentation effects are considered to avoid generating new fragments, which improves spectrum utilization (Fig. 3). As the number of services increases, the proposed MRA-NCFA algorithm shows the slowest and smallest increase in spectrum fragmentation ratio compared with the other two algorithms. This result occurs because the algorithm combines multi-path transmission with fragmentation-aware resource allocation, which improves the utilization of small spectrum fragments and reduces fragmentation in EONs. Moreover, both confidential and non-confidential services consider fragmentation effects during resource allocation and apply strategies to reduce fragmentation. Therefore, the proposed algorithm performs better than the Survivable Multipath Fragmentation-Sensitive Fragmentation-Aware Routing and Spectrum Assignment (SM-FSFA-RSA) algorithm and the Network Coding-based Routing and Spectrum Allocation (NC-RSA) algorithm (Fig. 4).  Conclusions  This study examines resource allocation for services that require protection against eavesdropping attacks in elastic optical networks. The objective is to satisfy the security requirements of confidential services and reduce spectrum fragmentation. The proposed MRA-NCFA algorithm applies NC to encrypt confidential services and adopts multi-path protection to improve transmission reliability. For non-confidential services, a path weight function based on path resources is designed for routing selection, and fragmentation-aware spectrum metrics are used for resource allocation. For confidential services, a path cost function that considers both path resources and eavesdropping probability is designed for routing selection. A bandwidth segmentation strategy based on eavesdropping probability supports multi-path transmission, and an FS and fragmentation sensing function based on encryption constraints is used for spectrum allocation. These mechanisms improve both reliability and security for confidential services. As the number of security-sensitive services on the Internet increases, the proposed MRA-NCFA algorithm can effectively reduce traffic blocking probability and improve spectrum resource utilization.
Joint Power Allocation and AP On-Off Control for Long-Term Energy Efficient Cell-Free Massive MIMO Systems
WEI Siqi, GUO Fengqian, CHONG Baolin, CHENG Guo, LU Hancheng
Available online  , doi: 10.11999/JEIT260014
Abstract:
  Objective   With the rapid development of wireless communication technologies, Cell-Free Massive Multiple-Input Multiple-Output (CF-mMIMO) has emerged as an effective paradigm to overcome the limitations of traditional cell-centric networks, such as limited performance for edge users. By deploying a large number of distributed Access Points (APs) connected to a Central Processing Unit (CPU) to cooperatively serve users, CF-mMIMO improves spectral efficiency and macro-diversity gain. However, dense AP deployment also introduces a critical challenge: high energy consumption. In practical systems, if all APs remain continuously active, especially during periods of low traffic load, substantial and unnecessary energy consumption occurs. This behavior reduces network sustainability and conflicts with global “dual-carbon” goals. Existing studies on energy efficiency in CF-mMIMO systems mainly focus on short-term performance optimization. These short-term approaches often ignore long-term traffic dynamics and the requirement of queue stability. Therefore, they lack robustness under time-varying traffic conditions and may cause queue congestion and significant performance fluctuations, which are unacceptable for next-generation wireless networks with strict reliability requirements. Although several recent studies examine long-term energy efficiency optimization, most assume that all APs remain active at all times. Therefore, the energy-saving potential of adaptive AP on-off control is not fully utilized.  Methods   To address these issues, a joint power allocation and AP on-off control strategy is proposed for downlink CF-mMIMO systems. The optimization problem aims to maximize long-term energy efficiency subject to user queue stability and AP power constraints. Because the problem has stochastic and long-term characteristics, the Lyapunov optimization framework is applied to transform the original long-term fractional programming problem into a sequence of deterministic drift-plus-penalty minimization problems solved in each time slot. The resulting per-slot problems remain nonconvex. Therefore, each problem is decomposed into two subproblems: power allocation and AP on-off control. The Successive Convex Approximation (SCA) method is used to convert the nonconvex formulations into solvable convex problems. An alternating optimization algorithm is then developed to jointly solve the two subproblems, which enables adaptive resource configuration under dynamic network conditions and stochastic traffic arrivals.  Results and Discussions   The proposed algorithm is evaluated through extensive simulations. First, the convergence behavior is examined. Numerical results (Fig. 2) show that per-slot energy efficiency increases rapidly and stabilizes after several iterations, which verifies the convergence of the alternating optimization procedure. Second, the effect of the control parameter is analyzed. As the parameter increases, the algorithm places greater emphasis on energy efficiency. Average power consumption decreases and then stabilizes (Fig. 3), whereas long-term energy efficiency increases and eventually stabilizes (Fig. 4). These results confirm the trade-off between energy efficiency and queue stability. Third, the proposed scheme is compared with three baseline methods. The results (Fig. 5) show that the proposed joint optimization approach consistently achieves higher long-term energy efficiency than the baseline methods. Fourth, the necessity of long-term optimization is demonstrated by comparing queue lengths with a short-term baseline (Fig. 6). Under the same traffic arrival rate, the short-term method shows cumulative queue growth, whereas the Lyapunov-based approach maintains queue lengths within a stable range and ensures network stability. Finally, robustness under imperfect Channel State Information (CSI) is evaluated (Fig. 7). Although energy efficiency decreases as channel uncertainty increases, the proposed method consistently outperforms the baseline approaches, which demonstrates strong robustness to channel estimation errors.  Conclusions   A long-term energy efficiency optimization framework is proposed for CF-mMIMO systems with stochastic traffic arrivals. By applying Lyapunov optimization theory, the stochastic long-term problem is transformed into slot-level drift-plus-penalty problems based on queue states. This transformation enables per-slot resource scheduling decisions while maintaining queue stability. On this basis, an efficient joint resource scheduling algorithm that integrates power allocation and AP on-off control is developed. The original problem is decomposed into power allocation and AP on-off control subproblems and solved through alternating optimization. Simulation results show that the proposed method adapts to dynamic traffic conditions. By placing underutilized APs into sleep mode, the algorithm improves long-term system energy efficiency and maintains queue stability. These results provide guidance for the design of green and sustainable wireless networks.
Phase Shift-Based Covert Backdoor Attack Strategy in Deep Neural Networks
ZHANG Heng, XIA Yu, REN Yan, DU Linkang, ZHANG Zhikun
Available online  , doi: 10.11999/JEIT251145
Abstract:
  Objective  The proliferation of deep neural networks (DNNs) in safety-critical domains such as autonomous driving and biomedical diagnostics has heightened concerns about their vulnerability to adversarial threats, particularly backdoor attacks. These attacks embed hidden triggers during training, causing models to behave normally on clean inputs while executing malicious actions when specific triggers are present. Existing backdoor methods predominantly operate in the spatial domain or frequency domain, but they face a fundamental trade-off between attack success rate (ASR) and stealthiness. Spatial triggers often introduce visible artifacts, while frequency-based amplitude perturbations disrupt energy distribution, making them detectable by advanced defenses like spectral anomaly detection. This work addresses the critical need for a backdoor paradigm that simultaneously achieves high attack performance, minimal perceptual distortion, and robustness against state-of-the-art defenses. Our objective is to develop a frequency-domain backdoor attack leveraging phase manipulation, which inherently aligns with human visual perception and structural coherence, thereby overcoming the limitations of existing methods.  Methods  FDPS integrates frequency-domain phase manipulation with perceptual similarity screening and standard data poisoning. The method begins by converting input images from RGB to YCrCb color space. This conversion isolates chrominance channels while preserving luminance information intact. Next, the system applies Discrete Fourier Transform to the chrominance components. This transformation produces complex frequency spectra. The method computes phase information using atan2 function and selectively shifts high-frequency components. Image reconstruction is performed through Inverse Fourier Transform. The framework incorporates Learned Perceptual Image Patch Similarity filtering. This filter discards generated instances that fall below similarity thresholds. The screening ensures all retained triggers maintain visual imperceptibility. Accepted poisoned samples receive target class labels. These samples are combined with clean training data following standard protocols.  Results and Discussions  FDPS achieves near-perfect 99% attack success rates while maintaining benign accuracy across three datasets and two network architectures (Table 1). The method operates by manipulating phase information in chrominance channels via Fourier transforms, with LPIPS filtering ensuring visual stealth. Experimental results show poisoned images retain semantic focus, as confirmed by Grad-CAM visualizations aligning with clean patterns (Fig. 4). The approach demonstrates strong defense evasion, scoring an anomaly index of 1.73 against Neural Cleanse - below the detection threshold of 2 (Fig. 3-5). Ablation studies validate that high-frequency phase perturbations achieve over 90% attack success with just 2% poisoning while minimizing impact on model utility (Fig. 6; Table 3).  Conclusions  An end-to-end frequency-domain strategy was developed to embed covert triggers in image classifiers while maintaining clean-data fidelity. By shifting selected phase components in chrominance and filtering with LPIPS, FDPS achieves 99% ASR with negligible BA loss and produces minimal visible artifacts. It also evades leading detection tools, including Grad-CAM, Neural Cleanse, ANP, and STRIP. The findings indicate that phase-centric, high-frequency perturbations constitute an especially potent and stealthy backdoor mechanism. Future work should explore broader modality coverage and develop frequency-domain anomaly detectors as principled countermeasures.
Blind Parameter Estimation Method for PSK Modulated Frequency-Hopping Signals Based on Improved Maximum Likelihood
ZHANG Tianhao, ZHANG Yushu, XU Zhongqiu, TANG Xinyi, DANG Wenhua, LI Guangzuo
Available online  , doi: 10.11999/JEIT260005
Abstract:
  Objective  Blind parameter estimation of non-cooperative Frequency-Hopping (FH) signals is a critical task in electronic reconnaissance and countermeasures. Estimation methods based on time-frequency analysis typically suffer from limited resolution or high computational complexity. Furthermore, methods based on compressive sensing rely heavily on the consistency between the predefined dictionary and the actual signal characteristics, and the estimation precision will be significantly compromised by grid mismatch or modulation-induced energy dispersion. Maximum Likelihood (ML)-based methods offer the advantage of high theoretical estimation accuracy with relatively low computational complexity. However, existing studies typically assume an ideal unmodulated signal model with a single frequency transition. Consequently, these ML-based methods suffer from severe model mismatch when processing FH signals with digital modulation, such as Phase Shift Keying (PSK), or multi-hop signals. Moreover, the conventional iterative solution of ML-based methods is prone to divergence or trapping in local optima. To address these limitations, this paper proposes an improved ML-based method for the blind parameter estimation of PSK-modulated FH signals.  Methods  To handle received multi-hop signals, a signal slicing technique based on the Short-Time Fourier Transform (STFT) is proposed to extract slices containing individual frequency transitions. Subsequently, to mitigate the model mismatch caused by digital modulation in conventional ML-based methods, a model-matching signal extraction approach based on the ML objective function is developed for PSK-modulated FH signals. Furthermore, a weighted iterative solving algorithm for ML estimation is designed to enhance convergence, thereby achieving robust and accurate estimation of frequency-hopping parameters.  Results and Discussions  To validate the effectiveness of the model-matching signal extraction approach, ablation experiments were carried out under various modulation schemes, including binary PSK (BPSK), quadrature PSK (QPSK), and 8-ary PSK (8PSK). The results indicate that the proposed approach (Group D) significantly reduces the Mean Square Error (MSE) of hopping frequency estimation compared to that without the proposed extraction (Group ND). These results demonstrate that the proposed method effectively mitigates the model mismatch (Fig. 5). Simulation results also illustrate that the designed weighted iterative algorithm achieves superior convergence performance compared with linear weighting and non-weighting schemes (Fig. 6). Moreover, the experiments verify the algorithm's insensitivity to initial frequency offsets, showing that it tolerates offsets of up to 2 MHz at SNR of -10 dB with little performance degradation (Fig. 7). Finally, comparative analysis with representative existing methods indicates that the proposed method outperforms the others in terms of estimation accuracy (Fig. 8).  Conclusions  To achieve blind parameter estimation for PSK-modulated FH signals, this paper proposes an improved ML-based method. By utilizing a signal slicing technique based on the STFT, the proposed method successfully extends the applicability of the ML-based estimator to continuous multi-hop signals. To mitigate the model mismatch induced by PSK modulation, a model-matching signal extraction approach is developed to isolate valid signal segments that conform to the ML model. Furthermore, a weighted iterative algorithm incorporating a dynamic weighting function is introduced to address the instability of the conventional iterative ML solver. Simulation results confirm that the proposed method effectively eliminates model mismatch and ensures superior convergence performance with insensitivity to initial frequency offsets. Moreover, it is shown to achieve high estimation precision for both hopping frequencies and hopping times.
A Semantic-Enhanced Cybersecurity Named Entity Recognition Approach Oriented to Lightweight Adaptation of Large Language Models
HU Ze, XU Tongwu, YANG Hongyu
Available online  , doi: 10.11999/JEIT251260
Abstract:
  Objective  Named Entity Recognition (NER) in the field of cybersecurity is a fundamental technology supporting threat intelligence analysis, vulnerability management, and security incident response. However, this field generally faces challenges such as dense technical terms, scarce labeled data, dynamic changes in entity categories, and highly complex semantic features, which make traditional deep learning models and existing Large Language Models (LLMs) significantly inadequate in terms of domain adaptability and semantic fusion capability. To address the aforementioned key issues while also considering the need for lightweight model deployment, this paper aims to construct a cybersecurity NER approach that can enhance domain semantic representation, improve the ability to identify rare entities, and apply to low-resource environments, providing a reliable technical path for intelligent threat analysis in cybersecurity scenarios.  Methods  To address the complex semantic features of cybersecurity texts, this paper proposes a semantically enhanced, lightweight, and LLMs-adaptable cybersecurity NER approach. The proposed approach uses LLM2Vec to achieve bidirectional semantic reconstruction of large model decoders and combines Low-Rank Adaptation (LoRA) for low-rank fine-tuning, so as to maintain deep semantic encoding capability while significantly reducing the amount of parameter updates. To address the challenges of sparse keywords and severe noise interference in cybersecurity texts, a sparse gated attention mechanism is introduced to strengthen keyword-focused feature extraction by dynamically selecting high-contribution cybersecurity terms through global gating and sparse inference. A SecRoBERTa-based semantic enhancement component is introduced, which utilizes a domain-pre-trained model to generate similar word embeddings, optimizes feature robustness in small-sample scenarios, and alleviates the challenges of identifying out-of-vocabulary words and low-frequency terms. Finally, a masked conditional random field is employed to constrain label transitions and guarantee BIO-compliant output sequences, achieving robust and consistent entity boundary prediction.  Results and Discussions  Extensive experiments were conducted on two public cybersecurity datasets, DNRTI and APTNER. The proposed approach achieved an F1 score of 91.91% on DNRTI, surpassing the previous state-of-the-art model by 2.14%. On APTNER, it reached an F1 score of 80.37%, outperforming the best baseline by 2.97%. Ablation studies confirmed the contribution of each key component: the Sparse Gated Attention mechanism improved F1 by 3.57% over standard Multi-Head Attention on DNRTI; the semantic enhancement module contributed a 2.32% F1 gain; and the MCRF (Masked Conditional Random Field) layer provided a 10.63% F1 improvement over traditional CRF (Conditional Random Field). The model also demonstrated efficient training and inference characteristics, aligning with its lightweight design goals.  Conclusions  This paper proposes a lightweight adaptation approach based on LLMs for NER in the cybersecurity domain, which effectively addresses the limitations of existing LLMs-based NER methods in domain adaptation and rare entity recognition. By integrating LLM2Vec and LoRA for lightweight fine-tuning, a sparse gated attention mechanism for domain feature fusion, and a SecRoBERTa-based semantic enhancement component for similar word precomputation, the proposed approach achieves high performance on DNRTI and APTNER datasets. The research provides an efficient technical path for NER tasks in low-resource cybersecurity scenarios and offers strong support for downstream tasks such as automated threat intelligence analysis.
A High-Performance Eye Tracking Method Based on Event Camera and Dual-Channel Differential Illumination
SONG Sishun, FENG Junchi, PU Chengyu, GUO Yu, LIU Shijie, HE Xin, CHENG Yuwei
Available online  , doi: 10.11999/JEIT251162
Abstract:
  Objective  Eye tracking has become an essential technology in human–computer interaction, medical diagnostics, cognitive neuroscience, and augmented/virtual reality applications. However, traditional eye tracking systems often suffer from two major limitations: low spatial accuracy and restricted temporal resolution, particularly in high-speed eye movement scenarios. These limitations hinder precise gaze estimation and reduce the reliability of real-time interactive systems. To address these challenges, this research integrates an event camera with the dual-channel differential illumination strategy to enhance the signal-to-noise ratio of corneal reflection events. By introducing the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm, accurate localization of corneal reflection points is achieved. On this basis, the corneal reflection point coordinates are utilized in combination with Singular Value Decomposition (SVD) and the least-squares method to determine the corneal curvature center, thereby significantly improving the accuracy of gaze direction estimation. This research provides an efficient technical pathway for next-generation eye tracking systems and offers theoretical support for their deployment in complex interactive environments.  Methods  The proposed event-camera-based gaze tracking method integrates asynchronous eye movement event data through a dual-channel differential illumination framework, thereby enhancing gaze direction estimation accuracy under high-speed and dynamic conditions. Firstly, the event camera asynchronously captures brightness-change events with microsecond-level temporal resolution, enabling precise tracking of rapid eye movements, while the dual-channel differential illumination mechanism suppresses redundant reflections and enhances the contrast of corneal reflection points. Secondly, the DBSCAN algorithm is employed to process event data, effectively removing noise and optimizing the spatial localization accuracy of corneal reflection features. Finally, a ray-tracing model is reconstructed using SVD and least-squares fitting to determine the corneal curvature center, thereby achieving robust and high-precision gaze direction estimation. Experimental results on a biomimetic eye movement dataset demonstrate that the proposed method achieves high temporal resolution, localization accuracy, and robustness in dynamic tracking scenarios.  Results and Discussions  Experiments demonstrate that the proposed method achieves a temporal resolution of 25 kHz (Fig. 6), far exceeding conventional cameras. Differential illumination significantly improves the signal-to-noise ratio of corneal reflection events. The DBSCAN algorithm localizes corneal reflection points more efficiently than K-Means, Agglomerative Clustering, Mean Shift, and OPTICS, achieving accurate results within 10 ms without requiring predefined clusters (Fig. 8, Table 3). For gaze estimation, the proposed method maintains stable accuracy across sampling frequencies from 2 kHz to 25 kHz. At a 15° cone angle, the mean error (ME) and root mean square error (RMSE) are approximately 0.66° and 0.67°, respectively, while at 25° they increase slightly to 0.87° and 0.90° (Table 4). Compared with existing state-of-the-art (SOTA) gaze tracking methods, the proposed approach demonstrates superior overall performance in terms of both temporal resolution and accuracy (Table 5) Trajectory results (Fig. 9) show close alignment between estimated and ground truth gaze paths, and distribution analyses (Fig. 10) confirm concentrated error ranges below 1°.  Conclusions  This paper presents a novel eye tracking method integrating event cameras, dual-channel differential illumination. The method achieves high temporal resolution (25 kHz), enhances event signal quality, and reduces localization errors, yielding gaze estimation errors of less than 1°. The proposed approach provides a reliable technical pathway for next-generation high-performance eye tracking systems. Future work should consider sensor noise modeling and computational optimization to further improve real-world applicability.
Hierarchical Fusion Multi-Instance Learning for Weakly Supervised Pathological Image Classification
CHEN Xiaohe, ZHANG Jiaang, LI Lingzhi, LI Guixiu, OU Zirong, BAO Yuehua, LIU Xinxin, YU Qiuchen, MA Yuhan, ZHAO Keyu, BAI Hua
Available online  , doi: 10.11999/JEIT250726
Abstract:
  Objective  Cancer mortality in China continues to rise, and pathological image classification has become central to diagnosis. Pathological images have a multilevel structure, yet many existing methods focus only on the highest resolution or use simple feature concatenation for multi-scale fusion. These strategies do not make effective use of hierarchical information. In addition, most approaches rely on random pseudo-bag division to handle high-resolution images. Because cancerous regions in positive slides are sparse, random sampling often produces incorrect pseudo-labels and low signal-to-noise ratios, which reduce classification accuracy. This study proposes a Hierarchical Fusion Multi-Instance Learning (HFMIL) method that integrates multilevel feature fusion with a pseudo-bag division strategy based on an attention evaluation function to improve accuracy and interpretability in pathological image classification.  Methods  A weakly supervised multilevel classification method is proposed to use the hierarchical characteristics of pathological images and improve cancer image classification performance. The method has three main steps. First, multilevel features are extracted. Blank regions are removed, low-resolution images are divided into patches, and these patches are indexed to their corresponding high-resolution regions. Semantic features capture low-resolution tissue structure and high-resolution cellular detail. Second, pseudo-bags are constructed using an attention-based evaluation function. Class activation mapping is used to compute patch-level scores. Patches are ranked, and high-scoring ones are selected as potential positive samples. Low-scoring patches are discarded to maintain pseudo-label relevance. High-resolution pseudo-bags are then generated using index mapping, which reduces incorrect pseudo-labels and improves the signal-to-noise ratio. Third, a two-stage classification model is developed. Low-resolution pseudo-bags are aggregated with a gated attention mechanism for preliminary classification. A cross-attention mechanism then fuses the most informative low-resolution features with their corresponding high-resolution features. The fused representation is concatenated with aggregated high-resolution pseudo-bags to form an image-level feature vector for final prediction. Training uses a two-stage loss that combines low-resolution and overall cross-entropy losses. Experiments on three pathological image datasets confirm the effectiveness of the method in weakly supervised settings.  Results and Discussions  The proposed method is compared with several recent weakly supervised classification approaches, including ABMIL, CLAM, TransMIL, and DTFD, using three pathological image datasets: the publicly available Camelyon16 and TCGA-LUNG datasets and a private skin cancer dataset, NBU-Skin. The results show clear performance gains. On Camelyon16, the method achieves 88.3% accuracy and an AUC of 0.979 (Table 2). On TCGA-LUNG, accuracy reaches 86.0% and AUC 0.931 (Table 2), exceeding the comparative methods. On the NBU-Skin dataset, accuracy reaches 90.5% and AUC 0.976 for multiclass tasks (Table 2). Ablation studies further examine the necessity of the multilevel feature fusion and pseudo-bag division modules. The combination of these modules improves classification performance. On the skin cancer dataset, removing the pseudo-bag division module reduces accuracy from 93.8% to 90.7%, and removing the multilevel feature fusion module reduces accuracy further to 80.0% (Table 3). These results confirm that each component contributes to the effectiveness of the method.  Conclusions  A weakly supervised pathological image classification method that integrates multilevel feature fusion and an attention-based pseudo-bag division strategy is proposed. The method uses hierarchical information effectively and reduces errors caused by incorrect pseudo-labels and low signal-to-noise ratios. Experiments show consistent improvements in accuracy and AUC across three datasets. The main contributions are: (1) a multilevel feature extraction and fusion strategy that uses a cross-attention mechanism to combine features across scales; (2) an attention-based pseudo-bag division method that identifies potential positive regions and improves pseudo-label correctness through a top-k strategy while reducing background noise; and (3) superior performance compared with recent weakly supervised classifiers. Future work may include optimizing cross-level attention mechanisms, extending the framework to prognosis prediction or lesion segmentation, and developing more efficient feature extraction and fusion modules for broader clinical use.
Multi-UAV RF Signals CNN|Triplet-DNN Heterogeneous Network Feature Extraction and Type Recognition
ZHAO Shen, LI Guangxuan, ZHOU Xiancheng, HUANG Wendi, YANG Lingling, GAO Liping
Available online  , doi: 10.11999/JEIT250757
Abstract:
  Objective  This study addresses the detection requirements of simultaneous Unmanned Aerial Vehicle (UAV) operations. The strategy is based on extracting model-specific information features from Radio Frequency (RF) time-frequency spectra. A CNN|Triplet-DNN heterogeneous network is developed to optimize feature extraction and classification. The method resolves the problem of identifying individual UAV models within coexisting RF signals and supports efficient multi-UAV management in complex environments.  Methods  The CNN|Triplet-DNN architecture uses a parallel-branch structure that integrates a Convolutional Neural Network (CNN) and a Triplet Convolutional Neural Network (Triplet-CNN). Branch 1 employs a lightweight CNN to extract global features from RF time-frequency diagrams while reducing computational cost. Branch 2 adds an enhanced center-loss function to strengthen feature discrimination and address ambiguous feature boundaries under complex conditions. Branch 3, based on a Triplet-CNN framework, applies Triplet Loss to capture local and global features of RF time-frequency diagrams. The complementary features from the three branches are fused and processed through a fully connected DNN with a Softmax activation function to generate probability distributions for UAV signal classification. This structure improves UAV type recognition performance.  Results and Discussions  RF signals from the open-source DroneRFa dataset were superimposed to simulate multi-UAV coexistence, and real-world drone signals were collected through controlled flights to build a comprehensive signal database. (1) Based on single-UAV RF time-frequency diagrams from the open-source dataset, ablation experiments (Fig. 7) were conducted on the three-branch CNN|Triplet-DNN structure to validate its design, and each model was trained. (2) The simulated multi-UAV coexistence dataset was used for identification tasks to evaluate recognition performance under coexistence conditions. Results (Fig. 10) show that recognition accuracy for four or fewer UAV types ranges from 83% to 100%, confirming the effectiveness of the CNN|Triplet-DNN model. (3) Each model was trained using the flight dataset and then applied to real multi-UAV coexistence identification. The CNN|Triplet-DNN achieved recognition accuracies of 86%, 57%, and 73% for two, three, and four UAV types, respectively (Fig. 13). Comparison with the CNN, Triplet-CNN, and Transformer models shows that the CNN|Triplet-DNN has stronger generalizability. All models exhibited performance degradation on real-world data relative to the open-source dataset, mainly because drones dynamically adjust communication frequency bands, which reduces recognition performance under coexistence scenarios.  Conclusions  A CNN|Triplet-DNN heterogeneous network is proposed for identifying RF signals emitted by multiple UAVs. The three-branch structure and backpropagation algorithm improve the extraction of discriminative aircraft-model features, and the DNN enhances model generalization. Experiments using open-source datasets and real flight scenarios verify the method’s effectiveness and practical value. Future work will address dataset expansion, model optimization for dynamic frequency-band adaptation, and improved recognition under complex coexistence conditions.
Wave-MambaCT: Low-dose CT Artifact Suppression Method Based on Wavelet Mamba
CUI Xueying, WANG Yuhang, LIU Bin, SHANGGUAN Hong, ZHANG Xiong
Available online  , doi: 10.11999/JEIT250489
Abstract:
  Objective  Low-Dose Computed Tomography (LDCT) reduces patient radiation exposure but introduces substantial noise and artifacts into reconstructed images. Convolutional Neural Network (CNN)-based denoising approaches are limited by local receptive fields, which restrict their abilities to capture long-range dependencies. Transformer-based methods alleviate this limitation but incur quadratic computational complexity relative to image size. In contrast, State Space Model (SSM)–based Mamba frameworks achieve linear complexity for long-range interactions. However, existing Mamba-based methods often suffer from information loss and insufficient noise suppression. To address these limitations, we propose the Wave-MambaCT model.  Methods  The proposed Wave-MambaCT model adopts a multi-scale framework that integrates Discrete Wavelet Transform (DWT) with a Mamba module based on the SSM. First, DWT performs a two-level decomposition of the LDCT image, decoupling noise from Low-Frequency (LF) content. This design directs denoising primarily toward the High-Frequency (HF) components, facilitating noise suppression while preserving structural information. Second, a residual module combined with a Spatial-Channel Mamba (SCM) module extracts both local and global features from LF and HF bands at different scales. The noise-free LF features are then used to correct and enhance the corresponding HF features through an attention-based Cross-Frequency Mamba (CFM) module. Finally, inverse wavelet transform is applied in stages to progressively reconstruct the image. To further improve denoising performance and network stability, multiple loss functions are employed, including L1 loss, wavelet-domain LF loss, and adversarial loss for HF components.  Results and Discussions  Extensive experiments on the simulated Mayo Clinic datasets, the real Piglet datasets, and the hospital clinical dataset DeepLesion show that Wave-MambaCT provides superior denoising performance and generalization. On the Mayo dataset, a PSNR of 31.6528 is achieved, which is higher than that of the suboptimal method DenoMamba (PSNR 31.4219), while MSE is reduced to 0.00074 and SSIM and VIF are improved to 0.8851 and 0.4629, respectively (Table 1). Visual results (Figs. 46) demonstrate that edges and fine details such as abdominal textures and lesion contours are preserved, with minimal blurring or residual artifacts compared with competing methods. Computational efficiency analysis (Table 2) indicates that Wave-MambaCT maintains low FLOPs (17.2135 G) and parameters (5.3913 M). FLOPs are lower than those of all networks except RED-CNN, and the parameter count is higher only than those of RED-CNN and CTformer. During training, 4.12 minutes per epoch are required, longer only than RED-CNN. During testing, 0.1463 seconds are required per image, which is at a medium level among the compared methods. Generalization tests on the Piglet datasets (Figs. 7, 8, Tables 3, 4) and DeepLesion (Fig. 9) further confirm the robustness and generalization capacity of Wave-MambaCT.In the proposed design, HF sub-bands are grouped, and noise-free LF information is used to correct and guide their recovery. This strategy is based on two considerations. First, it reduces network complexity and parameter count. Second, although the sub-bands correspond to HF information in different orientations, they are correlated and complementary as components of the same image. Joint processing enhances the representation of HF content, whereas processing them separately would require a multi-branch architecture, inevitably increasing complexity and parameters. Future work will explore approaches to reduce complexity and parameters when processing HF sub-bands individually, while strengthening their correlations to improve recovery. For structural simplicity, SCM is applied to both HF and LF feature extraction. However, redundancy exists when extracting LF features, and future studies will explore the use of different Mamba modules for HF and LF features to further optimize computational efficiency.  Conclusions  Wave-MambaCT integrates DWT for multi-scale decomposition, a residual module for local feature extraction, and an SCM module for efficient global dependency modeling to address the denoising challenges of LDCT images. By decoupling noise from LF content through DWT, the model enables targeted noise removal in the HF domain, facilitating effective noise suppression. The designed RSCM, composed of residual blocks and SCM modules, captures fine-grained textures and long-range interactions, enhancing the extraction of both local and global information. In parallel, the Cross-band Enhancement Module (CEM) employs noise-free LF features to refine HF components through attention-based CFM, ensuring structural consistency across scales. Ablation studies (Table 5) confirm the essential contributions of both SCM and CEM modules to maintaining high performance. Importantly, the model’s staged denoising strategy achieves a favorable balance between noise reduction and structural preservation, yielding robustness to varying radiation doses and complex noise distributions.
A Multi-scale Spatiotemporal Correlation Attention and State Space Modeling-based Approach for Precipitation Nowcasting
ZHENG Hui, CHEN Fu, HE Shuping, QIU Xuexing, ZHU Hongfang, WANG Shaohua
Available online  , doi: 10.11999/JEIT250786
Abstract:
  Objective  Precipitation nowcasting is a representative task in meteorological forecasting. It uses radar echoes or precipitation sequences to predict precipitation distribution in the next 0–2 hours. It supports disaster warning and key decision-making and protects lives and property. Current mainstream methods show loss of local details, limited representation of conditional information, and weak adaptability in complex regions. This study proposes a PredUMamba model based on a diffusion model. The model introduces a Mamba block with an adaptive zigzag scanning mechanism that extracts key local detail information and reduces computational complexity. A multi-scale spatiotemporal correlation attention module is also designed to enhance interactions across spatiotemporal hierarchies and to achieve a comprehensive representation of conditional information. In addition, a radar echo dataset tailored for complex regions is constructed for the southern Anhui mountainous area to evaluate the model's ability to predict sudden and extreme rainfall. This work provides an intelligent solution and theoretical support for precipitation nowcasting.  Methods  The PredUMamba model adopts a two-stage diffusion network. In the first stage, a frame-by-frame Variational AutoEncoder (VAE) is trained to map precipitation data from pixel space to a low-dimensional latent space. In the second stage, a diffusion network is built on the encoded latent space. An adaptive zigzag Mamba module with a spatiotemporal alternating scanning strategy is proposed. Sequential scanning is performed within rows and turn-back scanning is performed between rows. This design captures detailed precipitation-field features while maintaining low computational complexity. A multi-scale spatiotemporal correlation attention module is further introduced on temporal and spatial scales. On the temporal scale, adaptive convolution kernels and attention-based convolution layers extract local and global information. On the spatial scale, a lightweight correlation attention mechanism aggregates spatial information and strengthens historical conditional information representation. A radar dataset for the southern Anhui mountainous area is constructed to evaluate model adaptability in complex terrain.  Results and Discussions  The adaptive zigzag Mamba module and multi-scale spatiotemporal correlation attention module strengthen the model's ability to capture intrinsic spatiotemporal dependencies. They extract conditional information more accurately and yield prediction results closer to real conditions. Experiments show that PredUMamba achieves the best performance across all indicators on the Southern Anhui Mountain Area and Shanghai radar datasets. On the SEVIR dataset, FVD, CSI_pool4, and CSI_pool16 outperform other methods, and CSI and CRPS achieve competitive results. Visualization results further show that PredUMamba does not produce temporal blurring (Fig. 4). This indicates stronger stability and clear advantages in detail generation and motion-trend prediction. The model preserves edge details aligned with real precipitation fields and maintains accurate motion patterns.  Conclusions  This study proposes an innovative PredUMamba model based on a diffusion network architecture. Model performance is improved through a Mamba module with an adaptive zigzag scanning mechanism and a multi-scale spatiotemporal correlation attention module. The adaptive zigzag module captures fine-grained spatiotemporal features and reduces computational complexity. The multi-scale attention module strengthens historical conditional information extraction through temporal dual-branch processing and a lightweight spatial correlation mechanism, enabling joint representation of local and global features. A radar dataset for the southern Anhui mountainous area is also constructed to validate model applicability in complex terrain. The dataset covers precipitation under various terrain conditions and supports extreme rainfall prediction. Comparative experiments on the constructed dataset and on public datasets show that PredUMamba achieves the best results on the southern Anhui mountainous area and Shanghai datasets. On the SEVIR dataset, FVD, CSI_pool4, and CSI_pool16 outperform other methods, and CRPS and CSI achieve competitive results. As this work focuses on a data-driven forecasting approach, future research will integrate physical-condition constraints to improve interpretability and enhance prediction accuracy for small- and medium-scale convective systems.
Integrating Representation Learning and Knowledge Graph Reasoning for Diabetes and Complications Prediction
WANG Yuao, HUANG Yeqi, LI Qingyuan, LIU Yun, JING Shenqi, SHAN Tao, GUO Yongan
Available online  , doi: 10.11999/JEIT250798
Abstract:
  Objective  Diabetes mellitus and its complications are recognized as major global health challenges, causing severe morbidity, high healthcare costs, and reduced quality of life. Accurate joint prediction of these conditions is essential for early intervention but is hindered by data heterogeneity, sparsity, and complex inter-entity relationships. To address these challenges, a Representation Learning Enhanced Knowledge Graph-based Multi-Disease Prediction (REKG-MDP) model is proposed. Electronic Health Records (EHRs) are integrated with supplementary medical knowledge to construct a comprehensive Medical Knowledge Graph (MKG), and higher-order semantic reasoning combined with relation-aware representation learning is applied to capture complex dependencies and improve predictive accuracy across multiple diabetes-related conditions.  Methods  The REKG-MDP framework consists of three modules. First, a MKG is constructed by integrating structured EHR data from the MIMIC-IV dataset with external disease knowledge. Patient-side features include demographics, laboratory indices, and medical history, whereas disease-side attributes cover comorbidities, susceptible populations, etiological factors, and diagnostic criteria. This integration mitigates data sparsity and enriches semantic representation. Second, a relation-aware embedding module captures four relational patterns: symmetric, antisymmetric, inverse, and compositional. These patterns are used to optimize entity and relation embeddings for semantic reasoning. Third, a Hierarchical Attention-based Graph Convolutional Network (HA-GCN) aggregates multi-hop neighborhood information. Dynamic attention weights capture both local and global dependencies, and a bidirectional mechanism enhances the modeling of patient–disease interactions.  Results and Discussions  Experiments demonstrate that REKG-MDP consistently outperforms four baselines: two machine learning models (DCKD-RF and bSES-AC-RUN-FKNN) and two graph-based models (KGRec and PyRec). Compared with the strongest baseline, REKG-MDP achieves average improvements in P, F1, and NDCG of 19.39%, 19.67%, and 19.39% for single-disease prediction (\begin{document}$ n=1 $\end{document}); 16.71%, 21.83%, and 23.53% for \begin{document}$ n=3 $\end{document}; and 22.01%, 20.34%, and 20.88% for \begin{document}$ n=5 $\end{document} (Table 4). Ablation studies confirm the contribution of each module. Removing relation-pattern modeling reduces performance metrics by approximately 12%, removing hierarchical attention decreases them by 5–6%, and excluding disease-side knowledge produces the largest decline of up to 20% (Fig. 5). Sensitivity analysis indicates that increasing the embedding dimension from 32 to 128 enhances performance by more than 11%, whereas excessive dimensionality (256) leads to over-smoothing (Fig. 6). Adjusting the \begin{document}$ \beta $\end{document} parameter strengthens sample discrimination, improving P, F1, and NDCG by 9.28%, 27.9%, and 8.08%, respectively (Fig. 7).  Conclusions  REKG-MDP integrates representation learning with knowledge graph reasoning to enable multi-disease prediction. The main contributions are as follows: (1) integrating heterogeneous EHR data with disease knowledge mitigates data sparsity and enhances semantic representation; (2) modeling diverse relational patterns and applying hierarchical attention improves the capture of higher-order dependencies; and (3) extensive experiments confirm the model’s superiority over state-of-the-art baselines, with ablation and sensitivity analyses validating the contribution of each module. Remaining challenges include managing extremely sparse data and ensuring generalization across broader populations. Future research will extend REKG-MDP to model temporal disease progression and additional chronic conditions.
Research on Collaborative Reasoning Framework and Algorithms of Cloud-Edge Large Models for Intelligent Auxiliary Diagnosis Systems
HE Qian, ZHU Lei, LI Gong, YOU Zhengpeng, YUAN Lei, JIA Fei
Available online  , doi: 10.11999/JEIT250828
Abstract:
  Objective  The deployment of Large Language Models (LLMs) in intelligent auxiliary diagnosis is constrained by limited computing resources for local hospital deployment and by privacy risks related to the transmission and storage of medical data in cloud environments. Low-parameter local LLMs show 20%–30% lower accuracy in medical knowledge question answering and 15%–25% reduced medical knowledge coverage compared with full-parameter cloud LLMs, whereas cloud-based systems face inherent data security concerns. To address these issues, a cloud-edge LLM collaborative reasoning framework and related algorithms are proposed for intelligent auxiliary diagnosis systems. The objective is to design a cloud-edge collaborative reasoning agent equipped with intelligent routing and dynamic semantic desensitization to enable adaptive task allocation between the edge (hospital side) and cloud (regional cloud). The framework is intended to achieve a balanced result across diagnostic accuracy, data privacy protection, and resource use efficiency, providing a practical technical path for the development of medical artificial intelligence systems.  Methods  The proposed framework adopts a layered architectural design composed of a four-tier progressive architecture on the edge side and a four-tier service-oriented architecture on the cloud side (Fig. 1). The edge side consists of resource, data, model, and application layers, with the model layer hosting lightweight medical LLMs and the cloud-edge collaborative agent. The cloud side comprises AI IaaS, AI PaaS, AI MaaS, and AI SaaS layers, functioning as a center for computing power and advanced models. The collaborative reasoning process follows a structured workflow (Fig. 2), beginning with user input parsed by the agent to extract key clinical features, followed by reasoning node decision-making. Two core technologies support the agent: 1) Intelligent routing: This mechanism defaults to edge-side processing and dynamically selects the reasoning path (edge or cloud) through a dual-driven weight update strategy. It integrates semantic feature similarity computed through Chinese word segmentation and pre-trained medical language models and incorporates historical decision data, with an exponential moving average used to update feature libraries for adaptive optimization. 2) Dynamic semantic desensitization: Employing a three-stage architecture (sensitive entity recognition, semantic correlation analysis, and hierarchical desensitization decision-making), this technology identifies sensitive entities through a domain-enhanced Named Entity Recognition (NER) model, calculates entity sensitivity and desensitization priority, and applies a semantic similarity constraint to prevent excessive desensitization. Three desensitization strategies (complete deletion, general replacement, partial masking) are used based on entity sensitivity. Experimental validation is conducted with two open-source Chinese medical knowledge graphs (CMeKG and CPubMedKG) containing more than 2.7 million medical entities. The experimental environment (Fig. 3) deploys a qwen3:1.7b model on the edge and the Jiutian LLM on the cloud, with a 5,000-sample evaluation dataset divided into entity-level, relation-level, and subgraph-level questions. Performance is assessed with three metrics: answer accuracy, average token consumption, and average response time.  Results and Discussions  Experimental results show that the proposed framework achieves strong performance across the main evaluation dimensions. For answer accuracy, the intelligent routing mechanism attains 72.44% on CMeKG (Fig. 4) and 66.20% on CPubMedKG (Fig. 5), which are higher than the edge-side LLM alone (60.73% and 54.18%) and close to the cloud LLM (72.68% and 66.49%). These results indicate that the framework maintains diagnostic consistency with cloud-based systems while taking advantage of edge-side capabilities. For resource use, the intelligent routing model reduces average token consumption to 61.27, representing 45.63% of the cloud LLM’s token usage (131.68) (Fig. 6), which supports substantial cost reduction. For response time, the edge-side LLM shows latency greater than 6 s because of limited computing power, whereas the cloud LLM reaches 0.44 s latency through dedicated line access (8% of the 5.46 s latency under internet access). The intelligent routing model produces average latency values between those of the edge and cloud LLMs under both access modes (Fig. 7), consistent with expected trade-offs. The framework also shows applicability across common medical scenarios (Table 1), including outpatient triage, chronic disease management, medical image analysis, intensive care, and health consultation, by combining local real-time processing with cloud-based deep reasoning. Limitations appear in emergency rescue settings with weak network conditions because of latency constraints and in rare disease diagnosis because of limited edge-side training samples and potential loss of specific features during desensitization. Overall, the results verify that the cloud-edge collaborative reasoning mechanism reduces computing resource overhead while preserving consistency in diagnostic results.  Conclusions  This study constructs a cloud-edge LLM collaborative reasoning framework for intelligent auxiliary diagnosis systems, addressing the challenges of limited local computing power and cloud data privacy risks. Through the integration of intelligent routing, prompt engineering adaptation, and dynamic semantic desensitization, the framework achieves balanced optimization of diagnostic accuracy, data security, and resource economy. Experimental validation shows that its accuracy is comparable to cloud-only LLMs while resource consumption is substantially reduced, providing a feasible technical path for medical intelligence development. Future work focuses on three directions: intelligent on-demand scheduling of computing and network resources to mitigate latency caused by edge-side computing constraints; collaborative deployment of localized LLMs with Retrieval-Augmented Generation (RAG) to raise edge-side standalone accuracy above 90%; and expansion of diagnostic evaluation indicators to form a three-dimensional scenario–node–indicator system incorporating sensitivity, specificity, and AUC for clinical-oriented assessment.
Key Technologies for Low-Altitude Intelligent Networks: Architecture, Security, and Optimization
WANG Yuntao, SU Zhou, GAO Yuan, BA Jianle
Available online  , doi: 10.11999/JEIT250947
Abstract:
Low-Altitude Intelligent Networks (LAINs) function as a core infrastructure for the emerging low-altitude digital economy by connecting humans, machines, and physical objects through the integration of manned and unmanned aircraft with ground networks and facilities. This paper provides a comprehensive review of recent research on LAINs from four perspectives: network architecture, resource optimization, security threats and protection, and large model-enabled applications. First, existing standards, general architecture, key characteristics, and networking modes of LAINs are investigated. Second, critical issues related to airspace resource management, spectrum allocation, computing resource scheduling, and energy optimization are discussed. Third, existing/emerging security threats across sensing, network, application, and system layers are assessed, and multi-layer defense strategies in LAINs are reviewed. Furthermore, the integration of large model technologies with LAINs is also analyzed, highlighting their potential in task optimization and security enhancement. Future research directions are discussed to provide theoretical foundations and technical guidance for the development of efficient, secure, and intelligent LAINs.  Significance   LAINs support the low-altitude economy by enabling the integration of manned and unmanned aircraft with ground communication, computing, and control networks. By providing real-time connectivity and collaborative intelligence across heterogeneous platforms, LAINs support applications such as precision agriculture, public safety, low-altitude logistics, and emergency response. However, LAINs continue to face challenges created by dynamic airspace conditions, heterogeneous platforms, and strict real-time operational requirements. The development of large models also presents opportunities for intelligent resource coordination, proactive defense, and adaptive network management, which signals a shift in the design and operation of low-altitude networks.  Progress  Recent studies on LAINs have reported progress in network architecture, resource optimization, security protection, and large model integration. Architecturally, hierarchical and modular designs are proposed to integrate sensing, communication, and computing resources across air, ground, and satellite networks, which enables scalable and interoperable operations. In system optimization research, attention is given to airspace resource management, spectrum allocation, computing offloading, and energy-efficient scheduling through distributed optimization and AI-driven orchestration methods. In security research, multi-layer defense frameworks are developed to address sensing-layer spoofing, network-layer intrusions, and application-layer attacks through cross-layer threat intelligence and proactive defense mechanisms. Large Language Models (LLMs), Vision-Language Models (VLMs), and Multimodal LLMs (MLLMs) also support intelligent task planning, anomaly detection, and autonomous decision-making in complex low-altitude environments, which enhances the resilience and operational efficiency of LAINs.  Conclusions  This survey provides a comprehensive review of the architecture, security mechanisms, optimization techniques, and large model applications in LAINs. The challenges in multi-dimensional resource coordination, cross-layer security protection, and real-time system adaptation are identified, and existing or potential approaches to address these challenges are analyzed. By synthesizing recent research on architectural design, system optimization, and security defense, this work offers a unified perspective for researchers and practitioners aiming to build secure, efficient, and scalable LAIN systems. The findings emphasize the need for integrated solutions that combine algorithmic intelligence, system engineering, and architectural innovation to meet future low-altitude network demands.  Prospects  Future research on LAINs is expected to advance the integration of architecture design, intelligent optimization, security defense, and privacy preservation technologies to meet the demands of rapidly evolving low-altitude ecosystems. Key directions include developing knowledge-driven architectures for cross-domain semantic fusion, service-oriented network slicing, and distributed autonomous decision-making. Furthermore, research should also focus on proactive cross-layer security mechanisms supported by large models and intelligent agents, efficient model deployment through AI-hardware co-design and hierarchical computing architectures, and improved multimodal perception and adaptive decision-making to strengthen system resilience and scalability. In addition, establishing standardized benchmarks, open-source frameworks, and realistic testbeds are essential to accelerate innovation and ensure secure, reliable, and intelligent deployment of LAIN systems in real-world environments.
Research on Low Leakage Current Voltage Sampling Method for Multi-cell Series Battery Packs
GUO Zhongjie, GAO Yuyang, DONG Jianfeng, BAI Ruokai
Available online  , doi: 10.11999/JEIT250733
Abstract:
  Objective  The battery voltage sampling circuit is a key component of the Battery Management Integrated Circuit (BMIC). It performs real-time monitoring of cell voltages, and its performance directly affects the safety of series battery packs. Traditional resistive voltage sampling circuits exhibit channel leakage current, which affects cell-voltage consistency and sampling accuracy. In addition, the level-shifting circuit in the high-voltage domain contains high-voltage operational amplifiers, and the use of many high-voltage MOSFETs increases area overhead.  Methods  This study proposes a low-leakage-current battery voltage sampling circuit for 14-series lithium batteries. Based on the traditional resistive sampling structure, channel leakage current is reduced to the pA level by designing an operational-amplifier-isolated active-drive technique. Voltage conversion methods are selected according to the voltage domain of each cell group. The first section of the battery uses a unity-gain buffer for isolation and then performs voltage conversion through resistive division. Sections 2 to 13 use operational-amplifier-isolated active driving to follow each cell voltage synchronously, after which the followed voltage is converted to a ground-referenced level through a level-shifting circuit. The voltage sampling process of the highest-section battery draws power from the entire battery stack and does not affect pack consistency; therefore, this section directly adopts the level-shifting circuit for voltage conversion.  Results and Discussions  The circuit was designed and verified using a 0.35 µm high-voltage BCD process. The overall layout area of the proposed sampling circuit is 3 105 µm × 638 µm (Fig. 10). Verification results show that, across different process corners and temperatures, the maximum channel leakage current after applying the isolated active-drive technique is only 48.9 pA. In contrast, the minimum leakage current of the traditional sampling circuit is 1.169 × 106 pA (Fig. 12, Fig. 13). The effect of the sampling process on cell-voltage inconsistency is reduced from 18.56% to 2.122 ppm (Fig. 14). Under full PVT verification, the maximum measurement error of the proposed sampling circuit is 0.9 mV (Fig. 15, Fig. 16, Fig. 17).  Conclusions  This study proposes an operational-amplifier-isolated active-drive technique to address the channel leakage issue in traditional resistive voltage sampling circuits, which affects cell-voltage consistency and measurement accuracy. Using the proposed circuit, the maximum channel leakage current is 48.9 pA, the cell-voltage inconsistency is 2.122 ppm, and the maximum measurement error is 1.25 mV. The circuit achieves very low leakage current while maintaining sampling accuracy. The proposed low-leakage-current sampling circuit is suitable for 14-series lithium battery management chips.
Vision-Guided and Force-Controlled Method for Robotic Screw Assembly
ZHANG Chunyun, MENG Xintong, TAO Tao, ZHOU Huaidong
Available online  , doi: 10.11999/JEIT251193
Abstract:
  Objective  With the rapid development of intelligent manufacturing and industrial automation, robots are increasingly applied to high-precision assembly tasks, especially screw assembly. However, current systems still face several challenges. The pose of assembly objects is often uncertain, which makes initial localization difficult. Small features such as threaded holes are blurred and difficult to identify accurately. Conventional vision-based open-loop control may also cause assembly deviation or jamming. This study proposes a vision–force cooperative method for robotic screw assembly. The method establishes a closed-loop assembly system that covers coarse positioning and fine alignment. A semantic-enhanced 6D pose estimation algorithm and a lightweight hole detection model are used to improve perception accuracy. Force-feedback control then adjusts the end-effector posture dynamically. This approach improves the accuracy and stability of screw assembly.  Methods  The proposed screw-assembly method is based on a vision–force cooperative strategy that forms a closed-loop process. In the visual perception stage, a semantic-enhanced 6D pose estimation algorithm addresses disturbances and pose uncertainty in complex industrial environments. During initial pose estimation, Grounding DINO and SAM2 generate pixel-level masks that provide semantic priors for the FoundationPose module. In the continuous tracking stage, semantic cues from Grounding DINO support translational correction. To detect small threaded holes, an improved lightweight hole detection algorithm based on NanoDet is designed. It uses MobileNetV3 as the backbone and adds a CircleRefine module in the detection head to estimate hole centers precisely. In the assembly positioning stage, a hierarchical vision-guided strategy is used. The global camera performs coarse positioning for overall guidance, while the hand–eye camera conducts local correction using hole detection results. In the closed-loop assembly stage, force-feedback control adjusts the posture to achieve accurate alignment between the screw and the threaded hole.  Results and Discussions  The method is validated experimentally in robotic screw assembly scenarios. The improved 6D pose estimation algorithm reduces the average position error by 18% and the orientation error by 11.7% compared with the baseline (Table 1). The tracking success rate in dynamic sequences increases from 72% to 85% (Table 2). For threaded hole detection, the lightweight NanoDet-based algorithm is evaluated on a dataset collected from assembly environments. It achieves 98.3% precision, 99.2% recall, and 98.7% mAP (Table 3). The model size is 11.7 MB and the computational cost is 2.9 GFLOPs, which are both lower than most benchmark models while maintaining high accuracy. A circular branch is introduced to fit hole edges (Fig. 8), providing accurate center predictions for visual guidance. Under different inclination angles (Fig. 10), the assembly success rate remains above 91.6% (Table 4). For screws of different sizes (M4, M6, and M8), the success rate remains above 90% (Table 5). Under small external disturbances (Fig. 12), the success rates reach 93.3%, 90%, and 83.3% for translational, rotational, and mixed disturbances, respectively (Table 6). Force-feedback comparison experiments show that the success rate is 66.7% under visual guidance alone. With force-feedback control, the rate increases to 96.7% (Table 7). The system maintains stable performance throughout complete screw-assembly cycles and achieves an average cycle time of 9.53 s (Table 8), meeting industrial assembly requirements.  Conclusions  This study presents a vision–force cooperative method that addresses key challenges in robotic screw assembly. The approach enhances target localization accuracy through a semantic-enhanced 6D pose estimation algorithm and a lightweight threaded hole detection network. The integration of hierarchical vision guidance and force-feedback control enables precise alignment between screws and threaded holes. Experimental results show that the method ensures reliable assembly under varied conditions, providing a practical solution for intelligent robotic assembly. Future work will focus on adaptive force control, multimodal perception fusion, and intelligent task planning to further improve generalization and self-optimization in complex industrial environments.
A Review of Joint EEG-fMRI Methods for Visual Evoked Response Studies
WEI Zhiwei, XIAO Xiaolin, XU Minpeng, MING Dong
Available online  , doi: 10.11999/JEIT250781
Abstract:
  Significance   The study of Visual Evoked Responses (VERs) using non-invasive neuroimaging is central to understanding human visual information processing. Electroencephalography (EEG) provides millisecond temporal resolution but has limited spatial precision. Functional Magnetic Resonance Imaging (fMRI) offers millimeter spatial resolution based on the blood-oxygen-level-dependent signal, although its temporal resolution is constrained by delayed hemodynamic responses. This trade-off limits the ability of any single modality to characterize complex visual processes such as attentional modulation, motion perception, and multisensory integration. Joint EEG-fMRI acquisition has therefore become an effective multimodal approach. By recording both modalities synchronously, this technique combines their complementary strengths and yields a unified spatiotemporal representation of visual neural dynamics. Despite increasing use, the literature lacks a focused review that summarizes core methods, representative applications, and continuing challenges in joint EEG-fMRI research on VERs. This review addresses this need by providing a structured overview for researchers working on visual system investigation.  Progress   The review first introduces the foundational technologies that support joint EEG-fMRI studies, beginning with synchronous data acquisition using MR-compatible EEG systems and dedicated synchronization hardware. The core data fusion methods are grouped into asymmetric and symmetric approaches. Asymmetric strategies use one modality to constrain analyses of the other. EEG-informed fMRI analysis models fMRI activity using single-trial EEG features, whereas fMRI-informed EEG source imaging uses fMRI activation maps as spatial priors to improve source localization. Symmetric fusion treats both modalities equally. Data-driven methods such as joint independent component analysis identify shared neural sources without imposing strong biophysical assumptions. These methods have contributed to advances in several areas. In visual mechanism studies, joint EEG-fMRI has clarified feedforward and feedback interactions in visual cortical networks. In clinical diagnosis and evaluation, it offers objective physiological markers for disorders such as amblyopia and epilepsy by revealing altered activation patterns and network dysfunction. In Brain-Computer Interface (BCI) research, multimodal feature fusion improves the accuracy and robustness of decoding visual intentions.  Conclusions  This review examines joint EEG-fMRI methods for VER studies, classifying major acquisition and fusion strategies and summarizing representative applications. The choice of fusion framework depends on the research objective, data quality, and underlying assumptions. Although joint EEG-fMRI benefits basic neuroscience, clinical diagnosis, and BCI development, several issues limit broader use. System-level obstacles include hardware-induced artifacts, particularly severe electromagnetic interference in ultra-high-field MRI, which degrades EEG data quality. Algorithmic challenges arise from the mismatch in spatiotemporal scales between rapid EEG signals and delayed hemodynamic responses. Inter-subject variability further reduces the generalizability of analytical and decoding models. Continued innovation in hardware engineering and computational methods is required to address these limitations.  Prospects   Future work in joint EEG-fMRI for VER studies is expected to progress gradually and will be shaped by advances in artificial intelligence. System-level developments include next-generation hardware combining ultra-high-field MRI systems with artifact-resilient EEG sensors and real-time correction algorithms. The creation of open, multi-center EEG-fMRI databases (following standards like BIDS) based on standardized formats and analysis pipelines will improve reproducibility and comparability. Algorithmic progress is likely to focus on artificial intelligence and deep learning. End-to-end neural architectures with spatiotemporal attention mechanisms may learn nonlinear transformations between EEG and fMRI directly, addressing limitations of conventional linear models. Transfer learning and personalized modeling may mitigate inter-subject variability and support adaptive decoding and clinical applications. As clinical and BCI uses expand, balancing model complexity with interpretability and computational efficiency will remain essential. These developments are expected to advance understanding of visual neural computation, improve diagnostic and therapeutic strategies, and support more effective BCI systems.
A Novel Prognostic Model Establishment and Treatment Efficacy Analysis for Primary Pulmonary Non-Hodgkin’s Lymphoma
LI Hui, LI Jiancheng, LIU Feng, WU Di, CHEN Chuanben, LI Jinluan
Available online  , doi: 10.11999/JEIT250874
Abstract:
  Objective  At present, few studies have examined Primary Pulmonary non-Hodgkin’s Lymphoma (PPL). Most available reports are single-center retrospective studies. Therefore, no widely accepted prognostic index or treatment strategy for PPL has been established. This study aims to develop and validate a novel prognostic index based on the International Prognostic Index (IPI) for PPL using data from the United States cancer population and Chinese multicenter cohorts. The study also compares the therapeutic effects of different treatment approaches to predict clinical prognosis and provide evidence to support treatment decision-making for PPL.  Methods  Clinical data from patients diagnosed with PPL were collected from two sources. The first source was the Surveillance, Epidemiology, and End Results (SEER) database of the United States, covering the period from 2000 to 2019. The second source included patients treated between 2010 and 2021 at three tertiary hospitals in China. Independent prognostic factors were identified using the Cox proportional hazards regression model. A nomogram was constructed to predict Cancer-Specific Survival (CSS). Model performance was evaluated using the Concordance index (C-index) and calibration curves. The nomogram was combined with the IPI to develop a novel prognostic index. Risk stratification was performed, and the 3-year Overall Survival (OS) rate was calculated for each risk group. The Inverse Probability of Treatment Weighting (IPTW) method was applied to reduce confounding factors. Survival analysis was conducted using Kaplan-Meier curves and the log-rank test.  Results and Discussions  A total of 4 313 cases from the SEER database and 107 cases from the Chinese multicenter cohort were included. Multivariate Cox regression analysis showed that independent prognostic factors for PPL included age (p<0.001; Hazard Ratio(HR), 1.078; 95% Confidence Interval(CI), 1.072\begin{document}$ \sim $\end{document}1.084), Ann Arbor stage (p<0.001), sex (p<0.001; HR, 0.719; 95% CI, 0.624\begin{document}$ \sim $\end{document}0.829), primary site (p=0.037), pathological type (p< 0.001), B symptoms (p= 0.012; HR, 0.944; 95% CI, 0.773\begin{document}$ \sim $\end{document}0.997), surgery (p< 0.001; HR, 1.453; 95% CI, 1.221\begin{document}$ \sim $\end{document}1.728), chemotherapy (p<0.001; HR, 0.742; 95% CI, 0.631\begin{document}$ \sim $\end{document}0.872), and marital status (p<0.001). Based on these factors, a nomogram predicting 3-, 5-, and 10-year CSS was established. By integrating the nomogram with the IPI, a prognostic model for PPL was developed with a C-index of 0.932. Using defined risk parameters, a novel prognostic index for PPL was constructed. The risk parameters included age>60 years, Ann Arbor stage III/IV, serum Lactate DeHydrogenase (LDH) level>1 times the normal level, performance status score>2, number of extranodal sites>1, male sex, pathological type other than Mucosa-Associated Lymphoid Tissue (MALT) lymphoma, presence of B symptoms, and absence of cancer treatment. Risk stratification was defined as follows: low-risk group (0\begin{document}$ \sim $\end{document}2 risk factors), low-intermediate-risk group (3\begin{document}$ \sim $\end{document}4 risk factors), high-intermediate-risk group (5 risk factors), and high-risk group (6\begin{document}$ \sim $\end{document}9 risk factors). The corresponding 3-year OS rates were 96.97%, 82.61%, 50.00%, and 11.11%, respectively (p<0.001). In the analysis of treatment efficacy, both the United States and Chinese datasets showed that chemotherapy significantly reduced CSS in patients with primary pulmonary MALT lymphoma (p<0.001). No significant difference was observed between surgery and radiotherapy in patients with either primary pulmonary MALT lymphoma or diffuse large B-cell lymphoma (p>0.05).  Conclusions  This study develops a novel prognostic index for PPL based on data from the United States cancer population and a Chinese multicenter cohort. The model includes age, disease stage, serum LDH level, performance status score, and number of extranodal sites. The index demonstrates strong predictive performance and accuracy. Risk stratification based on this index provides estimated 3-year OS rates for different risk groups. Treatment efficacy analysis indicates that chemotherapy may reduce CSS in patients with primary pulmonary MALT lymphoma. In addition, no significant difference is observed between surgery and radiotherapy in patients with primary pulmonary MALT lymphoma or diffuse large B-cell lymphoma.
Intelligent Analysis Technologies for Encrypted Traffic: Current Status, Advances, and Challenges
GONG Bi, LIU Jian, TANG Xiaomei, YU Meiting, GONG Hang, HUANG Meigen
Available online  , doi: 10.11999/JEIT250416
Abstract:
  Significance   Encrypted traffic enables secure and reliable data transmission, yet introduces challenges to network security. These include the covert spread of malicious attacks, reduced effectiveness of security tools, and increased network resource overhead. Encrypted traffic analysis technologies are therefore essential. Traditional port filtering and deep packet inspection are inadequate in increasingly complex network environments. Intelligent encrypted traffic analysis integrates feature engineering, deep learning, Transformer architectures, federated learning, multimodal feature fusion, and generative models. These approaches address network security management from multiple perspectives. They support efficient detection of hidden attacks, improve network resource allocation, balance system security and privacy protection, enhance security defenses, and strengthen user experience.  Progress   Intelligent encrypted traffic analysis technologies provide new methods for network security. (1) Feature engineering: (a) Statistical features: Basic statistical features of encrypted traffic, such as packet size, count, arrival time, and rate, are selected through feature selection techniques so that the processed data reflect internal traffic characteristics. (b) Behavioral features: Observation and analysis of network traffic identify behavioral patterns such as access frequency and protocol usage habits. (2) Deep learning methods: (a) Convolutional Neural Network (CNN): Convolution and pooling layers automatically extract local features from encrypted traffic and capture key information. An improved multi-scale CNN achieves 86.77% accuracy on the ISCXVPN2016 dataset. (b) Recurrent Neural Network (RNN): RNNs process time-series data through memory units and capture long-term dependencies, enabling analysis of temporal features such as connection duration and traffic trends. (c) Graph Neural Network (GNN): GNNs are suited to relational data and model the graph structures of encrypted traffic to identify potential node relationships. (d) Transformer architectures: With parallel processing and support for long sequences, attention mechanisms capture long-distance dependencies. A traffic Transformer method using masked autoencoders reaches 98.07% accuracy on the ISCXVPN2016 dataset. (3) Other advanced methods: (a) Federated learning: Participants train a shared global model by exchanging sub-model parameters rather than raw traffic data, which protects privacy and improves performance. Reported results show performance gaps relative to centralized learning reduced to 0.8%. (b) Multimodal feature fusion: Features extracted from multiple traffic modalities are fused into a unified representation to build a comprehensive analysis architecture. This integration of heterogeneous features improves model performance, raising accuracy and F1-score for multitask classification to 93.75% and 91.95%. (c) Generative model-driven methods: Generative Adversarial Networks (GAN) and diffusion models learn real traffic distributions to generate synthetic samples, which mitigate data scarcity and class imbalance. Diffusion-based traffic generation increases similarity to real traffic in packet size and inter-arrival time by up to 43.4% and 39.02% compared with baseline models.  Conclusions  This paper explains the necessity of intelligent encrypted traffic analysis technologies and summarizes key methods and related research. Remaining challenges include: (1) Network complexity: Modern networks are heterogeneous and dynamic, using diverse encryption algorithms and producing inconsistent traffic structures that traditional rules do not adapt to. Network adjustments and behavior changes also shift traffic features over time, which complicates analysis. (2) Insufficient model robustness: Encrypted traffic features depend strongly on environment. Accuracy decreases after model migration, and models remain sensitive to non-ideal inputs and adversarial examples, which affect model decisions. (3) Privacy protection and compliance: Encrypted traffic carries sensitive information, and conventional analysis risks exposing original features. Even metadata can be associated with identities, which complicates compliance with anonymization requirements.  Prospects   Future work may focus on: (1) Dynamic adaptability: Full-link adaptive mechanisms that integrate multi-dimensional information may support dynamic context awareness. Incremental learning frameworks may help models respond in real time to feature drift. Genetic algorithms and reinforcement learning may also support dynamic detection strategies. (2) Anti-attack capability: A comprehensive protection system that includes adversarial sample detection, model defense, and attack traceability may be established by designing monitoring modules and applying adversarial training. (3) Privacy protection and compliance: Differential privacy can be applied by adding controlled noise during feature extraction or to model parameters. Homomorphic encryption may support analytical tasks directly on ciphertext. (4) Synergy between reverse engineering and Explainable AI (XAI): Reverse engineering may deepen protocol analysis and enhance the quality of inputs for XAI, and XAI may improve model transparency. This supports closed-loop optimization between protocol analysis and model interpretation.
Hybrid PUF Tag Generation Technology for Battery Anti-counterfeiting
HE Zhangqing, LUO Siyu, ZHANG Junming, ZHANG Yin, WAN Meilin
Available online  , doi: 10.11999/JEIT250967
Abstract:
  Objective  A global shift toward a low-carbon economy has increased the importance of power batteries as energy storage devices. The traceability and security of their life cycle are central to industrial governance. In 2023, the Global Battery Alliance (GBA) proposed the Battery Passport, which requires each battery to carry a unique, tamper-resistant, and verifiable digital identity. Conventional digital tags, such as QR codes and RFID, rely on static pre-written storage and remain vulnerable to physical cloning, data extraction, and environmental degradation. This study proposes a battery anti-counterfeiting tag generation technology based on a hybrid Physical Unclonable Function (PUF). The method applies physical coupling among the battery, PCB, and IC to generate a unique battery ID, and ensures strong physical binding and system-level anti-counterfeiting performance.  Methods  The tag includes four modules: an off-chip RC battery fingerprint extraction circuit, an on-chip arbiter PUF module, an on-chip delay compensation module, and a reliability enhancement module. The off-chip RC circuit uses the physical coupling between the battery negative tab and the PCB copper-clad area to form a capacitor structure that introduces manufacturing variation as an entropy source. The arbiter PUF converts these deviations into a unique digital signature. To reduce bias caused by asymmetric routing and off-circuit delay, a programmable delay compensation module with coarse and fine-tuning stages is used. The reliability enhancement module filters unstable response bits by tracking delay deviation, and improves response reliability without complex error-correcting codes.  Results and Discussions  The structure was implemented and tested using an FPGA Spartan-6 chip, a custom PCB, and 100 Ah blade batteries. The randomness reached 48.85%, and uniqueness averaged 49.15% under normal conditions (Fig. 11). Stability (RA) reached 99.98% at room temperature and nominal voltage, and remained above 98% at 100 ℃ and 1.05 V (Fig. 12). To evaluate anti-desoldering performance, three tampering scenarios were tested: battery replacement, PCB replacement, and IC replacement. The average response change rates were 14.86%, 24.58%, and 41.66%, respectively (Fig. 13). These results show strong physical binding among the battery, PCB, and chip, and confirm that the triple physical coupling mechanism resists counterfeiting and tampering.  Conclusions  This study presents a battery anti-counterfeiting tag generation technology based on a triple physical coupling mechanism. By binding the battery tab, PCB, and chip into a unified physical structure and extracting fingerprints from manufacturing variation, the method provides high randomness, uniqueness, and stability. The tag is highly sensitive to physical tampering and supports reliable battery authentication across its life cycle. Future work will examine the structure using more advanced fabrication processes and different PCB manufacturers, and will further refine the design for broader application.
Multi-projection plane InISAR 3D reconstruction method for complex moving ship targets
LI Ning, NIU Jinfa, WANG Weibin, HU Xingwang, WU Lin
Available online  , doi: 10.11999/JEIT251268
Abstract:
  Objective  Interferometric Inverse Synthetic Aperture Radar (InISAR) is a Three Dimensions (3D) reconstruction technique for non-cooperative target. However, the complex 3D rotational motion of the ship target causes unstable Doppler frequency changes, and Inverse Synthetic Aperture Radar (ISAR) imaging inevitably suffers from target overlap and occlusion problems, making high-precision complete 3D reconstruction difficult under a single projection plane. Thus, a multi-projection planes InISAR 3D reconstruction method of complex moving ship targets based on point cloud fusion is proposed. Through efficient and high-precision point clouds registration and fusion supplement target 3D information, significantly improving the 3D reconstruction quality.  Methods  This method fully leverages the advantages of multi-plane observation from the severe movement of ship targets, extracts the ship’s centerline and estimates the vertical rotation vector via Principal Component Analysis (PCA), to select the optimal imaging time corresponding to different Imaging Projection Planes, completes ISAR imaging and InISAR 3D reconstruction. Secondly, a point cloud fusion algorithm combining Weighted Random Sampling Consensus (RANSAC) and Hierarchical Iterative Closest Point (ICP) is proposed. The random sampling process is optimized through a feature stability weighting strategy, efficiently extracting and matching corresponding feature points in InISAR images, achieving high-precision multi- Imaging Projection Plane (IPP) point cloud fusion.  Results and Discussions  Experimental results demonstrate that the proposed method significantly enhances reconstruction accuracy and target completeness. For simulated ship point target data, Fig 7 shows excellent results, with a significant reduction in reconstruction error. Signal-to-noise ratio (SNR) analysis reveals that 3D fusion imaging quality improves continuously as SNR increases from –10 dB to 10 dB, maintaining robust fusion performance even under low SNR conditions. For simulated destroyer radar cross section data, this method achieved significant registration results, and the detail recovery and structural integrity of the fused image were significantly improved, effectively solving the problem of incomplete 3D information reconstruction caused by overlapping and occlusion of scattering points.  Conclusions  To address the issues of low reconstruction accuracy and information loss caused by target rotation, overlapping, and occlusion in traditional InISAR methods for 3D reconstruction of complex moving ship targets, this paper proposes a multi-IPP InISAR 3D reconstruction method based on point cloud fusion. This method employs a PCA optimal imaging time selection strategy, By employing weighted RANSAC and hierarchical ICP algorithms to achieve efficient and high-precision registration and fusion of InISAR point clouds under multiple IPPs, obtaining high-quality 3D reconstruction results. This paper conducts multi-scenario experiments by constructing a ship model with ideal scattering points and an electromagnetic simulation RCS model with occlusion effects, verifying the accuracy of the proposed method under ideal conditions and its applicability in complex real-world scenarios.
PAPR Reduction Theory and Method for OTFS Systems via Nonzero-Unitary Precoding
ZENG Junlong, JIANG Zhanjun, LIU Haoxiang, ZHANG Huawei, LI Cuiran
Available online  , doi: 10.11999/JEIT250888
Abstract:
  Objective  OTFS and its variants provide robust performance in high-mobility doubly selective channels, yet their inherently high peak-to-average power ratio (PAPR) limits power-amplifier efficiency and practical implementation. Recent observations reveal a theory–practice mismatch: some OTFS variants achieved by changing the orthogonal basis (e.g., DCT-based designs) can reduce PAPR while maintaining an OTFS-like bit error rate (BER), although the prevailing explanation mainly attributes reliability to constant-modulus unitary transforms and does not directly justify such non-constant-modulus cases. As a result, it remains unclear which unitary bases preserve the channel-hardening behavior that stabilizes effective gains and protects BER, and which unitary choices may degrade performance even though they are mathematically unitary. The objective of this paper is to close this gap by establishing a verifiable and more general condition that characterizes BER-robust unitary precoding, and by developing a waveform/precoder design approach that suppresses PAPR without sacrificing reliability for OTFS and typical OTFS-like variants.  Methods  A nonzero-unitary precoding based waveform design framework is established. An upper-bound characterization of the effective channel-gain fluctuation is derived, and it is shown that, when the precoder satisfies a nonzero and near-uniform energy-spreading condition, the variance of the effective channel coefficients decreases with the growth of the time–frequency grid, which indicates the emergence of a channel-hardening effect. Motivated by this result, the waveform design is formulated as a peak-power minimization problem over the unitary precoder, where the objective is to reduce the maximum instantaneous power while preserving the unitary structure required by the modulation framework. A CVX-based solver is employed to provide a performance reference benchmark for the formulated objective. For engineering implementation, an efficient algorithm is developed by the Alternating Direction Method of Multipliers (ADMM), in which the original nonconvex design is decomposed into low-cost sub-updates together with a unitary projection step, enabling scalable computation.  Results and Discussions  Simulation results under representative doubly selective channels with high terminal speeds indicate that the proposed precoder design achieves noticeable PAPR suppression while maintaining the bit error rate (BER) close to that of conventional constant-modulus unitary precoding. In addition, the CVX-based benchmark is used to reveal the attainable performance region, and the ADMM-based implementation is shown to approach this reference with a favorable PAPR–BER trade-off. The computational advantage is also validated: compared with general-purpose convex optimization, the ADMM solver reduces the overall runtime/complexity by roughly three orders of magnitude for typical OTFS parameter settings, which supports real-time or near-real-time deployment. The observed performance trends are consistent with the theoretical insight that near-uniform energy spreading stabilizes effective channel gains and prevents “spiky” basis vectors from degrading robustness. Furthermore, the framework is applicable to OTFS variants, since basis selection and waveform shaping can be equivalently interpreted as unitary-precoder design within the same optimization architecture.  Conclusions  A theoretical and algorithmic solution for PAPR suppression in OTFS systems is presented via nonzero-unitary precoding. Channel hardening is established under a nonzero and near-uniform energy-spreading condition, providing a principled justification for searching low-PAPR solutions beyond constant-modulus transforms. A peak-power minimization formulation is adopted to translate this insight into waveform optimization, and a CVX benchmark is provided to quantify the achievable performance reference. A low-complexity ADMM algorithm is then constructed to deliver scalable computation through simple sub-updates and unitary projection, while keeping BER performance essentially unchanged. The proposed approach offers a unified low-PAPR waveform design paradigm for OTFS and its variants, featuring theoretical generality, computational efficiency, and controllable performance under high-mobility doubly selective channels.
Research on Ultrasound Imaging Algorithm Fused with Diffusion Model
YUAN Ye, HUANG Minshang, YANG Weifeng
Available online  , doi: 10.11999/JEIT251083
Abstract:
  Objective   Medical ultrasound imaging, which utilizes ultrasonic waves to probe human tissues and generates images via signal processing of the returning echoes, has become a vital clinical diagnostic tool due to its non-invasive, safe, and real-time nature. However, conventional ultrasound imaging is fundamentally limited by factors such as the finite width of ultrasonic pulses, variations in tissue acoustic impedance, and the complexity of echo signals, leading to pervasive challenges including insufficient spatial resolution, significant speckle noise, and off-axis artifacts. These limitations directly impair the detection of lesions and diagnostic accuracy. While traditional approaches focusing on hardware optimization and signal processing algorithms like adaptive beamforming have made incremental improvements, they are often constrained by physical laws, computational complexity, and reliance on manual parameter tuning. Recent deep learning-based methods, particularly those using generative adversarial networks (GANs), offer promising results but suffer from training instability and poor interpretability. The emerging diffusion model, a state-of-the-art generative paradigm, has demonstrated superior robustness and generalization in computed tomography (CT) and magnetic resonance imaging (MRI) reconstruction, yet its application in ultrasound imaging remains largely unexplored. This study aims to fill this critical gap by developing a novel diffusion model-based framework for high-quality ultrasound image formation, seeking to overcome the inherent limitations of existing methods and provide a stable, efficient, and interpretable solution for enhancing ultrasound image quality.  Methods   This research proposes a novel ultrasound imaging method based on a denoising diffusion probabilistic model (DDPM). The core of our approach is a multi-scale diffusion network architecture designed to progressively refine a low-quality ultrasound image (e.g., one formed by a simple Delay-and-Sum, DAS, beamformer) into a high-quality image. The process consists of a forward and a reverse process. In the forward process, Gaussian noise is gradually added to a high-quality ground-truth image over a series of timesteps. The reverse process is trained to learn the conditional denoising function. Our custom-designed denoising network takes a low-resolution DAS image as a conditional input and fuses it with the noisy image at each denoising step through residual connections and feature-wise transformations at multiple scales. This deep fusion mechanism allows the network to effectively incorporate the underlying anatomical structure from the low-quality input while iteratively removing noise and artifacts through the diffusion process. The model was trained using a dataset of paired low-quality and high-quality ultrasound images, with the high-quality images serving as the training target. The training objective was to maximize the variational lower bound on the likelihood, effectively teaching the network to reverse the noising process. The performance of the proposed method was quantitatively evaluated against traditional DAS, minimum variance (MV) beamforming, and a leading GAN-based super-resolution method using metrics including Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM).  Results and Discussions   The proposed diffusion model demonstrated superior performance in enhancing ultrasound image quality. Quantitatively, our method achieved a mean PSNR of 35.2 dB and an SSIM of 0.933, representing a significant improvement of 4.5 dB in PSNR while maintaining exceptional structural fidelity compared to conventional beamforming approaches. The method also consistently outperformed adaptive minimum variance beamforming and GAN-based approaches across all evaluation metrics, including contrast-to-noise ratio. Visual assessment confirms these quantitative findings. The generated images exhibit markedly reduced speckle noise and significantly enhanced boundary clarity of anatomical structures. Critically, these improvements were achieved without introducing the blurring or artificial textures commonly observed in other deep learning-based methods. The multi-scale architecture with conditional feature injection effectively preserved structural integrity, as evidenced by the clear and continuous edges in the output.The progressive denoising nature of our approach provides inherent interpretability to the image refinement process. Unlike the opaque single-step generation of other deep learning models, our method offers transparent, step-wise enhancement from initial input to final output. Furthermore, the training process remained stable and convergent, avoiding the instability issues that frequently plague adversarial training methods. Ablation studies confirmed the critical importance of the deep fusion mechanism, while resolution analysis verified substantial improvements in both lateral and axial resolution compared to all baseline methods.  Conclusions   This study successfully developed and validated a novel ultrasound imaging method based on a diffusion model. The proposed framework effectively addresses key limitations in conventional and existing deep learning-based approaches. It bypasses the complex matrix computations and manual parameter tuning required by adaptive beamformers and offers a more stable training paradigm compared to GANs. The results conclusively demonstrate that the method can significantly enhance image quality by substantially improving the PSNR and maintaining excellent structural similarity, leading to images with suppressed noise, reduced artifacts, and improved resolution. The multi-scale diffusion process ensures the preservation of anatomical structures while providing a degree of interpretability to the image generation process. This work establishes diffusion models as a powerful and promising new paradigm for advanced ultrasound imaging, offering a robust and high-performance technical pathway to break through the current bottlenecks in ultrasound image quality, with potential for broad clinical impact.
DGCN-MFW: A Lightweight Human Action Recognition Network for Millimeter-Wave Radar 3D Point Clouds
DING Xuanyu, JIN Biao, ZHANG Zhenkai
Available online  , doi: 10.11999/JEIT251087
Abstract:
  Objective  Millimeter-wave radar 3D point clouds offer key spatial cues for human action recognition, but their inherent disorder challenges feature extraction, and actions depend on multi-frame temporal correlations, making single-frame analysis error-prone. This paper propose a dynamic graph convolutional network that fuses multi-scale features, adaptively weights frames, and uses cross-attention, tailored to long 3D point-cloud sequences to improve recognition performance and efficiency.  Methods  This paper proposes a dynamic graph convolutional network solution (DGCN-MFW) with three core components: dynamic graph convolution feature extraction, multi-scale feature fusion, and adaptive temporal frame weighting. Step 1: Use dynamic graph convolution to automatically build spatial geometry via local directed neighborhood graphs and update neighborhoods online, avoiding manual graph construction and improving feature robustness. Step 2: Apply multi-scale feature fusion to jointly extract and integrate point-cloud features across space and time, capturing local details and global semantics. Step 3: Introduce adaptive frame weighting to learn per-frame importance, highlight discriminative key frames, and suppress noisy or unimportant frames; cross-attention enables information exchange between the center frame and context, compensating for single-frame deficits caused by motion blur, occlusion, or pose ambiguity.  Results and Discussions  The proposed network model extracts features via dynamic graph convolution, then conducts multi-scale feature fusion and adaptive frame weighting, and ultimately accomplishes human action recognition. It performs excellently on public TI and Vayyar millimeter-wave radar point cloud datasets, with only 2.06M parameters and 4.51 GFLOPS of computation, outperforming existing methods (Tables 2, 3, 4). Ablation experiments prove both core modules significantly boost recognition accuracy (Table 1). Confusion matrices show it hits over 99% accuracy for most actions on the two datasets, exhibiting superior performance (Fig. 10, 11). Nevertheless, its scale, parameter count and efficiency in large-scale data processing need improvement, and future work will focus on model lightweighting and architectural optimization to enhance efficiency.  Conclusions  To address the two major challenges in mmWave radar 3D point-cloud human action recognition, this paper proposes an action recognition algorithm based on dynamic graph convolutional network and multi-feature fusion. It uses a multi-scale feature fusion module and cross-scale interaction to extract local and global features, improving spatial representation. An adaptive frame-weighting module and cross-attention mechanism are adopted to capture the temporal evolution of actions. The method achieves 98.32% and 99.48% accuracies on two datasets with 2.06M parameters and 4.51 GFLOPs, outperforming mainstream models. It provides a new solution for high-precision and low-resource mmWave radar action recognition, suitable for real-time scenarios like industrial human–machine interaction, intelligent security and healthcare.
Construction and Scene Classification Research of Entropy-Driven Adaptive Fusion Networks for High-Resolution Remote Sensing Images
SONG Wanying, LIU Yuchen, WANG Jie, WANG Anyi
Available online  , doi: 10.11999/JEIT251147
Abstract:
  Objective  Remote sensing image scene classification aims to assign semantic labels to aerial or satellite imagery. With the rapid development of earth observation technologies, high-resolution remote sensing images contain abundant details but highlight significant challenges, including complex spatial structures, large scale variations, high intra-class variance, and strong inter-class similarity. Traditional Convolutional Neural Networks (CNNs) achieve notable success in local spatial modeling but struggle to adequately model long-range dependencies due to fixed receptive fields. To overcome this, CNN-Transformer hybrid architectures are proposed to balance local details and global semantics. However, such models typically employ simple concatenation when fusing multi-scale features, introducing redundancy and weakening discriminability. Furthermore, while the Swin Transformer utilizes window-based self-attention to capture contextual information, it exhibits profound limitations when processing complex high-resolution images. Specifically, cross-window long-range dependency modeling is restricted by the fixed window size. The extraction of fine-grained local features is also limited, as deep networks tend to ignore crucial fine texture supplements from low- and mid-level features. Moreover, existing multi-level feature fusion strategies lack semantic guidance, easily introducing background noise. Therefore, constructing a network that balances global contextual modeling with local discriminability while realizing adaptive fusion remains a critical problem.  Methods  To address the limitations of cross-window interaction and the lack of semantic guidance during multi-level feature fusion, an Entropy-driven Adaptive Fusion Swin Transformer Network (E-AF-ST) is proposed. The architecture utilizes a lightweight Swin-Tiny backbone and embeds two key innovative modules: the Attention-guided Region Selection and Feature Optimization Module (ASO) and the Entropy-driven Gated Fusion Module (EGF) (Fig. 1). The ASO module resolves the weak cross-window interaction and insufficient fine-grained feature extraction of the Swin Transformer through three consecutive stages (Fig. 2a). First, a cross-window sparse attention computation eliminates physical window boundaries. By expanding the patch partition size, sparse attention is applied across the entire image sequence, capturing global contextual correlations spanning the whole image. Second, dynamic region selection is executed. Based on a pixel-level entropy measurement, a Multilayer Perceptron maps entropy features into attention scores, and a Top-K masking strategy dynamically screens the most informative discriminative regions. Third, feature recursive optimization applies multi-head self-attention and layer normalization at the local scale to progressively enhance boundaries and micro-structural information. Subsequently, the EGF module integrates the Swin Transformer output features, the globally enhanced context features, and the locally optimized features to mitigate semantic discrepancies (Fig. 2b). Initially, energy normalization is conducted using the Frobenius norm to obtain a probabilized energy distribution. Then, an entropy-driven gated fusion mechanism computes the Shannon entropy for each branch. A learnable soft-normalization gating function maps the entropy information into normalized fusion weights, automatically reducing the weight of branches exhibiting high entropy due to cluttered backgrounds. Finally, the fused representations undergo lightweight recursive optimization utilizing depth-wise separable convolutions and GELU activation functions with residual connections to suppress redundant information. The forward propagation process is systematically summarized in the algorithm (Algorithm 1).  Results and Discussions  To validate the discriminative capability of the proposed network, extensive experimental evaluations were conducted on two widely adopted public datasets: the AID dataset and the NWPU-RESISC45 dataset. The proposed E-AF-ST network demonstrates superior classification performance compared to existing advanced methods (Table 1). On the AID dataset, the model achieves state-of-the-art overall accuracies of 95.56% and 97.21% under 20% and 50% training ratios. On the challenging NWPU-RESISC45 dataset, it achieves highest accuracies of 92.45% and 94.59% under 10% and 20% training ratios. The confusion matrices reveal that the recognition accuracy for most categories exceeds 95% (Fig. 7), and misclassification proportions in classes with complex backgrounds are significantly lower than the baseline model (Fig. 8). Visual analysis using Grad-CAM technology validates the advantages of the E-AF-ST network in global contextual modeling and critical region screening. Compared to the Swin-Tiny baseline, the proposed network demonstrates precise semantic focusing capabilities (Fig. 10). In "airport" and "port" scenes, the model successfully suppresses background noise, accurately highlighting key targets. In structurally complex scenes like "viaducts" and "railway stations", it comprehensively captures the extension directions and textures. Ablation experiments confirm that the cross-window sparse attention in the ASO module and the dynamic weight allocation in the EGF module are highly complementary. Furthermore, the E-AF-ST network achieves this performance enhancement with a minimal parameter increase, totaling only 30.45M parameters and 4.72G FLOPs.  Conclusions  This paper proposes an Entropy-driven Adaptive Fusion Swin Transformer Network (E-AF-ST) to tackle insufficient local discriminative information extraction, cross-scale feature inconsistency, and semantic redundancy in high-resolution remote sensing image scene classification. By introducing information entropy as a guiding metric, the ASO module achieves precise screening and recursive optimization of discriminative regions, while the EGF module realizes adaptive, redundancy-free integration of multi-source features. Experimental and visual results demonstrate that the proposed method effectively overcomes complex background interference, outperforming existing mainstream CNN and Transformer hybrid architectures. This work provides a novel theoretical perspective and technical pathway for addressing multi-scale target perception and feature semantic alignment.
Multi-dimensional Resource Joint Optimization Algorithm for UAV Inspection of Collaborative Tasks of Perception and AI
LI Shiyang, ZHU Xiaorong
Available online  , doi: 10.11999/JEIT251284
Abstract:
  Objective  With the increasing demand for aerial activities, the operational capabilities of various aircraft are gradually expanding to all airspace and multiple industries. The application scope of UAVs has covered multiple altitude layers from low to high altitudes, including micro, medium and large models, and is widely used in public safety, transportation, emergency management, logistics distribution, geographic surveying and mapping and other fields, continuously promoting the innovation and transformation of production and lifestyle. Compared with the traditional manual inspection method, UAV inspection, as an emerging business, can obtain image information that is difficult for the human eye to capture, which not only significantly reduces labor costs, but also improves the accuracy and efficiency of inspection operations. However, UAV inspection also poses new challenges to the allocation of multidimensional resources and task scheduling planning. Taking power system inspection as an example, transmission lines are exposed to the outdoors for a long time and are prone to corrosion, aging and even damage, and need to rely on regular inspections to ensure operational safety.  Methods  The four-stage multi-dimensional resource inspection and scheduling collaborative optimization algorithm decomposes the original optimization problem into four sub-problems based on the inspection process. After mathematical analysis of each sub-problem, a corresponding solution method is proposed. For the node selection problem, a dual-aided MILP transformation method is used; for the UAV data acquisition problem, a data-driven boundary learning method is employed; for the UAV communication resource allocation problem, a bandwidth-power joint optimization algorithm based on SCA is used; and for the node computing power allocation problem, a lower-bound analytical allocation method is employed. Finally, the original problem is solved using an alternating optimization method for the sub-problems, forming the entire algorithm.  Results and Discussions  Simulation results show that the proposed algorithm improves the overall energy consumption of the UAV compared to the comparative algorithms. This paper conducts simulation training on visual positioning and fault detection services, investigating the relationship between compression ratio and data volume and both. Figures 2-5 show that the fault detection accuracy is optimal when using 60% data volume and 60% compression ratio. Visual positioning accuracy is optimal when using 80% data volume and 80% compression ratio. Figure 6 shows that the proposed algorithm outperforms the comparative algorithms in terms of accuracy for AI services. As shown in Figures 7 and 8, with changes in bandwidth, computing power, and other resources, the proposed algorithm consistently outperforms the comparative algorithms in terms of energy consumption, effectively reducing overall energy consumption.  Conclusions  This paper proposes a multi-dimensional resource joint optimization algorithm for intelligent UAV inspection, focusing on the collaborative optimization of perception and AI. It forms an optimization problem with minimizing UAV energy consumption as the objective, and bandwidth, power, computing power, node selection, data volume, and actual compression ratio as variables. This algorithm simultaneously minimizes UAV energy consumption for both fault detection and visual localization, two AI services. Simulation results show that the algorithm can reduce the total energy consumption of the UAV and improve the accuracy of model training. This research focuses on the application scenario of single-UAV inspection; future research can further explore more complex multi-UAV collaborative inspection scenarios and incorporate more services for comprehensive study.
Reconfigurable Intelligent Surface Assisted Key Generation Resistant to Signal Injection Attacks
YANG Lijun, WANG Haomin, ZHU Tiancheng, WU Meng
Available online  , doi: 10.11999/JEIT251281
Abstract:
  Objective  This study examines the potential threat of signal injection attacks to Physical Layer Key Generation (PLKG) in Reconfigurable Intelligent Surface (RIS)-assisted wireless systems. The threat is especially pronounced in quasi-static channels, where the channel state remains highly correlated across multiple probing rounds. From both attack and defense perspectives, the study clarifies how spatial correlation between RIS reflection channels and eavesdropping channels can be exploited to improve key inference. A channel-randomization mechanism is designed that uses the controllability of RIS to suppress key leakage, reduce the eavesdropper’s key capacity, and improve the security of RIS-assisted PLKG in future 6G scenarios. Quantitative analysis further examines the relationships among injection power, Signal-to-Noise Ratio (SNR), and spatial correlation. These results provide reference guidance for robust RIS configuration and secure system design.  Methods  An RIS-assisted Time-Division Duplex (TDD) system is considered. Single-antenna Alice and Bob generate symmetric keys from a reciprocal channel, whereas a two-antenna active eavesdropper, Eve, injects signals using previously observed Channel State Information (CSI) (Fig. 1). The links follow quasi-static Rayleigh block fading. CSI for Alice, Bob, and Eve is defined for each time slot within a coherence interval. A conventional injection attack is first modeled. Eve estimates the eavesdropping channel in one slot, precodes an injected waveform, and contaminates the subsequent probing at Alice and Bob, partially steering their key source. A joint key inference strategy is then proposed. This strategy exploits the spatial correlation between RIS reflection channels and eavesdropping channels, as well as the common RIS-induced subchannel shared by legitimate and eavesdropping links (Table 1). As a defense, a channel-randomization PLKG scheme is proposed. Alice randomly reconfigures RIS coefficients at each probing round. Therefore, the effective channels of Alice-Bob, Alice-Eve, and Bob-Eve vary independently across rounds, whereas Alice-Bob reciprocity within a single round is preserved. Injection signals precoded with outdated CSI therefore appear as uncorrelated interference at the legitimate nodes. Mutual-information-based bounds on secret-key capacity are derived to obtain key capacities. The eavesdropper’s Key Recovery Rate (KRR) is defined for performance evaluation. The theoretical results are validated through MATLAB Monte Carlo simulations with 10,000 trials using an information-theoretic estimator toolbox. The simulations examine different SNR levels, injection power values, and spatial correlation conditions (Figs. 2\begin{document}$ \sim $\end{document}5, Table 2).  Results and Discussions  Analysis of the conventional injection attack without RIS defense shows that at high SNR, Alice and Bob observe nearly identical reciprocal channels due to channel reciprocity. Eve’s estimate, derived from injected signals, follows a similar trend but shows noticeable mismatch (Fig. 2). Eve can therefore recover some key bits, although errors remain, and the KRR remains moderate. When the proposed joint key inference strategy is applied, Eve’s reconstructed channel more closely matches the legitimate response (Fig. 3). This effect arises because RIS-assisted PLKG causes legitimate and eavesdropping links to share an RIS-induced subchannel. The resulting spatial correlation provides additional exploitable information beyond the known injected signal. Therefore, Eve’s key capacity and KRR increase significantly, which indicates a stronger RIS-specific security threat. At fixed SNR (Fig. 4), Eve’s key capacity without defense increases rapidly with injection power and may approach or exceed the legitimate key capacity. Under RIS randomization, the legitimate capacity decreases slightly, whereas Eve’s capacity remains small and nearly constant. This result indicates that randomization converts structured injection signals into noise. Spatial-correlation analysis in Fig. 5 shows that Eve’s capacity without defense increases rapidly and becomes critical as correlation approaches one. In contrast, under RIS randomization the increase is gradual, and the capacity may remain near zero at moderate correlation levels. Table 2 confirms these trends in terms of KRR. The KRR is about 50% without correlation and injection. It increases to about 62.5% when injection is applied but spatial correlation is zero, whereas the defense keeps the value close to random guessing. When spatial correlation and injection power are higher, the KRR exceeds 80%. The proposed defense reduces this value to approximately 57%~66%.  Conclusions  This study examines the dual role of RIS in PLKG security. RIS can increase vulnerability but can also serve as an effective defensive mechanism. By exploiting the correlation between RIS reflection channels and eavesdropping channels, a joint key inference attack is developed that increases the eavesdropper’s key capacity and recovery rate compared with conventional injection attacks. This result reveals a new attack vector in RIS-assisted systems. A channel-randomization PLKG scheme is then proposed by exploiting the dynamic controllability of RIS. The scheme shortens the effective coherence time to a single probing round and decorrelates successive channel realizations from the attacker’s perspective. Theoretical analysis and Monte Carlo simulations show that the proposed scheme converts malicious injection signals into uncorrelated interference, reduces the eavesdropping key capacity, and pushes the eavesdropper’s KRR close to random guessing. This property remains effective even under high SNR, strong spatial correlation, and high injection power. The scheme achieves these security improvements with low hardware overhead compared with reconfigurable antenna-based solutions, because RIS devices are expected to serve as infrastructure elements in future 6G networks. The results provide guidance for the secure design of RIS-assisted PLKG systems and suggest that the controllable characteristics of RIS should be used for both performance improvement and security protection.
Genetic-algorithm-optimized All-metal Metasurface for Cross-band Stealth via Low-cost Computer Numerical Control Fabrication
ZHANG Ming, ZHANG Najiao, LI Jialei, LI Kang, Vazgen MELIKYAN, YANG Lin, HOU Weimin
Available online  , doi: 10.11999/JEIT251080
Abstract:
  Objective  Traditional electromagnetic stealth materials face the practical challenge of achieving both microwave absorption and infrared stealth. Conventional solutions, including geometric optimization and multilayer composite coatings, often suffer from narrow bandwidth, complex fabrication, and limited cross-band compatibility. This study proposes a genetic algorithm-optimized all-metal random coding metasurface that enables concurrent broadband Radar Cross Section (RCS) reduction and low infrared emissivity on a monolithic metallic platform, thereby addressing these practical limitations.  Methods  Monolithic all-metal C-shaped resonant units are employed. The design is based on the Pancharatnam-Berry geometric phase, in which the reflection phase is regulated by the rotation angle of the unit. Coding schemes of 2-bit, 3-bit, and 4-bit are implemented, corresponding to 4, 8, and 16 discrete phase states. A MATLAB-CST co-simulation framework is established. CST extracts unit responses using the Finite Element Method (FEM), whereas MATLAB applies a genetic algorithm to optimize the phase distribution for scattering energy diffusion. All-metal metasurface prototypes (150×150 mm2, 10×10 array) are fabricated using Computer Numerical Control(CNC) cutting.  Results and Discussions  Genetic algorithm optimization converges within 6~8 generations. Increasing the number of coding bits enhances phase randomness. The 4-bit metasurface achieves an average 10 dB RCS reduction over 11\begin{document}$ \sim $\end{document}18.4 GHz. Simulation results agree with anechoic chamber measurements under oblique incidence angles from 0° to 60°. Infrared imaging confirms the low emissivity of the metallic surface. Compared with conventional composite or multilayer structures, the all-metal design simplifies fabrication, prevents interfacial mismatch, and improves structural stability. The metasurface demonstrates broadband, wide-angle, and cross-band stealth performance.  Conclusions  This study presents a genetic algorithm-optimized all-metal random coding metasurface that achieves cross-band stealth compatibility. The design addresses the persistent challenge of realizing both microwave performance and thermal management in conventional stealth materials. Three main technical contributions are demonstrated. (1)The monolithic copper structure provides greater than 99.9% infrared reflectivity in the 8\begin{document}$ \sim $\end{document}14 μm band, verified by FLIR imaging, and achieves an average 10 dB RCS reduction over 11\begin{document}$ \sim $\end{document}18.4 GHz. (2)The single-material configuration removes the risk of delamination. The CNC-fabricated prototype maintains structural integrity under 60° oblique incidence and reduces fabrication cost by approximately 78% compared with lithographic processing. (3)The co-simulation optimization framework converges within eight generations for 4-bit coding, enabling broadband scattering manipulation over 7.4 GHz. The proposed metasurface combines fabrication reliability, cost efficiency, and dual-band stealth capability. These characteristics provide a practical basis for large-scale deployment in military stealth systems and satellite platforms that require multispectral concealment and long-term structural durability.
Research on the Architecture of Dual-field Reconfigurable Polynomial Multiplication Unit for Lattice-based Post-quantum Cryptography
CHEN Tao, ZHAO Wangpeng, BIE Mengni, LI Wei, NAN Longmei, DU Yiran, FU Qiuxing
Available online  , doi: 10.11999/JEIT250929
Abstract:
  Objective  Polynomial multiplication accounts for more than 80% of the computational time in lattice cryptography algorithms. The Number Theoretic Transform (NTT) and Fast Fourier Transform (FFT) reduce the computational complexity of polynomial multiplication from exponential to logarithmic order. However, mainstream lattice cryptography algorithms, including Kyber, Dilithium, and Falcon, differ considerably in their parameter sets and polynomial multiplication implementations. To support polynomial multiplication under multiple parameter configurations and improve resource utilization, a dual-field reconfigurable polynomial multiplication unit architecture is proposed.  Methods  First, the computational network for polynomial multiplication is extracted according to the parameter characteristics of Kyber, Dilithium, and Falcon. The internal dual-field multiplication operations are optimized at the algorithm level. Next, a dual-field reconfigurable polynomial multiplication unit architecture is designed for the polynomial multiplication network. The dual-field reconfigurable multiplication unit is further optimized to improve computational speed. Finally, a parallelism analysis is conducted to improve resource utilization of the computational architecture. The proposed architecture achieves the highest area efficiency when supporting 1-lane 64 bit, 2-lane 32 bit, or 4-lane 16 bit operations.  Results and Discussions  The architecture is experimentally validated on the Xilinx FPGA XC7V2000TFLG1925. It simultaneously supports one channel of complex-form floating-point operations or two channels of 17\begin{document}$ \sim $\end{document}32 bit internal NTT operations and four channels of 16 bit internal NTT operations. At an operating frequency of 169 MHz, the architecture reduces the area-time product by more than 50%.  Conclusions  The proposed dual-field reconfigurable processing unit architecture provides advantages in scalability, area efficiency, and core unit performance. Its configurable bit-width design adapts more easily to traditional cryptographic processors and provides a practical approach for migrating conventional public-key cryptosystems to post-quantum cryptography.
Delay Deterministic Routing Algorithm Based on Inter-controller Cooperation for Multi-layer Low Earth Orbit Satellite Networks
HUANG Longhui, DING Xiaojin, ZHANG Gengxin
Available online  , doi: 10.11999/JEIT251100
Abstract:
Objective The massive scale and large number of satellites in multi-layer Low Earth Orbit (LEO) constellations produce highly dynamic network topologies. Coupled with time-varying traffic loads, this condition causes temporal fluctuations in satellite network resources, such as available link queue size and link bandwidth. These variations make it difficult to establish stable end-to-end transmission paths and guarantee Quality of Service (QoS). To address this problem, Software-Defined Networking (SDN) is applied to multi-layer LEO constellations. SDN controllers collect network state information and enable unified management of network resources. The constellation is divided into multiple regions, with a controller deployed in each region to coordinate the operation of the constellation. A deterministic delay routing algorithm is designed within the SDN controller to compute inter-region transmission paths for traffic and satisfy deterministic delay requirements. Methods A deterministic delay routing algorithm based on controller cooperation is proposed for multi-layer LEO constellations. First, a regional division strategy and controller deployment scheme are designed. The satellite network is partitioned into multiple regions, each managed by a designated controller. Second, criteria are defined for Inter-Satellite Links (ISLs) between satellites within the same layer and across different layers to characterize link communication states. Third, a Time-Varying Graph (TVG) model represents the network topology and link resource attributes, including bandwidth, queue size, and link duration. This model is combined with a multi-destination Lagrange relaxation method to optimize path selection. The resulting paths satisfy both delay and delay jitter constraints. Adjacent regional controllers exchange network state information to support cooperative computation of feasible inter-region transmission paths. Results and Discussions To evaluate the proposed method, a simulation system for multi-layer LEO constellations was developed. The performance of the algorithm was tested under different data transmission rates. Compared with IUDR, the proposed method improves network performance by reducing end-to-end delay, delay jitter, and packet loss rate, and by increasing throughput. At a data transmission rate of 3 Mbit/(s·Hz), the average end-to-end delay is reduced by 16.0% (Fig. 3(a)), delay jitter by 37.9% (Fig. 3(b)), and packet loss rate by 37.2% (Fig. 3(c)). Throughput increases by approximately 2% (Fig. 3(d)). In terms of signaling overhead, the proposed algorithm achieves a higher Reduction-Improvement Gain Ratio, which increases by approximately 111.8% compared with IUDR. This result indicates superior overall performance of the DDRA-ICC. Additionally, the proposed method shows lower time complexity for route computation than IUDR. Conclusions To address deterministic delay requirements for traffic transmission in multi-layer LEO constellations, a controller cooperation-based deterministic delay routing algorithm is proposed. Performance evaluation under different load conditions shows that: (1) Compared with IUDR, the proposed algorithm reduces the average end-to-end delay, delay jitter, and packet loss rate by 16.0%, 37.9%, and 37.2%, respectively, and increases the average throughput by approximately 2%. (2) Although the additional overhead of DDRA-ICC is comparable to that of IUDR, the packet loss rate decreases further to 2.96%, representing a reduction of 52.49%, and the Reduction-Improvement Gain Ratio reaches 1.97. These results indicate lower packet loss, a higher Reduction-Improvement Gain Ratio, and a better balance between signaling overhead and reliability. Therefore, the proposed method provides advantages in ensuring deterministic traffic transmission. Future work may consider additional practical factors, such as satellite node failures and their effects on network performance, to further improve system capability.
A Complexity-Reduced Active Interference Cancellation Algorithm in f-OFDM
CHEN Hao, WEN Jiangang, ZOU Yuanping, HUA Jingyu, SHENG Bin
Available online  , doi: 10.11999/JEIT251172
Abstract:
  Objective  Due to spectrum scarcity and diverse communication requirements, a waveform technology with high spectral efficiency, flexible subband configuration, and support for asynchronous communication is required for Sixth Generation mobile communication (6G). Among the candidate waveforms, filtered Orthogonal Frequency Division Multiplexing (f-OFDM) is considered a promising solution that satisfies these requirements. By applying subband filtering, f-OFDM enables flexible subband configuration and asynchronous transmission. However, the filtering mechanism inevitably introduces intrinsic interference into the system. A dominant component of this interference is InTer-subBand Interference (ITBI), which is mainly caused by Out-Of-Band Emission (OOBE) leakage from adjacent subbands. Therefore, suppressing subband OOBE is essential for reducing ITBI and improving the performance of f-OFDM systems. Based on the structure of f-OFDM systems, a Complexity-Reduced Active Interference Cancellation (CRAIC) algorithm is proposed to suppress the OOBE of f-OFDM subbands and improve overall system performance.  Methods  First, based on the spectral structure of f-OFDM, a subset of data subcarriers in the target subband is used to generate Cancellation Carriers (CCs). A CRAIC optimization model for f-OFDM systems is then constructed under the constraint of CC power. The cost function is defined according to the superposed spectrum of data subcarriers and CCs at Desired Frequency Points (DFPs). Second, by introducing a real-complex domain transformation and reformulating the optimization model, the original complex-domain CRAIC programming problem is converted into a real-domain Second-Order Cone Programming (SOCP) problem, which enables efficient computation. Furthermore, computer simulations evaluate the effects of key parameters on CRAIC performance, including the number of CCs (\begin{document}$ M $\end{document}), the number of data subcarriers used to generate CCs (\begin{document}$ K $\end{document}), and the number of DFPs (\begin{document}$ Q $\end{document}). Based on these evaluations, practical recommendations are provided for configuring CRAIC parameters in f-OFDM systems.  Results and Discussions  Simulation results show that in the edge region of the adjacent subband, the proposed CRAIC algorithm produces the steepest Power Spectral Density (PSD) roll-off compared with the conventional ZP and Origin schemes. This result indicates that CRAIC provides the strongest ITBI suppression in this region and achieves the lowest Bit Error Rate (BER) for Edge Subcarriers (ESs) in the adjacent subband. Specifically, CRAIC achieves a maximum PSD reduction of 4 dB and 12 dB compared with ZP and Origin, respectively (Fig. 2a). This result occurs because the right Q/2 DFPs are largely located in the edge region of SB2, which leads to effective spectral suppression in this area. Therefore, the BER at the edge of SB2 is significantly lower for CRAIC than for Origin, and a visible performance improvement is also observed compared with ZP (Fig. 3a). Furthermore, the effects of key parameters \begin{document}$ M $\end{document}, \begin{document}$ K $\end{document} and \begin{document}$ Q $\end{document} are examined through simulations. The results show that increasing \begin{document}$ M $\end{document} continuously improves OOBE suppression capability (Fig. 4a), although spectral efficiency gradually decreases. In contrast, increasing \begin{document}$ K $\end{document} and \begin{document}$ Q $\end{document} produces only limited performance improvement. When these parameters exceed certain values, further increases do not provide additional gains (Fig. 5a and Fig. 6a). Based on these observations, \begin{document}$ M=4 $\end{document}, \begin{document}$ K=8 $\end{document}, \begin{document}$ Q=4 $\end{document} are selected as typical parameter settings for the scenario considered in this study. Under this configuration, CRAIC (\begin{document}$ K=8 $\end{document}) achieves significant improvements in ES BER compared with Origin and ZP (Fig. 8a), whereas the BER of Internal Subcarriers (ISs) remains nearly the same as that of the two benchmark schemes (Fig. 8b). Compared with the full-scale CRAIC scheme (\begin{document}$ K=20 $\end{document}), CRAIC (\begin{document}$ K=8 $\end{document}) reduces the size of the data-subcarrier mapping matrix by 60% while causing only limited BER degradation (Fig. 8a). These results indicate that the proposed algorithm preserves the performance of the full-scale Active Interference Cancellation (AIC) scheme while substantially reducing computational complexity.  Conclusions  A CRAIC algorithm for filtered OFDM systems is studied. The CRAIC optimization model is constructed under the constraint of CC power, and the cost function is defined based on the superposed spectrum of selected data subcarriers and CCs at DFPs. Through real-imaginary domain conversion and model reformulation, the complex-domain optimization problem is converted into a real-domain SOCP problem. Simulation results show that the CRAIC algorithm effectively reduces the PSD of the target subband, particularly in the transition region of the adjacent subband, which leads to clear improvement in edge BER performance. The effects of key parameters are also evaluated. Increasing \begin{document}$ M $\end{document} increases the performance gain of CRAIC compared with ZP, although spectral efficiency decreases. Increasing \begin{document}$ K $\end{document} improves OOBE suppression, although the gain gradually decreases and computational complexity increases. Increasing \begin{document}$ Q $\end{document} does not continuously reduce PSD. Overall, the CRAIC algorithm improves subband isolation in f-OFDM systems, reduces ITBI, and improves system performance.
Research on Time Slots Aggregation and Topology Aggregation Model for Unmanned Aerial Vehicle Swarm Overall Time Synchronization
WANG Zhenling, TAO Haihong, WEI Haitao, WANG Zhengyong
Available online  , doi: 10.11999/JEIT251274
Abstract:
  Objective  Unmanned Aerial Vehicle (UAV) swarms overcome the technical and performance limitations of individual UAVs and enable complex missions that cannot be accomplished by a single platform. High-precision time synchronization among swarm nodes serves as a fundamental requirement for key swarm operations, including resource scheduling, cooperative positioning, and multi-node data fusion. Existing research on UAV time synchronization mainly focuses on improving the accuracy of basic synchronization approaches. However, limitations remain in adapting to topological changes during swarm formation flights and in achieving global synchronization among multiple nodes. As the scale of UAV swarms increases, the connectivity of time-comparison links between nodes during formation flights exhibits clear time-varying characteristics. These characteristics create challenges for maintaining continuous, reliable, and precise overall time synchronization. To address stable formation flight and formation transformation scenarios in different mission stages of UAV swarms, an Observation Time Slots Aggregation (OTSA) model and a Time-Varying Topology Aggregation (TVTA) model are proposed to enhance the robustness of global time synchronization among swarm nodes and to improve Time Synchronization Accuracy (TSA). This study proposes an effective solution for Leader-Following Consistency Time Synchronization (LFCTS) in UAV swarms and provides references for time synchronization applications in heterogeneous and distributed systems.  Methods  Compared with the traditional Quasi Real-time Bidirectional Time Comparison (QRBTC) scheme, the time synchronization method based on the OTSA model fully uses all synchronization signal transmission and reception link resources within each time slot of the system synchronization period. Based on the “one transmission and multiple receptions” mechanism of all nodes, the Follower Node (FN) performs direct synchronization or single-hop indirect synchronization with the Leader Node (LN) in each time slot according to the OTSA model. This process produces tens of times more clock-skew observation samples than the traditional QRBTC scheme. The OTSA method improves the robustness of global time synchronization. It also enables secondary data processing using multi-slot synchronization samples, which further improves TSA compared with the QRBTC method. Based on the LFCTS results obtained during the system signal synchronization period, the TVTA model extends the direct comparison and single-hop indirect comparison mechanism of the OTSA model to cross-period multi-hop comparison. This extension addresses overall time synchronization instability caused by the time-varying characteristics of synchronization link relationships during UAV swarm takeoff, assembly, and formation transformation.  Results and Discussions  In the OTSA method, all time-comparison link resources of the total time slots are fully used during the synchronization period (Fig. 2). Based on the constructed error model and simulation analysis, for a UAV swarm with 50 nodes and a time slot allocation of 20 ms, time synchronization using the OTSA model achieves a single-slot TSA of 4.10~4.27 ns (Fig. 6). Within a complete time synchronization period, the overall TSA reaches 2.46~2.56 ns, which is better than the QRBTC scheme under the same conditions (Fig. 5(a)). The TVTA method uses cross-period synchronization comparison relationships to construct multi-hop time comparison links (Fig. 3 and 4). When the FN obtains external comparison relationships of other nodes through aggregation processing, one-way or two-way Dijkstra’s algorithm is applied to determine the multi-hop comparison link with optimal connectivity. Time tracing and comparison with the LN are then completed through edge computing. Error analysis indicates that during UAV swarm takeoff, assembly, and transitions to triangle or rhombus formations, time synchronization based on the TVTA model achieves an overall TSA better than 8.6 ns, which provides stronger global time synchronization capability.  Conclusions  This study addresses the robustness of time synchronization in UAV swarm formation flights. For stable formation flight and formation transformation scenarios during different mission stages, the OTSA and TVTA models are proposed. An error model is constructed and performance is analyzed. The results show the following. (1) The OTSA model improves the robustness of overall time synchronization through direct comparison and single-hop indirect comparison across multiple time slots within one synchronization period. The model achieves an overall TSA better than 2.56 ns and performs better than the traditional QRBTC method. (2) The TVTA model achieves overall UAV swarm time synchronization through multi-hop relay between nodes. Even when time-comparison links change, the model maintains global TSA better than 8.6 ns. (3) These two methods consider the time-varying characteristics of comparison links among UAV swarm nodes and have been verified through small-scale UAV swarm flight tests. They maintain synchronization robustness and performance and provide necessary support for coordinated UAV swarm operations. Future work will focus on practical flight verification, adaptation in complex scenarios, and further improvement of overall synchronization accuracy.
Communication, Computation, and Caching Resource Collaboration for Heterogeneous Artificial Intelligence Generated Content Service Provisioning
WU Mengru, GAO Yu, ZHAO Bo, XU Bo, SUN Hao, GUO Lei
Available online  , doi: 10.11999/JEIT251300
Abstract:
  Objective  In the Artificial Intelligence of Things (AIoT), Edge Servers (ESs) provide intelligent content generation services to AIoT devices by utilizing cached Artificial Intelligence Generated Content (AIGC) models. However, the limited computing resources and caching capacity of ESs make it difficult to support the large-scale caching demands of heterogeneous AIGC services. To address this issue, a communication, computation, and caching resource collaboration scheme is proposed based on a combined cloud-edge and edge-edge collaborative framework. The scheme considers three representative AIGC services: lightweight AIGC services, computation-intensive AIGC services, and preprocessing-based AIGC services. The objective is to minimize the total AIGC service latency through joint optimization of transmit power, computing resource allocation, model caching strategies, and offloading decisions.  Methods  Communication, computation, and caching resource collaboration for heterogeneous AIGC services is investigated. First, an AIGC service-oriented AIoT system model is established to incorporate both cloud-edge and edge-edge collaboration. An optimization problem is then formulated to minimize the total latency of AIGC services through joint optimization of transmit power, computing resource allocation, model caching strategies, and offloading decisions. Because the formulated problem is non-convex, an Alternating Optimization (AO) algorithm is proposed. The original problem is decomposed into three subproblems. These subproblems are solved using the Successive Convex Approximation (SCA) method, Karush-Kuhn-Tucker (KKT) conditions, and an improved Harris Hawks Optimization (HHO) algorithm.  Results and Discussions  Simulation experiments compare the proposed joint optimization scheme with three baseline methods: Particle Swarm Optimization (PSO), fixed resource allocation, and random offloading and caching. First, the convergence of the proposed AO algorithm is verified (Fig. 2). The results show that the algorithm converges rapidly within a limited number of iterations across different subproblems. Second, increasing transmission bandwidth significantly reduces the total AIGC service latency (Fig. 3). This occurs because each device obtains more bandwidth resources for task transmission, and the ES can allocate more bandwidth to deliver generated content in the downlink. Furthermore, the total AIGC service latency decreases as the ES storage capacity increases for all schemes (Fig. 4). Greater storage capacity enables the ES to store more AIGC models, which reduces the transmission delay between the ES and the cloud server. Moreover, when the required floating-point operations per bit increase, the total AIGC service latency rises significantly across all schemes (Fig. 5). Finally, the total AIGC service latency decreases as the maximum transmit power of the Base Station (BS) increases (Fig. 6). This occurs because higher BS transmit power improves the downlink signal-to-noise ratio, which increases the downlink transmission rate and reduces overall service latency. The proposed scheme demonstrates better performance than the baseline schemes, particularly under high computational demand.  Conclusions  Communication, computation, and caching resource collaboration for heterogeneous AIGC services is investigated. The objective is to minimize total AIGC service latency through joint optimization of the transmit power of AIoT devices and BSs, computing resource allocation, AIGC model deployment, and service offloading decisions under computation and caching resource constraints. Because the formulated problem is a mixed-integer nonlinear programming problem, an efficient AO algorithm is developed. The original optimization problem is decomposed into three subproblems, which are solved using the SCA algorithm, KKT conditions, and the HHO algorithm, respectively. Simulation results show that the proposed algorithm reduces the total AIGC service latency compared with the baseline schemes.
SAR Saturated Interference Suppression Method Guided by Precise Saturation Model
DUAN Lunhao, LU Xingyu, TAN Ke, LIU Yushuang, YANG Jianchao, YU Jing, GU Hong
Available online  , doi: 10.11999/JEIT251283
Abstract:
  Objective  With the increasing number of electromagnetic devices, Synthetic Aperture Radar (SAR) is highly susceptible to Radio Frequency Interference (RFI) within the same frequency band. RFI typically appears as bright streaks in SAR images and severely degrades image quality. Considerable research has been conducted on interference suppression, and many effective methods have been proposed. However, most existing approaches do not consider the nonlinear saturation of interfered echoes. In practical scenarios, the interference power is usually high, and the gain controller in the SAR receiver cannot effectively regulate the amplitude of interfered echoes. Therefore, the input signal amplitude of the Analog-to-Digital Converter (ADC) exceeds its dynamic range. This condition drives the SAR receiver into saturation and leads to nonlinear distortion in the interfered echoes. Such phenomena have been observed in multiple SAR systems. Documented cases include receiver saturation in the LuTan-1 satellite and several airborne SAR platforms. Analyses of SAR data further confirm the presence of saturated interference in systems such as Sentinel-1, Gaofen-3, and other spaceborne SAR platforms. After saturation occurs, the echo spectrum exhibits spurious components and spectral artifacts. These effects cause a mismatch between existing suppression methods and the actual characteristics of saturated interference. Therefore, many current methods cannot effectively mitigate this type of interference. Moreover, accurate models that precisely describe the output components of saturated interfered echoes remain limited. To address these issues, a precise analytical model for saturated interference is established, and an effective saturated interference suppression method is proposed based on this model.  Methods  Based on the processing of the basic saturation model, a mathematical model is first developed to accurately characterize the output components of saturated interference. The accuracy of the model in describing amplitude and phase is validated through simulations. A detailed analysis of the output components of interfered echoes under saturation conditions is also conducted. Compared with the one-bit sampling model and the traditional tanh saturation model, the proposed model provides higher accuracy in describing amplitude information. In addition, the model is not limited by the sampling bit width of ADCs and can theoretically be extended to describe saturation outputs in other radar receivers. Based on the observation that harmonic phases can be expressed as a linear combination of the phases of the original signal components, and by exploiting the high-power characteristic of the interference fundamental harmonic, a saturated interference suppression method is proposed. First, because the interference fundamental harmonic has relatively high power, it is extracted using eigen-subspace decomposition. Then, based on harmonic phase relationships, the extracted interference fundamental harmonic, and the SAR transmitted signal, various interference harmonics are systematically constructed. These include higher-order interference harmonics, target harmonics, and intermodulation harmonics, which together form a complete dictionary. Finally, a sparse optimization problem is solved to achieve separation and suppression of saturated interference. The effectiveness of the proposed method is verified using measured Gaofen-3 data.  Results and Discussions  Experiments are conducted using both simulated and measured data to verify the effectiveness of the proposed method in suppressing saturated interference. For simulated data, the proposed method completely removes interference stripes in the SAR image (Fig. 7). Analysis of the time-frequency spectra of the processed echoes (Fig. 8 and Fig. 9) shows that traditional methods cannot effectively eliminate higher-order harmonics. Thus, the proposed method improves the Target-to-Background Ratio (TBR) by 1.76 dB and achieves the lowest Root Mean Square Error (RMSE) of 0.078 3 (Table 3). For the measured Gaofen-3 data, analysis of the processed images and the time-frequency spectra of echoes confirms that the proposed method effectively suppresses interference. Conventional methods still exhibit residual interference in the processed results (Fig. 10 and Fig. 11).  Conclusions  With the increasing deployment of electromagnetic devices, SAR systems are increasingly susceptible to in-band interference. High-power interference can drive the SAR receiver into saturation and cause nonlinear distortion, which reduces the effectiveness of traditional interference suppression methods. To address this issue, a model that precisely characterizes the saturated output components of interfered echoes is established. Based on this model, an interference suppression method for saturated interference is proposed. Simulation and experimental results show that the model accurately describes saturation behavior and that the proposed method effectively suppresses saturated interference.
A Class of Double-twisted Generalized Reed-Solomon Codes and Their Extended Codes
CHENG Hongli, ZHU Shixin
Available online  , doi: 10.11999/JEIT251045
Abstract:
  Objective  Twisted Generalized Reed-Solomon (TGRS) codes have attracted considerable attention in coding theory due to their flexible structural properties. However, studies on their extended codes remain limited. Existing results indicate that only a small number of works examine extended TGRS codes, leaving gaps in the understanding of their error-correcting capability, duality properties, and applications. In addition, previously proposed parity-check matrix forms for TGRS codes lack clarity and do not cover all parameter ranges. In particular, the case h = 0 is not addressed, which limits applicability in scenarios requiring diverse parameter settings. Constructing non-Generalized Reed-Solomon (non-GRS) codes is of interest because such codes resist Sidelnikov-Shestakov and Wieschebrink attacks, whereas GRS codes are vulnerable. Maximum Distance Separable (MDS) codes, self-orthogonal codes, and almost self-dual codes are valued for their error-correcting efficiency and structural properties. MDS codes achieve the Singleton bound and are essential for distributed storage systems that require data reliability under node failures. Self-orthogonal and almost self-dual codes, due to their duality structures, are applied in quantum coding, secret sharing schemes, and secure multi-party computation. Accordingly, this paper aims to: (1) characterize the MDS and Almost MDS (AMDS) properties of double-twisted GRS codes \begin{document}$ {C}_{k,\boldsymbol{h},\boldsymbol{\eta }}(\boldsymbol{\alpha },\boldsymbol{v}) $\end{document} and their extended codes \begin{document}$ {C}_{k,\boldsymbol{h},\boldsymbol{\eta }}(\boldsymbol{\alpha },\boldsymbol{v},{\boldsymbol{\infty}}) $\end{document}; (2) derive explicit and unified parity-check matrices for all valid parameter ranges, including h = 0; (3) establish non-GRS properties under specific parameter conditions; (4) provide necessary and sufficient conditions for self-orthogonality of the extended codes and almost self-duality of the original codes; and (5) construct a class of almost self-dual double-twisted GRS codes with flexible parameters for secure and reliable communication systems.  Methods   The study is based on algebraic coding theory and finite field methods. Explicit parity-check matrices are derived using properties of polynomial rings over \begin{document}$ {F}_{q} $\end{document}, Vandermonde matrix structures, and polynomial interpolation. The Schur product method is applied to determine non-GRS properties by comparing the dimensions of the Schur squares of the codes and their duals with those of GRS codes. Linear algebra and combinatorial techniques are used to characterize MDS and AMDS properties. Conditions are obtained by analyzing the nonsingularity of generator-matrix submatrices and solving systems involving symmetric sums of finite field elements. These conditions are expressed using the sets \begin{document}$ {S}_{k}(\boldsymbol{\alpha },\boldsymbol{\eta }) $\end{document},\begin{document}$ {L}_{k}(\boldsymbol{\alpha },\boldsymbol{\eta }) $\end{document}, and \begin{document}$ {D}_{k}(\boldsymbol{\alpha },\boldsymbol{\eta }) $\end{document}. Duality theory is used to study orthogonality. A code C is self-orthogonal if \begin{document}$ C\subseteq {C}^{\bot } $\end{document} and its generator matrix satisfies \begin{document}$ {\boldsymbol{G}}{{\boldsymbol{G}}}^{\rm T}=\boldsymbol{O} $\end{document}. For almost self-dual codes with odd length and dimension-(n-1)/2, this condition is combined with the structure of the dual code and symmetric sum relations of αi to obtain necessary and sufficient conditions.  Results and Discussions   For MDS and AMDS properties, the following results are obtained. The extended double-twisted GRS code \begin{document}$ {C}_{k,\boldsymbol{h},\boldsymbol{\eta }}(\boldsymbol{\alpha },\boldsymbol{v},{\boldsymbol{\infty}}) $\end{document} is MDS if and only if \begin{document}$ 1\notin {S}_{k}(\boldsymbol{\alpha },\boldsymbol{\eta }) $\end{document} and \begin{document}$ 1\notin {L}_{k}(\boldsymbol{\alpha },\boldsymbol{\eta }) $\end{document}. The double-twisted GRS code \begin{document}$ {C}_{k,\boldsymbol{h},\boldsymbol{\eta }}(\boldsymbol{\alpha },\boldsymbol{v}) $\end{document} is AMDS if and only if \begin{document}$ 1\in {S}_{k}(\boldsymbol{\alpha },\boldsymbol{\eta }) $\end{document} and \begin{document}$ (0,1)\notin {D}_{k}(\boldsymbol{\alpha },\boldsymbol{\eta }) $\end{document}. The code \begin{document}$ {C}_{k,\boldsymbol{h},\boldsymbol{\eta }}(\boldsymbol{\alpha },\boldsymbol{v}) $\end{document}\begin{document}$ (0,1)\in {D}_{k}(\boldsymbol{\alpha },\boldsymbol{\eta }) $\end{document}. Unified parity-check matrices of \begin{document}$ {C}_{k,\boldsymbol{h},\boldsymbol{\eta }}(\boldsymbol{\alpha },\boldsymbol{v}) $\end{document} and \begin{document}$ {C}_{k,\boldsymbol{h},\boldsymbol{\eta }}(\boldsymbol{\alpha },\boldsymbol{v},{\boldsymbol{\infty}}) $\end{document} are derived for all \begin{document}$ 0\leq h\leq k-1 $\end{document}, removing previous restrictions that exclude h = 0. For non-GRS properties, when \begin{document}$ k\geq 4 $\end{document} and \begin{document}$ n-k\geq 4 $\end{document}, both \begin{document}$ {C}_{k,\boldsymbol{h},\boldsymbol{\eta }}(\boldsymbol{\alpha },\boldsymbol{v}) $\end{document} and its extended code \begin{document}$ {C}_{k,\boldsymbol{h},\boldsymbol{\eta }}(\boldsymbol{\alpha },\boldsymbol{v},{\boldsymbol{\infty}}) $\end{document} are non-GRS for both \begin{document}$ 2k\geq n $\end{document} or \begin{document}$ 2k \lt n $\end{document}. This conclusion follows from the fact that the dimensions of their Schur squares exceed those of the corresponding GRS codes, which ensures resistance to Sidelnikov-Shestakov and Wieschebrink attacks. Regarding orthogonality, the extended code \begin{document}$ {C}_{k,\boldsymbol{h},\boldsymbol{\eta }}(\boldsymbol{\alpha },\boldsymbol{v},{\boldsymbol{\infty}}) $\end{document} with \begin{document}$ h=k-1 $\end{document} is self-orthogonal under specific algebraic conditions. The code \begin{document}$ {C}_{k,\boldsymbol{h},\boldsymbol{\eta }}(\boldsymbol{\alpha },\boldsymbol{v}) $\end{document} with \begin{document}$ h=k-1 $\end{document} and \begin{document}$ n=2k+1 $\end{document} is almost self-dual if and only if there exists \begin{document}$ \lambda \in F_{q}^{*} $\end{document} such that \begin{document}$ \lambda {u}_{j}=v_{j}^{2} (j=1,2,\cdots ,2k+1) $\end{document} together with a symmetric sum condition on \begin{document}$ {\alpha }_{i} $\end{document} involving \begin{document}$ {\eta }_{1} $\end{document} and \begin{document}$ {\eta }_{2} $\end{document}. For odd prime power \begin{document}$ q $\end{document}, an almost self-dual code with parameters \begin{document}$ [q-t-1,(q-t-2)/2,\geq (q-t-2)/2] $\end{document} is constructed using the roots of \begin{document}$ m(x)=({x}^{q}-x)/f(x) $\end{document} where \begin{document}$ f(x)={x}^{t+1}-x $\end{document}. An example over \begin{document}$ {F}_{11} $\end{document} yields a \begin{document}$ [5,2,\geq 2] $\end{document} code.  Conclusions   The study advances the theory of double-twisted GRS codes and their extensions through five contributions: (1) complete characterization of MDS and AMDS properties using sets \begin{document}$ {S}_{k} $\end{document},\begin{document}$ {L}_{k} $\end{document},\begin{document}$ {D}_{k} $\end{document}; (2) unified parity-check matrices for all \begin{document}$ 0\leq h\leq k-1 $\end{document}; (3) non-GRS properties are established for \begin{document}$ k\geq 4 $\end{document}, ensuring resistance to known structural attacks; (4) necessary and sufficient conditions for self-orthogonal extended codes and almost self-dual original codes are obtained; (5) a flexible construction of almost self-dual double-twisted GRS codes is proposed. These results extend the theoretical understanding of TGRS-type codes and support the design of secure and reliable coding systems.
Tri-Frequency Wearable Antenna Loaded with Artificial Magnetic Conductors
JIN Bin, ZHANG Jialin, DU Chengzhu, CHU Jun
Available online  , doi: 10.11999/JEIT251050
Abstract:
  Objective   The rapid advancement of 5G communication technology has expanded the use of antennas in aviation, radar, medical, and other wireless systems. Wearable antennas have gained attention because of their conformability. Loading an Artificial Magnetic Conductor (AMC) is an effective way to enhance wearable antenna performance. This method increases gain, improves the Front-to-Back Ratio (FBR), and provides radiation isolation between the antenna and the human body. This study presents a tri-band wearable antenna loaded with an AMC for the ISM band and 5G frequency bands.  Methods   A trident-structured tri-band monopole antenna operating at 2.5 GHz, 3.5 GHz, and 5.8 GHz is designed together with a ring-shaped tri-band AMC tuned to the same frequency bands. Both structures use semi-flexible Rogers 4003 substrate. A 4×5 AMC array is placed on the back of the antenna to form a wearable integrated antenna. Simulation, physical measurement, and human safety assessment are performed.  Results and Discussions  The integrated antenna shows simulated operating bands of 2.40\begin{document}$ \sim $\end{document}2.50 GHz, 3.15\begin{document}$ \sim $\end{document}3.80 GHz, and 5.56\begin{document}$ \sim $\end{document}6.02 GHz, and measured bands of 2.38\begin{document}$ \sim $\end{document}2.52 GHz, 3.30\begin{document}$ \sim $\end{document}3.86 GHz, and 5.54\begin{document}$ \sim $\end{document}7.86 GHz (Fig. 4). These bands cover the ISM scientific band (2.40\begin{document}$ \sim $\end{document}2.4835 GHz), the 5G-n78 band (3.3\begin{document}$ \sim $\end{document}3.8 GHz), and the 5G-WiFi 5.8 GHz band (5.725\begin{document}$ \sim $\end{document}5.875 GHz). Measured gains at 2.4 GHz, 3.5 GHz, and 5.8 GHz increase by 5.3 dB, 4.6 dB, and 2.2 dB compared with the unloaded state (Fig. 15). The FBR values reach 20.8 dB, 18.0 dB, and 18.8 dB, corresponding to improvements of 19.8 dB, 16.7 dB, and 12.4 dB relative to the unloaded AMC (Table 4). The AMC reflector reduces the Specific Absorption Rate (SAR), and all integrated antennas show SAR values below 0.025 W/kg/g (Table 6), well below the FCC and ETSI limits. Performance is also measured when the antenna is placed on the chest, back, and thigh (Fig. 16), confirming safe and flexible on-body use.  Conclusions   A tri-band wearable antenna incorporating an AMC array is developed using semi-flexible Rogers 4003 substrate. With a 4×5 AMC array integrated behind the antenna, the measured operating bands cover the ISM scientific band (2.40\begin{document}$ \sim $\end{document}2.483 5 GHz), the 5G-n78 band (3.3\begin{document}$ \sim $\end{document}3.8 GHz), and the 5G-WiFi 5.8 GHz band (5.725\begin{document}$ \sim $\end{document}5.875 GHz). The results confirm high gain, high FBR, and stable wearable performance suitable for human-worn devices.
Auxiliary Screening for Hypertrophic Cardiomyopathy With Heart Failure with Preserved Ejection Fraction Utilizing Smartphone-Acquired Heart Sound Analysis
DONG Xianpeng, MENG Xiangbin, ZHANG Kuo, FANG Guanchen, GAI Weihao, WANG Wenyao, WANG Jingjia, GAO Jun, PAN Junjun, TANG Zhenchao, SONG Zhen
Available online  , doi: 10.11999/JEIT250830
Abstract:
  Objective  Heart Failure with preserved Ejection Fraction (HFpEF) is highly prevalent among patients with Hypertrophic CardioMyopathy (HCM), and early identification is critical for improving disease management. However, early screening for HFpEF remains challenging because symptoms are non-specific, diagnostic procedures are complex, and follow-up costs are high. Smartphones, owing to their wide accessibility, low cost, and portability, provide a feasible means to support heart sound-based screening. In this study, smartphone-acquired heart sounds from patients with HCM are used to develop and train an ensemble learning classification model for early detection and dynamic self-monitoring of HFpEF in the HCM population.  Methods  The proposed HFpEF screening framework consists of three components: preprocessing, feature extraction, and model training and fusion based on ensemble learning (Fig. 1). During preprocessing, smartphone-acquired heart sounds are subjected to bandpass filtering and wavelet denoising to improve signal quality, followed by segmentation into individual cardiac cycles. For feature extraction, Mel-Frequency Cepstral Coefficients (MFCCs) and Short-Time Fourier Transform (STFT) time-frequency spectra are calculated (Fig. 3). For classification, a stacking ensemble strategy is applied. Base learners, including a Support Vector Machine (SVM) and a Convolutional Neural Network (CNN), are trained, and their predicted probabilities are combined to construct a new feature space. A Logistic Regression (LR) meta-learner is then trained on this feature space to identify HFpEF in patients with HCM.  Results and Discussions  The classification performance of the three models is evaluated using the same patient-level independent test set. The SVM base learner achieves an Area Under the Curve (AUC) of 0.800, with an accuracy of 0.766, sensitivity of 0.659, and specificity of 0.865 (Table 5). The CNN base learner attains an AUC of 0.850, with an accuracy of 0.789, sensitivity of 0.622, and specificity of 0.944 (Table 5). By comparison, the ensemble-based LR classifier demonstrates superior performance, reaching an AUC of 0.900, with an accuracy of 0.813, sensitivity of 0.768, and specificity of 0.854 (Table 5). Relative to the base learners, the ensemble model exhibits a significant overall performance improvement after probability-based feature fusion (Fig. 5). Compared with existing clinical HFpEF risk scores, the proposed method shows higher predictive performance and stronger dynamic monitoring capability, supporting its suitability for risk stratification and follow-up warning in home settings. Compared with professional heart sound acquisition devices, the smartphone-acquired approach provides greater accessibility and cost efficiency, supporting its application in auxiliary HFpEF screening for high-risk HCM populations.  Conclusions  The challenges of clinical HFpEF screening in patients with HCM are addressed by proposing a smartphone-acquired heart sound analysis approach combined with an ensemble learning prediction model, resulting in an accessible and easily implemented auxiliary screening pipeline. The effectiveness of smartphone-based heart sound analysis for initial HFpEF screening in patients with HCM is validated, demonstrating its feasibility as an economical auxiliary tool for early HFpEF detection. This approach provides a non-invasive, convenient, and efficient screening strategy for patients with HCM complicated by HFpEF.
Cross Modal Hashing of Medical Image Semantic Mining for Large Language Model
LIU Qinghai, WU Qianlin, LUO Jia, TANG Lun, XU Liming
Available online  , doi: 10.11999/JEIT250529
Abstract:
  Objective  A novel cross-modal hashing framework driven by Large Language Models (LLMs) is proposed to address the semantic misalignment between medical images and their corresponding textual reports. The objective is to enhance cross-modal semantic representation and improve retrieval accuracy by effectively mining and matching semantic associations between modalities.  Methods  The generative capacity of LLMs is first leveraged to produce high-quality textual descriptions of medical images. These descriptions are integrated with diagnostic reports and structured clinical data using a dual-stream semantic enhancement module, designed to reinforce inter-modality alignment and improve semantic comprehension. A structural similarity-guided hashing scheme is then developed to encode both visual and textual features into a unified Hamming space, ensuring semantic consistency and enabling efficient retrieval. To further enhance semantic alignment, a prompt-driven attention template is introduced to fuse image and text features through fine-tuned LLMs. Finally, a contrastive loss function with hard negative mining is employed to improve representation discrimination and retrieval accuracy.  Results and Discussions  Experiments are conducted on a multimodal medical dataset to compare the proposed method with existing cross-modal hashing baselines. The results indicate that the proposed method significantly outperforms baseline models in terms of precision and Mean Average Precision (MAP) (Table 3; Table 4). On average, a 7.21% improvement in retrieval accuracy and a 7.72% increase in MAP are achieved across multiple data scales, confirming the effectiveness of the LLM-driven semantic mining and hashing approach.  Conclusions  Experiments are conducted on a multimodal medical dataset to compare the proposed method with existing cross-modal hashing baselines. The results indicate that the proposed method significantly outperforms baseline models in terms of precision and Mean Average Precision (MAP) (Table 3; Table 4). On average, a 7.21% improvement in retrieval accuracy and a 7.72% increase in MAP are achieved across multiple data scales, confirming the effectiveness of the LLM-driven semantic mining and hashing approach.
Comparison of DeepSeek-V3.1 and ChatGPT-5 in Multidisciplinary Team Decision-making for Colorectal Liver Metastases
ZHANG Yangzi, XU Ting, GAO Zhaoya, SI Zhenduo, XU Weiran
Available online  , doi: 10.11999/JEIT250849
Abstract:
  Objective   ColoRectal Cancer (CRC) is the third most commonly diagnosed malignancy worldwide. Approximately 25~50% of patients with CRC develop liver metastases during the course of their disease, which increases the disease burden. Although the MultiDisciplinary Team (MDT) model improves survival in ColoRectal Liver Metastases (CRLM), its broader implementation is limited by delayed knowledge updates and regional differences in medical standards. Large Language Models (LLMs) can integrate multimodal data, clinical guidelines, and recent research findings, and can generate structured diagnostic and therapeutic recommendations. These features suggest potential to support MDT-based care. However, the actual effectiveness of LLMs in MDT decision-making for CRLM has not been systematically evaluated. This study assesses the performance of DeepSeek-V3.1 and ChatGPT-5 in supporting MDT decisions for CRLM and examines the consistency of their recommendations with MDT expert consensus. The findings provide evidence-based guidance and identify directions for optimizing LLM applications in clinical practice.  Methods   Six representative virtual CRLM cases are designed to capture key clinical dimensions, including colorectal tumor recurrence risk, resectability of liver metastases, genetic mutation profiles (e.g., KRAS/BRAF mutations, HER2 amplification status, and microsatellite instability), and patient functional status. Using a structured prompt strategy, MDT treatment recommendations are generated separately by the DeepSeek-V3.1 and ChatGPT-5 models. Independent evaluations are conducted by four MDT specialists from gastrointestinal oncology, gastrointestinal surgery, hepatobiliary surgery, and radiation oncology. The model outputs are scored using a 5-point Likert scale across seven dimensions: accuracy, comprehensiveness, frontier relevance, clarity, individualization, hallucination risk, and ethical safety. Statistical analysis is performed to compare the performance of DeepSeek-V3.1 and ChatGPT-5 across individual cases, evaluation dimensions, and clinical disciplines.  Results and Discussions   Both LLMs, DeepSeek-V3.1 and ChatGPT-5, show robust performance across all six virtual CRLM cases, with an average overall score of ≥ 4.0 on a 5-point scale. This performance indicates that clinically acceptable decision support is provided within a complex MDT framework. DeepSeek-V3.1 shows superior overall performance compared with ChatGPT-5 (4.27±0.77 vs. 4.08±0.86, P=0.03). Case-by-case analysis shows that DeepSeek-V3.1 performs significantly better in Cases 1, 4, and 6 (P=0.04, P<0.01, and P =0.01, respectively), whereas ChatGPT-5 receives higher scores in Case 2 (P<0.01). No significant differences are observed in Cases 3 and 5 (P=0.12 and P=1.00, respectively), suggesting complementary strengths across clinical scenarios (Table 3). In the multidimensional assessment, both models receive high scores (range: 4.12\begin{document}$ \sim $\end{document}4.87) in clarity, individualization, hallucination risk, and ethical safety, confirming that readable, patient-tailored, reliable, and ethically sound recommendations are generated. Improvements are still needed in accuracy, comprehensiveness, and frontier relevance (Fig. 1). DeepSeek-V3.1 shows a significant advantage in frontier relevance (3.90±0.65 vs. 3.24±0.72, P=0.03) and ethical safety (4.87±0.34 vs. 4.58±0.65, P= 0.03) (Table 4), indicating more effective incorporation of recent evidence and more consistent delivery of ethically robust guidance. For the case with concomitant BRAF V600E and KRAS G12D mutations, DeepSeek-V3.1 accurately references a phase III randomized controlled study published in the New England Journal of Medicine in 2025 and recommends a triple regimen consisting of a BRAF inhibitor + EGFR monoclonal antibody + FOLFOX. By contrast, ChatGPT-5 follows conventional recommendations for RAS/BRAF mutant populations-FOLFOXIRI+bevacizumab-without integrating recent evidence on targeted combination therapy. This difference shows the effect of timely knowledge updates on the clinical value of LLM-generated recommendations. For MSI-H CRLM, ChatGPT-5’s recommendation of “postoperative immunotherapy” is not supported by phase III evidence or existing guidelines. Direct use of such recommendations may lead to overtreatment or ineffective therapy, representing a clear ethical concern and illustrating hallucination risks in LLMs. Discipline-specific analysis shows notable variation. In radiation oncology, DeepSeek-V3.1 provides significantly more precise guidance on treatment timing, dosage, and techniques than ChatGPT-5 (4.55±0.67 vs. 3.38±0.91, P<0.01), demonstrating closer alignment with clinical guidelines. In contrast, ChatGPT-5 performs better in gastrointestinal surgery (4.48±0.67 vs. 4.17 ±0.85, P=0.02), with experts rating its recommendations on surgical timing and resectability as more concise and accurate. No significant differences are identified in gastrointestinal oncology and hepatobiliary surgery (P=0.89 and P=0.14, respectively), indicating comparable performance in these areas (Table 5). These findings show a performance bias across medical sub-specialties, demonstrating that LLM effectiveness depends on the distribution and quality of training data.  Conclusions   Both DeepSeek-V3.1 and ChatGPT-5 demonstrated strong capabilities in providing reliable recommendations for CRLM-MDT decision-making. Specifically, DeepSeek-V3.1 showed notable advantages in integrating cutting-edge knowledge, ensuring ethical safety, and performing in the field of radiation oncology, whereas ChatGPT-5 excelled in gastrointestinal surgery, reflecting a complementary strength between the two models. This study confirms the feasibility of leveraging LLMs as “MDT collaborators”, offering a readily applicable and robust technical solution to bridge regional disparities in clinical expertise and enhance the efficiency of decision-making. However, model hallucination and insufficient evidence grading remain key limitations. Moving forward, mechanisms such as real-world clinical validation, evidence traceability, and reinforcement learning from human feedback are expected to further advance LLMs into more powerful auxiliary tools for CRLM-MDT decision support.
Research on ECG Pathological Signal Classification Empowered by Diffusion Generative Data
GE Beining, CHEN Nuo, JIN Peng, SU Xin, LU Xiaochun
Available online  , doi: 10.11999/JEIT250404
Abstract:
  Objective  ElectroCardioGram (ECG) signals are key indicators of human health. However, their complex composition and diverse features make visual recognition prone to errors. This study proposes a classification algorithm for ECG pathological signals based on data generation. A Diffusion Generative Network (DGN), also known as a diffusion model, progressively adds noise to real ECG signals until they approach a noise distribution, thereby facilitating model processing. To improve generation speed and reduce memory usage, a Knowledge Distillation-Diffusion Generative Network (KD-DGN) is proposed, which demonstrates superior memory efficiency and generation performance compared with the traditional DGN. This work compares the memory usage, generation efficiency, and classification accuracy of DGN and KD-DGN, and analyzes the characteristics of the generated data after lightweight processing. In addition, the classification effects of the original MIT-BIH dataset and an extended dataset (MIT-BIH-PLUS) are evaluated. Experimental results show that convolutional networks extract richer feature information from the extended dataset generated by DGN, leading to improved recognition performance of ECG pathological signals.  Methods  The generative network-based ECG signal generation algorithm is designed to enhance the performance of convolutional networks in ECG signal classification. The process begins with a Gaussian noise-based image perturbation algorithm, which obscures the original ECG data by introducing controlled randomness. This step simulates real-world variability, enabling the model to learn more robust representations. A diffusion generative algorithm is then applied to reconstruct and reproduce the data, generating synthetic ECG signals that preserve the essential characteristics of the original categories despite the added noise. This reconstruction ensures that the underlying features of ECG signals are retained, allowing the convolutional network to extract more informative features during classification. To improve efficiency, the approach incorporates knowledge distillation. A teacher-student framework is adopted in which a lightweight student model is trained from the original, more complex teacher ECG data generation model. This strategy reduces computational requirements and accelerates the data generation process, improving suitability for practical applications. Finally, two comparative experiments are designed to validate the effectiveness and accuracy of the proposed method. These experiments evaluate classification performance against existing approaches and provide quantitative evidence of its advantages in ECG signal processing.  Results and Discussions  The data generation algorithm yields ECG signals with a Signal-to-Noise Ratio (SNR) comparable to that of the original data, while presenting more discernible signal features. The student model constructed through knowledge distillation produces ECG samples with the same SNR as those generated by the teacher model, but with substantially reduced complexity. Specifically, the student model achieves a 50% reduction in size, 37.5% lower memory usage, and a 57% shorter runtime compared with the teacher model (Fig. 6). When the convolutional network is trained with data generated by the KD-DGN, its classification performance improves across all metrics compared with a convolutional network trained without KD-DGN. Precision reaches 95.7%, and the misidentification rate is reduced to approximately 3% (Fig. 9).  Conclusions  The DGN provides an effective data generation strategy for addressing the scarcity of ECG datasets. By supplying additional synthetic data, it enables convolutional networks to extract more diverse class-specific features, thereby improving recognition performance and reducing misidentification rates. Optimizing DGN with knowledge distillation further enhances efficiency, while maintaining SNR equivalence with the original DGN. This optimization reduces computational cost, conserves machine resources, and supports simultaneous task execution. Moreover, it enables the generation of new data without LOSS, allowing convolutional networks to learn from larger datasets at lower cost. Overall, the proposed approach markedly improves the classification performance of convolutional networks on ECG signals. Future work will focus on further algorithmic optimization for real-world applications.
A Study on Lightweight Method of TCM Structured Large Model Based on Memory-Constrained Pruning
LU Jiafa, TANG Kai, ZHANG Guoming, YU Xiaofan, GU Wenqi, LI Zhuo
Available online  , doi: 10.11999/JEIT250909
Abstract:
  Objective  The structuring of Traditional Chinese Medicine (TCM) Electronic Medical Records (EMRs) is essential for knowledge discovery, clinical decision support, and intelligent diagnosis. However, two major barriers remain. First, TCM EMRs are primarily unstructured free text and often paired with tongue images, which complicates automated processing. Second, grassroots hospitals usually have limited GPU resources, which restricts the deployment of large pretrained models. This study aims to address these challenges by proposing a lightweight multimodal model based on memory-constrained pruning. The method is designed to preserve near–state-of-the-art accuracy while sharply reducing memory consumption and computation cost, ensuring practical use in resource-limited healthcare settings.  Methods  A three-stage architecture is used, comprising an encoder, a multimodal fusion module, and a decoder. For text, a distilled TinyBERT encoder is combined with a BiLSTM-CRF decoder to extract 23 categories of TCM clinical entities, including symptoms, syndromes, prescriptions, and herbs. For images, a ResNet-50 encoder processes tongue diagnosis photographs. A memory-constrained pruning strategy is introduced in which an LSTM decision network observes convolutional feature maps and adaptively prunes redundant channels while retaining key diagnostic information. Gradient reparameterization and dynamic channel grouping improve pruning flexibility, and a reinforcement-learning controller stabilizes training. INT8 mixed-precision quantization, gradient accumulation, and Dynamic Batch Pruning (DBP) further reduce memory usage. A TCM terminology-enhanced lexicon is integrated into the encoder embeddings to improve recognition of rare entities. The system is trained end-to-end on paired EMR–tongue datasets (Fig. 1) to optimize multimodal information flow.  Results and Discussions  Experiments are performed on 10,500 de-identified EMRs paired with tongue images from 21 tertiary hospitals. On an RTX 3060 GPU, the model achieves an F1-score of 91.7%, reduces peak GPU memory to 3.8 GB, and reaches an inference speed of 22 records per second (Table 1). Compared with BERT-Large, memory consumption decreases by 75%, throughput increases 2.7×, and accuracy remains comparable. Ablation studies confirm the contributions of each component. The adaptive attention gating mechanism increases F1 by 2.8% (Table 2). DBP reduces memory usage by 40%–62% with minimal accuracy loss and improves performance on EMRs exceeding 5,000 characters (Fig. 2). The terminology-enhanced lexicon improves recognition of rare entities such as “blood stasis” by 6.2%. Structured EMR fields also support association rule mining, and the confidence of syndrome–symptom relationships increases by 18% (Algorithm 1). These findings highlight three observations: (1) multimodal fusion with lightweight design provides clinical advantages over unimodal models; (2) memory-constrained pruning achieves stable channel reduction under strict hardware limits and outperforms magnitude-based pruning; and (3) pruning, quantization, and dynamic batching show strong synergy when jointly designed. The results support the feasibility of deploying high-performing TCM EMR structuring systems in real-world environments with limited computational capacity.  Conclusions  This study proposes a lightweight multimodal framework for structuring TCM EMRs. Memory-constrained pruning, combined with quantization and DBP, substantially compresses the visual encoder while maintaining text–image fusion accuracy. The approach reaches near–state-of-the-art performance with sharply reduced hardware requirements, enabling deployment in regional hospitals and clinics. Beyond efficiency gains, the structured multimodal outputs enhance TCM knowledge graphs and improve downstream tasks such as syndrome classification and treatment recommendation. The framework narrows the gap between powerful pretrained models and limited hardware resources in grassroots institutions and provides a scalable direction for lightweight multimodal NLP in medical informatics. Future work includes integrating modalities such as pulse-wave signals, extending pruning strategies with graph neural networks, and exploring adaptive cross-modal attention to strengthen clinical applicability.
Unsupervised 3D Medical Image Segmentation With Sparse Radiation Measurement
YU Xiaofan, ZOU Lanlan, GU Wenqi, CAI Jun, KANG Bin, DING Kang
Available online  , doi: 10.11999/JEIT250841
Abstract:
  Objective  Three-dimensional medical image segmentation is a central task in medical image analysis. Compared with two-dimensional imaging, it captures organ and lesion morphology more completely and provides detailed structural information, supporting early disease screening, personalized surgical planning, and treatment assessment. With advances in artificial intelligence, three-dimensional segmentation is viewed as a key technique for diagnostic support, precision therapy, and intraoperative navigation. However, methods such as SwinUNETR-v2 and UNETR++ depend on extensive voxel-level annotations, which create high annotation costs and restrict clinical use. High-quality segmentation also often requires multi-view projections to recover full volumetric information, increasing radiation exposure and patient burden. Segmentation under sparse radiation measurements is therefore an important challenge. Neural Attenuation Fields (NAF) have recently been introduced for low-dose reconstruction by recovering linear attenuation coefficient fields from sparse views, yet their suitability for three-dimensional segmentation remains insufficiently examined. To address this limitation, a unified framework termed NA-SAM3D is proposed, integrating NAF-based reconstruction with interactive segmentation to enable unsupervised three-dimensional segmentation under sparse-view conditions, reduce annotation dependence, and improve boundary perception.  Methods  The framework is designed in two stages. In the first stage, sparse-view reconstruction is performed with NAF to generate a continuous three-dimensional attenuation coefficient tensor from sparse X-ray projections. Ray sampling and positional encoding are applied to arbitrary three-dimensional points, and the encoded features are forwarded to a Multi-Layer Perceptron (MLP) to predict linear attenuation coefficients that serve as input for segmentation. In the second stage, interactive segmentation is performed. A three-dimensional image encoder extracts high-dimensional features from the attenuation coefficient tensor, and clinician-provided point prompts specify regions of interest. These prompts are embedded into semantic features by an interactive user module and fused with image features to guide the mask decoder in producing initial masks. Because point prompts provide only local positional cues, boundary ambiguity and mask expansion may occur. To address these issues, a Density-Guided Module (DGM) is introduced at the decoder output stage. NAF-derived attenuation coefficients are transformed into a density-aware attention map, which is fused with the initial masks to strengthen tissue-boundary perception and improve segmentation accuracy in complex anatomical regions.  Results and Discussions  NA-SAM3D is evaluated on a self-constructed colorectal cancer dataset comprising 299 patient cases (collected in collaboration with Nanjing Hospital of Traditional Chinese Medicine) and on two public benchmarks: the Lung CT Segmentation Challenge (LCTSC) and the Liver Tumor Segmentation Challenge (LiTS). The results show that NA-SAM3D achieves overall better performance than mainstream unsupervised three-dimensional segmentation methods based on full radiation observation (SAM-MED series) and reaches accuracy comparable to, or in some cases higher than, the fully supervised SwinUNETR-v2. Compared with SAM-MED3D, NA-SAM3D increases the Dice on the LCTSC dataset by more than 3%, while HD95 and ASD decrease by 5.29 mm and 1.32 mm, respectively, indicating improved boundary localization and surface consistency. Compared with the sparse-field-based method SA3D, NA-SAM3D achieves higher Dice scores on all three datasets (Table 1). Compared with the fully supervised SwinUNETR-v2, NA-SAM3D reduces HD95 by 1.28 mm, and the average Dice is only 0.3% lower. Compared with SA3D, NA-SAM3D increases the average Dice by about 6.6% and reduces HD95 by about 11 mm, further confirming its capacity to restore structural details and boundary information under sparse-view conditions (Table 2). Although the overall performance remains slightly lower than that of the fully supervised UNETR++ model, NA-SAM3D still shows strong competitiveness and good generalization under label-free inference. Qualitative analysis shows that in complex pelvic and intestinal regions, NA-SAM3D produces clearer boundaries and higher contour consistency (Fig. 3). On public datasets, segmentation of the lung and liver also shows superior boundary localization and contour integrity (Fig. 4). Three-dimensional visualization further confirms that in colorectal, lung, and liver regions, NA-SAM3D achieves stronger structural continuity and boundary preservation than SAM-MED2D and SAM-MED3D (Fig. 5). The DGM further enhances boundary sensitivity, increasing Dice and mIoU by 1.20% and 3.31% on the self-constructed dataset, and by 4.49 and 2.39 percentage points on the LiTS dataset (Fig. 6).  Conclusions  An unsupervised three-dimensional medical image segmentation framework, NA-SAM3D, is presented, integrating NAF-based reconstruction with interactive segmentation to achieve high-precision segmentation under sparse radiation measurements. The DGM effectively uses attenuation coefficient priors to enhance boundary recognition in complex lesion regions. Experimental results show that the framework approaches the performance of fully supervised methods under unsupervised inference and yields an average Dice improvement of 2.0%, indicating strong practical value and clinical potential for low-dose imaging and complex anatomical segmentation. Future work will refine the model for additional anatomical regions and assess its practical use in preoperative planning.
Dynamic Wavelet Multi-Directional Perception and Geometry Axis-Solution Guided 3D CT Fracture Image Segmentation
ZHANG Yinhui, LIU Kai, HE Zifen, ZHANG Jinkai, CHEN Guangchen, MA Zhijian
Available online  , doi: 10.11999/JEIT250732
Abstract:
  Objective  Accurate segmentation of fracture surfaces in three-dimensional computed tomography (3D CT) images is essential for orthopedic surgical planning, particularly for determining nail insertion angles perpendicular to fracture planes. However, existing approaches present three major limitations: limited capture of deep global volumetric context, directional texture ambiguity in low-contrast fracture regions, and insufficient decoding of geometric features. To address these limitations, a Dynamic Wavelet Multi-Directional Perception and Geometry Axis-Solution Guided Network (DWAG-Net) is proposed to improve segmentation accuracy for complex tibial fractures and to provide reliable 3D digital guidance for preoperative planning.  Methods  The proposed architecture extends 3D nnU-Netv2 through three core components. First, a Dynamic Multi-View Aggregation (DMVA) module adaptively fuses tri-planar views (axial, sagittal, and coronal) with full-volume features using learnable parameter interpolation with an optimized kernel size of 2×2×2 and a channel-wise Hadamard product, thereby strengthening global context representation. Second, a Wavelet Direction Perception Enhancement (WDPE) module applies a 3D Symlets discrete wavelet transform to decompose inputs into eight subbands, followed by direction-specific enhancement. Adaptive convolutional kernels (e.g., [5, 3, 3] for depth-dominant fractures), reinforce texture information in high-frequency subbands, whereas cross-subband fusion integrates complementary features. Third, a Geometry Axis-Solution Guided (GASG) module is embedded in the decoder to maintain anatomical consistency by constructing axis-level affinity maps along depth, height, and width that combine geometric similarity with spatial distance decay, and by refining boundary delineation using rotational positional encoding and multi-axis attention. The network is trained on the YN-TFS dataset, which contains 110 tibial fracture CT scans with spatial resolutions ranging from 0.39 to 1.00 mm. Stochastic gradient descent is used with a learning rate of 0.01 and a momentum of 0.99. A class-weighted loss function with weights of 0.5 for background, 1 for bone, and 5 for fracture is adopted to address severe pixel imbalance.  Results and Discussions  DWAG-Net achieves state-of-the-art performance, with a mean Dice score of 71.20% (Table 1), exceeding that of nnU-Netv2 by 5.06%. For fracture surfaces, the Dice score reaches 69.48%, corresponding to an improvement of 7.12%. Boundary accuracy improves significantly, with a mean 95th percentile Hausdorff distance (HD95) of 1.38 mm and a fracture surface HD95 of 1.54 mm, representing a reduction of 3.70 mm. Ablation studies (Table 2) confirm the contribution of each component. DMVA increases the Dice score by 2.40% through adaptive multi-view fusion. WDPE reduces directional ambiguity and yields a 5.84% gain in fracture surface Dice. GASG provides an additional 1.20% improvement by enforcing geometric consistency. Optimal performance is obtained with a DMVA kernel size of 2×2×2, the use of Symlets wavelets, and sequential axis processing in the order of depth, height, and width. Qualitative comparisons indicate that DWAG-Net preserves fracture continuity in cases where U-Mamba and nnWNet fail, and reduces over-segmentation relative to nnFormer and UNETR++. (Fig. 4).  Conclusions  DWAG-Net establishes a state-of-the-art framework for 3D fracture segmentation by integrating multi-directional wavelet perception with geometry-guided decoding. The coordinated use of DMVA, directional texture enhancement, and geometry axis-solution guidance achieves clinically relevant precision, with a Dice score of 71.20% and an HD95 of 1.38 mm. These results support accurate data-driven surgical planning. Future work will focus on refining loss design to further mitigate severe class imbalance.
An Optimized Multi-Layer Equivalent Source Method for Spatial Continuation of Magnetic Anomalies in the Geomagnetic Background
GUAN Yu, ZHANG Huiqiang
Available online  , doi: 10.11999/JEIT250958
Abstract:
  Objective  Spatial continuation of magnetic anomalies is a key technique in potential field data processing and supports geological interpretation and geomagnetic navigation. Existing methods remain limited: frequency-domain approaches are severely ill-posed and amplify high-frequency noise during downward continuation, whereas traditional single-layer equivalent source methods often fail to fit multi-scale anomalies generated by sources at different depths. Although the Multilayer Equivalent Source (MES) model improves depth resolution, its performance is constrained by subjective parameter selection and instability in large-scale inversion, which can lead to the loss of high-frequency structural information. This study proposes an optimized MES method for high-precision continuation in complex geological environments. The method establishes an objective parameterization scheme by combining Radially Averaged Power Spectrum (RAPS) analysis with Variational Mode Decomposition (VMD) to separate sources. It also introduces a collaborative inversion scheme based on the Fungal Growth Optimizer (FGO) and the Preconditioned Conjugate Gradient (PCG) method to adaptively optimize regularization parameters, suppress ill-posedness, and improve reconstruction robustness under noise.  Methods  A four-step technical framework is developed. (1) Model construction: A Multi-layer Equivalent Source (MES) model is formed using uniformly magnetized rectangular prisms to represent subsurface sources. (2) Parameter configuration: An objective scheme combining RAPS and VMD is applied. RAPS estimates average source-layer depths from slope variations in the logarithmic power spectrum. VMD then decomposes the magnetic signal into intrinsic mode functions representing different depths, enabling calculation of layer thickness using the ratio of the Mean Total Horizontal Gradient (MTHD). (3) Collaborative inversion: A robust inversion strategy incorporates FGO into the PCG algorithm. Tikhonov regularization forms the objective function to mitigate ill-posedness, and FGO adaptively searches for optimal hyperparameters, including the regularization parameter, step-size scaling factor, and preconditioner weights, improving solution stability and convergence efficiency. (4) Comprehensive validation: Three evaluations are conducted. A five-prism theoretical model is used to benchmark performance against single-layer, double-layer, and frequency-domain methods. The global EMAG2 magnetic anomaly model with 5% Gaussian noise is applied to assess robustness. Finally, real aeromagnetic data from the Australian magnetic anomaly grid are tested in two sub-regions—a complex tectonic zone (Area A) and a sedimentary basin (Area B)—for downward continuation from 2 000 m to 0 m, using RMSE and GOF as indicators.  Results and Discussions  The performance of the proposed method is validated in three stages. (1) Theoretical model verification: The radial average logarithmic power spectrum (Fig. 3) and VMD analysis (Fig. 4) identify three equivalent source layers, demonstrating the objectivity of the parameter configuration framework. The FGO-optimized inversion accelerates convergence by approximately 5~6 times and reduces the residual norm by 13% compared with the traditional Conjugate Gradient (CG) method (Fig. 7). In the 100 m upward continuation (Fig. 8, Table 4) and downward continuation (Fig. 9, Table 5) tests, the proposed method attains the lowest RMSE and highest GOF, addressing the ill-posedness of frequency-domain methods and the large fitting errors of single- and double-layer models. (2) Robustness analysis: Using the EMAG2 data (Fig. 10), the method demonstrates strong noise resistance. With 5% Gaussian noise added to the 1 000 m observation data, the downward continuation results remain stable and free of noticeable artifacts. Quantitative evaluation (Table 6) yields an RMSE of 7.36 nT and a GOF of 82.65%, confirming robustness in low signal-to-noise conditions. (3) Generalization verification: When applied to Australian magnetic anomaly grid data, two different geological regions are examined (Fig. 11, Fig. 12). In Area B (sedimentary basin), which has smooth gradients, the method achieves high-fidelity reconstruction with a GOF of 84.28% and an RMSE of 29.06 nT. In Area A (complex tectonic zone), despite the exponential decay of high-frequency signals, the method recovers key structural features (GOF = 76.14%), although localized residuals appear in high-gradient areas because of physical limits in field transformation. These findings support the method’s applicability across varied geological textures.  Conclusions  This study proposes a robust spatial continuation method for magnetic anomalies based on an optimized MES framework. By integrating RAPS analysis with VMD, the method establishes an objective parameterization scheme that reduces subjectivity in model construction. The incorporation of the FGO into the inversion algorithm improves convergence speed and stability, mitigating the ill-posedness inherent in downward continuation. Experimental results show that: (1) the method exhibits strong robustness, maintaining high signal fidelity under 5% Gaussian noise, as confirmed by the EMAG2 model tests; and (2) the method has broad geological applicability. In real Australian aeromagnetic grid data, it achieves high-precision reconstruction in deep sedimentary basins (Area B) and recovers major structural features in complex tectonic zones (Area A), outperforming traditional single-layer and frequency-domain methods. A remaining limitation is high memory demand due to storage of large dense kernel matrices. Future work will explore matrix compression or matrix-free inversion strategies to improve computational efficiency for large-scale geomagnetic data processing.
Federated Semi-Supervised Image Segmentation with Dynamic Client Selection
LIU Zhenbing, LI Huanlan, WANG Baoyuan, LU Haoxiang, PAN Xipeng
Available online  , doi: 10.11999/JEIT250834
Abstract:
  Objective  Multicenter validation is a growing requirement in clinical research, yet strict privacy regulations, heterogeneous cross-institutional data distributions, and scarce pixel-level annotations limit the use of conventional centralized medical image segmentation models. This study develops a federated semi-supervised framework that uses labeled and unlabeled prostate MRI data from multiple hospitals, considers dynamic client participation and Non-Independent and Identically Distributed (Non-IID) data, and aims to improve segmentation accuracy and robustness under real-world constraints.  Methods  A cross-silo Federated Semi-Supervised Learning (FSSL) paradigm is used. Clients with pixel-wise annotations act as labeled clients, and those without annotations act as unlabeled clients. Each client maintains a local student network for prostate segmentation. On unlabeled clients, a teacher network with the same architecture is updated using the exponential moving average of student parameters and generates perturbed pseudo-labels to supervise the student through a hybrid consistency loss that combines Dice and binary cross-entropy terms. To reduce the effect of heterogeneous and low-quality updates, a performance-driven dynamic client selection and aggregation strategy is applied. At each communication round, clients are evaluated on their local validation sets, and only those whose Dice scores exceed a threshold are retained. A top-K subset is then aggregated with normalized contribution weights derived from validation Dice, with bounds to avoid gradient vanishing and single-client dominance. For unlabeled clients, a penalty factor down-weights unreliable pseudo-labeled updates. The segmentation backbone is a Multi-scale Feature Fusion U-Net (MFF-UNet). Starting from a standard encoder–decoder U-Net, an FPN-like pyramid is added to the encoder, where multi-level feature maps are channel-aligned using 1×1 convolutions, fused in a top–down pathway through upsampling and element-wise addition, and refined using 3×3 convolutions. The decoder upsamples these fused features and combines them with encoder features through skip connections, enabling joint modeling of global semantics and fine-grained boundaries. The framework is evaluated on T2-weighted prostate MRI from six centers, comprising three labeled and three unlabeled clients. All 3D volumes are resampled, sliced into 2D axial images, resized, and augmented. The Dice coefficient and 95th percentile Hausdorff distance (HD95) are used as evaluation metrics.  Results and Discussions  On the six-center dataset, the method achieves average Dice scores of 0.840 5 on labeled clients and 0.786 8 on unlabeled clients, with corresponding HD95 values of 8.04 and 8.67 pixels. These results are superior to or comparable with several representative federated semi-supervised or mixed-supervision methods, with the largest gains on distribution-shifted unlabeled centers. Visualization shows that the method generates more complete and smoother prostate contours with fewer false positives in low-contrast or small-volume cases than the baselines. Attention heatmaps from the final decoder layer indicate that UNet exhibits attention drift, SegMamba produces diffuse responses, and nnU-Net shows weak activations for small lesions, whereas MFF-UNet focuses more precisely on the prostate region with stable high responses, indicating improved discriminative capability and interpretability.  Conclusions  A federated semi-supervised prostate MRI segmentation framework that integrates teacher–student consistency learning, multi-scale feature fusion, and performance-driven dynamic client selection is presented. The method preserves privacy by keeping data local, reduces annotation scarcity by using unlabeled clients, and addresses client heterogeneity through reliability-aware aggregation. Experiments on a six-center dataset show that the framework achieves competitive or superior overlap and boundary accuracy compared with state-of-the-art federated semi-supervised methods, particularly on distribution-shifted unlabeled centers. The framework is model-agnostic and can be applied to other organs, imaging modalities, and cross-institutional segmentation tasks under strict privacy and regulatory constraints.
Split-architecture Non-contact Optical Seismocardiography Triggering System for Cardiac Magnetic Resonance Imaging
GAO Qiannan, ZHANG Jiayu, ZHU Yingen, WANG Wenjin, JI Jiansong, JI Xiaoyue
Available online  , doi: 10.11999/JEIT251098
Abstract:
  Objective  Cardiac-cycle synchronization is required in Cardiovascular Magnetic Resonance (CMR) to reduce motion artifacts and preserve quantitative accuracy. At high field strengths, the ElectroCardioGram (ECG) trigger is affected by magnetohydrodynamic effects and scanner-generated ElectroMagnetic Interference (EMI). Electrode placement and lead routing add setup burden. Contact-based mechanical sensors still require skin contact, and optical photoplethysmography introduces long physiological delay. A fully contactless and EMI-robust mechanical surrogate is therefore needed. This study develops a split-architecture, non-contact optical SeismoCardioGraphy (SCG) triggering system for CMR and evaluates its availability, beatwise detection performance, and timing characteristics under practical body-coil coverage.  Methods  The split-architecture system consists of a near-magnet optical acquisition unit and a far-magnet computation-and-triggering unit connected by fiber-optic links to minimize conductive pathways near the scanner (Fig. 2). The acquisition unit uses a defocused industrial camera and laser illumination to record speckle-pattern dynamics on the anterior chest without physical contact (Fig. 3). Dense optical flow is computed in a chest region of interest, and the displacement field is projected onto a principal motion direction to form a one-dimensional SCG sequence (Fig. 4). Drift suppression, smoothing, and short-window normalization are applied. Trigger timing is refined with a valley-constrained gradient search within a physiologically bounded window to reduce spurious detections and improve temporal consistency (Fig. 4). A benchmark dataset is acquired from 20 healthy volunteers under three coil configurations: no body coil, an ultra-flexible body coil, and a rigid body coil (Fig. 5, Fig. 6, Table 3). ECG serves as the reference, and CamPPG and radar are recorded for comparison. Beatwise precision, recall, and F1 score are computed against ECG R peaks, and availability is reported as the fraction of usable segments under unified quality criteria (Table 4). Backward and forward physiological delays and delay variability are summarized across subjects and coil conditions (Table 5, Table 6). Key windowing and refractory parameters are tested for sensitivity (Table 2). Runtime is measured to assess real-time feasibility, including the cost of dense optical flow and the overhead of one-dimensional processing and triggering (Table 7).  Results and Discussions  Under no-coil and ultra-flexible-coil conditions, the optical SCG trigger achieves high availability (about 97.6%) and strong beatwise performance. F1 reaches about 0.91 under the ultra-flexible coil (Table 4, Table 5). The backward physiological delay remains on the order of several tens of milliseconds, and delay jitter is generally within a few tens of milliseconds (Table 5, Table 6). Under the rigid body coil, performance decreases markedly. Mechanical decoupling between the coil surface and the chest wall weakens and distorts the vibration signature, which blurs AO-related features and increases false triggers (Fig. 1). This effect appears as lower precision and F1 and as a shift toward longer and more variable delays compared with the other conditions (Table 4, Table 6). Compared with CamPPG, which reflects peripheral blood-volume dynamics and typically lags further behind the ECG R peak, the optical SCG surrogate provides a more proximal mechanical marker with reduced trigger phase lag (Fig. 8, Table 5). EMI robustness is supported by representative segments: ECG waveforms show visible distortion under interference, whereas the optical SCG surrogate remains interpretable because acquisition and transmission near the scanner are fully optical and electrically isolated (Fig. 8). Parameter analysis supports a moderate processing window and a 0.5 s minimum interbeat interval as a stable choice across subjects (Table 2). Runtime analysis shows that dense optical flow dominates computational cost, whereas one-dimensional processing and triggering add little overhead. Throughput exceeds the acquisition frame rate, supporting real-time triggering (Table 7).  Conclusions  A split-architecture, non-contact optical SCG triggering system is developed and validated under three representative body-coil configurations. Fiber-optic separation between near-magnet acquisition and far-magnet processing improves EMI robustness while maintaining real-time trigger output. High availability, strong beatwise performance, and short physiological delay are demonstrated under no-coil and ultra-flexible-coil conditions (Table 4, Table 5). Rigid-coil coverage exposes a clear limitation caused by reduced mechanical coupling, which motivates further optimization for mechanically decoupled or heavily occluded scenarios (Fig. 1, Table 6).
Progress in Modeling Cardiac Myocyte Calcium Cycling and Investigating Arrhythmia Mechanisms: A Study Focused on the Ryanodine Receptor
GAO Ying, ZHANG Yucheng, WANG Wenyao, SU Xuanyi, SONG Zhen
Available online  , doi: 10.11999/JEIT250957
Abstract:
  Significance   The Ryanodine Receptor (RyR) is a central regulator of intracellular calcium (Ca2+) homeostasis in cardiomyocytes through its control of Ca2+ release from the Sarcoplasmic Reticulum (SR). Abnormal RyR activity, including excessive activation or impaired gating, is a key mechanism underlying Early Afterdepolarizations (EADs) and Delayed Afterdepolarizations (DADs), thereby increasing arrhythmia risk. The coupling between membrane electrophysiology and Ca2+ cycling in cardiomyocytes depends on spatially organized and rapidly evolving processes that are difficult to resolve experimentally. Conventional approaches, including animal models and pharmacological interventions, are constrained by high cost and limited control of experimental variables. Mathematical modeling and computer simulation of the RyR have therefore become essential tools for studying RyR regulation under physiological and pathological conditions and for elucidating arrhythmogenic mechanisms. This review provides an overview of RyR biology and modeling. It first summarizes structural features and core functional properties to establish the mechanistic basis of RyR gating and regulation. It then evaluates current and emerging modeling approaches, outlining their strengths and limitations. The review next describes the integration of RyR models into cardiomyocyte Ca2+ cycling frameworks and their application across different cardiomyocyte subtypes. It further examines arrhythmogenic mechanisms arising from RyR dysfunction and assesses drug strategies designed to stabilize RyR activity. Finally, it highlights artificial intelligence and cardiac digital twins as emerging directions for advancing RyR modeling and therapeutic development.  Progress   The growing availability of RyR structural data has enabled continued refinement of modeling strategies. Early RyR models relied primarily on phenomenological formulations that were computationally practical but limited in mechanistic detail. Markov models have become the dominant framework for simulating RyR gating behavior and enable detailed representation of Ca2+ sparks and related events through discrete state transitions. Deterministic integration of Markov models offers high computational efficiency and adaptability across different cardiomyocyte types. However, this approach neglects the stochastic nature of RyR opening and fails to reproduce random fluctuations in intracellular Ca2+ concentration, which can lead to discrepancies between simulations and physiological behavior. Stochastic Markov models capture these random processes and are therefore essential for investigating arrhythmogenic phenomena such as Ca2+ waves. Their application, however, requires extensive experimental data and substantial computational resources, which limits large-scale implementation. Recent artificial intelligence approaches, including deep neural networks that compress Markov models into single governing equations, have improved computational efficiency. Advances in structural biology have further clarified RyR conformational dynamics and subunit cooperativity during gating, particularly in relation to diastolic Ca2+ leak. These insights have motivated more detailed models that incorporate subunit interactions or molecular dynamics. Numerous RyR models have been incorporated into cardiac action potential frameworks and applied to the study of EADs and DADs. These integrated models enhance understanding of electrical disturbances caused by RyR dysfunction and provide a useful platform for drug screening and mechanistic investigation.  Conclusion  Multiple RyR models have been developed that successfully reproduce key physiological processes, including Ca2+ sparks, and are widely applied in studies of cardiomyocyte Ca2+ cycling. Nevertheless, several challenges remain. (1) A unified modeling framework is still lacking. No single RyR model can accurately simulate Ca2+ dynamics across the full spectrum of physiological and pathological conditions. Careful evaluation is therefore required when selecting models for intracellular Ca2+ handling. (2) Computational burden limits multiscale integration. Multiscale models are necessary to connect cellular Ca2+ dynamics with tissue-level electrical propagation by incorporating spatial heterogeneity, but their high computational cost restricts application in clinically relevant scenarios. (3) Pacemaker cell models remain underdeveloped. Current research focuses primarily on ventricular and atrial cardiomyocytes, whereas pacemaker cell models are less mature and often rely on common-pool formulations that do not represent spatial Ca2+ gradients. Future studies should prioritize the development of detailed pacemaker cell models that explicitly represent Ca2+ release unit networks and incorporate realistic RyR dynamics. Artificial intelligence and cardiac digital twins, although still at an early stage in RyR modeling, offer substantial potential to advance mechanistic research and support precision-medicine applications.  Prospects   Future RyR research will increasingly depend on integrating advances from structural biology, biophysics, and computational science. Such efforts are required to connect molecular-scale RyR conformational changes with organ-level cardiac function and to enable scalable, clinically actionable models. These models can strengthen mechanistic understanding and accelerate translational progress in precision cardiology. Artificial intelligence and cardiac digital twins provide a pathway toward multi-scale cardiac models that incorporate patient-specific electrophysiology and Ca2+ cycling. These approaches may substantially improve understanding of arrhythmia mechanisms and heart failure pathophysiology and serve as predictive platforms for the development of mechanism-based personalized antiarrhythmic therapies.
A Large-Scale Multimodal Instruction Dataset for Remote Sensing Agents
WANG Peijin, HU Huiyang, FENG Yingchao, DIAO Wenhui, SUN Xian
Available online  , doi: 10.11999/JEIT250818
Abstract:
  Objective   The rapid advancement of Remote Sensing (RS) technology has reshaped Earth observation research, shifting the field from static image analysis to intelligent, goal-oriented cognitive decision-making. Modern RS systems are expected to perceive complex scenes, reason over heterogeneous information, decompose high-level objectives into executable subtasks, and make decisions under uncertainty. These requirements motivate the development of RS agents, which extend perception models to include reasoning, planning, and interaction functions. However, existing RS datasets remain task-centric and fragmented, as they are usually designed for single-purpose supervised learning such as object detection or land-cover classification. They seldom support multimodal reasoning, instruction following, or multi-step decision-making, all of which are essential for agentic workflows. Current RS vision-language datasets also have limited scale, constrained modality coverage, and simplified text annotations, with insufficient use of non-optical data such as Synthetic Aperture Radar (SAR) and infrared imagery. They further lack instruction-driven interactions that reflect real human-agent collaboration. This study constructs a large-scale multimodal image-text instruction dataset tailored for RS agents. The objective is to establish a unified data foundation that supports perception, reasoning, planning, and decision-making. By training models on structured instructions across diverse modalities and task categories, the dataset supports the development and evaluation of next-generation RS foundation models with agentic capability.  Methods   The dataset is built through a systematic and extensible framework that integrates multi-source RS imagery with instruction-oriented textual supervision. A unified input-output paradigm is defined to ensure compatibility across heterogeneous tasks and model architectures. This paradigm formalizes interactions between visual inputs and language instructions, allowing models to process image pixels, text descriptions, spatial coordinates, region references, and action-oriented outputs. A standardized instruction schema encodes task objectives, constraints, and expected responses in a consistent format. The construction process includes three stages. (1) Data collection and integration: multimodal RS imagery is aggregated from authoritative sources, covering optical, SAR, and infrared modalities with different spatial resolutions, scene types, and geographic distributions. (2) Instruction generation: a hybrid strategy combines rule-based templates with refinement by Large Language Models (LLMs). Template-based generation ensures task completeness and structural consistency, whereas LLM rewriting improves linguistic diversity and instruction complexity. (3) Task categorization and organization: the dataset is organized into nine core task categories and 21 sub-datasets that span low-level perception, mid-level reasoning, and high-level decision-making. A validation pipeline performs automated syntax and format checks, cross-modal consistency verification, and manual review of representative samples to ensure semantic alignment between images and instructions.  Results and Discussions   The dataset contains more than 2 million multimodal instruction samples, making it one of the largest and most comprehensive instruction resources in the RS domain. The inclusion of optical, SAR, and infrared imagery supports cross-modal learning and reasoning across heterogeneous sensing mechanisms. Compared with existing RS datasets, this dataset emphasizes instruction diversity, task compositionality, and agent-oriented interaction rather than isolated perception tasks. Baseline experiments conducted using state-of-the-art multimodal LLMs and RS foundation models show that the dataset supports evaluation across the full spectrum of agentic capabilities, from visual grounding and reasoning to high-level decision-making. The experiments also highlight challenges inherent to RS data, including extreme scale variation, dense object distributions, and long-range spatial dependencies. These challenges indicate important research directions for improving multimodal reasoning and planning in complex RS environments.  Conclusions   This work presents a large-scale multimodal image-text instruction dataset designed for RS agents. By organizing data across nine task categories and 21 sub-datasets, it provides a unified and extensible benchmark for agent-centric RS research. The contributions include: (1) a unified multimodal instruction paradigm for RS agents; (2) a 2-million-sample dataset covering optical, SAR, and infrared modalities; (3) empirical validation demonstrating support for end-to-end agentic workflows from perception to decision-making; and (4) a comprehensive evaluation benchmark based on baseline experiments. Future work will extend the dataset to temporal and video-based RS scenarios, integrate dynamic decision-making processes, and further improve reasoning and planning capability in real-world, time-varying environments.
CaRS-Align: Channel Relation Spectra Alignment for Cross-Modal Vehicle Re-identification
SA Baihui, ZHUANG Jingyi, ZHENG Jinjie, ZHU Jianqing
Available online  , doi: 10.11999/JEIT250917
Abstract:
  Objective  Visible and infrared images are two commonly used modalities in intelligent transportation scenarios and play a key role in vehicle re-identification. However, differences in imaging mechanisms and spectral responses lead to inconsistent visual characteristics between these modalities, which limits cross-modal vehicle re-identification. To address this problem, this paper proposes a Channel Relation Spectra Alignment (CaRS-Align) method that uses channel relation spectra, rather than channel-wise features, as the alignment target. This strategy reduces interference caused by imaging style differences at the relational-structure level. Within each modality, a channel relation spectrum is constructed to capture stable and semantically coordinated channel-to-channel relationships through correlation modeling. At the cross-modal level, the correlation between the corresponding channel relation spectra of the two modalities is maximized to achieve consistent alignment of relational structures. Experiments on the public MSVR310 and RGBN300 datasets show that CaRS-Align outperforms existing state-of-the-art methods. For example, on MSVR310, under infrared-to-visible retrieval, CaRS-Align achieves a Rank-1 accuracy of 64.35%, which is 2.58% higher than advanced existing methods.  Methods  CaRS-Align adopts a hierarchical optimization paradigm: (1) for each modality, a channel–channel relation spectrum is constructed by mining inter-channel dependencies, yielding a semantically coordinated relation matrix that preserves the organizational structure of semantic cues; (2) cross-modal consistency is achieved by maximizing the correlation between the relation spectra of the two modalities, enabling progressive optimization from intra-modal construction to cross-modal alignment; and (3) relation spectrum alignment is integrated with standard classification and retrieval objectives commonly used in re-identification to supervise backbone training for the vehicle re-identification model.  Results and Discussions  Compared with several state-of-the-art cross-modal re-identification methods on the RGBN300 and MSVR310 datasets, CaRS-Align demonstrates strong performance and achieves best or second-best results across both retrieval modes. As shown in (Table 1), on RGBN300 it attains 75.09% Rank-1 accuracy and 55.45% mean Average Precision (mAP) in the infrared-to-visible mode, and 76.60% Rank-1 accuracy and 56.12% mAP in the visible-to-infrared mode. As shown in (Table 2), similar advantages are observed on MSVR310, with 64.54% Rank-1 accuracy and 41.25% mAP in the visible-to-infrared mode, and 64.35% Rank-1 accuracy and 40.99% mAP in the infrared-to-visible mode. (Fig. 4) presents Top-10 retrieval results, where CaRS-Align reduces identity mismatches in both directions (Fig. 5) illustrates feature distance distributions, showing substantial overlap between intra-class and inter-class distances without CaRS-Align (Fig. 5(a)), whereas clearer separation is observed with CaRS-Align (Fig. 5(b)), confirming improved feature discrimination. These results indicate that modeling channel-level relational structures improves both retrieval modes, increases adaptability to modality shifts, and effectively reduces mismatches caused by cross-modal differences.  Conclusions  This paper proposes a visible–infrared cross-modal vehicle re-identification method based on CaRS-Align. Within each modality, a channel relation spectrum is constructed to preserve semantic co-occurrence structures. A CaRS-Align function is then designed to maximize the correlation between modalities, thereby achieving consistent alignment and improving cross-modal performance. Experiments on the MSVR310 and RGBN300 datasets demonstrate that CaRS-Align outperforms existing state-of-the-art methods in key metrics, including Rank-1 accuracy and mAP.
A Review of Causal Feature Learning in Deep Learning Image Classification Models
WANG Xiaodong, JIANG Ling, LI Huihui, WANG Buhong
Available online  , doi: 10.11999/JEIT250738
Abstract:
  Significance   The Deep Learning mechanism is constructed based on statistical correlations rather than causal relationships. Consequently, severe challenges in terms of generalization, interpretability, and stability are inevitably faced by such models. In contrast to human cognition, which mainly relies on causal discovery and exploitation, current Deep Learning models are still confined to the bottom of the "Pearl Causal Hierarchy (PCH)". Thus, the integration of causal inference into Deep Learning is highly anticipated. As the most crucial branch of Deep Learning, image classification models (represented by Convolutional Neural Networks, CNNs) exhibit particularly prominent shortcomings, and the introduction of causal inference is urgently required to address the bottleneck. Among various solutions for integrating causal inference into these models, Causal Feature Learning (CFL), a framework that combines unsupervised machine learning and causal inference, exhibits significant advantages. It is confirmed by studies that causal relationships are implicitly embedded in the pixel information of input image data for image classification tasks. According to the proven Causal Coarsening Theorem (CCT), causal knowledge can be acquired from observed image data at minimal experimental cost. In classification tasks, the optimal solution is constituted by the Markov Boundary (MB) of the causal Bayesian network for the class variable. The research endeavor to establish a connection between deep image classification models and causal inference via CFL is strongly supported by these theories. In general, the research significance of CFL has become increasingly prominent, and it is positioned as one of the potential breakthrough directions in the development of next-generation models.  Progress   This paper presents a comprehensive survey of CFL in Deep Learning image classification models from three core issues: statistical causal inference theory, correlation analysis methods and CFL implementations. First, the relevant definitions of CFL technology and its two mainstream statistical implementation frameworks, including causal discovery based on the Structural Causal Model (SCM) and causal effect estimation based on the Rubin Causal Model (RCM), are introduced. Second, correlation analysis methods for Deep Learning image classification models, which lie at the threshold of the PCH, are systematically summarized from three perspectives: forward, backward, and horizontal. Third, following the auxiliary tools, the progress of CFL for image classification is classified into four main aspects: Causal Feature Discovery (CFD), Causal Feature Effect Estimation (CFEE), Causal Representation Learning (CRL) and Spurious Correlation Removal (SCR). CFD is grounded in the SCM framework, aiming to derive confounding-free causal graphs through explicit or implicit causal intervention analyses on image data or models. Under the RCM framework, CFEE leverages observed image data to complete the quantitative evaluation of the causal effects of features, while overcoming the impacts of unknown counterfactual samples and confounding biases. CRL focuses on selecting or extracting high-dimensional features from image data to learn causal relationships and mine low-dimensional cross-image representations. SCR eliminates non-causal features from images and preserves causal ones via diverse methods. In addition , available toolkits, top conference resources and academic organizations are listed. Furthermore, this paper discusses key technical issues and future research directions.  Conclusions  This review summarizes the technological development of CFL. In general, considerable progress has been made, but difficulties in different research directions still need to be overcome. The advantages of CFD lie in that it is based on the basic logic of causal theory with clear and simple structures and is easy to accept. However, CFD suffers from immature processing methods for high-dimensional image data and insufficient generalization ability. CFEE can effectively distinguish causal features from confounding features. Its evaluation results are closer to real decision-making logic and show strong universality. Common problems of CFEE include the requirement for observable confounding factors, high dependence on causal assumptions, insufficient computational efficiency. CRL has the advantages of more optional dimensions and the ability to discover causal factors that drive classification and exclude non-causal factors. The core problems to be solved currently include generalization bias, factor coupling, prior dependence, weak evaluation, and high cost. SCR has strong pertinence but poor generalization. From a macro perspective, the implementation of CFL should not be limited to specific methods. All methods that aim to build causal relationships from micro-variables such as image pixels to causal macro-variables such as global semantics can be included, so it is an open research topic.  Prospects   The goal of causal inference is to go beyond correlation and clarify the causal relationships between variables by designing more rigorous experiments or employing advanced statistical methods. This requires deeper assumptions about feature relationships and more generalizable exploration of underlying causal chains, both of which are highly challenging and will become the main focus of future scholars in this field. To address the technical challenges in CFL, this paper proposes that future research can focus on the following directions: (1) Unifying the construction paradigms and establishing standards for image-based Structural Causal Models (SCMs), so as to improve the standardization and consistency of causal discovery; (2) Developing the RCM supported by generative artificial intelligence, to address the problem of sample scarcity in causal effect estimation; (3) Reforming models with the aim of learning novel image causal representations, thereby fundamentally resolving the inherent deficiencies of CNNs in CFL; (4) Integrating spurious correlation analysis with reinforcement learning, and leveraging reinforcement learning to endow Deep Learning image classification models with meta-learning capabilities for causal exploration. It can be asserted that, with the resolution of these key issues in CFL, there must be a qualitative improvement in accuracy, generalization, interpretability, and stability of Deep Learning images classification models.
Optimized Implementation of Low-Depth Lightweight S-Boxes
FENG Zixi, LIU Yupeng, DOU Guowei, LIU Chengle
Available online  , doi: 10.11999/JEIT250690
Abstract:
  Objective  With the rapid development and widespread deployment of the Internet of Things (IoT), embedded systems, and mobile computing devices, ensuring secure communication and data protection on resource-constrained platforms has become a central focus in the field of information security. These devices are typically characterized by severe limitations in terms of computational capability, storage capacity, and energy consumption, which render traditional cryptographic algorithms inefficient or even infeasible in such environments. In response to these constraints, lightweight cryptographic algorithms have been proposed as an effective class of solutions. Their primary objective is to achieve comparable levels of security as traditional algorithms while significantly reducing the hardware and computational overhead through deliberate algorithmic simplifications and structural optimizations. These algorithms are designed to operate efficiently within tight resource bounds and are especially suitable for applications such as sensor networks, smart cards, RFID systems, and wearable devices. From the perspective of hardware implementation, the design of lightweight cryptographic algorithms must account for multiple performance indicators, including throughput, latency, power efficiency, chip area, and circuit depth. Among these, chip area and depth are considered particularly critical, as they directly influence the physical cost of production and the speed of computation. The Substitution-box (S-Box), as the core nonlinear component responsible for providing confusion in most symmetric encryption schemes, plays a decisive role in determining the security strength and implementation efficiency of the entire cipher. Therefore, exploring efficient methods to realize low-area and low-depth implementations of S-Boxes is of fundamental importance to the design of secure and practical lightweight cryptographic systems.  Methods  In this work, a novel S-Box optimization algorithm based on Boolean satisfiability (SAT) solving is proposed to simultaneously optimize two key hardware metrics: logic area and circuit depth. To this end, a circuit model with depth k and width w is constructed. Under a given area constraint, SAT solving techniques are employed to determine whether the circuit model can implement the target S-Box. By iteratively adjusting circuit depth, width, and area parameters, an optimized implementation scheme of the S-Box is eventually obtained. The method is specifically developed for 4-bit S-Boxes, which are widely adopted in many lightweight block ciphers, and it provides implementations that are highly efficient in both structural compactness and computational depth. This dual optimization approach helps to reduce hardware costs while maintaining low latency, making it especially suitable for scenarios where performance and energy efficiency are both critical. The proposed method begins by transforming the S-Box implementation problem into a formal SAT problem, enabling the use of powerful SAT solvers to exhaustively explore possible logic-level representations. In this transformation, a diverse set of logic gates—including 2-input, 3-input, and 4-input gates—is utilized to construct flexible logic networks. To enforce area and depth constraints, arithmetic operations such as binary addition and comparator logic are encoded into SAT-compatible Boolean constraints, which guide the solver toward low-area and low-depth solutions. To further accelerate the solving process and avoid redundant search paths, symmetry-breaking constraints are introduced. These constraints help eliminate logically equivalent but structurally different representations, thereby significantly reducing the size of the solution space. The Cadical SAT solver, known for its speed and efficiency in handling large-scale SAT problems, is employed to compute optimized S-Box implementations that minimize both depth and area. The proposed approach not only generates efficient implementations but also provides a general modeling framework that can be extended to other logic synthesis problems in cryptographic hardware design.  Results and Discussions  To validate the effectiveness of the proposed optimization method, a comprehensive set of experiments was conducted on 4-bit S-Boxes from several representative lightweight block ciphers, including Joltik, Piccolo, Rectangle, Skinny, Lblock, Lac, Midori, and Prøst. The results demonstrate that the method consistently produces high-quality implementations that are competitive or superior in terms of both chip area and circuit depth when compared with existing state-of-the-art results. Specifically, for the S-Boxes of Joltik and Piccolo, as well as for those used in Skinny and Rectangle, the generated implementations match the best known results in both metrics, indicating that the method can successfully reproduce optimal or near-optimal designs. In the cases of Lblock and Lac, although the logic area remains similar to prior results, the circuit depth is significantly reduced, from an initial value of 10 down to 3, which represents a substantial improvement in processing latency and suitability for real-time applications. For the inverse S-Box of the Rectangle cipher, the proposed implementation achieves the same circuit depth as previous designs but reduces the area from 24.33 gate equivalents (GE) to 17.66 GE, yielding a more compact and efficient realization. The optimization results for the Midori S-Box further confirm the effectiveness of the method, where both depth and area are improved—depth is reduced from 4 to 3, and area is brought down from 20.00 GE to 16.33 GE. For the Prøst cipher’s S-Box, two alternative implementations are presented to illustrate the trade-off between area and depth. The first achieves a depth of 4 with an area of 22.00 GE, matching the best known depth but at a higher area cost, while the second increases the depth to 5 but reduces the area significantly to 13.00 GE. These results demonstrate that the method not only supports flexible optimization under different design constraints but also contributes to a deeper understanding of the complexity and trade-offs involved in S-Box implementation.  Conclusions   This paper presents a SAT-based method for jointly optimizing S-box hardware implementations in terms of area and circuit depth. By modeling the S-box realization as a satisfiability problem and exploiting advanced constraint encoding, multi-input logic gates, and symmetry-breaking techniques, the method effectively reduces hardware complexity while maintaining or improving depth performance. Extensive experiments on various 4-bit S-boxes demonstrate that the proposed approach matches or outperforms existing results, particularly in reducing circuit depth and improving logic compactness. This makes it well suited for lightweight cryptographic systems operating under strict constraints on silicon area, speed, and energy consumption.Despite these advantages, the method still has limitations. While it achieves optimal or near-optimal results for 4-bit S-boxes, scalability to larger instances such as 5-bit or 8-bit S-boxes remains challenging due to the exponential growth of the search space and solving time. As model complexity increases, solving becomes computationally expensive and may not converge in practice. Future work will focus on improving modeling efficiency and solver performance through refined constraint generation, stronger pruning strategies, and heuristic-guided search, with the goal of extending the method to more complex S-boxes and other nonlinear components in lightweight and post-quantum cryptographic systems.
A Clipped NMS List Decoding Algorithm of LDPC Codes for 5G URLLC
ZHANG Xiaojun, SONG Xin, GAO Jian, MI Yonghao, NIU kai
Available online  , doi: 10.11999/JEIT250853
Abstract:
  Objective  As one of the coding schemes in the fifth-generation (5G) wireless communication systems, Low-Density Parity-Check (LDPC) codes can achieve performance close to the Shannon limit through iterative decoding. However, in practical wireless transmission environments, the decoding performance of LDPC codes is susceptible to burst interference in wireless channels. The NMS decoding algorithm is highly sensitive to the distribution characteristics of input log-likelihood ratios (LLRs). Burst interference will cause LLRs to deviate from the Gaussian distribution, resulting in degradation in decoding performance. Meanwhile, 5G LDPC decoders are often equipped with a fixed number of processing units (PEs) according to the maximum lifting size to cover the full code length range. In URLLC (Ultra-Reliable Low-Latency Communications) short code transmission scenarios, the lifting size is much smaller than the maximum lifting size, leading to long-term idleness of a large number of processing units and insufficient utilization of hardware resources. To address the above issues, this paper proposes a Clipped Normalized Min-Sum List (CNMSL) decoding algorithm. By co-designing burst interference smoothing and idle resource reuse, it improves hardware resource utilization while enhancing decoding performance.  Methods  The statistical characteristics of LLRs over AWGN and interference channels are first analyzed, and the negative impact of burst interference on decoding performance is qualitatively illustrated to stem from the increased proportion of saturated LLRs induced by such interference. Next, the correlation between the optimal clipping threshold and channel noise variance, burst interference variance as well as burst probability is verified, which converges to a finite interval, the optimal threshold interval, when channel parameters undergo limited variations. On this basis, the CNMSL decoding algorithm is proposed. This algorithm constructs a list decoding architecture by reusing idle processing units in 5G LDPC decoders, where each decoding path performs independent and synchronous decoding to generate candidate codewords, and the optimal decoding result is screened out via CRC check. Meanwhile, an independent clipper is configured for each path with parameters set according to the optimal threshold interval, thereby effectively suppressing and mitigating the adverse effects of burst interference.  Results and Discussions  Experimental results show that the layered NMS algorithm almost fails to decode over interference channels without clipping mechanism. With a single clipping threshold, the algorithm works normally, and its BLER exhibits a convex-down trend of first decreasing and then increasing as the clipping threshold reduces. Under various channel conditions for both short and long codes, the single-clipping layered NMS algorithm with a clipping threshold of 3.5 achieves a gain of about 1 dB at \begin{document}$ BLER={10}^{-2} $\end{document} compared with that of 10, and the CNMSL algorithm further yields an additional gain of about 0.5 dB relative to the single-clipping NMS algorithm. In terms of hardware efficiency, when the lifting factor is less than 192, the PE utilization of the CNMSL algorithm is significantly higher than that of the layered NMS algorithm, with more remarkable improvement as the lifting factor decreases, and the average PE utilization of the CNMSL algorithm is increased by 69% compared with the layered NMS algorithm.  Conclusions  The CNMSL decoding algorithm is proposed in this paper, aiming to improve the error correction performance of the traditional layered NMS decoding algorithm over interference channels. By reusing idle PEs for list decoding to generate multiple candidate paths, the algorithm incurs no additional hardware overhead. In addition, an optimal threshold interval is defined to configure the clipper for each decoding path, which limits the proportion of saturated LLRs and makes the input LLRs follow a Gaussian or near-Gaussian distribution. Experimental results show that compared with the layered NMS decoding algorithm with a single clipper, the proposed CNMSL algorithm achieves a gain of approximately 0.5 dB for both short and long codes. Meanwhile, it increases the PE utilization by an average of 69%.
Drug Response Prediction Based on Graph Topology Attention Network
XU Peng, XU Hao, BAO Zhenshen, ZHOU Chi, LIU Wenbin
Available online  , doi: 10.11999/JEIT251099
Abstract:
  Objective  A core goal in modern cancer research is to figure out why patients respond differently to the same therapy. Achieving this requires developing computational tools that combine genetic information and drug properties to forecast treatment outcomes, which is essential for advancing personalized oncology. Although some existing methods have made progress in predicting cancer drug responses, effectively extracting features of drugs and integrating multi-omics data from cell lines have become challenges. To address these challenges, employing Graph Neural Networks (GNNs) to process drug molecular graphs has become a promising strategy. This research proposes a model that utilizes a graph topology attention network to capture features from drug molecular graphs, while an attention mechanism is applied to integrate multi-omics data.  Methods  In this study, a drug response prediction method based on Graph Topology Attention Network(GTAT) is proposed. The model integrates topological graph information to predict drug responses in cell lines. The model utilizes drug SMILES strings to generate two distinct drug representations and incorporates multi-omics data for cell line characterization (Fig. 1). For drug feature extraction, SMILES strings are first parsed to construct molecular graphs, which are then processed by the GTAT. This network captures both the topological information of the molecular graph-level and atom-level features, thereby producing structured molecular representations. Simultaneously, Extended Connectivity Fingerprints are computed from the same SMILES strings and transformed into continuous feature vectors via a Multi-Layer Perceptron (MLP). The graph-based drug representation and the fingerprint-based representation are subsequently concatenated to form a comprehensive drug feature vector. For cell line representation, multi-omics data are processed through omics-specific neural networks. The resulting features are fused using multi-head self-attention mechanisms, enabling the model to capture contextual interactions across omics modalities and generate an integrated cell line representation. Finally, the drug and cell line features are combined and fed into an MLP classifier to predict drug response outcomes. The proposed model effectively integrates heterogeneous biological data sources and significantly enhances prediction accuracy through multi-modal learning and attention-based feature fusion.  Results and Discussions  The proposed method achieves competitive performance on both GDSC and CCLE benchmark datasets (Table 2). Specifically, on the GDSC dataset, our approach outperforms all competing methods across all four metrics—AUC, AUPR, F1-score, and Accuracy. Notably, it improves the AUPR by approximately 1.92% over the second-best method, MOFGCN, demonstrating its advantage in handling class imbalance. On the CCLE dataset, our method still achieves the best performance in terms of AUC and Accuracy. Although it is marginally lower than GADRP in AUPR and F1-score, the gap is minimal, and our approach exhibits more robust overall discriminative ability (as reflected by AUC). These results collectively validate the effectiveness and strong generalizability of our method in drug sensitivity prediction tasks. The observed variation in AUPR and F1-score performance between datasets can be attributed to inherent differences in sample size and class distribution characteristics. The limited scale of the CCLE dataset, combined with its specific class imbalance (approximately 4:1 ratio of resistant to sensitive samples), may constrain the model's capacity to fully learn the underlying data distribution, particularly for minority classes. In contrast, the GDSC dataset exhibits greater heterogeneity and a more pronounced class imbalance (approximately 8:1), which collectively contribute to increased prediction difficulty and consequently lower performance on certain metrics.  Conclusions  Accurately predicting drug response in cell lines remains a central challenge in precision medicine, with significant implications for accelerating drug development and advancing personalized treatment. However, constructing a high-accuracy predictive model capable of effectively integrating multi-source biological information is difficult due to the complexity of drug molecular structures and inherent heterogeneity of cell lines. To address this, a cell line drug response prediction model based on Graph Topology Attention Network is proposed. This model employs the graph topology attention network to extract molecular graph features of drugs, which are then fused with molecular fingerprint features. Meanwhile, multi-omics features of cell lines are integrated using an attention mechanism. Experimental results demonstrate that the proposed model achieves superior performance over existing state-of-the-art benchmarks on the employed dataset. This study provides a new perspective for predicting cell line drug response. Certain limitations are acknowledged, such as the use of only three types of omics features for cell line representation and the influence of sample size on predictive outcomes. The integration of more diverse omics features, the application of pre-trained large-scale models, and the clinical translation for personalized medicine will be the primary focus of future work.
Multi-dimensional Spatio-temporal Features Enhancement for Lip reading
MA JinLin, ZHONG YaoWei, MA RuiShi
Available online  , doi: 10.11999/JEIT251111
Abstract:
  Objective  Lip reading is a challenging yet vital frontier in computer vision, dedicated to decoding spoken language solely from visual lip movements. The difficulty arises primarily from inherent ambiguities in the visual speech signal. On one hand, articulatory movements for different visemes can be extremely subtle. for instance, lip displacement differences as small as 0.3–0.7 mm for confusable pairs such as /p/–/b/ and /m/–/n/. These fine-grained spatial variations often lie below the effective resolution limits of conventional 3D convolutional neural networks. On the other hand, the natural co-articulation in speech introduces temporal ambiguity, where mouth shapes transiently blend multiple phonemes, making it difficult to isolate distinct visual units. These challenges are further compounded by real-world variables such as uneven lighting and significant inter-speaker articulation differences. As a result, current lip reading models frequently exhibit limitations in capturing discriminative spatiotemporal features, leading to suboptimal performance—especially for phonemes with minimal visual distinctions. Motivated by these issues, this work aims to develop a robust lip reading framework capable of effectively capturing and leveraging fine-grained spatiotemporal dependencies to improve recognition accuracy under diverse and realistic conditions.  Methods  To address the aforementioned limitations, this study proposes a novel lip reading framework named the Multi-dimensional Spatio-Temporal Enhancement Network (MSTEN), which is systematically designed to enhance spatial and temporal representations through integrated attention mechanisms and advanced residual learning. The framework incorporates three core components that collaboratively model the interdependencies between spatial and temporal features—an aspect often underutilized in conventional architectures. The first component, the Self-adjusting Spatio-temporal Attention (SaSTA) module, employs a self-adjusting mechanism operating concurrently across height, width, and temporal dimensions. It generates query, key, and value tensors via 1×1×1 3D convolutions, flattens them across spatial and temporal dimensions, and computes attention weights by multiplying the query with the transposed key, followed by softmax normalization. The resulting attention map is multiplied with the value vector and then combined with the original input via learnable parameters and a residual connection to preserve contextual information, yielding globally enhanced features. The second component, the Three-dimensional Enhanced Residual Block (TE-ResBlock), augments spatiotemporal feature extraction through temporal shift, multi-scale convolution, and channel shuffle. The temporal shift operation moves a quarter of the feature channels along the time axis to fuse adjacent frame information parameter-free, while multi-scale convolution uses parallel branches with kernel sizes of 3×3, 3×1, 1×3, and 1×1 to capture diverse receptive fields. Outputs are concatenated and processed via channel shuffle to improve cross-group information flow, with four TE-ResBlocks stacked for progressive feature refinement. The third component, the Multi-dimensional Adaptive Fusion (MDAF) module, deeply integrates spatial, temporal, and channel dimensions through three sub-modules: a Channel Enhancement Module (CEM) that recalibrates features using max pooling, temporal convolution, and sigmoid activation; a Spatial Enhancement Module (SEM) that expands the receptive field via identity mapping, standard and dilated convolution; and an Adaptive Temporal Capture Module (ATCM) that emphasizes dynamic movements using frame difference features and temporal weight maps. MDAF modules are inserted between TE-ResBlock stacks for iterative refinement. Finally, features from the MSTEN front-end are fed into a Densely Connected Temporal Convolutional Network (DC-TCN) back-end, which comprises four blocks, each containing three temporally convolutional layers with dense connections, to effectively model long-range phonological dependencies.  Results and Discussions  The proposed framework is comprehensively evaluated on the widely-used LRW dataset and GRID dataset, LRW comprising over 500,000 video clips from more than 1,000 speakers, GRID dataset consists of video clips from 34 speakers, with each speaker having 1,000 utterances and a total duration of 28 hours. Our model achieves an accuracy of 91.18%, representing an absolute improvement of 2.82 percentage points over a strong ResNet18 baseline, which underscores its substantial effectiveness. Ablation studies are conducted to dissect the contribution of each key component. The results clearly demonstrate that every proposed module brings a significant performance gain. Specifically, the introduction of the SaSTA module alone leads to an accuracy improvement of 2.09%, highlighting the crucial role of global spatiotemporal attention. The TE-ResBlock contributes a 1.73% increase, confirming its efficacy in multi-scale local feature extraction and inter-frame information fusion. Moreover, the MDAF module further enhances performance by 1.74%, emphasizing the benefit of adaptive multi-dimensional feature fusion, as detailed in Table 2.  Conclusions  This study presents a significant advancement in lipreading via the introduction of the MSTEN front-end network. The work is built upon three core contributions. First, the SaSTA module introduces an innovative mechanism for global context aggregation, effectively performing multi-dimensional feature weighting across height, width, and temporal sequences. Second, the TE-ResBlock tackles fundamental challenges in spatio-temporal modeling through a unique combination of temporal displacement, multi-scale convolution, and enhanced channel-wise interaction. Third, the MDAF module facilitates deep and synergistic integration of information from spatial, temporal, and channel dimensions. Together, these components work in concert to achieve state-of-the-art performance, reaching an accuracy of 91.18% on the challenging LRW dataset and 97.82% on the GRID dataset. Ablation studies further validate the individual and collective efficacy of each proposed innovation. Looking forward, future work will explore the extension of this framework to audio-visual speech recognition under noisy conditions, as well as the development of domain adaptation strategies to enhance robustness in low-resolution or resource-constrained scenarios.
Multi-scale Frequency Adapter and Dual-path Attention for Time Series Forecasting
YANG Zhenzhen, XU Yi, WANG Chengye, YANG Yongpeng
Available online  , doi: 10.11999/JEIT251188
Abstract:
  Objective  With the rapid development of big data technology, time series data has been increasingly applied in areas such as meteorology, power systems, and finance. Nonetheless, mainstream methods for time series forecasting face notable challenges in multi-scale modeling and frequency-domain feature extraction, which prevents the comprehensive capture of crucial dynamic properties and periodic patterns in complex datasets. Traditional statistical approaches, including ARIMA, rely on assumptions of linear relationships, resulting in poor performance when handling nonlinear or high-dimensional time series data. Although deep learning methods, notably those based on convolutional neural network and Transformer, have improved forecasting accuracy through advanced feature extraction and long-range dependency modeling, limitations remain in the ability to efficiently extract and fuse multi-scale features, both in the temporal and frequency domains. These deficiencies lead to instability and suboptimal accuracy, particularly in dynamic and high-variety applications. This paper aims to address these challenges by proposing an intelligent forecasting framework that effectively models multi-scale information and enhances prediction accuracy in diverse scenarios.  Methods  The proposed method introduces a multi-scale frequency adapter and dual-path attention (MFADA) framework for time series forecasting. The framework integrates the multi-scale frequency adapter (MFA) and the multi-scale dual-path attention (MDA) two key modules. The MFA module efficiently captures multi-scale frequency features using the adaptive pooling and deep convolutions, which enhances the sensitivity to various frequency components and supports modeling of short-term and long-term dependencies. The MDA module applies a multi-scale attention mechanism to strengthen fine-grained modeling across both the temporal and feature dimensions, enabling effective extraction and fusion of comprehensive time and frequency information. The entire framework is designed with computational efficiency in mind to ensure scalability. Experimental validation on 8 public datasets demonstrates the superior performance and robustness compared to existing mainstream time series forecasting approaches.  Results and Discussions  Extensive experiments were conducted on 8 publicly available multivariate datasets, including ECL, Weather, ETT (ETTm1, ETTm2, ETTh1, ETTh2), Solar-Energy, and Traffic. The evaluation metrics used were mean absolute error (MAE) and mean squared error (MSE), with additional consideration given to parameter count, FLOPs, and training time for computational efficiency. Experimental comparisons with state-of-the-art models including Fredformer, Peri-midFormer, iTransformer, TFformer, PatchTST、MSGNet、TimesNet、TCM, show that the proposed MFADA consistently achieves superior forecasting performance across most datasets and forecasting horizons (Table 1), with the best average MSE and MAE of 0.163 and 0.261 on ECL and a 13.2% and 17.3% decrease versus TimesNet for forecasting length 96. On the periodic ETTm1 dataset, the average MSE reaches 0.377, outperforming MSGNet by 5.3%. Ablation studies (Table 2) demonstrate the importance of both MFA and MDA modules: removing MFA or reverting MDA to standard self-attention increases error rates on ECL, Weather, ETTh1, and ETTh2, indicating the synergistic contribution to modeling complexity. Complexity analysis (Fig. 2) reveals that MFADA achieves optimal balance among forecasting accuracy, parameter efficiency, and training time, outperforming Fredformer, MSGNet, and TimesNet. Visualization results for ECL and ETTh2 (Fig. 3, Fig. 4) confirm the ability of MFADA to track ground truth trends, forecast turning points, and outperform baselines in both global and local prediction fidelity. Notably, MFADA performance lags on the Traffic dataset due to its high spatial correlation, highlighting future directions for spatial structure integration.  Conclusions  This paper proposes MFADA, a novel time series forecasting method integrating multi-scale frequency adaptation and dual-path attention mechanisms. MFADA stands out with four key strengths: (1) The MFA module effectively extracts and merges multi-scale frequency-domain features, emphasizing diverse temporal scales through pyramid pooling and channel gating; (2) The MDA module captures multi-scale dependencies along both temporal and feature dimensions, enabling fine-grained dynamic modeling; (3) The architecture maintains computational efficiency using lightweight convolution and pooling operations; (4) Superior results across 8 datasets and various forecasting lengths demonstrate robust generalization, especially for multivariate and long-term forecasting scenarios. The extensive experiments confirm that MFADA advances the state-of-the-art in accurate and efficient time series forecasting, offering promising perspectives for both academic research and practical deployment. Future work will explore spatial correlation integration to further enhance model applicability.
Identification of Novel Protein Drug Targets for Respiratory Diseases by Integrating Human Plasma Proteome with Genome
MA Xinqian, NI Wentao
Available online  , doi: 10.11999/JEIT250796
Abstract:
  Objective  Respiratory diseases are a major cause of global morbidity and mortality and place a heavy socioeconomic burden on healthcare systems. Epidemiological data indicate that Chronic Obstructive Pulmonary Disease (COPD), pneumonia, asthma, lung cancer, and tuberculosis are the five most significant pulmonary diseases worldwide. The COronaVIrus Disease 2019 (COVID-19) pandemic has introduced additional challenges for respiratory health and emphasizes the need for new diagnostic and therapeutic strategies. Integrating proteomics with Genome-Wide Association Studies (GWAS) provides a framework for connecting genetic variation to clinical phenotypes. Genetic variants associated with plasma protein levels, known as protein Quantitative Trait Loci (pQTLs), link the genome to complex respiratory phenotypes. This study evaluates the causal effects of druggable proteins on major respiratory diseases through proteome-wide Mendelian Randomization (MR) and colocalization analyses. The aim is to identify causal associations that can guide biomarker development and drug discovery, and to prioritize candidates for therapeutic repurposing.  Methods  Summary-level data for circulating protein levels are obtained from two large pQTL studies: the deCODE study and the UK Biobank Pharma Proteomics Project (UKB-PPP). Strictly defined cis-pQTLs are selected to ensure robust genetic instruments, yielding 2,918 proteins for downstream analyses. For disease outcomes, large GWAS summary statistics for 27 respiratory phenotypes are collected from previously published studies and international consortia. A two-sample MR design is applied to estimate the effects of plasma proteins on these phenotypes. To reduce confounding driven by Linkage Disequilibrium (LD), Bayesian colocalization analysis is used to assess whether genetic signals for protein levels and respiratory outcomes share a causal variant. The Posterior Probability of hypothesis 4 (PP4) serves as the primary metric, and PP4 > 0.8 is considered strong evidence of shared causality. Summary-data-based Mendelian Randomization (SMR) and the HEterogeneity In Dependent Instruments (HEIDI) test are used to validate the causal associations. Bidirectional MR and the Steiger test are applied to evaluate potential reverse causality. Protein-Protein Interaction (PPI) networks are generated through the STRING database to visualize functional connectivity and biological pathways associated with the causal proteins.  Results and Discussions  The causal effects of 2 918 plasma proteins on 27 respiratory phenotypes are evaluated (Fig. 1). A total of 694 protein–trait associations meet the Bonferroni-corrected threshold (P<1.7×10–5) when cis-instrumental variables are used (Fig. 2). The MR-Egger intercept test identifies 94 protein–disease associations with evidence of directional pleiotropy, which are excluded. Colocalization analysis indicates that 29 protein–phenotype associations show high-confidence evidence of a shared causal variant (PP4>0.8), and 39 show medium-level evidence (0.5<PP4<0.8). SMR validation confirms 26 associations (P<1.72×10–3), and 21 pass the HEIDI test (P>0.05). The findings provide insights into several respiratory diseases. For COPD, five proteins—NRX3A, NRX3B, ERK-1, COMMD1, and PRSS27—are identified as causal. The association between NRXN3 and COPD suggests a genetic connection between nicotine-addiction pathways and chronic lung decline. For asthma, TEF, CASP8, and IL7R show causal evidence, and the robust association between IL7R and asthma suggests that modulation of T-cell homeostasis may provide a therapeutic opportunity. The FUT3_FUT5 complex is uniquely associated with Idiopathic Pulmonary Fibrosis (IPF). CSF3 and LTBP2 are significantly associated with severe COVID-19. For lung cancer, subtype-specific causal proteins are identified, including BTN2A1 for squamous cell lung cancer, BTN1A1 for small cell lung carcinoma, and EHBP1 for lung adenocarcinoma. These findings provide a basis for the development of subtype-specific precision therapies.  Conclusions  This study identifies 29 plasma proteins with high-confidence causal associations across major respiratory diseases. Using MR and colocalization, a comprehensive map of molecular drivers of respiratory conditions is generated. These findings may support precision medicine strategies. However, the findings are limited by the focus on European populations and potential heterogeneity arising from different proteomic platforms. The associations are based on computational analysis, and further validation in independent cohorts and animal models is needed. Additional experimental studies and clinical trials are required to clarify the pathogenic roles and biological mechanisms of the identified proteins to support therapeutic innovation in respiratory medicine.
Spatio-Temporal Constrained Refined Nearest Neighbor Fingerprinting Localization
WANG Yifan, SUN Shunyuan, QIN Ningning
Available online  , doi: 10.11999/JEIT250777
Abstract:
  Objective  Indoor fingerprint-based localization faces three key challenges. First, Dimensionality Reduction (DR), used to reduce storage and computational costs, often disrupts the geometric correlation between signal features and physical space, which reduces mapping accuracy. Second, signal features present temporal variability caused by human movement or environmental changes. During online mapping, this variability introduces bias and distorts similarity between target and reference points in the low-dimensional space. Third, pseudo-neighbor interference persists because environmental noise or imperfect similarity metrics lead to inaccurate neighbor selection and skew position estimates. To address these issues, this study proposes a Spatio-Temporal Constrained Refined Nearest Neighbor (STC-RNL) fingerprinting localization algorithm designed to provide robust, high-accuracy localization under complex interference conditions.  Methods  In the offline phase, a robust DR framework is constructed by integrating two constraints into a MultiDimensional Scaling (MDS) model. A spatial correlation constraint uses physical distances between reference points and assigns stronger associations to proximate locations to preserve alignment between low-dimensional features and the real layout. A temporal consistency constraint clusters multiple temporal signal samples from the same location into a compact region to suppress feature drift. These constraints, combined with the MDS structure-preserving loss, form the optimization objective, from which low-dimensional features and an explicit mapping matrix are obtained. In the online phase, a progressive refinement mechanism is applied. An initial candidate set is selected using a Euclidean distance threshold. A hybrid similarity metric is then constructed by enhancing shared-neighbor similarity with a Sigmoid-based strategy, which truncates low and smooths high similarities, and fusing it with Euclidean distance to improve discrimination of true neighbors. Subsequently, an iterative Z-score-based filtering procedure removes reference points that deviate from local group characteristics in feature and coordinate domains. The final position is estimated through a similarity-weighted average over the refined neighbor set, assigning higher weights to more reliable references.  Results and Discussions  The performance of STC-RNL is assessed on a private ITEC dataset and a public SYL dataset. The spatio-temporal constraints enhance the robustness of the mapping matrix under noisy conditions (Table 2). Compared with baseline DR methods, the proposed module reduces mean localization error by at least 6.30% in high-noise scenarios (Fig. 9). In the localization stage, the refined neighbor selection reduces pseudo-neighbor interference. On the ITEC dataset, STC-RNL achieves an average error of 0.959 m, improving performance by 9.61% to 33.68% compared with SSA-XGBoost and SPSO (Table 1). End-to-end comparisons show that STC-RNL reduces the average error by at least 12.42% on ITEC and by at least 7.08% on SYL (Table 2), and its CDF curves demonstrate faster convergence and higher precision, especially within the 1.2 m range (Fig. 10). These results indicate that the algorithm maintains high stability and accuracy with a lower maximum error across datasets.  Conclusions  The STC-RNL algorithm addresses structural distortion and mapping bias found in traditional DR-based localization. By jointly optimizing offline feature embedding with spatio-temporal constraints and online neighbor selection with progressive refinement, the coupling between signal features and physical coordinates is strengthened. The main innovation lies in a synergistic framework that ensures only high-confidence neighbors contribute to the final estimate, improving accuracy and robustness in dynamic environments. Experiments show that the model reduces average localization error by 12.42%\begin{document}$ \sim $\end{document}32.80% on ITEC and by 7.08%\begin{document}$ \sim $\end{document}13.67% on SYL relative to baseline algorithms, while achieving faster error convergence. Future research may incorporate nonlinear manifold modeling to further improve performance in heterogeneous access point environments.
Construction of Maximum Distance Separable Codes and Near Maximum Distance Separable Codes Based on Cyclic Subgroup of \begin{document}$ \mathbb{F}_{{q}^{2}}^{*} $\end{document}
DU Xiaoni, XUE Jing, QIAO Xingbin, ZHAO Ziwei
Available online  , doi: 10.11999/JEIT251204
Abstract:
  Objective  The demand for higher performance and efficiency in error-correcting codes has increased with the rapid development of modern communication technologies. These codes detect and correct transmission errors. Because of their algebraic structure, straightforward encoding and decoding, and ease of implementation, linear codes are widely used in communication systems. Their parameters follow classical bounds such as the Singleton bound: for a linear code with length \begin{document}$ n $\end{document} and dimension \begin{document}$ k $\end{document}, the minimum distance \begin{document}$ d $\end{document} satisfies \begin{document}$ d\leq n-k+1 $\end{document}. When \begin{document}$ d=n-k+1 $\end{document}, the code is a Maximum Distance Separable (MDS) code. MDS codes are applied in distributed storage systems and random error channels. If \begin{document}$ d=n-k $\end{document}, the code is Almost MDS (AMDS); when both a code and its dual are AMDS, the code is Near MDS (NMDS). NMDS codes have geometric properties that are useful in cryptography and combinatorics. Extensive research has focused on constructing structurally simple, high-performance MDS and NMDS codes. This paper constructs several families of MDS and NMDS codes of length \begin{document}$ q+3 $\end{document} over the finite field \begin{document}$ {\mathbb{F}}_{{{q}^{2}}} $\end{document} of even characteristic using the cyclic subgroup \begin{document}$ {U}_{q+1} $\end{document}. Several families of optimal Locally Repairable Codes (LRCs) are also obtained. LRCs support efficient failure recovery by accessing a small set of local nodes, which reduces repair overhead and improves system availability in distributed and cloud-storage settings.  Methods  In 2021, Wang et al. constructed NMDS codes of dimension 3 using elliptic curves over \begin{document}$ {\mathbb{F}}_{q} $\end{document}. In 2023, Heng et al. obtained several classes of dimension-4 NMDS codes by appending appropriate column vectors to a base generator matrix. In 2024, Ding et al. presented four classes of dimension-4 NMDS codes, determined the locality of their dual codes, and constructed four classes of distance-optimal and dimension-optimal LRCs. Building on these works, this paper uses the unit circle \begin{document}$ {U}_{q+1} $\end{document} in \begin{document}$ {\mathbb{F}}_{{{q}^{2}}} $\end{document} and elliptic curves to construct generator matrices. By augmenting these matrices with two additional column vectors, several classes of MDS and NMDS codes of length \begin{document}$ q+3 $\end{document} are obtained. The locality of the constructed NMDS codes is also determined, yielding several classes of optimal LRCs.  Results and Discussions  In 2023, Heng et al. constructed generator matrices with second-row entries in \begin{document}$ \mathbb{F}_{q}^{*} $\end{document} and with the remaining entries given by nonconsecutive powers of the second-row elements. In 2025, Yin et al. extended this approach by constructing generator matrices using elements of \begin{document}$ {U}_{q+1} $\end{document} and obtained infinite families of MDS and NMDS codes. Following this direction, the present study expands these matrices by appending two column vectors whose elements lie in \begin{document}$ {\mathbb{F}}_{{{q}^{2}}} $\end{document}. The resulting matrices generate several classes of MDS and NMDS codes of length \begin{document}$ q+3 $\end{document}. Several classes of NMDS codes with identical parameters but different weight distributions are also obtained. Computing the minimum locality of the constructed NMDS codes shows that some are optimal LRCs satisfying the Singleton-like, Cadambe–Mazumdar, Plotkin-like, and Griesmer-like bounds. All constructed MDS codes are Griesmer codes, and the NMDS codes are near Griesmer. These results show that the proposed constructions are more general and unified than earlier approaches.  Conclusions  This paper constructs several families of MDS and NMDS codes of length \begin{document}$ q+3 $\end{document} over \begin{document}$ {\mathbb{F}}_{{{q}^{2}}} $\end{document} using elements of the unit circle \begin{document}$ {U}_{q+1} $\end{document} and oval polynomials, and by appending two additional column vectors with entries in \begin{document}$ {\mathbb{F}}_{q} $\end{document}. The minimum locality of the constructed NMDS codes is analyzed, and some of these codes are shown to be optimal LRCs. The framework generalizes earlier constructions, and the resulting codes are optimal or near-optimal with respect to the Griesmer bound.
FPGA Hybrid PLB Architecture for Highly Efficient Resource Utilization
WANG Yanlin, GAO Lijiang, YANG Haigang
Available online  , doi: 10.11999/JEIT260108
Abstract:
6-input look-up tables (LUTs) are frequently used in commercial Field-Programmable Gate Arrays (FPGAs) to build programmable logic blocks, while related experiments reveal that their average application in circuits is less than 30%, resulting in a significant waste of programmable resources. In this paper, the 6-input LUTs are fractured based on fracturable factors and recombined with different granularities to construct several new Hybrid Basic Logic Elements (HBLE). Based on HBLE, several novel Hybrid Programmable Logic Block (HPLB) architectures are proposed. Then the Programmable Logic Blocks (PLB) of Xilinx is replaced by several innovative HPLB architectures. Concurrently, a statistical evaluation algorithm for the mapped netlist is proposed. Finally, several HPLB architectures are experimentally verified and evaluated as appropriate. Experimental evaluations of the three enhanced architectures show that the HPLBs achieve an average area reduction of more than 30% when compared to Xilinx’s PLBs without adding more input ports. The hybrid HPLB architectures constructed with a fracturable factor N=3 produces the best optimization results when taking into account both HPLB utilization and area optimization. Based on the MCNC and VTR benchmarks, resource consumption increased by an average of 8.27% and 27.64%, respectively, thereby improving FPGA logic efficiency.  Objective  Currently, modern commercial FPGA architectures employ 6-LUTs as the fundamental building blocks for Basic Logic Elements (BLEs). Only about 30% of the Logic Elements (LEs) in the circuit are ultimately translated to 6-LUTs when mapping 6-LUT BLEs, according to experimental results. Nevertheless, more than half of the logic resources are wasted when 6-LUTs implement functions with inputs smaller than 6. Programmable resources will unavoidably be significantly wasted as a result. A circuit design mapped to 100 4-LUTs can be mapped to 78 6-LUTs during 6-LUT mapping studies, according to experimental data, with the {6,5,4,3,2}-LUT function distribution being {23,32,17,9,13}. The findings indicate that only around 25% of the 6-LUTs are ultimately mapped to 6-input functions, with the remaining 6-LUTs being underutilized. This illustrates even more how inefficient technical mapping is for LUTs with large input K.Methods The fracturable factor N, which is the number of sub-LUTs that may be obtained from a single LUT, characterizes the fracturable and reconfigurable nature of LUT architectures in FPGAs. Motivated by this, we decompose a 6-LUT into several granularities according to the fracturable factor in order to address the previously described problem of low resource utilization. Three novel hybrid-granularity divisible logic (HBLE) structures are created by connecting and reconfiguring the resultant sub-LUTs with additional input ports and multiplexer modules. We shall now investigate how FPGA performance is optimized by these three HBLE topologies. We shall now investigate how FPGA performance is optimized by these three HBLE topologies. One undivided 6-LUT and one divisible 6-LUT, divided into two 5-LUTs with a divisibility factor N=2, make up the HBLE2 structure. One undivided 6-LUT and one divisible 6-LUT, divided into one 5-LUT and two 4-LUTs, with a divisibility factor N=3, are included in the HBLE3 structure. One undivided 6-LUT and one divisible 6-LUT, which divides into four 4-LUTs with a divisibility factor N=4, make up the HBLE4 structure. Adder units are supported by all three HBLE structures, allowing for both latched and direct combinational logic output. Additionally, they allow direct latched output by avoiding combinational logic. A Hybrid Programmable Logic Block (HPLB) is a novel structure created by merging several HBLEs. The MCNC circuit set and the VTR circuit set, the two most well-known academic circuit benchmarks (BMs), are chosen for experimental assessment. A Xilinx Virtex-7 FPGA is used to map each circuit set. The mapped netlist is then used to tally the kinds and numbers of LUTs that were utilized. The minimum number of CLBs needed is found once the data has been arranged using the corresponding greedy algorithms. Since each Xilinx CLB has eight 6-LUTs, the greedy approach uses # Total LUT Number / 8 to determine the smallest number of CLBs needed following BM mapping. In order to guarantee similar conditions, each structure also needs to be sorted using the greedy algorithm after Xilinx’s CLB structure is replaced with the HPLB structure suggested in this research. This results in the bare minimum of HPLBs needed. It is not possible to use every LUT in the mapped CLBs during actual packing owing to routing constraints. As a result, the smallest value that may be achieved in a theoretical optimization scenario is represented by the optimized result that is acquired following greedy algorithm restructuring.  Results and Discussions  The average number of HPLBs needed for both HPLB2 and HPLB3 structures drops by about 8% when CLB structures are swapped out for HPLBs in order to map the MCNC circuit set. However, the number of HPLBs needed increases by more than 30% on average as a result of the HPLB4 structure. The needed count is smaller when HPLBs are used in place of CLBs for mapping the VTR circuit set. On average, the HPLB2 and HPLB4 counts drop by less than 10%, whereas the HPLB3 count drops by around 30%. This enables SRAM scheduling and complete input pin use. On the other hand, because of resource waste, the uniform CLB structure results in higher CLB requirements when implementing functions with a tiny LUT input K. The HPLB4 structure performs worse than the HPLB3 structure, according to post-mapping HPLB counts. Both the MCNC and VTR circuit sets achieve average area reduction ratios over 30%, according to analysis of post-mapping area optimization. All three HPLB structures attained area optimization ratios of about 31% on the MCNC test set. Different optimization effects were seen in the VTR test circuit set: HPLB2 produced an average area reduction of 30.63%, whereas HPLB4 produced an average decrease of 51.21%. The HPLB2 structure produced a 45.22% area reduction, even though its optimization effect was marginally less than that of HPLB4. A thorough examination of the area optimization results showed that a higher divisibility factor N produces more noticeable benefits for integrating small-scale LUTs in circuits, resulting in higher area reduction ratios from the enhanced architectures.  Conclusions  In order to solve the issue of low resource utilization in 6-LUTs, this research proposes three split granularity-based HPLB enhancement architectures. In addition to establishing an assessment procedure and matching algorithms for the enhanced structures, these HPLBs take the place of Xilinx’s CLB structure in order to examine the new structure’s benefits in resource utilization. Based on the proportion differences of different LUTs in the post-mapping netlist, evaluation experiments using the MCNC and VTR circuit test suites show that, although HPLB4 achieves significant area optimization, it requires additional HPLBs, resulting in increased interconnect area. While both HPLB2 and HPLB3 structures obtain average area optimizations over 30%, HPLB3 produces a significantly greater HPLB count and area optimization than HPLB2 as the test circuit scale grows. Thus, after replacing the CLB structure, the HPLB3 structure provides a more balanced optimization impact, greatly improving the utilization of programmable resources when taking into account the combined aspects of HPLB usage count and area optimization.
Efficient and Verifiable Ciphertext Retrieval Scheme Based on Trusted Execution Environment
WU Axin, FENG Dengguo, ZHANG Min, CHI Jialin, YI Yuling
Available online  , doi: 10.11999/JEIT251358
Abstract:
The ciphertext retrieval mechanism enables retrieval functionality over encrypted data. Symmetric Searchable Encryption (SSE) is a critical branch of ciphertext retrieval. However, due to considerations such as saving computing power, cloud servers may return incorrect or incomplete results. Moreover, attackers can also exploit these leaked information from search and access patterns to reconstruct the keyword details. Therefore, it is necessary and meaningful to protect the privacy of search and access patterns while achieving result verifiability. Nevertheless, existing verifiable SSE schemes that support search and access pattern privacy typically rely on keyword traversal mechanisms and their verification mechanisms are inefficient, which impose high computational and communication overheads on users. To address the above performance bottlenecks, this paper introduces an efficient and verifiable ciphertext retrieval scheme based on Trusted Execution Environment (TEE). To improve the efficiency of ciphertext retrieval, this scheme employs the collaborative implementation of hardware-level security isolation and oblivious data rearrangement to achieve keyword trapdoor size independent of the size of the keyword dictionary. Meanwhile, the correctness of the returned results is verified by embedding random numbers and blinding polynomial constant terms. Thanks to these designs, the scheme achieves significant efficiency improvements. Specifically, firstly, this scheme ensures that the size of keyword trapdoors depends solely on the number of query keywords, not the global dictionary size, effectively minimizing communication and computational costs. Secondly, this scheme requires storing only two random numbers to enable verifiability, substantially minimizing local storage overhead for users. Thirdly, the adoption of techniques, such as enabling data users to retrieve results via single-server and single-round interaction and leveraging symmetric homomorphic encryption, further enhances operational efficiency. Additionally, confidential computing within TEE weakens the security assumptions and trust level towards TEE. After formally proving the security of the proposed scheme using simulation-based methods, this paper has conducted a comprehensive performance evaluation. The evaluation results confirm that this scheme is significantly more efficient than other schemes with the same functionalities.
Physical Layer Security Game for Large Language Model-Based Inference in the Maritime Network
CHEN Haoyu, XIAO Liang, XU Xiaoyu, LI Jieling, WANG Zicheng, LIU Huanhuan, CHEN Hongyi
Available online  , doi: 10.11999/JEIT251269
Abstract:
  Objective  The physical-layer security game reveals the interaction between user equipment (UE) and attackers, and provides performance bounds of anti-jamming transmission and physical-layer authentication schemes based on the equilibriums. However, existing game models overlook smart attackers that send jamming or spoofing signals, fail to account for the maritime wireless channels affected by evaporation ducts and sea wave fluctuations, and are difficult to evaluate the performance of large language models (LLMs)-based inference, such as the vessel traffic monitoring.  Methods  The anti-jamming maritime communication game for LLM inference is formulated, where the jammer first selects the jamming power and channel to reduce the signal-to-interference-plus-noise ratio at the server with less jamming cost, and the UEs then choose transmit power, channel, LLM sparsity ratio and control center to send sensing data (e.g., images, temperature, and humidity) to enhance the inference accuracy with less latency. The physical-layer authentication game for maritime wireless networks with LLM inference is further formulated. The spoofing attacker first selects the number of spoofing packets to degrade authentication accuracy with less cost. The control center then selects the fast authentication mode based on channel state or the safe authentication mode based on the received signal strength and the arrival interval of the packet from multiple ambient transmitters, and the test threshold to increase accuracy with less cost.  Results and Discussions  Based on the Stackelberg equilibrium (SE) under the LLM with 7 billion parameters, the performance bounds of the reinforcement learning (RL)-based anti-jamming inference scheme are provided to reveal the impact of evaporation duct height, wave height, maximum sparsity ratio of LLM and the quantization level on inference accuracy and latency. In addition, the performance bounds of the RL-based maritime spoofing detection scheme are provided based on the SE of the physical-layer authentication game to show the impact of the maximum number of spoofing packets on the authentication accuracy. Simulations are carried out based on the five UEs with the antenna height of 3 meters offloading the image, temperature and humidity using the transmit power up to 200 mW at 5.8 GHz with a bandwidth of 20 MHz to five control centers with antenna heights of 6 m. The jammer applies Deep Q-Network to choose the jamming power with a maximum transmit power of 200 mW for each 5.8 GHz channel, and the spoofing attacker applies the Deep Q-Network to select the number of spoofing packets up to 100. The results show that the inference accuracy and latency of the RL-based anti-jamming maritime communication scheme for LLM inference converge to the performance bounds with gaps of less than 0.6% after 2500 time slots. In addition, the RL-based authentication scheme converges after 1000 time slots with the gap of less than 1.6%.  Conclusions  In this paper, we have formulated the maritime physical-layer security game for LLM inference, addressing scenarios such as anti-jamming sensing data transmission and spoofing detection, aiming at investigating how UEs determine transmit power and channel, and how the control center selects authentication modes and test thresholds to enhance the physical-layer security mechanisms. The attacker chooses attack modes and parameters to degrade the inference accuracy, increase latency, and even cause denial-of-service. Based on the SE and the conditions, the performance bounds of the inference accuracy increase with the maximum transmit power and linearly decrease with the sparsity ratio. Furthermore, the impact of the maximum number of spoofing packets on the inference accuracy is provided. Simulation results show that the RL-based maritime physical-layer security schemes converge to the performance bounds, thereby validating the accuracy and effectiveness of the game model.
A Method for Parallel Testing of Interlayer Vias in Monolithic 3D Integrated Circuits
CHEN Tian, CHEN Weikun, LIU Jun, LIANG Huaguo, LU Yingchun
Available online  , doi: 10.11999/JEIT251375
Abstract:
  Objective  As device dimensions in conventional two-dimensional integrated circuits approach fundamental physical limits, further improvements in performance and integration density face significant challenges. Monolithic three-dimensional integrated circuits (M3D ICs), which sequentially stack multiple active device layers on a single wafer, provide an effective solution to overcome these limitations. In M3D ICs, monolithic inter-tier vias (MIVs) are employed to realize vertical interconnections between device tiers. Compared with through-silicon vias (TSVs), MIVs feature much smaller dimensions, lower parasitic capacitance, and shorter interconnect delay. However, their small electrical variations and massive quantity cause defects to manifest mainly as subtle delay shifts, posing stringent requirements on test accuracy, efficiency, and robustness against Process, Voltage, and Temperature (PVT) variations. Existing MIV testing approaches suffer from limited scalability, strong PVT sensitivity, and difficulty in simultaneously achieving small-delay defect detection and fault localization in large-scale arrays. To address these challenges, a parallel MIV testing method based on a time-to-digital converter (TDC) is presented to enable efficient and reliable testing of large MIV arrays with low area and time overhead.  Methods  Large-scale MIVs are logically organized into a two-dimensional array structure. Each basic test cell consists of a device-under-test MIV, a tri-state buffer, and a D flip-flop, and multiple cells are cascaded to form row test chains and column test chains. By systematically exploiting the inherent input capacitance mismatch between the data and clock terminals of the D flip-flop, an embedded TDC structure incorporating the MIV under test is constructed. Test stimuli are generated by a digitally controlled delay line (DCDL), which produces START and STOP pulse signals with multiplicatively adjustable phase differences and injects them into different propagation paths of the test chains, enabling time quantization through a signal chasing mechanism. Structural symmetry between the test chains is employed to mitigate the influence of PVT variations. As the START and STOP phase difference is progressively amplified, multiple TDC readings are collected to characterize defect-induced small delay variations and to distinguish them from measurement noise and PVT-induced fluctuations. After fault information is obtained for individual test chains, cross-analysis of row and column test results enables fault localization within the two-dimensional MIV array.  Results and Discussions  Simulation results based on the Nangate 45 nm standard cell library demonstrate that, under fault-free conditions, TDC readings obtained at different phase difference settings exhibit a stable linear proportional relationship (Fig. 7). Extensive Monte Carlo simulations are performed to determine a robust deviation tolerance threshold of 2, which effectively separates normal variations caused by PVT fluctuations from abnormal shifts induced by defects. Fault injection experiments verify that small delay defects occurring on both the START chain and the STOP chain can be effectively detected and distinguished (Fig. 8). In terms of quantitative detection capability, the minimum detectable resistive open defect is approximately 8.4 kΩ, while the maximum detectable leakage defect and resistive short defect are about 67 kΩ and 32 kΩ, respectively, outperforming existing methods (Fig. 9). Moreover, the row–column decomposition architecture effectively alleviates the growth of test time as the MIV array size increases, resulting in a substantial reduction in overall test overhead. Area evaluation indicates that the average area overhead of the embedded built-in self-test structure is only 5.594 µm2 per MIV, making it suitable for high-density M3D integration.  Conclusions  A parallel TDC-based testing approach for large-scale MIV arrays is presented, which combines row–column decomposition, phase-difference multiplication, and proportional deviation-based decision mechanisms to achieve efficient detection and accurate localization of both hard faults and small delay defects. Structural symmetry within the test chains effectively enhances robustness against PVT variations. Simulation results confirm that the proposed method can reliably detect resistive open, leakage, and short defects while maintaining low area and time overhead. Compared with existing techniques, a favorable balance among test accuracy, PVT robustness, test efficiency, and hardware cost is achieved. Owing to its scalability and practical feasibility, the proposed approach provides an effective and reliable solution for MIV testing in advanced monolithic three-dimensional integrated circuits.
Physical Layer Key Generation Method for Integrated Sensing and Communication Systems
LIU Kexin, HUANG Kaizhi, PEI Xinglong, JIN Liang, CHEN Yajun
Available online  , doi: 10.11999/JEIT251034
Abstract:
  Objective  Integrated Sensing And Communication (ISAC) has become a central technology in Sixth-Generation (6G) wireless networks, enabling simultaneous data transmission and environmental sensing. However, the characteristics of ISAC systems, including highly directional sensing signals and the risk of sensitive information leakage to malicious sensing targets, create specific security challenges. Physical layer security provides lightweight methods to enhance confidentiality. In secure transmission, approaches such as artificial noise injection and beamforming can partially improve secrecy, although they may reduce sensing accuracy or communication efficiency. Their effect also depends on the quality advantage of legitimate channels over eavesdropping channels. For Physical Layer Key Generation (PLKG), existing work has only demonstrated basic feasibility. Most current schemes adopt a radar-centric design, which limits compatibility with communication protocols and restricts key generation rates. This paper proposes a PLKG method tailored for ISAC systems. It aims to maximize the Sum Key Generation Rate (SKGR) under sensing accuracy constraints through a Twin Delayed Deep Deterministic policy gradient (TD3)-based joint communication and sensing beamforming algorithm, thereby improving the security performance of ISAC systems.  Methods  A MIMO ISAC system is considered, where a base station (Alice) equipped with multiple antennas communicates with single-antenna users (Bobs) and senses a malicious target (Eve). The system operates under a TDD protocol to leverage channel reciprocity. A PLKG protocol designed for ISAC systems is developed, including channel estimation, joint communication and sensing beamforming, and key generation. The SKGR is derived in closed form, and sensing accuracy is evaluated using the Cramér-Rao Bound (CRB). To maximize the SKGR under CRB constraints, a non-convex optimization problem for the joint design of communication and sensing beamforming matrices is formulated. Given its NP-hardness, an algorithm based on TD3 is proposed. TD3 employs dual critic networks to reduce overestimation, delayed policy updates to enhance stability, and target policy smoothing to improve robustness. The state includes channel state information, the actions correspond to beamforming matrices, and the reward function combines SKGR, CRB, and power constraints.  Results and Discussions  Simulation results confirm the effectiveness of the proposed design. The TD3-based algorithm achieves a stable SKGR of 18.5 bits/channel use after training (Fig. 4), outperforming benchmark schemes such as Deep Deterministic Policy Gradient (DDPG), greedy search, and random algorithms. The SKGR increases monotonically with transmit power because of reduced noise interference (Fig. 5). Increasing the number of antennas also improves SKGR, although the gain diminishes as power per antenna decreases. The scheme maintains stable SKGR across different distances to the eavesdropper (Fig. 6), demonstrating the robustness of PLKG against eavesdropping attacks. The proposed algorithm manages the complex optimization problem effectively and adapts to dynamic system conditions, offering a practical approach for secure ISAC systems.  Conclusions  This paper presents a PLKG method for ISAC systems. The proposed protocol generates consistent keys between the base station and communication users. The SKGR maximization problem with sensing constraints is solved using a TD3-based algorithm that jointly optimizes communication and sensing beamforming matrices. Simulation results show that the method outperforms benchmark schemes, with significant gains in SKGR and adaptability to system conditions. The study establishes a basis for integrating PLKG into ISAC to strengthen security without reducing sensing performance. Future work will examine real-time implementation and scalability in large networks.
Resilient Average Consensus for Second-Order Multi-Agent Systems: Algorithms and Application
FANG Chongrong, HUAN Yuehui, ZHENG Wenzhe, BAO Xianchen, LI Zheng
Available online  , doi: 10.11999/JEIT251155
Abstract:
  Objective  Multi-Agent Systems (MASs) are central to collaborative tasks in dynamic environments, and consensus algorithms are essential for applications such as formation control. However, MASs are vulnerable to misbehaviors (e.g., malicious attacks or accidental faults) that disrupt consensus and degrade system performance. Existing resilient consensus methods for first-order systems are insufficient for second-order MASs, where both position and velocity states must be considered. This study develops a resilient average consensus framework for second-order MASs that maintains accurate collaboration under misbehaviors. The main challenges are distributed error detection and compensation for two-dimensional state errors (position and velocity) using one-dimensional acceleration inputs.  Methods  The study derives sufficient conditions for second-order average consensus under misbehaviors using graph theory and Lyapunov stability analysis. The system is modeled as an undirected graph \begin{document}$ \mathcal{G}=(\mathcal{V},\mathcal{E}) $\end{document}, and agents follow double-integrator dynamics. Two algorithms are proposed. Finite Input-Errors Detection–Compensation (FIDC): For finite control input errors, Detection Strategies 1 and 2 use two-hop communication to detect discrepancies in neighbors’ states or control inputs. Compensation Scheme 1 generates input sequences that satisfy the consensus conditions in Corollary 1. Infinite Attack Detection–Compensation (IADC): For infinite errors in control inputs, velocities, and positions, the detection strategies are extended to identify falsified data. Compensation Schemes 2 and 3 reduce the effect of these errors, and an exponentially decaying error bound isolates persistent attackers. The algorithms are fully distributed and require no global information.  Results and Discussions  Simulations on a 10-agent network demonstrate the effectiveness of the algorithms. Under FIDC, agents reach exact average consensus despite finite input errors caused by malicious or faulty agents (Fig. 3). IADC ensures consensus among normal agents after isolating malicious agents that exceed the error bound (Fig. 4). Experiments on a multi-robot platform confirm resilience to real-world faults (e.g., actuator failures) and attacks (e.g., false data injection). In fault scenarios, FIDC reduces the deviation of the formation center from 180 mm to 34 mm (Fig. 6). Under attacks, IADC isolates malicious robots, allowing normal agents to converge correctly (Fig. 7). Analyses of relaxed Assumption 1 (non-adjacent misbehaving agents) show that Detection Strategy 3 and majority voting address certain connected malicious topologies (Fig. 2), although complex cases need further study.   Conclusions  This work presents a resilient average consensus framework for second-order MASs. Theoretically, the study provides sufficient conditions for consensus under misbehaviors. The FIDC and IADC algorithms enable distributed detection, compensation, and isolation of errors. Simulations and physical experiments verify that the methods achieve accurate average consensus under both finite and infinite errors. Future research will explore extensions to directed networks, time-varying topologies, and higher-dimensional systems.
Aperiodic Total Squared Ambiguity Function: Theoretical Bounds for Binary Sequence Sets and Optimal Constructions
WEI Wenbo, SHEN Bingsheng, YANG Yang, ZHOU Zhengchun
Available online  , doi: 10.11999/JEIT251327
Abstract:
  Objective  In direct-sequence code division multiple access systems, the performance of spreading sequence sets is typically evaluated using the total squared correlation metric. Traditional metrics such as total squared correlation and aperiodic total squared correlation are only applicable to synchronous communication systems and asynchronous systems with time shifts only, respectively. However, in modern high-speed mobile and satellite communications, the Doppler effect becomes significant, causing both time and Doppler shifts in the received signal and consequently leading to severe signal distortion. In communication scenarios considering only time shift, the one-dimensional correlation function is typically employed to measure interference within the system. However, in high-speed mobile environments, the Doppler effect is introduced during signal transmission, necessitating the simultaneous consideration of both time shift and Doppler shift of the sequence. In such cases, the two-dimensional ambiguity function should be used in place of the one-dimensional correlation function. To mitigate Doppler effects, the research community has increasingly focused on designing Doppler-resilient sequences to address the Doppler effects present in various mobile channels. Existing studies are primarily concentrated on the theoretical bounds of the ambiguity function, namely the maximum ambiguity magnitude, with sequence sets subsequently constructed that achieve or asymptotically achieve these bounds. This research, however, focuses on the overall ambiguity function performance of binary sequence sets in asynchronous communication, namely the ATSAF. The specific objectives are as follows:1. The theoretical lower bound for the ATSAF of binary sequence sets is derived.2. Based on the derived ATSAF lower bound, several classes of optimal binary sequence sets that achieve this theoretical bound are designed.  Methods  The aperiodic time-phase cycling extension matrix \begin{document}$ {\boldsymbol{S}}_{a} $\end{document} is defined for a binary sequence set \begin{document}$ \boldsymbol{S} $\end{document} consisting of \begin{document}$ K $\end{document} sequences of length \begin{document}$ L $\end{document}, in order to account for both time shifts and Doppler shifts. This definition transforms the problem of computing the ATSAF for the set \begin{document}$ \boldsymbol{S} $\end{document} into that of calculating the total squared correlation of the matrix \begin{document}$ {\boldsymbol{S}}_{a} $\end{document}. Subsequently, the theoretical lower bounds for the ATSAF of the binary sequence set \begin{document}$ \boldsymbol{S} $\end{document} are derived for different combinations of the set size \begin{document}$ K $\end{document}, sequence length \begin{document}$ L $\end{document}, and Doppler shift \begin{document}$ V $\end{document}. To design binary sequence sets that achieve these derived ATSAF lower bounds, it is first proven that binary aperiodic complementary sets constitute optimal binary sequence sets with respect to the ATSAF. Furthermore, based on Hadamard matrices and specific sequences, two additional classes of optimal binary sequence sets are designed, which are shown to achieve the theoretical ATSAF lower bound.  Results and Discussions  Existing research primarily focuses on the maximum ambiguity magnitude of sequence sets, while this study emphasizes the overall ambiguity function performance. The one-dimensional aperiodic total squared correlation analysis for asynchronous communication with delay only, as investigated by Ganapathy et al., is extended in this work to the two-dimensional aperiodic total squared ambiguity function, which incorporates both time delay and Doppler shift. This paper first defines the aperiodic time-phase cycling extension matrix \begin{document}$ {\boldsymbol{S}}_{a} $\end{document} for a binary sequence set \begin{document}$ \boldsymbol{S} $\end{document} (Definition 3). Subsequently, the theoretical lower bounds for the ATSAF of the binary sequence set \begin{document}$ \boldsymbol{S} $\end{document} are derived for various parameters, including the set size\begin{document}$ K $\end{document}, sequence length \begin{document}$ L $\end{document}, and Doppler shift \begin{document}$ V $\end{document} (Theorem 1). When the Doppler shift \begin{document}$ V=1 $\end{document}, the ATSAF theoretical bound derived in this paper reduces to the aperiodic total squared correlation theoretical bound. Binary sequence sets that achieve these ATSAF lower bounds maintain the overall cross interference energy in the two-dimensional delay-Doppler domain at its theoretical minimum. To design binary sequence sets that achieve these derived ATSAF bounds, it is first proven that binary aperiodic complementary sets are ATSAF-optimal binary sequence sets (Theorem 2). Furthermore, based on Hadamard matrices and specific sequences, two additional classes of ATSAF-optimal binary sequence sets are designed (Theorems 3 and 4). Finally, an example is provided in this paper to demonstrate that the sequence set constructed in Theorem 4 is an ATSAF-optimal binary sequence set (Example 1).  Conclusions  In high-speed mobile communication scenarios, Doppler effects lead to distortion in the received signal. Therefore, by defining the aperiodic time-phase cycling extension matrix \begin{document}$ {\boldsymbol{S}}_{a} $\end{document} for a binary sequence set \begin{document}$ \boldsymbol{S} $\end{document}, the theoretical lower bound for the ATSAF is derived, which specifies the minimum theoretical value for the total energy of the binary sequence set S in the two-dimensional delay-Doppler domain. When Doppler shifts are not considered, the derived ATSAF bound reduces to the aperiodic total squared correlation bound. Furthermore, three classes of ATSAF-optimal binary sequence sets that achieve this theoretical bound are constructed using binary aperiodic complementary sets, Hadamard matrices, and specific sequences. This study not only provides the theoretical ATSAF bound for binary sequence sets in the two-dimensional delay-Doppler domain but also designs several classes of optimal binary sequence sets that achieve this bound. These sets achieve the theoretical minimum for overall cross interference energy in the two-dimensional delay-Doppler domain.
Modulation Recognition Method for High-Speed Mobile Communication Based on Attention Dynamic Fusion and Hybrid Pruning Transformer
ZHENG Qinghe, CHEN Bin, YU Lisu, HUANG Chongwen, JIANG Weiwei, SHU Feng, ZHAO Yizhe
Available online  , doi: 10.11999/JEIT251211
Abstract:
  Objective  Automatic modulation recognition is a critical preprocessing step in dynamic spectrum access and anti-jamming communication systems, directly impacting the robustness and spectrum efficiency of non-cooperative communication. In high-speed mobile communication scenarios such as satellite, high-speed rail, and drone swarm communications, signal modulation features suffer severe distortion due to Doppler shifts, time-varying channels, and non-stationary interference. The above issues pose significant challenges to traditional modulation recognition methods based on static assumptions, leading to feature mismatch and increased misjudgment rates. To address the issues of insufficient robustness and real-time performance in existing deep learning-based modulation recognition models under high-speed mobile environments, this paper proposes a lightweight dynamic fusion Transformer-based approach.  Methods  The proposed method consists of three main components: signal representation fusion block, Transformer model design, and model pruning for lightweight inference. First, a RollingQ mechanism is introduced to dynamically adjust the direction of attention query matrix based on the quality of each signal representation, breaking the cycle of attention fixation and achieving the balanced utilization of all types of signal representations. Then, the multi-head attention frequency enhancement Transformer (MAFE-Transformer) is designed, which integrates local and global spatiotemporal features through modules including lightweight convolutional enhancement, multi-attention feature extraction, and frequency learning and selection. Finally, an attention-based dynamic hybrid pruning strategy is applied to reduce structural redundancy and accelerate inference, enabling real-time modulation recognition.  Results and Discussions  Extensive experiments are conducted on two public datasets, RadioML 2016.10a and RML22, to validate the effectiveness of the proposed method. The MAFE-Transformer achieves average classification accuracies of 65.14% and 78.40% on the two datasets, respectively. Under low SNR conditions of –20~0 dB, the model demonstrates strong robustness, particularly on the RML22 dataset with dynamic channel model ETU70 (Fig. 5). The confusion matrix shows that the error distribution of MAFE-Transformer is relatively uniform among different modulation schemes, reflecting its well-balanced classification performance (Fig. 6). Ablation studies confirm that the RollingQ-based dynamic fusion mechanism improves accuracy by 7.2% on RadioML 2016.10a and 9.5% on RML22 compared to single signal representation (Fig. 7). The hybrid pruning strategy reduces inference latency to 2.2 ms per signal while maintaining high accuracy (Fig. 8). Comparative experiments show that the proposed model outperforms several state-of-the-art deep learning models (e.g., Ms-RaT, MobileViT, MobileRaT, and KA-CNN) by 4%–10% in recognition accuracy, demonstrating superior performance in high-speed mobile communication scenarios (Fig. 9).  Conclusions  This paper proposes a lightweight dynamic fusion Transformer-based automatic modulation recognition method to address the challenges of robustness and real-time performance in high-speed mobile communication environments. By introducing RollingQ mechanism and the MAFE-Transformer structure combined with dynamic hybrid pruning, the proposed method achieves a better trade-off between recognition accuracy and inference efficiency. Experimental results on public datasets confirm its effectiveness and robustness under complex channel conditions with Doppler shifts and time-varying interference. However, the proposed method has not been systematically evaluated under more complex interference such as impulsive noise or frequency-selective fading. Future work will focus on improving adaptability to non-stationary noise, cross-device generalization, and optimization for edge deployment.
Design and Verification of Robust Modulation Recognition Framework Under Blind Adversarial Attacks
ZHENG Qinghe, ZHOU Fuhui, YU Lisu, HUANG Chongwen, JIANG Weiwei, SHU Feng, ZHAO Yizhe
Available online  , doi: 10.11999/JEIT260019
Abstract:
  Objective  Deep learning-based automatic modulation recognition (AMR) models have demonstrated superior performance in non-cooperative communication systems such as cognitive radio and spectrum monitoring. However, the inherent vulnerability of deep learning models to adversarial attacks, where imperceptible perturbations can cause catastrophic misclassification, poses the severe security threat. Existing defense methods, including adversarial training, often rely on prior knowledge of specific attacks, incur significant computational overhead, and face the trade-off between robustness and accuracy on clean samples. To address these limitations, this paper aims to design and validate a robust modulation recognition framework that can operate effectively under blind adversarial attack scenarios without prior knowledge of the attack type and strategy, thereby ensuring the reliable deployment of intelligent communication systems in adversarial environments.  Methods  The proposed framework integrates a novel feature-purifying autoencoder module with standard modulation classifiers (CNN and Transformer). The core innovation lies in the autoencoder’s bottleneck layer, which incorporates a dynamic purification mechanism. This mechanism first calculates an adaptive threshold based on the statistical properties of the encoded latent features to identify anomalies. Subsequently, the Top-K sparsification operation selectively preserves only the most significant feature activations, effectively suppressing noise and adversarial perturbations while retaining essential signal characteristics. Then the autoencoder is trained via a three-stage curriculum learning strategy that sequentially optimizes reconstruction fidelity, feature sparsity, and semantic consistency between the purified and original clean signals, ensuring the output aligns with the true modulation manifold. This model-agnostic module can be seamlessly prepended to any trained classifier without retraining.  Results and Discussions  Comprehensive experiments are conducted on a simulated dataset encompassing 12 digital modulation types under multipath fading channels. The framework demonstrated substantial performance improvements. For the CNN and Transformer, the recognition accuracies under challenging targeted white-box attacks increased to 82.1% and 83.2%, and under non-targeted black-box attacks reached 87.7% and 89.4%, respectively (Table 1). The attack success rate (ASR) and attack effectiveness index (AEI) remained at low levels, confirming strong defensive capability. Figure 4 shows that defense efficacy improves with higher SNR. Crucially, the ablation study in Figure 5 highlights the indispensable role of the autoencoder, whose removal caused accuracy to plummet by 4.02% and 2.36% on CNN and Transformer under strong attacks. Further analysis (Figure 6) indicates that the framework maintains robustness across a wide range of perturbation bounds (\begin{document}$ \epsilon \leq 0.1 $\end{document}). Moreover, parameter sensitivity studies (Figures 7 and 8) show stable performance for threshold coefficient \begin{document}$ \xi $\end{document} in [1.5, 1.9] and sparsity rate k around 0.7, confirming its practical deployability.  Conclusions  This paper presents a robust, blind defense framework for robust AMR based on the feature-purifying autoencoder. The key advantages are threefold: 1) It provides effective defense against diverse white-box and black-box attacks without requiring any prior knowledge of various attack methods, achieving true blind defense; 2) As a preprocessing module, it eliminates the need for computationally expensive retraining of the primary classifier and is compatible with various backbone networks; 3) The multi-stage training strategy successfully balances robustness against attacks with the preservation of high accuracy on clean samples. Finally, experimental results on the comprehensive dataset validate the framework’s superiority. Future work will focus on lightweight architectural designs to reduce inference latency and further investigate performance boundaries under extreme low-SNR conditions combined with complex nonlinear channel impairments.
A Long-Short Term Fusion Spiking Neural Network for Detecting Tiny Moving Targets in Dynamic Vision
LI Miao, ZHANG Heng, CHEN Nuo, SHI Yangsi, HE Shiman, AN Wei
Available online  , doi: 10.11999/JEIT250785
Abstract:
  Objective  The long-distance electro-optical surveillance system is widely used in fields such as space debris monitoring and unauthorized drone flight warning. The targets in this system randomly appear, move rapidly, and due to the long detection distance, the form of the targets in the optical detector is very small, without obvious morphological texture features, belonging to tiny-motion targets. The traditional mechanism for sensing tiny-motion targets adopts the "image frame imaging + artificial neural network processing" approach, which is always accompanied by large amounts of data, high computing power, and high energy consumption, becoming a bottleneck restricting the lightweight of the system. In recent years, inspired by bionic perception and brain-like processing, "dynamic visual detection + brain-like processing" has become the frontier mechanism. The dynamic vision has the advantages of low redundancy and high temporal resolution, but the output data is no longer regular image frames, but sparse event streams. Therefore, new processing methods need to be studied. The spiking neural network is called the third-generation neural network, which has the characteristics of sparse connections and spiking representation, and has a natural compatibility with the asynchronous event triggering and bright-dark pulse output of the dynamic vision. However, the existing spiking neural network methods are mainly oriented towards targets with special shapes in fields such as autonomous driving, are difficult adapt to the tiny-motion targets in long-distance electro-optical surveillance system. To address the above problems, this paper designs a long-short-term fusion pulse neural network, providing dedicated algorithm support for the application of the dynamic vision in the detection of tiny-motion targets.  Methods  The proposed network architecture consists of four key components. Firstly, a short-term feature extraction module (SST, Spiking Swin Transformer) is designed to capture morphological the morphological expansion characteristic of tiny targets, focusing on spatiotemporal correlations between adjacent time steps and spatial domains. It integrates a spiking self-attention mechanism to adaptively enhance learning of irregular pixel correlations and temporal dependencies. Second, a long-term feature extraction module (SCL, spiking ConvLSTM) is designed to learn motion continuity, which is embedded in long-term temporal sequences. The longer the temporal domain, the richer the learnable features. The spiking ConvLSTM network is designed by mimicking the ANN-style ConvLSTM, capitalizing on the inherent advantages of spiking recurrent neural networks for temporal signal processing to emphasize autonomous long-term temporal information memorization capabilities. Thirdly, dual-path features from SST and SCL are combined via tensor alignment and additive integration, called as SFPN(Spiking Feature Pyramid Network). Adopting spiking pyramid operations to fuse cross-scale spatiotemporal features across network depths. Finally, tiny targets are extracted by detection head.  Results and Discussions  The proposed algorithm was validated using real dynamic vision data for drone detection. Test results demonstrate significant performance improvements based different metrics. Compared to methods based on short-term temporal features, the proposed method achieves about 1.3% increase in recall and about 0.9% boost in accuracy, enabling more precise detection of tiny moving targets. The F1-score analysis further reveals that the proposed approach improves recall rates by 1.3%, and it simultaneously reduces false alarms. This confirms that the dual-path spiking memory network for long-term feature extraction enhances the model's capability to discern subtle target characteristics. Specifically, the incorporation of long-term temporal features contributes to overall performance gains, allowing better discrimination between noise events and genuine tiny targets.  Conclusions  This paper addresses the problem of detecting tiny moving targets under dynamic vision and proposes a method based on long-short term fusion of spiking neural networks. Considering the morphological expansion characteristics and motion continuity of tiny targets, the paper designs the spiking Swin Transformer module and the spiking ConvLSTM module respectively, and fuses multi-scale dual-path features through the spiking pyramid module. By learning high-dimensional features within different time windows, it achieves in-depth mining and automatic learning of limited surface features. The performance advantages of the proposed method are verified in real d datas, with a recall rate of over 95%, outperforming comparison algorithms. Ablation experiments demonstrate the importance of using long-term domain feature neural networks and more time-domain data to improve the performance of tiny target detection. This method realizes the natural combination of sparse event streams from dynamic vision and spiking neural mechanisms, providing algorithmic support for the application of the "bionic detection + brain-like processing" new perception mode in long-distance electro-optical surveillance systems.
UWF-YOLO: A Lightweight Framework for Underwater Object Detection via Redundant Information Optimization
HOU Guojia, MA Jiaqi, WANG Yuechuan, HUANG Baoxiang, LI Kunqian
Available online  , doi: 10.11999/JEIT251129
Abstract:
  Objective  The rapid development of underwater imaging technology has significantly elevated the importance of underwater object detection for resource exploration and environmental monitoring applications. Generally, complex underwater environments yield various degradations of image quality such as color casts, haze-like effects, and non-uniform illumination. Unfortunately, existing vision-based object detection algorithms always suffer from unpleasing performance and notable limitations especially for detecting small objects, resulting in missed detections and false positives. Moreover, existing deep learning based underwater detection models also face substantial challenges in striking an optimal balance between accuracy and lightweight design under the condition of limited equipment resources. To address these issues, it is of great importance to design efficient underwater object detection methods in view of water-related vision tasks, which play a crucial role in marine resource exploration, ecological monitoring, underwater robotics, and intelligent perception systems for autonomous underwater vehicles.  Methods  In this paper, we propose a novel lightweight framework based on redundant information optimization for underwater object detection. Technically, we propose a lightweight underwater object detection network called UWF-YOLO based on redundancy information optimization. First, the C2f module is reconstructed by FasterNet Block to optimize both the backbone and neck networks, and a feature channel selection mechanism is incorporated to reduce the redundant features. On other hand, due to the redundant traditional convolutional features in the YOLO neck, it is difficult to adapt to the underwater environment. Ghost Convolution is also introduced to generate the Ghost feature map for enhancing the multi-scale feature fusion capability of the neck network. Next, our proposed method achieves parameter sharing by replacing the original detection head with a redundant optimization group detection head (RRG-Head) based on group convolution, thereby reducing computational costs. Finally, the structured channel pruning technique is applied to identify the inter-layer dependencies of the graph and bind the pruning units. Combined with the LAMP weight magnitude score normalization for evaluating the importance of channels, the low-contributing groups are pruned and fine-tuned to achieve network size compression. In addition, since the scene of underwater detection datasets are typically monotonous and the underwater objects contained in the available datasets are usually small and clustered. We also construct an underwater object detection dataset with complex scene, namely CSUOD, by collecting real-world underwater images from different websites and platforms to ensure both its diversity and authenticity, followed by manual annotation and resolution normalization preprocessing. CSUOD is specifically designed for various challenging underwater environments characterized by color casts, haze-like effects, and non-uniform illumination. In our CSUOD, we manually select 1135 images containing 6 different types, and perform the manual annotation and resolution standardization operations.  Results and Discussions  Extensive experiments are conducted on three public underwater object detection datasets (i.e., DUO, RUOD, and TrashCan) by comparing several popular and widely used object detection methods. The proposed model is evaluated against mainstream detectors, including YOLOv5s, YOLOv7-tiny, YOLOv8s, YOLOv9-tiny, and Deformable DETR. In computational complexity assessment, experimental results show that the proposed method has reduced the FLOPs, model size, and parameters by 60.4%, 77.3%, and 78.4%, respectively, compared to the baseline. In addition, our method has outperformed YOLOv9-tiny with comparable parameters by 0.3%, 2.3%, and 3.4% in mAP across the three datasets. Also, some comparative results on our established CSUOD dataset also indicate that our proposed model has a good improvement and stability even in complex underwater environments. Qualitative visualization results further illustrate the model’s robustness and detection stability under various underwater degradations, such as haze-like effects and non-uniform illumination.  Conclusions  Quantitative and qualitative experiments on different datasets have validated the effectiveness and robustness of the proposed method. In addition, our method achieves superior detection performance in complex underwater environments, effectively solving missed detections and false positives caused by background interference. A large number of experimental results show that our designed UWF-YOLO can not only achieve significant light weighting, but also maintain the comparable detection accuracy comparing with the benchmark model. This balance between the detection accuracy and low computational cost makes it particularly suitable for underwater devices with limited resources. Besides, the proposed method has great potential in practical scenarios such as marine ecological monitoring, underwater resource exploration, and autonomous underwater vehicle perception systems. It also provides a reliable and efficient technical foundation for real-time applications, with strong adaptability to different underwater conditions, efficient integration into embedded platforms, and support for real-time perception and decision-making. Our constructed dataset CSUOD in this study will help address the limitations of existing underwater object detection datasets and promote the development of underwater object detection. In the future, this work can be further extended to multi-modal perception systems and larger-scale datasets. These efforts will enable adaptive models for more dynamic underwater scenarios and support broader applications in intelligent ocean observation and autonomous navigation.
Research on Recognition Method in Mixture Scenarios of Ships and Floating Targets
DING Hao, LI Ao, CAO Zheng, LIU Ningbo, WANG Guoqing, SUN Dianxing
Available online  , doi: 10.11999/JEIT251119
Abstract:
  Objective  In radar maritime target detection scenarios, when two or more targets are located within the same range cell, they form mixture echoes, such as echoes from both ship and floating targets. Existing target recognition methods exhibit notable limitations in such scenarios, mainly because they focus on Doppler channel with strongest energy in time–frequency domain. To address this issue, this paper proposes a target recognition method that jointly integrates mode reconstruction and time–frequency features. The objective is to distinguish individual target without prior knowledge of whether the received echoes contain mixture targets or not, avoiding reliance on high range resolution or multi-polarization information.  Methods  The core idea is to introduce Variational Mode Decomposition (VMD) to decompose radar echoes into multiple modal components, thereby achieving Doppler-channel separation. To address the spurious modes and the fragmented representation of a single target across multiple modes after decomposition, energy-constrained mode filtering method and spectral consistency based mode clustering method are proposed for effective mode selection and reconstruction. Based on the reconstructed signals, we then exploit the time–frequency differences between ships and floating targets in terms of micro-motion and complexity by extracting features from two perspectives, namely motion stability and disorder degree of energy distribution, which are short for VF and REDDC features, so as to enable accurate identification of individual target.  Results and Discussions  The experiments are conducted using X-band radar measured data under sea states 2~4 (Table 1 and Table 2). The results show that, the proposed method achieves an average recognition accuracy of 97.32% in mixture scenarios, significantly outperforming existing four-feature recognition method (Table 3) as well as state-of-the-art methods (Fig. 9). After investigating the impact of the frequency separation between different targets, it is found that when the time–frequency ridge space exceeds 70 Hz, the recognition accuracy reaches 97.93% (Fig. 11). This result also provides empirical support for selecting reasonable clustering threshold in mode reconstruction stage. When mixture scenarios turns to single target scenarios because of relative motion, the proposed method achieves an average recognition accuracy of 93.34%, which is 4.62% higher than that of existing four-feature method (88.72%) (Table 4). The analysis also indicates that, to ensure the expected recognition accuracy, the observation duration for feature extraction should be no less than 0.25 s (Fig. 12).  Conclusions  This paper investigates the recognition problems in maritime multi-target mixture scenarios. By introducing VMD, the constituent components of mixture echoes are separated. To address spurious modes and fragmented representation of target information across multiple modes, energy-constrained mode filtering method and spectrum consistency based mode clustering method are proposed. The VF and REDDC features are extracted from structure perspective and complexity perspective respectively. SVM classifier is then employed to complete target recognition. Performance analyses confirm that the proposed method can effectively identify each constituent target in mixture echoes while maintaining superior recognition performance in single target scenarios. Future work will focus on improving computational efficiency and real-time capability by optimizing the stopping criteria of VMD iterations, and on exploring the application boundaries of the method using measured data under higher sea states.
Improved Related-tweak Attack on Full-round HALFLOOP-48
SUN Xiaomeng, ZHANG Wenying, YUAN Zhaozhong
Available online  , doi: 10.11999/JEIT251014
Abstract:
  Objective  HALFLOOP is a family of tweakable AES-like lightweight block ciphers used to encrypt automatic link establishment messages in fourth-generation high-frequency radio systems. Because the RotateRows and MixColumns operations diffuse differences rapidly, long differentials with high probability are difficult to construct, which limits attacks on the full cipher. This study examines full HALFLOOP-48 and evaluates its resistance to sandwich attacks in the related-tweak setting, a critical method in lightweight-cipher cryptanalysis.  Methods  A new truncated sandwich distinguisher framework is proposed to attack full HALFLOOP-48. The cipher is decomposed into three sub-ciphers, \begin{document}$ {{E}}_{0} $\end{document}, \begin{document}$ {{E}}_{1} $\end{document}. A model is built by applying an automatic search method based on the Boolean Satisfiability Problem (SAT) to each part: byte-wise models for \begin{document}$ {{E}}_{0} $\end{document}, \begin{document}$ {{E}}_{1} $\end{document} and a bit-wise model for \begin{document}$ {E}_{m} $\end{document}. For \begin{document}$ {E}_{m} $\end{document}, a method is proposed to model large S-boxes using SAT, the Affine subspace Dimensional Reduction method (ADR). ADR converts the modeling of a high-dimensional set into two sub-problems for a low-dimensional set. ADR ensures that the SAT-searched differentials exist and that their probabilities are accurate, while reducing the size of Conjunctive Normal Form (CNF) clauses. It also enables the SAT method to search longer differentials efficiently when large S-boxes appear. To improve probability accuracy in \begin{document}$ {E}_{m} $\end{document}, dependencies between \begin{document}$ {{E}}_{0} $\end{document} and \begin{document}$ {{E}}_{1} $\end{document} are evaluated across three layers, and their probabilities are multiplied. Two key-recovery attacks, a sandwich attack and a rectangle-like sandwich attack, are mounted on the distinguisher in the related-tweak scenario.  Results and Discussions  The SAT-based model reveals a critical weakness in HALFLOOP-48. A practical sandwich distinguisher for the first 8 rounds withprobability \begin{document}$ {2}^{-43.415} $\end{document} is identified. An optimal truncated sandwich distinguisher for 8-round HALFLOOP-48 with probability \begin{document}$ {2}^{-43.2} $\end{document} is then established by exploiting the clustering effect of the identified differentials. Compared with earlier results, this distinguisher is practical and extends the reach by two rounds. Using the 8-round distinguisher, both a sandwich attack and a rectangle-like sandwich attack are mounted on full-round HALFLOOP-48 under related tweaks. The sandwich attack requires data complexity of \begin{document}$ {2}^{32.8} $\end{document}, time complexity \begin{document}$ {2}^{92.2} $\end{document} and memory complexity \begin{document}$ {2}^{42.8} $\end{document}. For the rectangle-like sandwich attack, the data complexity is \begin{document}$ {2}^{16.2} $\end{document}, with time complexity \begin{document}$ {2}^{99.2} $\end{document} and memory complexity \begin{document}$ {2}^{26.2} $\end{document}. Compared with the previous results, these attacks reduce time complexity by \begin{document}$ {2}^{25.4} $\end{document} and memory complexity by \begin{document}$ {2}^{10} $\end{document}.  Conclusions  To handle the rapid diffusion of differences in HALFLOOP, a new perspective on sandwich attacks based on truncated differentials is developed by combining byte-wise and bit-wise models. The models for \begin{document}$ {{E}}_{0} $\end{document} and \begin{document}$ {{E}}_{1} $\end{document} are byte-wise and extend these two parts forward and backward into \begin{document}$ {E}_{m} $\end{document}, which is based on bit-wise. To efficiently model the 8-bit S-box in the layer \begin{document}$ {E}_{m} $\end{document}, which is bit-wise. To model the 8-bit S-box in Em efficiently, an affine subspace dimensional reduction approach is proposed. This model ensures compatibility between the two truncated differential trails and covers as many rounds as possible with high probability. It supports a new 8-round truncated boomerang distinguisher that outperforms previous distinguishers for HALFLOOP-48. Based on this 8-round truncated boomerang distinguisher, a key-recovery attack is achieved with success probability 63%. The results show that (1) the ADR method offers an efficient way to apply large S-boxes in lightweight ciphers, (2) the truncated boomerang distinguisher construction can be applied to other AES-like lightweight block ciphers, and (3) HALFLOOP-48 does not provide an adequate security margin for use in the U.S. military standard.
Privacy-Preserving Federated Weakly-Supervised Learning for Cancer Subtyping on Histopathology Images
WANG Yumeng, LIU Zhenbing, LIU Zaiyi
Available online  , doi: 10.11999/JEIT250842
Abstract:
  Objective  Data-driven deep learning methods are widely applied to cancer subtyping, yet their performance depends on large training datasets with fine-grained annotations. For gigapixel Whole Slide Images (WSI), such annotations are labor-intensive and costly. Clinical data are typically stored in isolated data silos, and sharing procedures raise privacy concerns. Federated Learning (FL) enables a global model to be trained from data distributed across multiple medical centers without transmitting local data. However, in conventional FL, substantial heterogeneity across centers reduces the performance and stability of the global model.  Methods  A privacy-preserving FL method is proposed for gigapixel WSI in computational pathology. Weakly supervised attention-based Multiple Instance Learning (MIL) is integrated with differential privacy to support training when only slide-level labels are available. Within each client, a multi-scale attention-based MIL method is used to conduct local training on histopathology WSIs, reducing the need for costly pixel-level annotation through a weakly supervised setting. During the federated update, local differential privacy is applied to limit the risk of sensitive information leakage. Random noise drawn from a Gaussian or Laplace distribution is added to model parameters after each client’s local training. Furthermore, a federated adaptive reweighting strategy is introduced to address the heterogeneity of pathological images across clients by dynamically balancing the influence of local data quantity and quality on each client’s aggregation weight.  Results and Discussions  The proposed FL framework is evaluated on two clinical diagnostic tasks: Non-small Cell Lung Cancer (NSCLC) histologic subtyping and Breast Invasive Carcinoma (BRCA) histologic subtyping. As shown in (Table 1, Table 2, and Fig. 4), the proposed FL method (Ours with DP and Ours w/o DP) achieves higher accuracy and stronger generalization than localized models and other FL approaches. Its classification performance remains competitive even when compared with the centralized model (Fig. 3). These results indicate that privacy-preserving FL is a feasible and effective strategy for multicenter histopathology images and may reduce the performance degradation typically caused by data heterogeneity across centers. When the magnitude of added noise is controlled within a limited range, stable classification can also be achieved (Table 3). The two main components, the multiscale representation attention network and the federated adaptive reweighting strategy, each contribute to consistent performance improvement (Table 4). In addition, the proposed FL method maintains stable classification performance across different hyperparameter settings (Table 5, Table 6), confirming its robustness.  Conclusions  The proposed FL method addresses two central challenges in multicenter computational pathology: the presence of data silos and concerns over privacy. It also alleviates the performance degradation caused by inter-center data heterogeneity. As balancing model accuracy with privacy protection remains a key challenge, future work focuses on developing methods that preserve privacy while sustaining stable classification performance.
Routing and Resource Scheduling Algorithm Driven by Mixture of Experts in Large-scale Heterogeneous Local Power Communication Network
JING Chuanfang, ZHU Xiaorong
Available online  , doi: 10.11999/JEIT251176
Abstract:
  Objective  Emerging power services, such as distributed energy consumption, place stringent performance requirements on Large-Scale Heterogeneous Local Power Communication Networks (LHLPCNs). Limited communication resources and increasing service demands make it challenging to provide on-demand services and improve network capacity while ensuring Quality of Service (QoS). Conventional routing and resource scheduling algorithms based on optimization or heuristics depend on precise mathematical models and parameters, and their computational cost increases as network size and variables grow. These limitations reduce their adaptability to expanding power application scenarios. Advances in Mixture-of-Experts (MoE) frameworks offer a promising direction because they reduce the need to train task-specific models by using an ensemble of specialized AI experts. Motivated by these challenges, this study proposes an MoE-based routing and resource scheduling algorithm (RASMoE) for LHLPCNs integrating High-Power Line Carrier (HPLC) and Radio Frequency (RF). RASMoE is designed to meet personalized QoS requirements and support more power services within limited resources.  Methods  An optimization problem that minimizes the difference between QoS supply and demand in LHLPCNs is formulated as a 0–1 integer linear programming model considering multimodal links, channels, and modulation methods. To solve this NP-hard problem, a new MoE framework comprising expert networks and gated networks is designed. The framework supports personalized service requirements in terms of data rate, delay, and reliability, while improving convergence. The expert networks include shared and QoS-specific experts that generate optimal next hops and compute allocation strategies for links, channels, and modulation modes between node pairs. The gated networks dynamically combine and reuse these experts to support known and unforeseen service types. Extensive comparative experiments are conducted, and RASMoE shows improved resource utilization, reduced delay, and higher reliability relative to multiple baselines.  Results and Discussions  The performance supply-demand differences of five algorithms under varying service numbers are compared (Fig. 3). RASMoE consistently achieves the smallest differences across scenarios due to its gating network, which combines QoS-specific experts to align resource allocation with service requirements. Because control and compute-intensive services have strict delay requirements, their average End-to-End (E2E) latency under different service numbers is evaluated (Fig. 4). The proposed algorithm achieves the lowest average E2E latency because its GAT-enhanced expert networks extract node load states and interact with the network environment in real time through a Multi-Armed Bandit (MAB) mechanism. This supports adaptive allocation strategies. The average reliability of E2E paths for different numbers of control, compute-intensive, and acquisition services is also illustrated (Fig. 5).  Conclusions  This study proposes a MoE-driven routing and resource scheduling algorithm for LHLPCNs. The framework integrates expert networks and a gating network. The expert networks include GAT-based shared experts for E2E path selection and MAB-based QoS-specific experts for adaptive allocation of links, channels, and modulation schemes according to QoS demands and link states. The gated networks orchestrate and reuse these experts to support services with single or multiple QoS requirements, including previously unseen service types. Theoretical analysis shows that the method improves resource utilization in LHLPCNs, with notable advantages in multi-service scenarios characterized by diverse QoS demands. Future work will examine integrating the MoE framework with domain-specific models, including power load forecasting and predictive analytics, to enhance the use of renewable energy sources.
Crosstalk-Free Frequency-Spin Multiplexed Multifunctional Device Realized by Nested Meta-Atoms
ZHANG Ming, DONG Peng, TAO En, YANG Lin, HAN Qi, HE Yuhang, HOU Weimin, LI Kang
Available online  , doi: 10.11999/JEIT251202
Abstract:
  Objective  To address high fabrication costs and signal crosstalk in existing multidimensional multiplexed metasurfaces, a crosstalk-free, frequency-spin multiplexed single-layer metasurface based on nested bi-spectral meta-atoms is proposed. Two C-shaped split-ring resonators are physically superimposed to target the Ku band (12.5 GHz) and the K band (22 GHz). This configuration enables four fully independent information channels, defined by two frequencies and two spin states, without spatial division or multilayer stacking. The objective is to demonstrate independent, high-performance vortex beam generation and holographic imaging, providing a simplified and cost-effective solution for advanced 6G communication and sensing systems.  Methods  A reflective metal–dielectric–metal metasurface architecture is adopted, in which each unit cell integrates an Outer C-Shaped Split-Ring Resonator (OCSRR) and an Inner C-Shaped Split-Ring Resonator (ICSRR). Parameter sweeps performed using CST Microwave Studio are used to select structures that provide high cross-polarization conversion at the target frequencies while maintaining negligible responses in non-target bands. Independent spin multiplexing is achieved through the combined use of transmission phase and geometric phase, controlled by resonator rotation. Two prototypes are fabricated using printed circuit board technology. MS1 is designed for focused vortex beam generation with topological charges l = +1, +2, +3, and +4, whereas MS2 is designed for holographic imaging of the letters “H”, “B”, “K”, and “D”. Device performance is validated by near-field scanning measurements under oblique incidence using a vector network analyzer.  Results and Discussions  Simulation and experimental results confirm strong frequency selectivity and effective spin decoupling enabled by the nested meta-atom design. The OCSRR and ICSRR dominate the electromagnetic responses at 12.5 GHz and 22 GHz, respectively, and exhibit linear superposition behavior with minimal crosstalk. MS1 generates four focused vortex beams with clearly separated topological charges, achieving an average mode purity of 88.25%. MS2 reconstructs four independent and well-defined holographic images with high channel isolation. The close agreement between measured and simulated results demonstrates the robustness of the device and validates the effectiveness of the crosstalk-free design strategy under practical illumination conditions.  Conclusions  A reliable approach for realizing crosstalk-free frequency-spin multiplexed metasurfaces using nested meta-atoms is demonstrated. Simultaneous and independent manipulation of electromagnetic waves across four channels is achieved on a single metasurface layer, substantially reducing design complexity and fabrication cost. The successful demonstration of multi-channel vortex beam generation and holographic imaging indicates strong potential for integrated multifunctional applications in next-generation wireless communication and optical systems.
Model-Free Adaptive Resilient Control of Vehicle Platoons Against Hybrid Cyberattacks
HAN Qiaoni, MA Jianguo, LI Peng, ZUO Zhiqiang
Available online  , doi: 10.11999/JEIT251135
Abstract:
  Objective  Connected and automated vehicle platoons represent a pivotal technology for enhancing traffic efficiency, driving safety, and fuel economy in intelligent transportation systems. Through inter-vehicle information interaction and cooperative control, vehicle platoons can achieve safe and efficient car-following operations. However, their heavy reliance on vehicular communication networks makes them vulnerable to cyberattacks, particularly hybrid threats combining Denial-of-Service (DoS) and False Data Injection (FDI) attacks. Such attacks may lead to the interruption or tampering with information transmission, thus posing a severe threat to the safety and stability of vehicle platoon systems. Simultaneously, vehicle platoon control faces challenges arising from environmental disturbances, parametric uncertainties, and nonlinear dynamic characteristics. Existing model-based control methods often struggle to maintain performance under such complex conditions, necessitating a resilient, data-driven control strategy that does not depend on precise mechanical models. This paper aims to develop a novel attack-compensated Model-Free Adaptive Control (MFAC) framework to ensure the secure and stable operation of heterogeneous nonlinear vehicle platoons under hybrid cyberattacks.  Methods  Aiming at the resilient control problem of connected vehicle platoons under cyberattacks, this paper proposes an MFAC method based on attack compensation for hybrid attacks involving both DoS and FDI attacks. First, a nonlinear longitudinal vehicle dynamics model for the platoon is established, which is then transformed into an equivalent compact-form dynamic linearized data model via the dynamic linearization technique. This transformation effectively decouples the controller design from the specific mechanical model of the vehicle. In addition, an innovative output tuning factor is introduced to dynamically balance and achieve the simultaneous tracking of both position and velocity states. Second, a hybrid attack model is formulated to capture the characteristics of persistent FDI attacks that inject malicious data and aperiodic DoS attacks that cause communication interruptions. Subsequently, a pseudo-gradient estimator is designed to capture the system dynamics using real-time input-output data; the impact of hybrid attacks on this pseudo-gradient estimator is investigated, and an adaptive update strategy for the estimator is developed during DoS attacks. Most importantly, an intelligent attack compensation mechanism is proposed, which strategically leverages historical control input information during DoS attack periods. This mechanism ensures the continuous and stable operation of the system even when real-time vehicle state information is unavailable, thereby further enhancing the control performance of the connected vehicle platoon system under DoS attacks.  Results and Discussions  Rigorous theoretical analysis is conducted to prove that the tracking error of the closed-loop system remains bounded under specific conditions regarding the frequency and duration of cyber attacks (Theorem 1). Extensive simulations verify the practical effectiveness of the proposed method. During cyberattacks, the MFAC method with the attack compensation mechanism can adaptively adjust the attenuation rate of its control inputs, thereby effectively guaranteeing the system’s control performance (Fig. 3). Additionally, follower vehicles successfully track the leader’s velocity variations while maintaining the desired inter-vehicle spacing (Fig. 4a, 4b), and the tracking error exhibits satisfactory convergence characteristics (Fig. 4d), thereby verifying the stability of the closed-loop system. Comparative studies demonstrate the critical role of the proposed compensation mechanism: when this mechanism is disabled, the platoon experiences significant performance degradation during cyberattacks (Fig. 5), whereas the proposed method maintains superior tracking accuracy and facilitates faster error recovery. Furthermore, an investigation into the intensity of FDI attacks demonstrates that increasing attack intensity leads to expanded steady-state error bounds (Fig. 6), which not only quantitatively validates the theoretical robustness analysis of the proposed method but also provides important insights for designing security thresholds in practical engineering applications.  Conclusions  This paper achieves a significant advancement in the secure control of heterogeneous nonlinear connected vehicle platoons by proposing a novel attack-compensated MFAC framework, which effectively addresses the dual challenges of hybrid cyberattacks (i.e., DoS and FDI attacks) and system nonlinearities. Specifically, three key contributions are made to realize this goal: (1) developing a data-driven dynamic linearization framework integrated with an output tuning factor to achieve simultaneous position and velocity tracking, based on the established nonlinear longitudinal vehicle dynamics model and its equivalent data-based linearized model; (2) establishing a hybrid attack model that incorporates aperiodic DoS attacks (causing communication interruptions) and bounded additive FDI attacks (injecting malicious data), capturing their intrinsic characteristics; and (3) designing an intelligent historical input-driven compensation mechanism, coupled with a pseudo-gradient estimator, to optimize control performance during DoS-induced communication outages. Both theoretical analysis and simulation results confirm the effectiveness of the proposed method: the system tracking error can be guaranteed to be bounded when attack parameters satisfy specific conditions, enabling follower vehicles to accurately track the leader’s states while outperforming the compensation-free baseline scheme in velocity tracking accuracy and error convergence speed. Focusing on the hybrid scenario of aperiodic DoS and bounded additive FDI attacks, this work provides a practical model-free solution for enhancing the cybersecurity of connected vehicle platoons. For future research, we will extend the scope to include stealthier hybrid attack modes (non-additive FDI, spoofing and DoS attacks) to explore their coupling mechanisms and design targeted defense strategies. Meanwhile, we will investigate a communication-efficient MFAC strategy that integrates an event-triggered mechanism to reduce network load and improve scalability.
Total Coloring on Planar Graphs of Nested n-Pointed Stars
SU Rongjin, FANG Gang, ZHU Enqiang, XU Jin
Available online  , doi: 10.11999/JEIT250861
Abstract:
  Objective  Many combinatorial optimization problems can be regarded as graph coloring problems. A classic topic in this field is total coloring, which combines vertex coloring and edge coloring. Previous studies and current research focus on the Total Coloring Conjecture (TCC), proposed in the 1960s. For graphs, including planar graphs, with maximum degree less than six, the correctness of the TCC has been verified through case enumeration. For planar graphs with maximum degree greater than six, the discharging technique has been used to confirm the conjecture by identifying reducible configurations and establishing detailed discharging rules. This method becomes limited when applied to planar graphs with maximum degree exactly six. Only certain restricted classes of graphs have been shown to satisfy the TCC, such as graphs without 4-cycles and graphs without adjacent triangles. More recent work demonstrates that the TCC holds for planar graphs without 4-fan subgraphs and for planar graphs with maximum average degree less than twenty-three fifths. Thus, it remains unclear whether planar graphs with maximum degree six that contain a 4-fan subgraph or have maximum average degree at least twenty-three fifths satisfy the conjecture. To address this question, this paper studies total coloring of a class of planar graphs known as nested n-pointed stars and aims to show that the TCC holds for these graphs.  Methods  The study relies on theoretical methods, including mathematical induction, constructive techniques, and case enumeration. An n-pointed star is obtained by connecting each edge of an n-polygon (n ≥ 3) to a triangle and then joining the triangle vertices not on the polygon to form a new n-polygon. Repeating this operation produces a nested n-pointed star with l layers, denoted by \begin{document}$ G_{n}^{l} $\end{document}. These graphs have maximum degree exactly six. Their structural properties, including the presence of 4-fan subgraphs and maximum average degree greater than twenty-three fifths, are established. Induction on the number of layers is then used to show that \begin{document}$ G_{n}^{l} $\end{document} has a total 8-coloring: (1) \begin{document}$ G_{n}^{1} $\end{document} has a total 8-coloring; (2) Suppose that \begin{document}$ G_{n}^{l-1} $\end{document} has a total 8-coloring; (3) prove that \begin{document}$ G_{n}^{l} $\end{document} has a total 8-coloring. A graph \begin{document}$ G_{n}^{l} $\end{document} is defined as a type I graph if it has a total 7-coloring. When \begin{document}$ n=3k $\end{document}, constructive arguments show that \begin{document}$ G_{3k}^{l} $\end{document} is a type I graph. The value of \begin{document}$ k $\end{document} is considered in two cases, \begin{document}$ (k=2m-1) $\end{document} and \begin{document}$ (k=2m) $\end{document}. In both cases, a total 7-coloring of \begin{document}$ G_{3k}^{l} $\end{document} is obtained by directly assigning colors to all vertices and edges.  Results and Discussions  Induction on the number of layers of \begin{document}$ G_{n}^{l} $\end{document} that nested n-pointed stars satisfy the Total Coloring Conjecture (Fig. 5). Five colors are assigned to the vertices and edges of \begin{document}$ G_{3k}^{1} $\end{document} to obtain a total 5-coloring (Fig. 6(a) and Fig. 8(a)). Two additional colors are then applied alternately to the edges connecting the polygons in layers 1 and 2. This produces a total 7-coloring of \begin{document}$ G_{3k}^{2} $\end{document} (Fig. 7(a) and Fig. 9(a)). After a permutation of the colors, another total 7-coloring of \begin{document}$ G_{3k}^{3} $\end{document} is obtained (Fig. 7(b) and Fig. 9(b)). The coloring pattern on the outermost layer is identical to that of \begin{document}$ G_{3k}^{1} $\end{document}, which allows the same extension to construct total 7-colorings for \begin{document}$ G_{3k}^{4},G_{3k}^{5},\cdots ,G_{3k}^{l} $\end{document} . Therefore, \begin{document}$ G_{3k}^{l} $\end{document} is a type I graph.  Conclusions  This study verifies that the Total Coloring Conjecture holds for nested n-pointed stars, which have maximum degree six and contain 4-fan subgraphs. It shows that \begin{document}$ G_{3k}^{l} $\end{document} is a type I graph. A further question arises regarding whether \begin{document}$ G_{n}^{l} $\end{document} is a type I graph when \begin{document}$ n\neq 3k $\end{document}. A total 7-coloring can be constructed when \begin{document}$ n=4 $\end{document} or \begin{document}$ n=5 $\end{document}, and therefore both \begin{document}$ G_{4}^{l} $\end{document} and \begin{document}$ G_{5}^{l} $\end{document} are type I graphs. For other values of \begin{document}$ n\neq 3k $\end{document}, whether \begin{document}$ G_{n}^{l} $\end{document} is a type I graph remains open.
Research on Generation and Optimization of Dual-channel High-current Relativistic Electron Beams Based on a Single Magnet
AN Chenxiang, HUO Shaofei, SHI Yanchao, ZHAI Yonggui, XIAO Renzhen, CHEN Changhua, CHEN Kun, HUANG Huijie, SHEN Liuyang, LUO Kaiwen, WANG HongGuang, LI YuQing
Available online  , doi: 10.11999/JEIT250487
Abstract:
  Objective  High-Power Microwave (HPM) technology is a strategic frontier in defense, military, and civilian systems. The microwave output power of a single HPM source reaches a bottleneck because of physical limits, material constraints, and fabrication challenges. To address this issue, researchers have proposed HPM power synthesis, which increases peak power by integrating multiple HPM sources.  Methods  This study addresses the time synchronization problem in multipath HPM synthesis by designing a dual-channel high-current relativistic electron-beam generator. The device uses one pulse-power driver to drive two diodes simultaneously and applies one coil magnet to confine both electron beams. Three-dimensional particle-in-cell simulations revealed the angular nonuniformity of the beam current, and a cathode stalk modification is proposed to improve beam quality, whose effectiveness is subsequently validated by experiments.   Results and Discussions  Three-dimensional UNIPIC particle-in-cell simulations of the device’s physical processes revealed that: due to side emission from the cathode stalk, the dual electron beams exhibit significant angular nonuniformity. Specifically, the beam current density near the center of the magnetic field is relatively low, while it is higher in regions farther from the magnetic center. To address this issue, the structure of the cathode stalk was modified to suppress side emission. The angular current fluctuation of cathode emission in Tube 1 decreased dramatically from 35.61% to 2.93%, and that in Tube 2 decreased from 33.17% to 3.13%, improving beam quality. Simulations and experiments show that the device stably generates high-quality electron beams with a voltage of 800 kV and a current of 20 kA, reaching a total power of 16 GW. The current waveform remains stable within the 45 ns voltage half-width without impedance collapse.  Conclusions  The study provides a reliable basis for generating multipath high-current relativistic electron beams and for synthesizing the power of multiple HPM sources, demonstrating strong application potential.
A Miniaturized Steady-State Visual Evoked Potential Brain-Computer Interface System
CAI Yu, WANG Junyang, JIANG Chuanli, LUO Ruixin, LÜ Zhengchao, YU Haiqing, HUANG Yongzhi, ZHONG Ziping, XU Minpeng
Available online  , doi: 10.11999/JEIT251223
Abstract:
  Objective  The practical use of Brain-Computer Interface (BCI) systems in daily settings is limited by bulky acquisition hardware and the cables required for stable performance. Although portable systems exist, achieving compact hardware, full mobility, and high decoding performance at the same time remains difficult. This study aims to design, implement, and validate a wearable Steady-State Visual Evoked Potential (SSVEP) BCI system. The goal is to create an integrated system with ultra-miniaturized and concealable acquisition hardware and a stable cable-free architecture, and to show that this approach provides online performance comparable with laboratory systems.  Methods  A system-level solution was developed based on a distributed architecture to support wearability and hardware simplification. The core component is an ultra-miniaturized acquisition node. Each node functions as an independent EEG acquisition unit and integrates a Bluetooth Low Energy (BLE) system-on-chip (CC2640R2F), a high-precision analog-to-digital converter (ADS1291), a battery, and an electrode in one encapsulated module. Through an optimized 6-layer PCB design and stacked assembly, the module size was reduced to 15.12 mm × 14.08 mm × 14.31 mm (3.05 cm3) with a weight of 3.7 g. Each node uses one active electrode, and all nodes share a common reference electrode connected by a thin short wire. This structure reduces scalp connections and allows concealed placement in hair using a hair-clip form factor. Multiple nodes form a star network coordinated by a master device that manages communication with a stimulus computer. A cable-free synchronization strategy was implemented to handle timing uncertainties in distributed wireless operation. Hardware-event detection and software-based clock management were combined to align stimulus markers with multi-channel EEG data without dedicated synchronization cables. The master device coordinates this process and streams synchronized data to the computer for real-time processing. System evaluation was conducted in two phases. Foundational performance metrics included physical characteristics, electrical parameters (input-referred noise: 3.91 mVpp; common-mode rejection ratio: 132.99 dB), and synchronization accuracy under different network scales. Application-level performance was assessed using a 40-command online SSVEP spelling task with six subjects in an unshielded room with common RF interference. Four nodes were placed at Pz, PO3, PO4, and Oz. EEG epochs (0.14\begin{document}$ \sim $\end{document}3.14 s post-stimulus) were analyzed using Canonical Correlation Analysis (CCA) and ensemble Task-Related Component Analysis (e-TRCA) to compute recognition accuracy and Information Transfer Rate (ITR).  Results and Discussions  The system met its design objectives. Each acquisition node achieved an ultra-compact form factor (3.05 cm3, 3.7 g) suitable for concealed wear and provided more than 5 hours of battery life at a 1 000 Hz sampling rate. Electrical performance supported high-quality SSVEP acquisition. The cable-free synchronization strategy ensured stable operation. More than 95% of event markers aligned with the EEG stream with less than 1 ms error (Fig. 4), meeting SSVEP-BCI requirements. This stability supported the quality of recorded neural signals. Grand-averaged SSVEP responses showed clear and stable waveforms with precise phase alignment (Fig. 5). The signal-to-noise ratio at the fundamental stimulation frequency exceeded 10 dB for all 40 commands (Fig. 6). In the online spelling experiment, the system showed strong decoding performance. With the e-TRCA algorithm and a 3-s window, the average accuracy was (95.00 ± 2.04)%. The system reached a peak ITR of (147.24 ± 30.52) bits/min with a 0.4-s data length (Fig. 7). Comparison with existing SSVEP-BCI systems (Table 1) indicates that, despite constraints of miniaturization, cable-free use, and four channels, the system achieved accuracy comparable with several cable-dependent laboratory systems while offering improved wearability.  Conclusions  This work presents a wearable SSVEP-BCI system that integrates ultra-miniaturized hardware with a distributed cable-free architecture. The results show that coordinated hardware and system design can overcome tradeoffs between device size, user mobility, and decoding capability. The acquisition node (3.7 g, 3.05 cm3) supports concealable wearability, and the synchronization strategy provides reliable cable-free operation. In a realistic environment, the system produced online performance comparable with many cable-dependent setups, achieving 95.00% accuracy and a peak ITR of 147.24 bits/min in a 40-target task. Therefore, this study provides a practical system-level solution that supports progress toward wearable high-performance BCIs.
Wavelet Transform and Attentional Dual-Path EEG Model for Virtual Reality Motion Sickness Detection
CHEN Yuechi, HUA Chengcheng, DAI Zhian, FU Jingqi, ZHU Min, WANG Qiuyu, YAN Ying, LIU Jia
Available online  , doi: 10.11999/JEIT251233
Abstract:
  Objective  Virtual Reality Motion Sickness (VRMS) presents a barrier to the wider adoption of immersive Virtual Reality (VR). It is primarily caused by sensory conflict between the vestibular and visual systems. Existing assessments rely on subjective reports that disrupt immersion and do not provide real-time measurements. An objective detection method is therefore needed. This study proposes a dual-path fusion model, the Wavelet Transform ATtentional Network (WTATNet), which integrates wavelet transform and attention mechanisms. WTATNet is designed to classify resting-state ElectroEncephaloGraph (EEG) signals collected before and after VR motion stimulus exposure to support VRMS detection and research on the mechanisms and mitigation strategies.  Methods  WTATNet contains two parallel pathways for EEG feature extraction. The first applies a Two-Dimensional Discrete Wavelet Transform (2D-DWT) to both the time and electrode dimensions of the EEG, reshaping the signal into a two-dimensional matrix based on the spatial layout of the scalp electrodes in horizontal or vertical form. This decomposition captures multi-scale spatiotemporal features, which are then processed using Convolutional Neural Network (CNN) layers. The second pathway applies a one-dimensional CNN for initial filtering followed by a dual-attention structure consisting of a channel attention module and an electrode attention module. These modules recalibrate the importance of features across channels and electrodes to emphasize task-relevant information. Features from both pathways are fused and passed through fully connected layers to classify EEGs into pre-exposure (non-VRMS) and post-exposure (VRMS) states based on subjective questionnaire validation. EEG data were collected from 22 subjects exposed to VRMS using the game “Ultrawings2.” Ten-fold cross-validation was used for training and evaluation with accuracy, precision, recall, and F1-score as metrics.  Results and Discussions  WTATNet achieved high VRMS-related EEG classification performance, with an average accuracy of 98.39%, F1-score of 98.39%, precision of 98.38%, and recall of 98.40%. It outperformed classical and state-of-the-art EEG models, including ShallowConvNet, EEGNet, Conformer, and FBCNet (Table 2). Ablation experiments (Tables 3 and 4) showed that removing the wavelet transform path, the electrode attention module, or the channel attention module reduced accuracy by 1.78%, 1.36%, and 1.01%, respectively. The 2D-DWT performed better than the one-dimensional DWT, supporting the value of joint spatiotemporal analysis. Experiments with randomized electrode ordering (Table 4) produced lower accuracy than spatially coherent layouts, indicating that 2D-DWT leverages inherent spatial correlations among electrodes. Feature visualizations using t-SNE (Figures 5 and 6) showed that WTATNet produced more discriminative features than baseline and ablated variants.  Conclusions  The dual-path WTATNet model integrates wavelet transform and attention mechanisms to achieve accurate VRMS detection using resting-state EEG. Its design combines interpretable, multi-scale spatiotemporal features from 2D-DWT with adaptive channel-level and electrode-level weighting. The experimental results confirm state-of-the-art performance and show that WTATNet offers an objective, robust, and non-intrusive VRMS detection method. It provides a technical foundation for studies on VRMS neural mechanisms and countermeasure development. WTATNet also shows potential for generalization to other EEG decoding tasks in neuroscience and clinical research.
Performance Analysis and Rapid Prediction of Long-range Underwater Acoustic Communications in Uncertain Deep-sea Environments
CHEN Xiangmei, TAI Yupeng, WANG Haibin, HU Chenghao, WANG Jun, WANG Diya
Available online  , doi: 10.11999/JEIT251244
Abstract:
  Objective  In complex and dynamically changing deep-sea environments, the performance of underwater acoustic communications shows substantial variability. Feedback-based channel estimation and parameter adaptation are impractical in long-range scenarios because platform constraints prevent reliable feedback channels and the slow propagation of sound introduces significant delay. In typical long-range systems, environmental dynamics are often ignored and communication parameters are selected heuristically, which frequently leads to mismatches with actual channel conditions and causes communication failures or reduced efficiency. Predictive methods able to assess performance in advance and support feed-forward parameter adjustment are therefore required. This study proposes a deep-learning-based framework for performance analysis and rapid prediction of long-range underwater acoustic communications under uncertain environmental conditions to enable efficient and reliable parameter–channel matching without feedback.  Methods  A feed-forward method for underwater acoustic communication performance analysis and rapid prediction is developed using deep-learning-based sound-field uncertainty estimation. A neural network is first used to estimate probability distributions of Transmission Loss (TL PDFs) at the receiver under dynamic environments. TL PDFs are then mapped to probability distributions of the Signal-to-Noise Ratio (SNR PDFs), enabling communication performance evaluation without real-time feedback. Statistical channel capacity and outage capacity are analyzed to characterize the theoretical upper limits of achievable rates in dynamic conditions. Finally, by integrating the SNR distribution with the bit-error-rate characteristics of a representative deep-sea single-carrier communication system under the corresponding channel, a rate–reliability prediction model is constructed. This model estimates the probability of reliable communication at different data rates and serves as a practical tool for forecasting link performance in highly dynamic and feedback-limited underwater acoustic environments.  Results and Discussions  The method is validated using simulation data and sea trial data. The TL PDFs predicted by the deep learning model show strong consistency with the traditional Monte Carlo (MC) method across multiple receiver locations (Fig. 6). Under identical computational settings, deep-learning-based TL PDF prediction reduces computation time by 2\begin{document}$ \sim $\end{document}3 orders of magnitude compared with the MC method. The chained mapping from TL PDFs to SNR PDFs and then to channel capacity metrics accurately represents the probabilistic features of communication performance under uncertain conditions (Fig. 7 and Fig. 8). The rate–reliability curves derived from the deep-learning-based TL PDFs are highly consistent with MC-based results. In the high sound-intensity region, prediction errors for reliable communication probabilities across data rates range from 0.1% to 3%, and in the low sound-intensity region errors are approximately 0.3% to 5% (Fig. 12). Sea trial results further indicate that predicted rate–reliability performance agrees well with measured data. In the convergence zone, deviations between predicted and measured reliability probabilities at each rate range from 0.9% to 4%, and in the shadow zone from 1% to 9% (Fig. 18). Under a 90% reliability requirement, the maximum achievable rates predicted by the method match the measurements in both the convergence and shadow zones, demonstrating accuracy and practical applicability in complex channel environments.  Conclusions  A deep-learning-based framework for performance analysis and rapid prediction of long-range underwater acoustic communications in uncertain deep-sea environments is developed and validated. The framework builds a chained mapping from environmental parameters to TL PDFs, SNR PDFs, and communication performance metrics, enabling quantitative capacity assessment under dynamic ocean conditions. Predictive “rate–reliability’’ profiles are obtained by integrating probabilistic propagation characteristics with the performance of a representative deep-sea single-carrier system under the corresponding channel, providing guidance for parameter selection without feedback. Sea trial results confirm strong agreement between predicted and measured performance. The proposed approach offers a technical pathway for feed-forward performance analysis and dynamic adaptation in long-range deep-sea communication systems, and can be extended to other communication scenarios in dynamic ocean environments.
Low-Complexity Joint Estimation Algorithm for Carrier Frequency Offset and Sampling Frequency Offset in 5G-NTN Low Earth Orbit Satellite Communications
GONG Xianfeng, LI Ying, LIU Mingyang, ZHAI Shenghua
Available online  , doi: 10.11999/JEIT251086
Abstract:
  Objective   The Doppler effect is a major impairment in Low Earth Orbit (LEO) satellite communications within 5G Non-Terrestrial Networks (5G-NTN). It introduces Carrier Frequency Offset (CFO), Sampling Frequency Offset (SFO), and Inter-Subcarrier Frequency Offset (ISFO) across subcarriers. Although existing estimation algorithms focus mainly on CFO and SFO, the effect of ISFO is insufficiently addressed. ISFO becomes highly detrimental to receiver performance when Orthogonal Frequency-Division Multiplexing (OFDM) systems use a large number of subcarriers and high-order modulation. Moreover, under joint CFO and SFO conditions, conventional Maximum Likelihood Estimation (MLE) methods often require one- or two-dimensional grid searches. This results in high computational cost. To reduce this cost, two joint estimation algorithms for CFO and SFO are proposed.  Methods   The influence of non-ideal factors at the transmitter, receiver, and channel, such as local oscillator offset, SFO in Digital-to-Analog Converters (DACs) and Analog-to-Digital Converters (ADCs), and the Doppler effect, is analyzed. A mathematical model for the received OFDM signal is developed, and the mechanism through which SFO and ISFO distort the phase of frequency-domain subcarriers is derived. Leveraging the pilot structure of 5G-NTN, two joint CFO and SFO estimation algorithms are proposed. (1) Algorithm 1 uses the sequence correlation between the received frequency-domain Demodulation Reference Signal (DMRS) vectors. After phase pre-compensation is applied, the normalized cross-correlation vector is computed. An objective function is constructed from this vector, and its unimodal behavior in the main lobe is used to estimate the parameters through a bisection search. (2) Algorithm 2 treats the estimation parameter as analogous to a CFO in single-carrier systems and adopts an L&R-based autocorrelation method to derive approximate closed-form expressions.  Results and Discussions   A computational complexity analysis compares the proposed algorithms with one-dimensional (1D-ML) and two-dimensional (2D-ML) grid-search MLE methods. Numerical results show that Algorithm 1 reduces complexity substantially. The number of complex multiplications, which represent the main computational cost, is 4% of that of the 2D-ML method, 8% of that of Algorithm 2, and 44% of that of the 1D-ML method. Although Algorithm 2 is more computationally demanding, it yields a closed-form estimation expression. The performance of each algorithm is evaluated through the Mean Square Error (MSE) of the estimated parameters. Simulations show that for a subcarrier number of 3072, the 1D-ML algorithm performs slightly better than the others at Signal-to-Noise Ratios (SNRs) below 5 dB. However, because robust modulation schemes such as BPSK and QPSK typically used at low SNRs tolerate larger offsets, the medium-to-high SNR range is of greater practical relevance. In this range, all four algorithms demonstrate comparable estimation performance.  Conclusions  This study addresses the effect of Doppler in 5G-NTN LEO satellite communications by analyzing the mechanism and influence of ISFO and by proposing two joint estimation algorithms for CFO and SFO. First, a mathematical model of the received signal is established considering non-ideal factors such as CFO, SFO, and ISFO. The combined effect of SFO and ISFO on OFDM signals is derived to be equivalent to their linear superposition, which expands the range of the equivalent SFO. Second, the objective function is defined using the cross-correlation vector of two DMRS sequences. By using its unimodal behavior within the main lobe, a binary search enables fast convergence. Subsequently, the parameter determined by SFO and ISFO is then treated as analogous to the CFO in single-carrier systems, allowing an approximate closed-form estimation solution to be obtained through the L&R method. Finally, complexity analysis and performance simulations show that the proposed algorithms provide significant computational savings and strong estimation performance. These results can support the development of 5G-NTN LEO satellite payloads and terminal products.
Mamba-YOWO: An Efficient Spatio-Temporal Representation Framework for Action Detection
MA Li, XIN Jiangbo, WANG Lu, DAI Xinguan, SONG Shuang
Available online  , doi: 10.11999/JEIT251124
Abstract:
  Objective  Spatio-temporal action detection aims to localize and recognize action instances in untrimmed videos, which is crucial for applications like intelligent surveillance and human-computer interaction. Existing methods, particularly those based on 3D CNNs or Transformers, often struggle with balancing computational complexity and modeling long-range temporal dependencies effectively. The YOWO series, while efficient, relies on 3D convolutions with limited receptive fields. The recent Mamba architecture, known for its linear computational complexity and selective state space mechanism, shows great potential for long-sequence modeling. This paper explores the integration of Mamba into the YOWO framework to enhance temporal modeling efficiency and capability while reducing computational burden, addressing a significant gap in applying Mamba specifically to spatio-temporal action detection tasks.  Methods  The proposed Mamba-YOWO framework is built upon the lightweight YOWOv3 architecture. It features a dual-branch heterogeneous design for feature extraction. The 2D branch, based on YOLOv8’s CSPDarknet and PANet, processes keyframes to extract multi-scale spatial features The core innovation lies in the 3D temporal modeling branch, which replaces traditional 3D convolutions with a hierarchical structure composed of a Stem layer and three Stages (Stage1-Stage3). Stage1 and Stage2 utilize Patch Merging for spatial downsampling and stack Decomposed Bidirectionally Fractal Mamba (DBFM) blocks. The DBFM block employs a bidirectional Mamba structure to capture temporal dependencies from both past-to-future and future-to-past contexts. Crucially, a Spatio-Temporal Interleaved Scan (STIS) strategy is introduced within DBFM, which combines bidirectional temporal scanning with spatial Hilbert quad-directional scanning, effectively serializing video data while preserving spatial locality and temporal coherence. Stage3 incorporates a 3D average pooling layer to compress features temporally. An Efficient Multi-scale Spatio-Temporal Fusion (EMSTF) module is designed to integrate features from the 2D and 3D branches. It employs group convolution-guided hierarchical interaction for preliminary fusion and a parallel dual-branch structure for refined fusion, generating an adaptive spatio-temporal attention map. Finally, a lightweight detection head with decoupled classification and regression sub-networks produces the final action tubes.  Results and Discussions  Extensive experiments were conducted on UCF101-24 and JHMDB datasets. Compared to the YOWOv3/L baseline on UCF101-24, Mamba-YOWO achieved a Frame-mAP of 90.24% and a Video-mAP@0.5 of 60.32%, representing significant improvements of 2.1% and 6.0%, respectively (Table 1). Notably, this performance gain was achieved while reducing parameters by 7.3% and computational load (GFLOPs) by 5.4%. On JHMDB, Mamba-YOWO attained a Frame-mAP of 83.2% and a Video-mAP@0.5 of 86.7% (Table 2). Ablation studies confirmed the effectiveness of key components: The optimal number of DBFM blocks in Stage2 was found to be 4, beyond which performance degraded likely due to overfitting (Table 3). The proposed STIS scan strategy outperformed 1D-Scan, Selective 2D-Scan, and Continus 2D-Scan, demonstrating the benefit of jointly modeling temporal coherence and spatial structure (Table 4). The EMSTF module also proved superior to other fusion methods like CFAM, EAG, and EMA (Table 5), highlighting its enhanced capability for cross-modal feature integration. The performance gains are attributed to the efficient long-range temporal dependency modeling by the Mamba-based branch with linear complexity and the effective multi-scale feature fusion facilitated by the EMSTF module.  Conclusions  This paper presents Mamba-YOWO, an efficient spatio-temporal action detection framework that integrates the Mamba architecture into YOWOv3. By replacing traditional 3D convolutions with a DBFM-based temporal modeling branch featuring the STIS strategy, the model effectively captures long-range dependencies with linear complexity. The designed EMSTF module further enhances discriminative feature fusion through group convolution and dynamic gating. Experimental results on UCF101-24 and JHMDB datasets demonstrate that Mamba-YOWO achieves superior detection accuracy (e.g., 90.24% Frame-mAP on UCF101-24) while simultaneously reducing model parameters and computational costs. Future work will focus on theoretical exploration of Mamba’s temporal mechanisms, extending its capability for long-video sequencing, and enabling lightweight deployment on edge devices.
UAV-assisted Mobile Edge Computing based on Hybrid Hierarchical DRL in the Internet of Vehicular
YANG Miaoyan, FANG Xuming
Available online  , doi: 10.11999/JEIT250743
Abstract:
  Objective  In the internet of vehicle (IoV), utilizing unmanned aerial vehicle (UAV) to address the tidal wave of edge computing has become a key technology in the 6G field in recent years. However, when using deep reinforcement learning (DRL) to optimize system latency, the action space dimension grows exponentially with the number of vehicles, leading to training difficulties and slow convergence. Therefore, this paper proposes a two-layer hybrid solution for UAV-assisted mobile edge computing (MEC) based on DRL which called hybrid hierarchical deep reinforcement learning(HHDRL).  Methods  The proposed HHDRL algorithm employs a two-layer architecture to hierarchically solve complex optimization problems. The upper layer employs an agent based on proximal policy optimization (PPO) combined with a multi-head actor network to manage user offloading policy and UAV control policy. The N heads in this network handle offloading decisions for the N users (local processing, offloadi- -ng to associated CAPs or UAV). A UAV flight control head is responsible for selecting from a set of discrete acceleration actions to reflect actual control constraints. The lower layer employs a computation- -ally efficient greedy algorithm to prioritize resources based on task characteristics. This hybrid hierarchi- -cal approach avoids the high computational cost of resource allocation schemes based solely on DRL.  Results and Discussions  The performance of the proposed HHDRL scheme was verified through numerical simulations. The parameters used in the simulation include parameters related to the specific Rician fading channel, parameters related to the UAV flight energy consumption model, and system parameters(e.g., mission data size of 9-18 Mbits and mission complexity of 2000-3000 cycles/bit). Figure 3 shows a training convergence comparison between the HHDRL scheme and the original DRL algorithm, demonstrating that HHDRL consistently converges faster than the DRL scheme, despite achieving slightly lower final rewards compared to the pure DRL approach. Figure 4 illustrates the impact of the HHDRL architecture on user delay fairness; the comparison reveals that the introduction of the HHDRL framework does not compromise the user fairness performance inherent to the DRL method. The performance evaluation in Figure 5 shows that the proposed scheme reduces system latency by approximately 71%-91% compared to a random baseline, and 1%-12% compared to the original DRL algorithm. Figure 6 shows a training time analysis for different numbers of users. Across different numbers of users, the HHDRL scheme consistently has shorter training times than the DRL scheme. Furthermore, as the number of users increases, the HHDRL scheme's training time increases more slowly. This is attributed to the hybrid hierarchical algorithm network architecture, which simplifies the DRL output action space. When we replace the upper-layer algorithm from PPO with other DRL algorithm, we still outperform the random baseline, and achieve comparable performance to the non-hybrid-hierarchical approach. This demonstrates the effectiveness and universality of the hybrid hierarchical architecture in achieving significant training acceleration while maintaining performance. The system parameter sensitivity analysis in Figure 8 shows that computational resources have the most significant impact on latency performance, compared to user transmission power and system bandwidth. This is because computational latency typically accounts for a larger proportion than communication latency in task processing. Figure 9 shows the results of the UAV trajectory optimization. Figure 9(a) shows the change in the UAV's velocity over time, demonstrating that discrete acceleration control reflects actual control accuracy and response delay considerations rather than idealized instantaneous velocity changes. Figure 9(b) shows the X-coordinates of the UAV and user over time, illustrating that the UAV adaptively adjusts its position to match the changing user distribution while maintaining flight stability.  Conclusions  This paper proposes a HHDRL algorithm that integrates DRL with a greedy algorithm in a hierarchical framework to address the difficulty of training UAV-assisted MEC systems in IoV. Simulation results confirm that: (1) Compared with the DRL method, the proposed method significantly accelerates the training convergence speed and shortens the training time. (2) The system latency performance of the proposed algorithm is almost comparable to that of the pure DRL method, while significantly outperforming the heuristic baseline and random baseline algorithms. (3) The HHDRL framework is able to effectively manage user task offloading, computing node resource allocation, and joint optimization of UAV trajectories under practical operational constraints. Future work will extend the framework to apply to multi-UAV collaboration and consider more complex environments.
Towards Privacy-Preserving and Lightweight Modulation Recognition for Short-Wave Signals under Channel Shifts
YAO Yizhou, DENG Wen, LI Baoguo
Available online  , doi: 10.11999/JEIT251017
Abstract:
  Objective  Existing short-wave signal modulation recognition methods based on the supervised learning paradigm typically assume that training data (source domain) and test data (target domain) follow identical distributions. However, short-wave channels are susceptible to ionospheric variations, leading to significant distribution discrepancies across domains, which consequently causes model performance degradation. Furthermore, deployment on the edge side of unmanned platforms is constrained by limited device resources, scarce labeled samples, and data privacy requirements. To address these challenges, a lightweight recognition method based on source-model transfer is proposed in this paper, enabling privacy-preserving model adaptation without the need to access source domain data.  Methods  A multi-modal source-model transfer framework (M-SMOT) is developed, which utilizes information maximization loss and self-supervised pseudo-labeling techniques to facilitate model adaptation without revisiting source domain data. This approach achieves effective cross-channel recognition of short-wave modulation signals while reducing computational resource consumption and preserving data privacy. Additionally, multi-modal information—comprising in-phase/quadrature (I/Q) components, amplitude-phase (AP) characteristics, and spectral features—is fused to leverage complementary feature representations, thereby enhancing the robustness of the recognition network against complex channel variations.  Results and Discussions  Experimental results demonstrate that the recognition performance of the proposed method consistently surpasses that of the Source-Only baseline across six cross-channel scenarios, with improvements ranging from 0.31% to 10.81% (Table 1). In terms of few-shot adaptation, average recognition accuracies are maintained at 98.3% and 96% relative to the full-sample baseline, even when target domain training samples are reduced to 10% and 1%, respectively (Fig. 12). Ablation studies verify the necessity and effectiveness of the self-supervised pseudo-labeling module (Fig. 16) and the multi-modal fusion strategy (Fig. 17), confirming that both components contribute to the overall performance. Furthermore, the lightweight advantages are quantified: the method requires zero storage for source data, exhibits a peak memory consumption of only 6.00 MB, and achieves convergence within a single fine-tuning epoch (Table 2). These findings validate the capability of the proposed mechanism to mitigate domain discrepancies and protect privacy under resource-constrained conditions.  Conclusions  The M-SMOT method successfully integrates data privacy protection, source model adaptation, few-shot generalization, and low resource consumption. Consequently, it provides a practical solution for cross-channel modulation recognition in short-wave communications, demonstrating significant potential for deployment on resource-limited edge devices.
Indoor Visible Light Positioning Based on CNN–MLP Multi-Feature Fusion under Random Receiver Tilt Conditions
JIA Kejun, WANG Jian, MAO Lifei, YOU Wei, HUANG Ziyang, PENG Duo
Available online  , doi: 10.11999/JEIT251021
Abstract:
  Objective  Traditional visible light positioning (VLP) methods based on received signal strength (RSS) suffer from instability when the receiver experiences orientation perturbations, which disrupt the correspondence between optical power and spatial position, making reliable three-dimensional (3D) positioning difficult to achieve. Existing approaches typically rely on inertial measurement units (IMUs) to obtain orientation information; however, sensor fusion increases system complexity and hardware cost and introduces cumulative errors. To address these issues, this paper proposes a positioning method that fuses cosine-of-incidence-angle estimation based on a photodiode (PD) array with RSS information, enabling high-accuracy 3D indoor positioning under receiver orientation perturbations.  Methods  In the proposed fusion-based positioning method, a multi-PD array structure is first adopted, and a local coordinate system (LCS) is established at the array center. Constraint equations are then constructed based on the differences in received optical power among PDs in the array. A Gauss–Newton iterative algorithm is employed to estimate the incident light direction vector. By exploiting the orthogonal rotation invariance between the LCS and the global coordinate system (GCS), the cosine of the incident angle is estimated without the need for orientation sensors. Subsequently, a serial CNN–MLP fusion network is constructed, in which the estimated incident-angle cosine is introduced as an additional positioning feature on top of RSS-based localization. The network jointly models the RSS and incident-angle cosine information received by the PD array and maps them to 3D spatial coordinates. Finally, training samples are generated using Latin hypercube sampling (LHS) to uniformly sample spatial positions and orientation dimensions, thereby improving the representativeness of the training dataset.  Results and Discussions  Simulation experiments are conducted in a 4 m × 4 m × 2.5 m indoor environment. First, the effects of different numbers of PDs and tilt angles on the accuracy of incident-angle cosine estimation and spatial coverage are evaluated (Fig. 6), and the cumulative distribution functions (CDFs) of positioning errors under different array configurations are compared (Fig. 7). The results show that a 3-PD array with a tilt angle of 40° achieves the best balance among cost, coverage, and positioning accuracy. Next, positioning performance under different receiver tilt angles is analyzed. When the tilt angle is small, more than 70% of positioning errors are below 5 cm; even when the receiver is tilted up to 55°, the average error remains within 11.7 cm (Fig. 8). Error component comparisons indicate that the error along the Z-axis is significantly smaller than those along the X and Y axes (Fig. 9). Further tests are conducted at a height of 0.0 m covered by the training data and at an unseen height of 0.6 m not included in the training set (Fig. 10). The results demonstrate that the proposed model does not exhibit strong dependence on a specific height plane and maintains stable 3D positioning performance at unseen heights. Finally, the proposed method is compared with related positioning schemes. It outperforms existing methods in terms of CDF convergence speed, RMSE, and standard deviation (Fig. 11), achieving an average error reduction of approximately 2.5 cm and an RMSE reduction of 31.58% compared with Ref. [12].  Conclusions  This paper estimates the cosine of the incident angle at the receiver by exploiting differences in the optical power received by different PDs in an array and introduces this cosine value as a joint positioning feature into conventional RSS-based localization, thereby alleviating the instability of position mapping caused by relying solely on RSS under random receiver perturbations. By further combining the spatial feature extraction capability of CNNs with the nonlinear modeling strength of MLPs, the proposed method effectively maps positioning features to 3D spatial coordinates. The approach reduces reliance on orientation sensors such as IMUs while overcoming the susceptibility of traditional geometric positioning methods to noise and high-dimensional nonlinear features. Under varying heights and receiver orientations, the proposed algorithm demonstrates significant advantages in both positioning accuracy and stability.
Small Object Detection Algorithm for UAV Aerial Images in Complex Environments
LIU Jie, LIU Shuhao, TIAN Ming, CUI Zhigang
Available online  , doi: 10.11999/JEIT251126
Abstract:
  Objective  Small object detection plays a critical role in practical applications such as UAV (Unmanned Aerial Vehicle) inspection and intelligent transportation systems, where precise perception of diminutive targets is essential for operational reliability and safety. It enables the automated identification and tracking of challenging targets. However, the limited pixel size of small objects, coupled with their tendency to be obscured or integrated with complex backgrounds, results in strong background noise, leading to poor performance and elevated false-negative rates in existing detection models. To address this issue and achieve high-performance, high-precision detection of small objects in complex environments, this study proposes HAR-DETR, an enhancement over the RT-DETR baseline model, aimed at improving the detection accuracy for small objects.  Methods  HAR-DETR is proposed for small object detection in aerial images, incorporating three key improvements: Aggregated Attention, RFF-FPN (Recalibrated Feature Fusion Network-FPN), and a high-resolution detection branch. In the backbone network, Aggregated Attention enhances the model's ability to focus on relevant features of small objects. By expanding the receptive field, the model captures more detailed edge and texture information, thereby enabling more effective extraction of multi-scale features of the targets. During the feature fusion phase, RFF-FPN selectively integrates high-level and low-level features, allowing the network to retain critical spatial information and context. This facilitates the refinement of the edges and contours of small objects, improving the accuracy of localization and recognition, especially when object details may be obscured by background clutter or varying lighting conditions. The high-resolution detection head places greater emphasis on the edge features of small objects, providing enhanced small object perception capabilities, and further improving the model's robustness and precision.  Results and Discussions  A comparative analysis is conducted with several widely used object detection models, including YOLOv5, YOLOv8, YOLOv10 and so on, to evaluate the performance of the model in small object detection using precision, recall, and mAP metrics. Experimental results show that the HAR-DETR model outperforms other comparative models in terms of precision, recall, and mAP on the VisDrone2019 dataset (Table 1). The mAP50 and mAP50-95 are improved by 3.8% and 3.2%, respectively, compared to the baseline model (Table 2). This demonstrates that the HAR-DETR model offers superior performance in detecting small objects in aerial images under complex environments. Heatmaps generated using GradCAM are utilized for comparative analysis of the proposed improvements, showing better detection results for all improvements compared to the baseline model (Fig. 6). In the generalization performance experiment, the VisDrone2019 validation set and RSOD dataset are used under identical training conditions. The experimental results indicate that HAR-DETR exhibits strong generalization ability across heterogeneous tasks (Tables 3 and 4).  Conclusions  This paper addresses the issues of false positives and false negatives in small object detection within aerial images captured in complex environments by utilizing the HAR-DETR model. Aggregated Attention is introduced in the backbone feature extraction phase to expand the receptive field and enhance global feature extraction capabilities. In the feature fusion phase, the RFF-FPN structure is proposed to enrich the feature representations. Additionally, a high-resolution detection head is introduced to make the model more sensitive to the edge textures of small objects. The model is evaluated using the Visdrone2019 and RSOD datasets, and the results demonstrate the following: (1) The proposed method improves the small object detection metrics, mAP50 and mAP50-95, by 3.8% and 3.2%, respectively, compared to the baseline model, achieving 51.2% and 32.1%, and mitigating the issues of false negatives and false positives; (2) In comparison with other mainstream object detection models, HAR-DETR exhibits the best performance in small object detection, thereby fully validating the effectiveness of the model; (3) The HAR-DETR model achieves high accuracy in cross-dataset training, demonstrating its excellent generalization performance. These results indicate that HAR-DETR possesses stronger semantic expression and spatial awareness capabilities, making it adaptable to various aerial perspectives and target distribution patterns, thus providing a more versatile solution for UAV visual perception systems in complex environments.
SCUNet-Based Decoding Algorithm for Rayleigh Fading Channels Integrating Feature Extraction and Recovery Mechanisms
WANG Leijun, WANG Kuan, XIE Jinfa, PENG Xidong, LI Jiawen, CHEN Rongjun
Available online  , doi: 10.11999/JEIT251138
Abstract:
  Objective  This study addresses the limitations of conventional deep neural network (DNN) decoding algorithms in Rayleigh fading channels, such as constrained performance, insufficient generalization capability, and weak resistance to fading. To tackle these issues, a feature extraction and recovery decoding algorithm based on the SCUNet architecture, termed SCUNetDec, is proposed. In the 6G communication era, wireless channels are characterized by high dynamics and complexity, making it difficult for traditional decoding methods to meet the strict requirements for high reliability, low latency, and strong robustness. Therefore, exploring intelligent decoding mechanisms with adaptive feature learning capabilities holds significant theoretical and practical importance. By integrating a multi-dimensional feature extraction and recovery mechanism and incorporating a noise-level map to enhance the network’s perception of channel states, SCUNetDec effectively learns channel characteristics, mitigates fading effects, and significantly improves decoding performance. This research not only provides a new approach to the design of intelligent decoding in complex channel environments but also lays a key technical foundation for building efficient and intelligent 6G communication systems.  Methods  In the construction of the research methodology, the proposed SCUNetDec network deeply integrates three core mechanisms—data preprocessing, feature extraction and recovery, and noise-level mapping—to achieve efficient and robust signal representation learning and decoding performance enhancement in Rayleigh fading channel environments. First, in the data preprocessing stage, dimensionality expansion operations are employed to map the original one-dimensional received signal into a two-dimensional feature map, enhancing the discernibility of the signal structure and providing a spatial correlation foundation for subsequent deep feature extraction. Second, a feature extraction and recovery module is constructed: the extraction module combines multi-layer convolutional layers with attention mechanisms to effectively capture essential channel features in the signal, while the recovery module utilizes deconvolutional layers and residual connections to suppress irrelevant interference introduced during the dimensionality transformation process, thereby improving signal reconstruction quality and decoding accuracy. Furthermore, a noise-level map mechanism is incorporated into the network. By embedding SNR-aware information that aligns with the feature maps, the model can dynamically adapt to changes in channel conditions and adjust its decoding strategy and feature extraction intensity accordingly. The synergistic interaction of these three mechanisms significantly enhances the noise robustness, generalization capability, and decoding stability of SCUNetDec in Rayleigh fading channels, providing a systematic solution for intelligent decoding in complex 6G wireless environments.  Results and Discussions  The SCUNetDec decoding algorithm, built upon the SCUNet architecture, significantly enhances signal learning and decoding capabilities in Rayleigh fading channels by integrating a feature extraction-recovery module and a noise-level map. Its performance was evaluated through simulations under various coding schemes. For (7,4) Hamming code, SCUNetDec outperformed conventional DNN decoding and closely approached Maximum Likelihood (ML) performance. Specifically, at a BER of \begin{document}$ {10}^{-4} $\end{document}, the performance gap to ML decoding was about 1.5 dB, and at a FER of \begin{document}$ {10}^{-3} $\end{document}, the gap was approximately 2.0 dB (Fig. 4). This shows that SCUNetDec can capture complex relationships within signals, thereby effectively learning the latent associations between information and parity-check nodes. For (2,1,3) Convolutional code, SCUNetDec's performance at BER=\begin{document}$ {10}^{-3} $\end{document} was close to the Viterbi algorithm, with a marginal gap of only about 2.0 dB. In contrast, DNN decoding performance degraded significantly at high SNRs, demonstrating SCUNetDec's superior decoding capability and robustness (Fig. 5). For Polar codes with a rate of 0.5, SCUNetDec exhibited strong learning and generalization capabilities. It achieved a gain of approximately 4.0 dB over Successive Cancellation (SC) decoding at BER=\begin{document}$ {10}^{-4} $\end{document} and maintained an advantage of about 1.0 dB at FER=\begin{document}$ {10}^{-3} $\end{document}, whereas SC decoding only showed a slight advantage in the low SNR region (Fig. 6). The comparison results of decoding time indicate that the SCUNetDec decoder can reduce decoding time compared to traditional decoding algorithms (Table 2). The ablation experiments demonstrate that combining the designed feature extraction and recovery modules with SCUNet leads to better decoding performance (Fig. 7). In summary, comprehensive analysis confirms that SCUNetDec delivers outstanding and robust decoding performance across multiple coding schemes and varying signal-to-noise ratio conditions.  Conclusions  To address the limited decoding performance of DNNs in Rayleigh fading channels, this paper proposes a decoding method named SCUNetDec based on the SCUNet network. The method enhances SCUNet by designing signal feature extraction and signal recovery modules. Simulations and ablation studies on Hamming codes, convolutional codes, and Polar codes demonstrate that the proposed modules exhibit strong generalization capability and effectiveness, making them suitable for various coding schemes. Compared with traditional DNN models, SCUNetDec shows superior decoding performance in Rayleigh fading channels, approaching that of conventional optimal decoding algorithms while significantly reducing decoding time. These results indicate that the SCUNetDec decoding algorithm possesses certain performance advantages and practical application potential in complex channel environments. Future work will focus on algorithm fusion and engineering implementation. On one hand, we aim to deepen the co-design of neural networks and traditional algorithms to achieve an optimal trade-off between performance and complexity via dynamic parameter optimization, while further exploring intelligent decoding schemes for long codes. On the other hand, research will be conducted on joint modulation-decoding modeling and end-to-end architectures to enhance the model's adaptability and practical value under high-order modulation and complex channel environments.
Detection and Parameter Estimation of Quadratic Frequency Modulated Signal Based on Non-uniform Quadrilinear Autocorrelation Function
YANG Yuchao, FANG Gang
Available online  , doi: 10.11999/JEIT250723
Abstract:
  Objective  Polynomial Phase Signal (PPS) analysis has attracted broad attention because many radar, sonar, and seismic signals are modeled as PPS of different orders. A first-order PPS can be focused into a frequency bin through the Fourier transform to estimate the center frequency. For higher order PPS, such as a Quadratic Frequency Modulated (QFM) signal, non-coherent characteristics limit the effectiveness of the Fourier transform for energy integration. Existing time–frequency distribution methods, such as the short-time Fourier transform and the Wigner-Ville distribution, do not resolve the conflicts between auto-terms and cross-terms or between time- and frequency-domain resolution. In addition, current algorithms face difficulties in balancing computational complexity and detection performance, which results in reduced parameter estimation accuracy. This study proposes a QFM detection method based on a non-uniform quadrilinear autocorrelation function to provide balanced performance for QFM parameter estimation with controlled computational cost.  Methods  A time–frequency distribution method for QFM detection and parameter estimation is presented. The method applies non-uniform sampling and maps a one-dimensional signal into a two-dimensional time domain through a forth-order autocorrelation function. A non-uniform fast Fourier transform is used to resolve the time variable and concentrate the energy into a vertical line in the two-dimensional plane. Then, FFT is performed along this line to focuse the signal into a peak, from which the chirp rate and quadratic chirp rate are estimated. Finally, dechirp processing compensates high-order phase terms of the original signal, and FFT yields the center frequency estimation result can be obtaioned through FFT operation.  Results and Discussions  Theoretical analysis and simulation results show that the method balances computational complexity and detection performance. Under low signal-to-noise ratio conditions, it distinguishes targets effectively and produces accurate parameter estimates (Fig. 1). For multicomponent signals with large amplitude differences, it enables stepwise detection and estimation (Fig. 2). Comparative experiments with state-of-the-art algorithms show that the method is quasi-optimal in estimation accuracy and integration gain (Fig. 3Fig. 6). Compared with the ML estimator, it offers markedly higher computational efficiency.  Conclusions  A QFM detection and parameter estimation method based on non-uniform quadrilinear autocorrelation functions is proposed. The method maps the QFM signal into a two-dimensional time domain through a new autocorrelation kernel and achieves coherent integration through scaling and FFT. Mathematical analysis and simulation results show that, relative to the ML method, it sacrifices part of the detection performance but substantially reduces computational complexity. When computational efficiency is similar, it outperforms other classical methods in detection and parameter estimation accuracy. The method provides a balanced solution for QFM signal detection and parameter estimation.
Component Placement Algorithm Considering Reagent Type Differences in Cell Reuse for FPVA Biochips
XU Yanbo, ZHU Yuhan, HUANG Xing, LIU Genggeng
Available online  , doi: 10.11999/JEIT250731
Abstract:
  Objective  Fully Programmable Valve Array (FPVA) biochips, a recent type of flow-based microfluidic biochip, offer high flexibility and programmability, which enables them to meet different and complex experimental needs. Component placement is a critical stage in FPVA architectural synthesis because it affects several performance metrics, including assay completion time, total fluid-transport length, and cross-contamination. Cell reuse, an essential feature of FPVA programmability, requires special consideration during placement. However, existing studies have largely ignored the effect of reagent type differences in cell reuse on these metrics  Methods  This study presents a component placement algorithm for FPVA biochips that accounts for reagent type differences during cell reuse. The algorithm first introduces a cell reuse complexity metric that quantifies reuse complexity by considering the effects of reagent-type differences and component overlap on cross-contamination. It then integrates constraints, including placement-area limits and non-overlapping conditions for concurrent components, to ensure valid placement. The reward function is optimized to minimize reuse complexity and reduce the distance between components that use the same reagent type. The goal is to lower cross-contamination, total fluid-transport length, and assay completion time.  Results and Discussions  The algorithm is evaluated on benchmark FPVA instances with different chip sizes and functional requirements and compared with related methods. It reduces cell reuse complexity by 34.2%, assay completion time by 2.8%, and total fluid-transport length by 9.2% on average (Table 2). It also reduces the reagent-aware distance metric by 29.9% on average (Fig. 6). The learning agent’s decision trajectories show clear spatial structure, which reflects global placement awareness.  Conclusions  This study is the first to investigate FPVA component placement with attention to reagent type differences in cell reuse. The main contributions are as follows: (1) a cell reuse complexity metric is proposed to assess reuse intensity in placement, (2) the FPVA placement problem is modeled as a Markov decision process to enable the use of double deep Q-networks for safe and efficient placement policy learning, and (3) compared with existing work, the model improves FPVA biochemical assay performance and reliability.
Lightweight Dual Convolutional Finger Vein Recognition Network Based on Attention Mechanism
ZHAO Bingyan, LIANG Yihuai, ZHANG Zhongxia, ZHANG Wenzheng
Available online  , doi: 10.11999/JEIT250380
Abstract:
  Objective  Finger vein recognition is an emerging biometric authentication technology valued for its physiological uniqueness and advantages in in vivo detection. However, mainstream deep learning recognition frameworks still face two challenges. High-precision recognition often depends on complex network structures, which increase parameter counts and hinder deployment in memory-limited embedded devices and edge scenarios with constrained computing resources. Model compression can reduce computational cost but often weakens feature representation, creating a conflict between recognition accuracy and efficiency. To address these issues, a lightweight dual convolutional model integrated with an attention mechanism is proposed. A parallel heterogeneous convolution module and an attention guidance mechanism are designed to extract diverse image features and improve recognition accuracy while preserving a lightweight network structure.  Methods  The proposed architecture adopts a three-level collaborative mechanism comprising feature extraction, dynamic calibration, and decision fusion. A dual convolutional feature extraction module is constructed using normalized ROI images. This module combines heterogeneous convolution kernels. Rectangular convolution branches with different shapes capture venous topological structures and diameter orientations, whereas square convolution branches employ stacked square kernels to extract local texture details and background intensity distributions. These branches operate in parallel with reduced channel numbers and generate complementary responses through kernel shape diversity. This design reduces parameter scale while improving feature discrimination. A parallel dual attention mechanism is then applied to achieve two-dimensional calibration through joint optimization of channel attention and spatial attention. Channel attention adaptively assigns weights to enhance discriminative venous texture features, whereas spatial attention constructs pixel-level dependency models that focus on effective discriminative regions. A parallel concatenation fusion strategy preserves structural information without introducing additional parameters and improves sensitivity to critical features. Finally, a three-level progressive feature optimization structure is implemented. A convolutional compression module with stride 2 nests multi-scale receptive fields and progressively refines primary features during dimensionality reduction. Two fully connected layers then perform feature space transformation. The first layer applies ReLU activation to form sparse representations, and the final layer applies Softmax for probability calibration. This structure balances shallow underfitting and deep overfitting while maintaining efficient forward inference.  Results and Discussions  The effectiveness and robustness of the proposed network are evaluated on three public datasets, namely USM, HKPU, and SDUMLA. Recognition accuracy is assessed using the Acc metric. Experimental results (Table 1) show strong recognition performance. Feature visualization heatmaps (Fig. 4, Fig. 6) confirm that the network extracts complete and discriminative venous features. Training visualizations (Fig. 7, Fig. 8) show stable loss and accuracy trends, achieving 100% classification performance and demonstrating training reliability and robustness. Quantitative comparisons (Tables 2 and 3) indicate that the proposed method effectively addresses the trade-off between model complexity and classification performance and achieves superior results across all three datasets. Ablation studies (Table 4) further verify the effectiveness of the proposed modules and show significant improvements in finger vein recognition performance.  Conclusions  A lightweight dual convolutional neural network with an attention mechanism is proposed. The network consists of three core modules: a dual convolutional feature extraction module, a parallel dual-attention module, and a feature optimization classification module. During feature extraction, long-range venous features and background information are jointly encoded through a low-channel parallel design, which substantially reduces parameter counts while improving inter-individual discrimination. The attention module efficiently captures critical venous features without the parameter expansion commonly observed in conventional attention mechanisms. The feature optimization classification module applies progressive feature recalibration, which reduces underfitting and overfitting during stacked dimensionality reduction. Experimental results show recognition accuracies of 99.70%, 98.33%, and 98.27% on the USM, HKPU, and SDUMLA datasets, corresponding to an average improvement of 2.05% over existing state-of-the-art methods. Compared with representative lightweight finger vein recognition approaches, the proposed method reduces parameter scale by 11.35%~60.19%, achieving a balance between model lightening and performance improvement.
Research on a Miniaturized Wide Stopband Folded Substrate Integrated Waveguide Filter
KE Rongjie, WANG Hongbin, CHENG Yujian
Available online  , doi: 10.11999/JEIT250869
Abstract:
To meet the requirements of 5G/6G communication systems for miniaturization, high integration, and a wide stopband, this paper proposes a fourth-order bandpass filter based on an eighth-mode Folded Substrate Integrated Waveguide (FSIW) using High-Temperature Co-Fired Ceramic (HTCC) technology. The design combines the miniaturization characteristics of FSIW with the three-dimensional integration capability of HTCC. Size reduction is achieved through an eighth-mode FSIW cavity structure with dimensions of 0.29λg × 0.29λg, where λg denotes the waveguide wavelength at the center operating frequency (f0). Metal vias suppress high-order mode coupling, a bent 10 microstrip line introduces transmission zeros, and an L-shaped stub improves the high-frequency response. Three controllable transmission zeros are generated in the upper stopband, achieving 20 dB@3.73f0. Measurements show a center frequency of 6.4 GHz. Although slight frequency deviation and insertion loss are observed, the design provides clear advantages in miniaturization, stopband width, and the number of transmission zeros compared with reported work, indicating potential for high-density integrated communication systems.  Objective  The rapid development of 5G/6G communication systems increases the demand for Radio Frequency (RF) microwave devices that provide miniaturization, high integration, and wide stopband performance. As core components of RF transceiver front-ends, bandpass filters transmit useful signals and suppress interference. Conventional Substrate Integrated Waveguide (SIW) filters often show large size, limited stopband extension, and insufficient control of transmission zeros, which restrict their use in high-density integrated systems. To address these challenges, this paper presents a miniaturized wide stopband fourth-order bandpass filter based on an eighth-mode FSIW structure and HTCC technology to achieve compact size and broad stopband performance.  Methods  The filter integrates the miniaturization capability of FSIW with the three-dimensional integration characteristics of HTCC. First, an eighth-mode FSIW cavity is developed by modifying a quarter-mode FSIW cavity. A square patch is replaced with a triangular patch (eighth-mode cavity I), followed by slot etching in the triangular patch (eighth-mode cavity II). Second, a fourth-order bandpass filter is constructed by symmetrically designing two triangular metal patches for each cavity type and stacking them vertically. A common metal layer (fifth layer) containing coupling windows enables coupling between the upper and lower cavities. Three techniques are used to optimize performance: metal vias to suppress high-order mode coupling, bent microstrip lines to generate transmission zeros, and an L-shaped stub to enhance high-frequency response. Parameter scanning of key dimensions (d2, s4, s6) verifies the controllability of transmission zeros. The filter is fabricated using HTCC on an Al2O3 substrate with relative permittivity 9.8 and loss tangent 0.000 2.  Results and Discussions  Measurements show a center frequency of 6.4 GHz. Although fabrication and assembly deviations cause slight frequency shift and additional insertion loss, the filter demonstrates strong performance compared with reported designs (Table 2). The size of 0.29λg×0.29λg is smaller than that of most SIW filters. The upper stopband extends to 20dB@3.73f0, outperforming filters of comparable size. Three controllable transmission zeros appear in the upper stopband, and parameter scanning confirms their tunability (Fig. 13).  Conclusions  A miniaturized wide stopband fourth-order bandpass filter based on an eighth-mode FSIW structure is presented. The eighth-mode cavity combined with HTCC technology achieves a compact footprint of 0.29λg×0.29λg, meeting the integration requirements of 5G/6G systems. The use of metal vias, bent microstrip lines, and L-shaped stubs generates a wide stopband of 20 dB@3.73f0 and three tunable transmission zeros, strengthening interference suppression. Adjustable parameters enable flexible tuning of transmission zero frequencies without affecting the passband, improving the adaptability of the design to different interference conditions. These advances address key challenges in miniaturization, stopband extension, and design flexibility of SIW filters, offering a practical solution for RF front-ends in next-generation high-density integrated communication systems.
Radio Map Enabled Path Planning for Multiple Cellular-Connected Unmanned Aerial Vehicles
ZHOU Decheng, WANG Wei, SHAO Xiang, CHEN Mei, XIAO Jianghao
Available online  , doi: 10.11999/JEIT250821
Abstract:
  Objective  In collaborative operation scenarios of cellular-connected Unmanned Aerial Vehicles (UAVs), conflict avoidance strategies often cause unbalanced service quality. Traditional schemes focus on reducing total task completion time but do not ensure service fairness. To address this issue, a radio map-assisted cooperative path planning scheme is proposed. The objective is to minimize the maximum weighted sum of task completion time and communication disconnection time across all UAVs to improve service fairness in multi-UAV scenarios.  Methods  A Signal-to-Interference-plus-Noise Ratio (SINR) map is constructed to assess communication quality. The two-dimensional airspace is discretized into grids, and link gain maps are generated through ray tracing and Axis-Aligned Bounding Box detection to determine Line-of-Sight (LoS) or Non-Line-of-Sight (NLoS) conditions. The SINR map is produced by selecting, for each grid, the base station with the highest expected SINR. To solve the optimization problem, an Improved Conflict-Based Search (ICBS) algorithm with a hierarchical structure is developed. At the high-level stage, proximity conflicts are managed to maintain safety distances, and the cost function is reformulated to emphasize fairness by minimizing the maximum weighted time. The low-level stage applies a bidirectional A* algorithm for single-UAV path planning, using parallel search to improve efficiency while meeting the constraints set by the high-level stage.  Results and Discussions  The proposed scheme is evaluated through simulations across different scenarios. Building heights and positions are shown, where base station locations are marked by red stars and building heights are represented with color gradients from light to dark to indicate increasing height (Fig. 2). The wireless propagation characteristics between UAVs and ground base stations are demonstrated by the SINR map at an altitude of 60 m (Fig. 3), which shows significant SINR degradation in areas affected by building blockage and co-channel interference, resulting in communication blind zones. Trajectory planning results for four UAVs at an altitude of 60 m with a SINR threshold of 2 dB show that all UAVs avoid signal blind zones and complete tasks without collision risks under the proposed scheme (Fig. 4). The trade-off between task completion time and disconnection time is controlled by the weight coefficient (Fig. 5). The maximum weighted time increases monotonically as the weight coefficient increases, whereas the maximum disconnection time decreases. The bidirectional A* algorithm achieves higher computational efficiency than Dijkstra’s and traditional A* algorithms while maintaining optimal solution quality (Table 1). All three algorithms yield identical weighted times, confirming the optimality of the bidirectional A* approach, and its runtime is reduced significantly due to parallel search. Compared with three benchmark schemes, the proposed scheme achieves the lowest maximum weighted time for different SINR thresholds (Fig. 6). Performance analysis at different UAV altitudes shows that the proposed scheme maintains stable maximum weighted time below 75 m, while sharp increases appear above 75 m due to intensified interference from non-serving base stations (Fig. 7). The scalability analysis further shows clear improvements over benchmark schemes, especially when conflicts occur more frequently (Fig. 8).  Conclusions  To address fairness in cellular-connected multi-UAV systems, a radio map-assisted path planning scheme is proposed to minimize the maximum weighted time. Based on a discretized SINR map, an ICBS algorithm is developed. At the high-level stage, proximity conflicts and a reformulated cost function ensure safety and fairness, and at the low-level stage, a bidirectional A* algorithm increases search efficiency. Simulation results show that the proposed scheme lowers the maximum weighted time compared with benchmark schemes and improves fairness and overall multi-UAV collaboration performance.
Inverse Design of a Silicon-Based Compact Polarization Splitter-Rotator
HUI Zhanqiang, ZHANG Xinglong, HAN Dongdong, LI Tiantian, GONG Jiamin
Available online  , doi: 10.11999/JEIT250858
Abstract:
  Objective  The Polarization Splitter-Rotator (PSR) is a key device used to control the polarization state of light in Photonic Integrated Circuits (PICs). Device size has become a major constraint on integration density in PICs. Traditional design methods are time-consuming and tend to yield larger device footprints. Inverse design, by contrast, determines structural parameters through optimization algorithms according to target performance and enables compact devices to be obtained while maintaining functionality. This strategy is now applied to wavelength and mode division multiplexers, all-optical logic gates, power splitters, and other integrated photonic components. The objective of this work is to use inverse design to address size limitations in silicon-based PSRs by combining the Momentum Optimization algorithm with the Adjoint Method. This combined approach improves the integration level of PICs and provides a feasible pathway for the miniaturization of other photonic devices.  Methods  The design region is defined on a 220 nm Silicon-on-Insulator (SOI) wafer and is discretized into 25×50 cylindrical elements. Each element has a 50 nm radius, a 150 nm height, and an initial relative permittivity of 6.55. The adjoint method is used to obtain gradient information across the design region, and this gradient is processed with the Momentum Optimization algorithm. The relative permittivity of each element is then updated according to the processed gradient. During optimization, the momentum factor is dynamically adjusted with the iteration number to accelerate convergence, and a linear bias is applied to guide the permittivity toward the values of silicon and air as the iterations progress. After optimization, the elements are binarized based on their final permittivity: values below 6.55 are assigned to air, whereas values above 6.55 are assigned to silicon. This results in a structure containing irregularly distributed air holes. To compensate for performance loss introduced during binarization, the etching depth of air holes with pre-binarization permittivity between 3 and 6.55 is optimized. Adjacent air holes are merged to reduce fabrication errors. The final device consists of air holes with five radii, among which three larger-radius types are selected for further refinement. Their etching radii and depths are optimized to recover remaining performance loss. Device performance is evaluated through numerical analysis. Calculated parameters include Insertion Loss (IL), Crosstalk (CT), Polarization Extinction Ratio (PER), and bandwidth. Tolerance analysis is also conducted to assess robustness under fabrication variations.  Results and Discussions   A compact PSR is designed on a 220 nm SOI wafer with dimensions of 5 μm in length and 2.5 μm in width. During optimization, the momentum factor in the Momentum Optimization algorithm is dynamically adjusted. A larger momentum factor is applied in the early stage to accelerate escape from local maxima or plateau regions, whereas a smaller momentum factor is used in later iterations to increase the weight of the current gradient. Compared with other optimization strategies, this algorithm requires only 20%~33% of the iteration count needed by alternative methods to reach a Figure of Merit (FOM) of 1.7, which improves optimization efficiency. Numerical analysis shows that the device achieves stable performance across the 1 520~1 575 nm wavelength range. The IL remains low (TM0 < 1 dB, TE0 < 0.68 dB), and the CT is effectively suppressed (TM0 < –23 dB, TE0 < –25.2 dB). The PER is high (TM0 > 17 dB, TE0 > 28.5 dB). Tolerance analysis indicates strong robustness to fabrication variations. Within the 1 520~1 540 nm range, performance remains stable under etching depth offsets of ±9 nm and etching radius offsets of ±5 nm, demonstrating reliable manufacturability.  Conclusions   Numerical analysis demonstrates that combining the adjoint method with the Momentum Optimization algorithm is a feasible strategy for designing an integrated PSR. The design principle relies on controlling light propagation through adjustments to the relative permittivity, which determine the distribution and placement of air holes to achieve polarization splitting and rotation. Compared with traditional design approaches, inverse design uses the design region more efficiently and enables a more compact device structure. The proposed PSR is markedly smaller and shows enhanced fabrication tolerance. It is suitable for future large-scale PICs and provides useful guidance for the miniaturization of other photonic devices.
Research on UAV Swarm Radiation Source Localization Method Based on Dynamic Formation Optimization
WU Sujie, WU Binbin, YANG Ning, WANG Heng, GUO Daoxing, GU Chuan
Available online  , doi: 10.11999/JEIT251023
Abstract:
In dense and structurally complex urban environments, Unmanned Aerial Vehicle (UAV) swarm radiation source localization is affected by signal attenuation, multipath propagation, and building obstructions. To address these limitations, a dynamic formation-optimization method for UAV swarms is proposed. By improving the geometric configuration of the swarm, the method reduces path loss and interference, which strengthens localization accuracy. Received signal strength is used to evaluate signal quality in real time and supports adaptive formation adjustments that improve propagation conditions. Geometric dilution of precision and root mean square error metrics are integrated to refine swarm geometry and improve distance-estimation reliability. Simulation results show that the proposed method converges faster and improves localization accuracy in complex urban environments, reducing errors by more than 80 percent. The method adapts to environmental variation and demonstrates strong robustness and practical value.  Objective  UAV swarm localization and formation control in urban environments are affected by obstacles, signal attenuation, and rapid variation in the surroundings that reduce the reliability of conventional methods. This study proposes a radiation source localization approach that integrates the Received Signal Strength Indicator (RSSI) with dynamic formation adjustment to improve localization accuracy and strengthen system robustness in complex urban scenarios. RSSI is used once in full form, then referenced consistently.  Methods  The method uses RSSI measurements to estimate the distance to the radiation source and adjusts UAV swarm formation in real time to reduce localization errors. These adjustments are based on feedback that reflects relative positions, signal strength, and environmental variation. Localization accuracy is strengthened through a multi-sensor fusion strategy that integrates GPS, IMU, and depth-camera data. A data-quality assessment mechanism evaluates signal reliability and triggers formation adaptation when the signal drops below a predefined threshold. This optimization process reduces positioning errors and improves system robustness.  Results and Discussions  Simulation experiments in a ROS-based environment were conducted to evaluate the UAV swarm localization method under urban obstacles and multipath conditions. The swarm began in a hexagonal formation and adjusted its geometry according to environmental variation and localization confidence (Fig. 34). As shown in Fig. 5, localization errors fluctuated during initialization but converged to below 1 m after 150 s. Formation comparisons (Fig. 6) showed that symmetric structures such as hexagonal and triangular formations maintained errors below 0.5 m, whereas asymmetric formations (T and Y shape) produced deviations up to 4.9 m. Further comparisons (Fig. 7) showed that traditional RSSI saturated near 15 m, direction of arrival fluctuated between 5 and 14 m, and time difference of arrival failed due to synchronization problems. The proposed method achieved sub-meter accuracy within 60 s and remained robust throughout the mission. These findings indicate that combining RSSI-based distance estimation with dynamic formation adjustment improves localization accuracy, convergence speed, and adaptability under complex environmental conditions.  Conclusions  This study addresses UAV swarm localization in complex urban environments by integrating RSSI-based distance estimation, dynamic formation adjustment, and multi-sensor fusion. ROS-based simulations show that: (1) localization errors converge rapidly to sub-meter levels, reaching below 1 m within 150 s under non-line-of-sight conditions; (2) symmetric formations such as hexagonal and triangular configurations outperform asymmetric ones and reduce errors by up to 67 percent compared with fixed Y-shaped formations; and (3) relative to traditional RSSI, direction of arrival, and time difference of arrival approaches, the proposed method shows faster convergence, higher stability, and stronger robustness.
Adversarial Attacks on 3D Target Recognition Driven by Gradient Adaptive Adjustment
LIU Weiquan, SHEN Xiaoying, LIU Dunqiang, SUN Yanwen, CAI Guorong, ZANG Yu, SHEN Siqi, WANG Cheng
Available online  , doi: 10.11999/JEIT251264
Abstract:
  Objective   Robust environmental perception is essential for intelligent driving systems. Light Detection and Ranging (LiDAR) provides high-resolution 3D point cloud data and serves as a core information source for object detection and recognition. However, deep learning models for 3D point cloud recognition show notable vulnerability to adversarial attacks. Small, imperceptible perturbations can cause severe classification errors and threaten system safety. Existing attack methods have improved the Attack Success Rate (ASR), but the perturbations they generate often lack concealment, create outliers, and show poor imperceptibility because they do not adequately preserve the geometric structure of point clouds. This reduces their suitability for realistic security evaluation of optoelectronic perception systems. Developing an attack method that maintains a high success rate while preserving geometric consistency and imperceptibility is therefore critical. This study addresses this need by proposing a framework that incorporates point cloud geometry into perturbation generation.  Methods   A Gradient Adaptive Adjustment (GAA) adversarial attack method for 3D point cloud recognition is proposed. The framework (Fig. 2) includes three coordinated modules. The 3D Point Cloud Salient Region Extraction module evaluates decision-level vulnerability using Shapley value analysis to identify and rank point subsets with the strongest influence on classifier output. Perturbations are then concentrated in these sensitive regions. A Curvature-Weighted Gradient Mechanism integrates local geometric priors. For each point in the salient region, a local covariance matrix is computed from its k-nearest neighbors. Principal component analysis generates eigenvalues and eigenvectors, which are used to compute a curvature measure. A Gaussian kernel function produces curvature-dependent weights that are applied to backpropagated gradients. This suppresses perturbations in high-curvature areas and encourages them in low-curvature regions to preserve local shape morphology. A Principal Curvature Direction Constrained Optimization module further refines the perturbation direction. The weighted gradient is projected onto the principal curvature directions, and the projection components are fused using coefficients derived from the corresponding eigenvalues. This aligns the perturbation with natural geometric trends and avoids unnatural deformation. An Adaptive Optimization Algorithm then minimizes a multi-objective loss balancing attack success, geometric similarity (via Chamfer Distance and Hausdorff Distance), and perturbation sparsity. The adversarial point cloud is iteratively updated based on the saliency map, curvature-weighted gradients, and principal direction constraints.  Results and Discussions   Experiments on ModelNet40, ShapeNetPart, and KITTI were conducted using PointNet, DGCNN, and PointConv. The GAA method showed strong performance. On ModelNet40 with PointNet, it achieved a 97.69% ASR with an average of 28 perturbed points, outperforming ten baselines such as AL-Adv (92.92% ASR, 40 points) and Kim et al. (89.38% ASR, 36 points) (Table 1). It also produced lower geometric distortion, as indicated by smaller Chamfer Distance and Hausdorff Distance values. Visual results (Fig. 4) show that GAA produces fewer outliers and more natural adversarial point clouds compared with methods such as AL-Adv. The method generalized well across architectures, reaching 99.78% ASR on DGCNN and 96.91% on PointConv (Table 2), with similar performance on ShapeNetPart (Table 3). Ablation experiments on the number of salient regions (K) showed consistent improvements in ASR and reduced geometric distortion as K increased from 1 to 6 (Table 4, Fig. 5), confirming the advantage of targeting multiple critical regions. Tests on the KITTI dataset demonstrated strong performance in real-world, noisy environments. The method maintained high ASRs, such as 99.33% on PointNet, with limited perturbations (Table 5). An ablation study on K indicated that K=4 offers an effective balance between success rate and perturbation cost for PointNet (Table 6).  Conclusions   This study presents a GAA method for adversarial attacks on 3D point cloud recognition. By combining a Shapley value-based saliency analyzer, a curvature-weighted gradient mechanism, and a principal curvature direction constraint, the method generates adversarial examples that achieve high attack success while preserving geometric consistency. Experiments show that GAA minimizes perceptual distortion and perturbs fewer points across datasets and models. The method provides a practical tool for vulnerability analysis and supports the development of more robust and secure optoelectronic perception systems for intelligent driving. Future work will examine robustness under adverse conditions and assess physical-world implications.
Conditional Generative Adversarial Networks-based Channel Estimation for ISAC-RIS System
LIU Yu, ZHENG Zelin, LIU Gang
Available online  , doi: 10.11999/JEIT251168
Abstract:
  Objective  In RIS-assisted ISAC systems, accurate channel estimation is crucial to ensure reliable operation. Although traditional deep learning methods can partially address the channel estimation problem, their generalization ability and estimation accuracy remain limited in complex multi-user channel environments. To tackle these challenges, this paper proposes a two-stage channel estimation method based on Conditional Generative Adversarial Network(CGAN) for RIS-assisted multi-user ISAC systems, aiming to enhance both the accuracy and stability of channel estimation.  Methods  This paper proposes a two-stage channel estimation method based on CGAN for estimating the SAC channels in RIS-assisted multi-user ISAC systems. By adjusting the switching states of the RIS, the overall estimation problem is decomposed into subproblems, enabling sequential estimation of the direct and reflected channels. Within the proposed CGAN framework, the adversarial training between the generator and discriminator allows the model not only to learn the mapping relationship between the observed signals and the true channels but also to optimize the output according to the discriminator’s feedback, thereby effectively improving both training efficiency and estimation accuracy.  Results and Discussions  Extensive simulation experiments were conducted to verify the effectiveness of the proposed method. First, the estimation performance of the SAC channel under different SNR conditions was compared. The results demonstrate that the proposed CGAN-based method achieves significantly better NMSE performance than the LS benchmark and traditional models such as FNN and ELM (Fig. 4). Then, the impact of increasing the number of antennas and RIS elements on SAC channel estimation performance was investigated. Compared with the LS benchmark, the proposed CGAN method consistently maintains superior performance under various SNR conditions (Figs. 5 and 6).  Conclusions  This paper investigates the channel estimation problem in RIS-assisted multi-user ISAC systems and proposes a two-stage channel estimation method based on CGAN. By adjusting the switching states of the RIS and employing adversarial training between the generator and discriminator networks, the proposed method achieves accurate estimation of the SAC channel. Simulation results demonstrate that, under various SNR conditions and channel dimensions, the CGAN-based estimation method exhibits strong generalization capability and significantly outperforms the benchmark schemes in estimation accuracy. Therefore, it shows great potential as an effective solution for enhancing system stability and efficiency.
AutoPenGPT: Drift-Resistant Penetration Testing Driven by Search-Space Convergence and Dependency Modeling
HUANG Weigang, FU Lirong, LIU Peiyu, DU Linkang, YE Tong, XIA Yifan, WANG Wenhai
Available online  , doi: 10.11999/JEIT250873
Abstract:
  Objective  Industrial Control Systems (ICS) are widely deployed in critical sectors and often contain long-standing vulnerabilities due to strict availability requirements and limited patching opportunities. The increasing exposure of external management and access infrastructure has expanded the attack surface and allows adversaries to pivot from boundary components into fragile production networks. Continuous penetration testing of these components is essential but remains costly and difficult to scale when carried out manually. Recent work examines Large Language Models (LLMs) for automated penetration testing; however, existing systems often experience strategy drift and intention drift, which produce incoherent testing behaviors and ineffective exploitation chains.  Methods  This study proposes AutoPenGPT, a multi-agent framework for automated Web security testing. AutoPenGPT uses an adaptive exploration-space convergence mechanism that predicts likely vulnerability types from target semantics and constrains LLM-driven testing through a dynamically updated payload knowledge base. To reduce intention drift in multi-step exploitation, a dependency-driven strategy module rewrites historical feedback, models step dependencies, and generates coherent, executable strategies in a closed-loop workflow. A semi-structured prompt embedding scheme is also developed to support heterogeneous penetration testing tasks while preserving semantic integrity.  Results and Discussions  AutoPenGPT is evaluated on Capture-the-Flag (CTF) benchmarks and real-world ICS and Web platforms. On CTF datasets, it achieves 97.62% vulnerability-type detection accuracy and an 80.95% requirement completion rate, exceeding state-of-the-art tools by a wide margin. In real-world deployments, it reaches approximately 70% requirement completion and identifies six previously undisclosed vulnerabilities, demonstrating practical effectiveness.  Conclusions   The contributions are threefold. (1) Strategy drift and intention drift in LLM-driven penetration testing are examined and addressed through adaptive exploration and dependency-aware strategy mechanisms that stabilize long-horizon testing behaviors. (2) AutoPenGPT is designed and implemented as a multi-agent penetration testing system that integrates semantic vulnerability prediction, closed-loop strategy generation, and semi-structured prompt embedding. (3) Extensive evaluation on CTF and real-world ICS and Web platforms confirms the effectiveness and practicality of the system, including the discovery of previously unknown vulnerabilities.
Cross-modal Retrieval Enhanced Energy-efficient Multimodal Federated Learning in Wireless Networks
LIU Jingyuan, MA Ke, XU Runchen, CHANG Zheng
Available online  , doi: 10.11999/JEIT251221
Abstract:
  Objective  Multimodal Federated Learning (MFL) uses complementary information from multiple modalities, yet in wireless edge networks it is restricted by limited energy and frequent missing modalities because many clients store only images or only reports. This study presents Cross-modal Retrieval Enhanced Energy-efficient Multimodal Federated Learning (CREEMFL), which applies selective completion and joint communication–computation optimization to reduce training energy under latency and wireless constraints.  Methods  CREEMFL completes part of the incomplete samples by querying a public multimodal subset, and processes the remaining samples through zero padding. Each selected user downloads the global model, performs image-to-text or text-to-image retrieval, conducts local multimodal training, and uploads model updates for aggregation. An energy–delay model couples local computation and wireless communication and treats the required number of global rounds as a function of retrieval ratios. Based on this model, an energy minimization problem is formulated and solved using a two-layer algorithm with an outer search over retrieval ratios and an inner optimization of transmission time, Central Processing Unit (CPU) frequency, and transmit power.  Results and Discussions  Simulations on a single-cell wireless MFL system show that increasing the ratio of completing text from images improves test accuracy and reduces total energy. In contrast, a large ratio of completing images from text provides limited accuracy gain but increases energy consumption (Fig. 3, Fig. 4). Compared with four representative baselines, CREEMFL achieves shorter completion time and lower total energy across a wide range of maximum average transmit powers (Fig. 5, Fig. 6). For CREEMFL, increased system bandwidth further reduces completion time and energy consumption (Fig. 7, Fig. 8). Under different user modality compositions, CREEMFL also attains higher test accuracy than local training, zero padding, and cross-modal retrieval without energy optimization (Fig. 9).  Conclusions  CREEMFL integrates selective cross-modal retrieval and joint communication–computation optimization for energy-efficient MFL. By treating retrieval ratios as variables and modeling their effect on global convergence rounds, it captures the coupling between per-round costs and global training progress. Simulations verify that CREEMFL reduces training completion time and total energy while preserving classification accuracy in resource-constrained wireless edge networks.
Battery Pack Multi-fault Diagnosis Algorithm Based on Dual-Perspective Spectral Attention Fusion
LIU Mingjun, GU Shenyu, YIN Jingde, ZHANG Yifan, DONG Zhekang, JI Xiaoyue
Available online  , doi: 10.11999/JEIT251156
Abstract:
  Objective  With the rapid growth of electric vehicles and their widespread deployment, battery pack faults have become more frequent, creating an urgent need for efficient fault diagnosis methods. Although deep learning-based approaches have achieved notable progress, existing studies remain limited in addressing multiple fault types, such as Internal Short Circuit (ISC), sensor noise, sensor drift, and State-Of-Charge (SOC) inconsistency, and in modeling the coupling relationships among these faults. To address these limitations, a multi-fault diagnosis algorithm for battery packs based on dual-perspective spectral attention is proposed. A dual-perspective tokenization module is designed to extract spatiotemporal features from battery data, whereas a spectral attention mechanism addresses non-stationary time-series characteristics and captures long-term dependencies, thereby improving diagnostic performance.   Methods  To improve spatiotemporal feature extraction and fault diagnosis performance, a dual-perspective spectral attention fusion algorithm for battery pack multi-fault diagnosis is proposed. The overall architecture consists of four core modules (Fig. 3): a dual-perspective tokenization module, a spectral attention module, a feature fusion module, and an output module. The dual-perspective tokenization module applies positional encoding to jointly model temporal and spatial dimensions, enabling comprehensive spatiotemporal feature representation. When combined with the spectral attention mechanism, the capability of the model to handle non-stationary characteristics is strengthened, leading to improved diagnostic performance. In addition, to address the lack of comprehensive publicly available datasets for battery pack fault diagnosis, a new dataset is constructed, covering ISC, sensor noise, sensor drift, and SOC inconsistency faults. The dataset includes three operating conditions, FUDS, UDDS, and US06, which alleviates data scarcity in this research field.  Results and Discussions  Experimental results indicate that the proposed method improves average precision, recall, F1 score, and accuracy by 10.98%, 12.64%, 13.84%, and 13.45%, respectively, compared with existing optimal fault diagnosis methods. Comparison experiments under different operating conditions (Table 6) support this conclusion. Conventional convolutional neural network methods perform well in local feature extraction; however, fixed-size convolution kernels are not well suited to time features with varying frequencies, which limits long-term temporal dependency modeling and global feature capture. Recurrent neural network-based methods show reduced computational efficiency when large-scale datasets are processed. Transformer-based models face constraints in spatial feature extraction and in representing temporal variations. By contrast, the proposed algorithm addresses these limitations through an integrated architectural design. Ablation experiments demonstrate the contribution of each module to overall performance (Table 7), and the complete framework improves average F1 score and accuracy by 9.30% and 9.26%, respectively, compared with ablation variants. Robustness analysis under simulated noise conditions (Table 8) shows that the proposed method achieves accuracy improvements ranging from 49.95% to 124.34% over baseline methods at noise levels from –2 dB to –8 dB, indicating strong noise resistance.  Conclusions  A multi-fault diagnosis algorithm for battery packs is presented that integrates dual-perspective tokenization and spectral attention to combine spatiotemporal and spectral information. The dual-perspective tokenization module performs tokenization and positional encoding along temporal and spatial axes, which improves spatiotemporal representation. The spectral attention mechanism strengthens modeling of non-stationary signals and long-term dependencies. Experiments under FUDS, UDDS, and US06 driving cycles show that the proposed method outperforms existing multi-fault diagnosis approaches, with average gains of 13.84% in F1 score and 13.45% in accuracy. Ablation studies confirm that both modules contribute substantially and that their combination enables effective handling of complex time-series features. Under high-noise conditions (–2 dB, –4 dB, –6 dB, and –8 dB), the method also shows improved robustness, with accuracy gains of 49.95%, 90.39%, 112.01%, and 124.34%, respectively, compared with baseline methods. Several limitations remain. First, the data are mainly derived from laboratory simulations, and further validation under real-world operating conditions is required. Second, the effect of fault severity on battery management system hierarchical decision making has not been fully addressed, and future work will focus on establishing a fault severity grading strategy. Third, physical interpretability requires further improvement, and subsequent studies will explore the integration of equivalent circuit models or electrochemical mechanism models to balance diagnostic accuracy and interpretability.
Two-Channel Joint Coding Detection for Cyber-Physical Systems Against Integrity Attacks
MO Xiaolei, ZENG Weixin, FU Jiawei, DOU Keqin, WANG Yanwei, SUN Ximing, LIN Sida, SUI Tianju
Available online  , doi: 10.11999/JEIT250729
Abstract:
  Objective  Cyber-Physical Systems (CPS) are widely applied across infrastructure, aviation, energy, healthcare, manufacturing, and transportation, as computing, control, and sensing technologies advance. Due to the real-time interaction between information and physical processes, such systems are exposed to security risks during data exchange. Attacks on CPS can be grouped into availability, integrity, and reliability attacks based on information security properties. Integrity attacks manipulate data streams to disrupt the consistency between system inputs and outputs. Compared with the other two types, integrity attacks are more difficult to detect because of their covert and dynamic nature. Existing detection strategies generally modify control signals, sensing signals, or system models. Although these approaches can detect specific categories of attacks, they may reduce control performance and increase model complexity and response delay.  Methods  A joint additive and multiplicative coding detection scheme for the two-channel structure of control and output is proposed. Three representative integrity attacks are tested, including a control-channel bias attack, an output-channel replay attack, and a two-channel covert attack. These attacks remain stealthy by partially or fully obtaining system information and manipulating data so the residual-based χ2 detector output stays below the detection threshold. The proposed method introduces paired additive watermarking signals with positive and negative patterns, together with paired multiplicative coding and decoding matrices on both channels. These additional unknown signals and parameters introduce information uncertainty to the attacker and cause the residual statistics to deviate from the expected values constructed using known system information. The watermarking pairs and matrix pairs operate through different mechanisms. One uses opposite-sign injection, while the other uses a mutually inverse transformation. Therefore, normal control performance is maintained when no attack is present. The time-varying structure also prevents attackers from reconstructing or bypassing the detection mechanism.  Results and Discussions  Simulation experiments on an aerial vehicle trajectory model are conducted to assess both the influence of integrity attacks on flight paths and the effectiveness of the proposed detection scheme. The trajectory is modeled using Newton’s equations of motion, and attitude dynamics and rotational motion are omitted to focus on positional behavior. Detection performance with and without the proposed method is compared under the three attack scenarios (Fig. 2, Fig. 3, Fig. 4). The results show that the proposed scheme enables effective identification of all attack types and maintains stable system behavior, demonstrating its practical applicability and improvement over existing approaches.  Conclusions  This study addresses the detection of integrity attacks in CPS. Three representative attack types (bias, replay, and covert attacks) are modeled, and the conditions required for their successful execution are analyzed. A detection approach combining additive watermarking and multiplicative encoding matrices is proposed and shown to detect all three attack types. The design uses paired positive–negative additive watermarks and paired encoding and decoding matrices to ensure accurate detection while maintaining normal control performance. A time-varying configuration is adopted to prevent attackers from reconstructing or bypassing the detection elements. Using an aerial vehicle trajectory simulation, the proposed approach is demonstrated to be effective and applicable to cyber-physical system security enhancement.
One-pass Architectural Synthesis for Continuous-Flow Microfluidic Biochips Based on Deep Reinforcement Learning
LIU Genggeng, JIAO Xinyue, PAN Youlin, HUANG Xing
Available online  , doi: 10.11999/JEIT251058
Abstract:
Continuous-Flow Microfluidic Biochips (CFMBs) are widely applied in biomedical research because of miniaturization, high reliability, and low sample consumption. As integration density increases, design complexity significantly rises. Conventional stepwise design methods treat binding, scheduling, layout, and routing as separate stages, with limited information exchange across stages, which leads to reduced solution quality and extended design cycles. To address this limitation, a one-pass architectural synthesis method for CFMBs is proposed based on Deep Reinforcement Learning (DRL). Graph Convolutional Neural networks (GCNs) are used to extract state features, capturing structural characteristics of operations and their relationships. Proximal Policy Optimization (PPO), combined with the A* algorithm and list scheduling, ensures rational layout and routing while providing accurate information for operation scheduling. A multiobjective reward function is constructed by normalizing and weighting biochemical reaction time, total channel length, and valve count, enabling efficient exploration of the decision space through policy gradient updates. Experimental results show that the proposed method achieves a 2.1% reduction in biochemical reaction time, a 21.3% reduction in total channel length, and a 65.0% reduction in valve count on benchmark test cases, while maintaining feasibility for larger-scale chips.  Objective  CFMBs have gained sustained attention in biomedical applications because of miniaturization, high reliability, and low sample consumption. With increasing integration density, design complexity escalates substantially. Traditional stepwise design methods often yield suboptimal solutions, extended design cycles, and feasibility limitations for large-scale chips. To address these challenges, a one-pass architectural synthesis framework is proposed that integrates DRL to achieve coordinated optimization of binding, scheduling, layout, and routing.  Methods  All CFMB design tasks are integrated into a unified optimization framework formulated as a Markov decision process. The state space includes device binding information, device locations, operation priorities, and related parameters, whereas the action space adjusts device placement, operation-to-device binding, and operation priority. High-dimensional state features are extracted using GCNs. PPO is applied to iteratively update policies. The reward function accounts for biochemical reaction time, total flow-channel length, and the number of additional valves. These metrics are evaluated using the A* algorithm and list scheduling, normalized, and weighted to balance trade-offs among objectives.  Results and Discussions  Based on the current state and candidate actions, architectural solutions are generated iteratively through PPO-guided policy updates combined with the A* algorithm and list scheduling. The defined reward function enables the generation of CFMB architectures with improved overall quality. Experimental results show an average reduction of 2.1% in biochemical reaction time, an average reduction of 21.3% in total flow-channel length, with a maximum reduction of 57.1% in the ProteinSplit benchmark, and an average reduction of 65.0% in additional valve count compared with existing methods. These improvements reduce manufacturing cost and operational risk.  Conclusions  A one-pass architectural synthesis method for CFMBs based on DRL is proposed to address flow-layer design challenges. By applying GCN-based state feature extraction and PPO-based policy optimization, the multiobjective design problem is transformed into a sequential decision-making process that enables joint optimization of binding, scheduling, layout, and routing. Experimental results obtained from multiple benchmark test cases confirm improved performance in biochemical reaction completion time, total channel length, and valve count, while preserving scalability for larger chip designs.
Multi-Scale Region of Interest Feature Fusion for Palmprint Recognition
MA Yuxuan, ZHANG Feifei, LI Guanghui, TANG Xin, DONG Zhengyang
Available online  , doi: 10.11999/JEIT250940
Abstract:
  Objective  Accurate localization of the Region Of Interest (ROI) is a prerequisite for high-precision palmprint recognition. In contactless and uncontrolled application scenarios, complex background illumination and diverse hand postures frequently cause ROI localization offsets. Most existing deep learning-based recognition methods rely on a single fixed-size ROI as input. Although some approaches adopt multi-scale convolution kernels, fusion at the ROI level is not performed, which makes these methods highly sensitive to localization errors. Therefore, small deviations in ROI extraction often result in severe performance degradation, which restricts practical deployment. To overcome this limitation, a Multi-scale ROI Feature Fusion Mechanism is proposed, and a corresponding model, termed ROI3Net, is designed. The objective is to construct a recognition system that is inherently robust to localization errors by integrating complementary information from multiple ROI scales. This strategy reinforces shared intrinsic texture features while suppressing scale-specific noise introduced by positioning inaccuracies.  Methods  The proposed ROI3Net adopts a dual-branch architecture consisting of a Feature Extraction Network and a lightweight Weight Prediction Network (Fig. 4). The Feature Extraction Network employs a sequence of Multi-Scale Residual Blocks (MSRBs) to process ROIs at three progressive scales (1.00×, 1.25×, and 1.50×) in parallel. Within each MSRB, dense connections are applied to promote feature reuse and reduce information loss (Eq. 3). Convolutional Block Attention Modules (CBAMs) are incorporated to adaptively refine features in both the channel and spatial dimensions. The Weight Prediction Network is implemented as an end-to-end lightweight module. It takes raw ROI images as input and processes them using a serialized convolutional structure (Conv2d-BN-GELU-MaxPool), followed by a Multi-Layer Perceptron (MLP) head, to predict a dynamic weight vector for each scale. This subnetwork is optimized for efficiency, containing 2.38 million parameters, which accounts for approximately 6.2% of the total model parameters, and requiring 103.2 MFLOPs, which corresponds to approximately 2.1% of the total computational cost. The final feature representation is obtained through a weighted summation of multi-scale features (Eq. 1 and Eq. 2), which mathematically maximizes the information entropy of the fused feature vector.  Results and Discussions  Experiments are conducted on six public palmprint datasets: IITD, MPD, NTU-CP, REST, CASIA, and BMPD. Under ideal conditions with accurate ROI localization, ROI3Net demonstrates superior performance compared with state-of-the-art single-scale models. For instance, a Rank-1 accuracy of 99.90% is achieved on the NTU-CP dataset, and a Rank-1 accuracy of 90.17% is achieved on the challenging REST dataset (Table 1). Model robustness is further evaluated by introducing a random 10% localization offset. Under this condition, conventional models exhibit substantial performance degradation. For example, the Equal Error Rate (EER) of the CO3Net model on NTU-CP increases from 2.54% to 15.66%. In contrast, ROI3Net maintains stable performance, with the EER increasing only from 1.96% to 5.01% (Fig. 7, Table 2). The effect of affine transformations, including rotation (±30°) and scaling (0.85\begin{document}$ \sim $\end{document}1.15×), is also analyzed. Rotation causes feature distortion because standard convolution operations lack rotation invariance, whereas the proposed multi-scale mechanism effectively compensates for translation errors by expanding the receptive field (Table 3). Generalization experiments further confirm that embedding this mechanism into existing models, including CCNet, CO3Net, and RLANN, significantly improves robustness (Table 6). In terms of efficiency, although the theoretical computational load increases by approximately 150%, the actual GPU inference time increases by only about 20% (6.48 ms) because the multi-scale branches are processed independently and in parallel (Table 7).  Conclusions  A Multi-scale ROI Feature Fusion Mechanism is presented to reduce the sensitivity of palmprint recognition systems to localization errors. By employing a lightweight Weight Prediction Network to adaptively fuse features extracted from different ROI scales, the proposed ROI3Net effectively combines fine-grained texture details with global semantic information. Experimental results confirm that this approach significantly improves robustness to translation errors by recovering truncated texture information, whereas the efficient design of the Weight Prediction Network limits computational overhead. The proposed mechanism also exhibits strong generalization ability when integrated into different backbone networks. This study provides a practical and resilient solution for palmprint recognition in unconstrained environments. Future work will explore non-linear fusion strategies, such as graph neural networks, to further exploit cross-scale feature interactions.
Joint Mask and Multi-Frequency Dual Attention GAN Network for CT-to-DWI Image Synthesis in Acute Ischemic Stroke
ZHANG Zehua, ZHAO Ning, WANG Shuai, WANG Xuan, ZHENG Qiang
Available online  , doi: 10.11999/JEIT250643
Abstract:
  Objective  In the clinical management of Acute Ischemic Stroke (AIS), Computed Tomography (CT) and Diffusion-Weighted Imaging (DWI) serve complementary roles at different stages. CT is widely applied for initial evaluation due to its rapid acquisition and accessibility, but it has limited sensitivity in detecting early ischemic changes, which can result in diagnostic uncertainty. In contrast, DWI demonstrates high sensitivity to early ischemic lesions, enabling visualization of diffusion-restricted regions soon after symptom onset. However, DWI acquisition requires a longer time, is susceptible to motion artifacts, and depends on scanner availability and patient cooperation, thereby reducing its clinical accessibility. The limited availability of multimodal imaging data remains a major challenge for timely and accurate AIS diagnosis. Therefore, developing a method capable of rapidly and accurately generating DWI images from CT scans has important clinical significance for improving diagnostic precision and guiding treatment planning. Existing medical image translation approaches primarily rely on statistical image features and overlook anatomical structures, which leads to blurred lesion regions and reduced structural fidelity.  Methods  This study proposes a Joint Mask and Multi-Frequency Dual Attention Generative Adversarial Network (JMMDA-GAN) for CT-to-DWI image synthesis to assist in the diagnosis and treatment of ischemic stroke. The approach incorporates anatomical priors from brain masks and adaptive multi-frequency feature fusion to improve image translation accuracy. JMMDA-GAN comprises three principal modules: a mask-guided feature fusion module, a multi-frequency attention encoder, and an adaptive fusion weighting module. The mask-guided feature fusion module integrates CT images with anatomical masks through convolution, embedding spatial priors to enhance feature representation and texture detail within brain regions and ischemic lesions. The multi-frequency attention encoder applies Discrete Wavelet Transform (DWT) to decompose images into low-frequency global components and high-frequency edge components. A dual-path attention mechanism facilitates cross-scale feature fusion, reducing high-frequency information loss and improving structural detail reconstruction. The adaptive fusion weighting module combines convolutional neural networks and attention mechanisms to dynamically learn the relative importance of input features. By assigning adaptive weights to multi-scale features, the module selectively enhances informative regions and suppresses redundant or noisy information. This process enables effective integration of low- and high-frequency features, thereby improving both global contextual consistency and local structural precision.  Results and Discussions  Extensive experiments were performed on two independent clinical datasets collected from different hospitals to assess the effectiveness of the proposed method. JMMDA-GAN achieved Mean Squared Error (MSE) values of 0.0097 and 0.0059 on Clinical Dataset 1 and Clinical Dataset 2, respectively, exceeding state-of-the-art models by reducing MSE by 35.8% and 35.2% compared with ARGAN. The proposed network reached peak Signal-to-Noise Ratio (PSNR) values of 26.75 and 28.12, showing improvements of 30.7% and 7.9% over the best existing methods. For Structural Similarity Index (SSIM), JMMDA-GAN achieved 0.753 and 0.844, indicating superior structural preservation and perceptual quality. Visual analysis further demonstrates that JMMDA-GAN restores lesion morphology and fine texture features with higher fidelity, producing sharper lesion boundaries and improved structural consistency compared with other methods. Cross-center generalization and multi-center mixed experiments confirm that the model maintains stable performance across institutions, highlighting its robustness and adaptability in clinical settings. Parameter sensitivity analysis shows that the combination of Haar wavelet and four attention heads achieves an optimal balance between global structural retention and local detail reconstruction. Moreover, superpixel-based gray-level correlation experiments demonstrate that JMMDA-GAN exceeds existing models in both local consistency and global image quality, confirming its capacity to generate realistic and diagnostically reliable DWI images from CT inputs.  Conclusions  This study proposes a novel JMMDA-GAN designed to enhance lesion and texture detail generation by incorporating anatomical structural information. The method achieves this through three principal modules. (1) The mask-guided feature fusion module effectively integrates anatomical structure information, with particular optimization of the lesion region. The mask-guided network focuses on critical lesion features, ensuring accurate restoration of lesion morphology and boundaries. By combining mask and image data, the method preserves the overall anatomical structure while enhancing lesion areas, preventing boundary blurring and texture loss commonly observed in traditional approaches, thereby improving diagnostic reliability. (2) The multi-frequency feature fusion module jointly optimizes low- and high-frequency features to enhance image detail. This integration preserves global structural integrity while refining local features, producing visually realistic and high-fidelity images. (3) The adaptive fusion weighting module dynamically adjusts the learning strategy for frequency-domain features according to image content, enabling the network to manage texture variations and complex anatomical structures effectively, thereby improving overall image quality. Through the coordinated function of these modules, the proposed method enhances image realism and diagnostic precision. Experimental results demonstrate that JMMDA-GAN exceeds existing advanced models across multiple clinical datasets, highlighting its potential to support clinicians in the diagnosis and management of AIS.
Modeling, Detection, and Defense Theories and Methods for Cyber-Physical Fusion Attacks in Smart Grid
WANG Wenting, TIAN Boyan, WU Fazong, HE Yunpeng, WANG Xin, YANG Ming, FENG Dongqin
Available online  , doi: 10.11999/JEIT250659
Abstract:
  Significance   Smart Grid (SG), the core of modern power systems, enables efficient energy management and dynamic regulation through cyber–physical integration. However, its high interconnectivity makes it a prime target for cyberattacks, including False Data Injection Attacks (FDIAs) and Denial-of-Service (DoS) attacks. These threats jeopardize the stability of power grids and may trigger severe consequences such as large-scale blackouts. Therefore, advancing research on the modeling, detection, and defense of cyber–physical attacks is essential to ensure the safe and reliable operation of SGs.  Progress   Significant progress has been achieved in cyber–physical security research for SGs. In attack modeling, discrete linear time-invariant system models effectively capture diverse attack patterns. Detection technologies are advancing rapidly, with physical-based methods (e.g., physical watermarking and moving target defense) complementing intelligent algorithms (e.g., deep learning and reinforcement learning). Defense systems are also being strengthened: lightweight encryption and blockchain technologies are applied to prevention, security-optimized Phasor Measurement Unit (PMU) deployment enhances equipment protection, and response mechanisms are being continuously refined.  Conclusions  Current research still requires improvement in attack modeling accuracy and real-time detection algorithms. Future work should focus on developing collaborative protection mechanisms between the cyber and physical layers, designing solutions that balance security with cost-effectiveness, and validating defense effectiveness through high-fidelity simulation platforms. This study establishes a systematic theoretical framework and technical roadmap for SG security, providing essential insights for safeguarding critical infrastructure.  Prospects   Future research should advance in several directions: (1) deepening synergistic defense mechanisms between the information and physical layers; (2) prioritizing the development of cost-effective security solutions; (3) constructing high-fidelity information–physical simulation platforms to support research; and (4) exploring the application of emerging technologies such as digital twins and interpretable Artificial Intelligence (AI).
An EEG Emotion Recognition Model Integrating Memory and Self-attention Mechanisms
LIU Shanrui, BI Yingzhou, HUO Leigang, GAN Qiujing, ZHOU shuheng
Available online  , doi: 10.11999/JEIT250737
Abstract:
  Objective  ElectroEncephaloGraphy (EEG) is a noninvasive technique for recording neural signals and provides rich emotional and cognitive information for brain science research and affective computing. Although Transformer-based models demonstrate strong global modeling capability in EEG emotion recognition, their multi-head self-attention mechanisms do not reflect the characteristics of brain-generated signals that exhibit a forgetting effect. In human cognition, emotional or cognitive states from distant time points gradually decay, whereas existing Transformer-based approaches emphasize temporal relevance only and neglect this forgetting behavior. This limitation reduces recognition performance. Therefore, a model is designed to account for both temporal relevance and the intrinsic forgetting effect of brain activity.  Methods  A novel EEG emotion recognition model, termed Memory Self-Attention (MSA), is proposed by embedding a memory-based forgetting mechanism into the standard self-attention framework. The MSA mechanism integrates global semantic modeling with a biologically inspired memory decay component. For each attention head, a memory forgetting score is learned through two independent linear decay curves to represent natural attenuation over time. These scores are combined with conventional attention weights so that temporal relationships are adjusted by distance-aware forgetting behavior. This design improves performance with a negligible increase in model parameters and computational cost. An Aggregated Convolutional Neural Network (ACNN) is first applied to extract spatiotemporal features across EEG channels. The MSA module then captures global dependencies and memory-aware interactions. The refined representations are finally passed to a classification head to generate predictions.  Results and Discussions  The proposed model is evaluated on several benchmark EEG emotion recognition datasets. On the DEAP binary classification task, classification accuracies of 98.87% for valence and 98.30% for arousal are achieved. On the SEED three-class task, an accuracy of 97.64% is obtained, and on the SEED-IV four-class task, the accuracy reaches 95.90%. These results (Figs. 35, Tables 35) exceed those of most mainstream methods, indicating the effectiveness and robustness of the proposed approach across different datasets and emotion classification settings.  Conclusions  An effective and biologically informed method for EEG-based emotion recognition is presented by incorporating a memory forgetting mechanism into a Transformer architecture. The proposed MSA model captures both temporal correlations and forgetting characteristics of brain signals, providing a lightweight and accurate solution for multi-class emotion recognition. Experimental results confirm its strong performance and generalizability.
High-Efficiency Side-Channel Analysis: From Collaborative Denoising to Adaptive B-Spline Dimension Reduction
LUO Yuling, XU Haiyang, OUYANG Xue, FU Qiang, QIN Sheng, LIU Junxiu
Available online  , doi: 10.11999/JEIT251047
Abstract:
  Objective  The performance of side-channel attacks is often constrained by the low signal-to-noise ratio of raw power traces, the masking of local leakage by redundant high-dimensional data, and the reliance on empirically chosen preprocessing parameters. Existing studies typically optimize individual stages, such as denoising or dimensionality reduction, in isolation, lack a unified framework, and fail to balance signal-to-noise ratio enhancement with the preservation of local leakage features. A unified analysis framework is therefore proposed to integrate denoising, adaptive parameter selection, and dimensionality reduction while preserving local leakage characteristics. Through coordinated optimization of these components, both the efficiency and robustness of side-channel attacks are improved.  Methods  Based on the similarity of power traces corresponding to identical plaintexts and the local approximation properties of B-splines, a side-channel analysis method combining collaborative denoising and Adaptive B-Spline Dimension Reduction (ABDR) is presented. First, a Collaborative Denoising Framework (CDF) is constructed, in which high-quality traces are selected using a plaintext-mean template, and targeted denoising is performed via singular value decomposition guided by a singular-value template. Second, a Neighbourhood Asymmetry Clustering (NAC) method is applied to adaptively determine key thresholds within the CDF. Finally, an ABDR algorithm is proposed, which allocates knots non-uniformly according to the variance distribution of power traces, thereby enabling efficient data compression while preserving critical local leakage features.  Results and Discussions  Experiments conducted on two datasets based on 8-bit AVR (OSR2560) and 32-bit ARM Cortex-M4 (OSR407) architectures demonstrate that the CDF significantly enhances the signal-to-noise ratio, with improvements of 60% on OSR2560 (Fig. 2) and 150% on OSR407 (Fig. 4). The number of power traces required for successful key recovery is reduced from 3 000/2 400 to 1 200/1 500 for the two datasets, respectively (Figs. 3 and 5). Through adaptive threshold selection in the CDF, NAC achieves faster and more stable guessing-entropy convergence than fixed-threshold and K-means-based strategies, which enhances overall robustness (Fig. 6). The ABDR algorithm places knots densely in high-variance leakage regions and sparsely in low-variance regions. While maintaining a high attack success rate, it reduces the data dimensionality from 5 000 and 5 500 to 1 000 and 500, respectively, corresponding to a compression rate of approximately 80%. At the optimal dimensionality (Fig. 7), the correlation coefficients of the correct key reach 0.186 0 on OSR2560 and 0.360 5 on OSR407, both exceeding those obtained using other dimensionality reduction methods. These results indicate superior local information retention and attack efficiency (Tables 3 and 4).  Conclusions  The results confirm that the proposed CDF substantially improves the signal-to-noise ratio of power traces, while NAC enables adaptive parameter selection and enhances robustness. Through accurate local modeling, ABDR effectively alleviates the trade-off between high-dimensional data reduction and the preservation of critical leakage information. Comprehensive experimental validation shows that the integrated framework addresses key challenges in side-channel analysis, including low signal-to-noise ratio, redundancy-induced information masking, and dependence on empirical parameters, and provides a practical and scalable solution for real-world attack scenarios.
Research on Proximal Policy Optimization for Autonomous Long-Distance Rapid Rendezvous of Spacecraft
LIN Zheng, HU Haiying, DI Peng, ZHU Yongsheng, ZHOU Meijiang
Available online  , doi: 10.11999/JEIT250844
Abstract:
  Objective   With increasing demands from deep-space exploration, on-orbit servicing, and space debris removal missions, autonomous long-distance rapid rendezvous capabilities are required for future space operations. Traditional trajectory planning approaches based on analytical methods or heuristic optimization show limitations when complex dynamics, strong disturbances, and uncertainties are present, which makes it difficult to balance efficiency and robustness. Deep Reinforcement Learning (DRL) combines the approximation capability of deep neural networks with reinforcement learning-based decision-making, which supports adaptive learning and real-time decisions in high-dimensional continuous state and action spaces. In particular, Proximal Policy Optimization (PPO) is a representative policy gradient method because of its training stability, sample efficiency, and ease of implementation. Integration of DRL with PPO for spacecraft long-distance rapid rendezvous is therefore expected to overcome the limits of conventional methods and provide an intelligent, efficient, and robust solution for autonomous guidance in complex orbital environments.   Methods   A spacecraft orbital dynamics model is established by incorporating J2 perturbation, together with uncertainties arising from position and velocity measurement errors and actuator deviations during on-orbit operations. The long-distance rapid rendezvous problem is formulated as a Markov Decision Process, in which the state space includes position, velocity, and relative distance, and the action space is defined by impulse duration and direction. Fuel consumption and terminal position and velocity constraints are integrated into the model. On this basis, a DRL framework based on PPO is constructed. The policy network outputs maneuver command distributions, whereas the value network estimates state values to improve training stability. To address convergence difficulties caused by sparse rewards, an enhanced dense reward function is designed by combining a position potential function with a velocity guidance function. This design guides the agent toward the target while enabling gradual deceleration and improved fuel efficiency. The optimal maneuver strategy is obtained through simulation-based training, and robustness is evaluated under different uncertainty conditions.   Results and Discussions   Based on the proposed DRL framework, comprehensive simulations are conducted to assess effectiveness and robustness. In Case 1, three reward structures are examined: sparse reward, traditional dense reward, and an improved dense reward that integrates a relative position potential function with a velocity guidance term. The results show that reward design strongly affects convergence behavior and policy stability. Under sparse rewards, insufficient process feedback limits exploration of feasible actions. Traditional dense rewards provide continuous feedback and enable gradual convergence, but terminal velocity deviations are not fully corrected at later stages, which leads to suboptimal convergence and incomplete satisfaction of terminal constraints. In contrast, the improved dense reward guides the agent toward favorable behaviors from early training stages while penalizing undesirable actions at each step, which accelerates convergence and improves robustness. The velocity guidance term allows anticipatory adjustments during mid-to-late approach phases rather than delaying corrections to the terminal stage, resulting in improved fuel efficiency.Simulation results show that the maneuvering spacecraft performs 10 impulsive maneuvers, achieving a terminal relative distance of 21.326 km, a relative velocity of 0.005 0 km/s, and a total fuel consumption of 111.2123 kg. To evaluate robustness under realistic uncertainties, 1,000 Monte Carlo simulations are performed. As summarized in Table 6, the mission success rate reaches 63.40%, and fuel consumption in all trials remains within acceptable bounds. In Case 2, PPO performance is compared with that of Deep Deterministic Policy Gradient (DDPG) for a multi-impulse fast-approach rendezvous mission. PPO results show five impulsive maneuvers, a terminal separation of 2.281 8 km, a relative velocity of 0.003 8 km/s, and a total fuel consumption of 4.148 6 kg. DDPG results show a fuel consumption of 4.322 5 kg, a final separation of 4.273 1 km, and a relative velocity of 0.002 0 km/s. Both methods satisfy mission requirements with comparable fuel use. However, DDPG requires a training time of 9 h 23 min, whereas PPO converges within 6 h 4 min, indicating lower computational cost. Overall, the improved PPO framework provides better learning efficiency, policy stability, and robustness.  Conclusions   The problem of autonomous long-distance rapid rendezvous under J2 perturbation and uncertainties is investigated, and a PPO-based trajectory optimization method is proposed. The results demonstrate that feasible maneuver trajectories satisfying terminal constraints can be generated under limited fuel and transfer time, with improved convergence speed, fuel efficiency, and robustness. The main contributions include: (1) development of an orbital dynamics framework that incorporates J2 perturbation and uncertainty modeling, with formulation of the rendezvous problem as a Markov Decision Process; (2) design of an enhanced dense reward function that combines position potential and velocity guidance, which improves training stability and convergence efficiency; and (3) simulation-based validation of PPO robustness in complex orbital environments. Future work will address sensor noise, environmental disturbances, and multi-spacecraft cooperative rendezvous in more complex mission scenarios to further improve practical applicability and generalization.
A Review of Research on Voiceprint Fault Diagnosis of Transformers
GONG Wenjie, LIN Guosong, WEI Xiaoguang
Available online  , doi: 10.11999/JEIT251076
Abstract:
  Significance   Voiceprint fault diagnosis of transformers has become an active research area for ensuring the safe and reliable operation of power systems. Traditional monitoring methods, such as dissolved gas analysis, infrared temperature measurement, and online partial discharge monitoring, exhibit limited real-time capability and rely heavily on expert experience. These limitations hinder effective detection of early-stage faults. Voiceprint fault diagnosis captures operational voiceprint signals from transformers and enables non-contact monitoring for early anomaly warning. This approach offers advantages in real-time performance, sensitivity, and fault coverage. This review systematically traces the technological evolution from traditional signal analysis to deep learning and compares the advantages, limitations, and application scenarios of different models across multiple dimensions. Key challenges are identified, including limited robustness to noise and imbalanced datasets. Potential research directions are proposed, including integration of physical mechanisms with data-driven methods and improvement of diagnostic transparency and interpretability. These analyses provide theoretical support and practical guidance for promoting the transition of voiceprint fault diagnosis from laboratory research to engineering applications.  Progress   Research on voiceprint fault diagnosis of transformers has progressed from traditional signal analysis to an intelligent recognition paradigm based on deep learning, reflecting a clear technological evolution. A bibliometric analysis of 188 papers from the CNKI and Web of Science databases shows that annual publications remained at 1–10 papers between 1997 and 2020, corresponding to an exploratory stage. Studies during this period focused mainly on fundamental voiceprint signal processing methods, including acoustic wave detection, wavelet transform, and Empirical Mode Decomposition (EMD). After 2020, Variational Modal Decomposition (VMD), Mel spectrum, and Mel Frequency Cepstral Coefficient (MFCC) were gradually applied to voiceprint feature extraction. Since 2021, publication output has increased rapidly and reached a historical peak in 2023. This growth was driven by advances in image and speech processing technologies. Early studies emphasized time-domain and frequency-domain analysis of voiceprint signals. Recent research increasingly converts voiceprint signals into two-dimensional time–frequency spectrogram representations. Model architectures have evolved from single-channel feature inputs with single-model outputs to complex frameworks with multi-channel feature extraction and multi-model fusion. Classical machine learning models, including Gaussian Mixture Model (GMM), Support Vector Machine (SVM), Random Forest (RF), and Back Propagation Neural Network (BPNN), form the foundation of voiceprint fault diagnosis but are limited in handling high-dimensional features. Deep learning models, such as Convolutional Neural Network (CNN), Residual Neural Network (ResNet), Recurrent Neural Network (RNN), and Transformer, demonstrate advantages in automatic feature extraction and complex pattern recognition, although they require substantial computational resources.  Conclusions  This review summarizes the technological development of transformer voiceprint fault diagnosis from machine learning to deep learning. Although deep learning methods achieve high recognition accuracy for complex voiceprint signals, five major challenges remain. These challenges include limited robustness to noise in non-stationary environments, severe data imbalance caused by scarce fault samples, the black-box nature of deep learning models, fragmented evaluation systems resulting from inconsistent data acquisition standards, and insufficient cross-modal fusion of multi-source data. Sensitivity to environmental noise limits diagnostic performance under varying operating conditions. Data imbalance reduces recognition accuracy for rare fault types. Limited interpretability restricts fault mechanism analysis and diagnostic credibility. Inconsistent sensor placement and sampling parameters lead to poor comparability across datasets. Single-modal voiceprint analysis restricts effective utilization of complementary information from other data sources. Addressing these challenges is essential for advancing voiceprint fault diagnosis from laboratory validation to field deployment.  Prospects   Future research should focus on five directions. First, noise-robust voiceprint feature extraction methods based on physical mechanisms should be developed to address non-stationary interference in complex operating environments. Second, the lack of real-world fault data should be alleviated by constructing electromagnetic field–structural mechanics–acoustic coupling models of transformers to generate high-fidelity voiceprint fault samples, while unsupervised clustering methods should be applied to improve annotation efficiency and quality. Third, explainable deep learning architectures for voiceprint fault diagnosis that incorporate physical mechanisms should be designed. Attention mechanisms combined with SHapley Additive exPlanations, Grad-CAM, and physical equations can support process-level and post hoc interpretation of diagnostic results. Fourth, industry-wide collaboration is required to establish standardized voiceprint data acquisition protocols, benchmark datasets, and unified evaluation systems. Fifth, cross-modal fusion models based on multi-channel and multi-feature analysis should be developed to enable integrated transformer fault diagnosis through comprehensive utilization of multi-source information.
Multimodal Pedestrian Trajectory Prediction with Multi-Scale Spatio-Temporal Group Modeling and Diffusion
KONG Xiangyan, GAO YuLong, WANG Gang
Available online  , doi: 10.11999/JEIT250900
Abstract:
  Objective  With the rapid advancement of autonomous driving and social robotics, accurate pedestrian trajectory prediction has become pivotal for ensuring system safety and enhancing interaction efficiency. Existing group-based modeling approaches predominantly focus on local spatial interaction, often overlooking latent grouping characteristics across the temporal dimension. To address these challenges, this research proposes a multi-scale spatiotemporal feature construction method that achieves the decoupling of trajectory shape from absolute spatiotemporal coordinates, enabling the model to accurately capture the latent group associations over different time intervals. Simultaneously, spatiotemporal interaction three-element format encoding mechanism is introduced to deeply extract the dynamic relationships between individuals and groups. By integrating the reverse process length mechanism of diffusion models, the proposed approach incrementally mitigates prediction uncertainty. This research not only offers an intelligent solution for multi-modal trajectory prediction in complex, crowded environments but also provides robust theoretical support for improving the accuracy and robustness of long-range trajectory forecasting.  Methods  The proposed algorithm performs deep modeling of pedestrian trajectories through multi-scale spatiotemporal group modeling. The system is designed across three key dimensions: group construction, interaction modeling, and trajectory generation. First, to address the limitations of traditional methods that focus on local spatiotemporal relationships while overlooking cross-dimensional latent characteristics, A multi-scale trajectory grouping model is designed. Its core innovation lies in extracting trajectory offsets to represent trajectory shapes, successfully decoupling motion features from absolute positions. This enables the model to accurately capture latent group associations among agents following similar paths over different periods. Second, a coding method based on spatiotemporal interaction three-element format is proposed. By defining neural interaction strength, interaction categories, and category functions, this method deeply analyzes the complex associations between agents and groups. This not only captures fine-grained individual interactions but also effectively reveals the global dynamic evolution of collective behavior. Finally, a Diffusion Model is introduced for multimodal prediction. Through the reverse process length mechanism of the diffusion model, the model converges progressively, effectively eliminating uncertainty during the prediction process and transforming a fuzzy prediction space into clear and plausible future trajectories.  Results and Discussions  In this study, the proposed model was evaluated against 11 state-of-the-art baseline algorithms using the NBA dataset (Table 1). Experimental results indicate that this model achieves a significant advantage in the minADE20. Notably, it demonstrates a substantial performance leap over GroupNet+CVAE in long-term prediction tasks, with minADE20 and minFDE20 improvements of 0.18 and 0.36, respectively, at the 4-second prediction horizon. Although the model slightly underperforms compared to MID in long-term trends—likely due to the frequent and intense shifts in group dynamics within NBA scenarios—it exhibits exceptional precision in instantaneous prediction. This provides strong empirical evidence for the effectiveness of multi-scale grouping strategy, based on historical trajectories, in capturing complex dynamic interactions. On the ETH/UCY datasets (Table 2), the MSGD method achieved consistent performance gains across all five sub-scenarios. Particularly in the pedestrian-dense and interaction-heavy UNIV scene, the proposed method surpassed all baseline models by leveraging the advantages of multi-scale modeling. While MSGD is slightly behind PPT in terms of long-distance endpoint constraints, it maintains a lead in minADE20. Furthermore, it outperforms Trajectory++ in velocity smoothness and directional coherence (std dev: 0.7012) (Table 3). These results suggest that while fitting the geometric shape of trajectories, the method generates naturally smooth paths that align more closely with the physical laws of human motion. Ablation studies systematically verified the independent contributions of the diffusion model, spatiotemporal feature extraction, and multi-scale grouping modules to the overall accuracy (Table 4). Grouping sensitivity analysis on the NBA dataset revealed that a full-court grouping strategy (group size of 11) significantly enhances long-term stability, resulting in a further reduction of minFDE20 by 0.026–0.03 at the 4-second (Table 5). Simultaneously, configurations with group sizes of 5 or 2 validate the significance of team formations and “one-on-one” local offensive/defensive dynamics in trajectory prediction (Table 6). Additionally, sensitivity analysis of diffusion steps and training epochs revealed a “complementary” relationship: moderately increasing the number of steps (e.g., 30–40) refines the denoising process and significantly improves accuracy, whereas excessive iterations may lead to overfitting (Table 7). Finally, qualitative visualization intuitively demonstrates that the multimodal trajectories generated by MSGD have a high degree of overlap with ground-truth data (Fig.2).  Conclusions  This study proposes a novel trajectory prediction algorithm that enhances performance primarily in two aspects: (1) It effectively captures pedestrian interactions by extracting spatiotemporal features; (2) It strengthens the modeling of collective behavior by grouping pedestrians across multiple scales. Experimental results demonstrate that the algorithm achieves state-of-the-art (SOTA) performance on both the NBA and ETH/UCY datasets. Furthermore, ablation studies verify the effectiveness of each constituent module. Despite its superior performance and adaptability, the proposed algorithm has two primary limitations: first, the current model does not account for explicit environmental information (such as maps or obstacles); second, the diffusion model involves high computational overhead during inference. Future work will focus on improvements and research in these two directions.
Neighboring Mutual-Coupling Channel Model and Tunable-Impedance Optimization Method for Reconfigurable-Intelligent-Surface Aided Communications
WU Wei, WANG Wennai
Available online  , doi: 10.11999/JEIT251109
Abstract:
  Objective  Reconfigurable Intelligent Surfaces (RIS) attract increasing attention due to their ability to controllably manipulate electromagnetic wave propagation. A typical RIS consists of a dense array of Reflecting Elements (REs) with inter-element spacing no greater than half a wavelength, under which electromagnetic mutual coupling inevitably occurs between adjacent REs. This effect becomes more pronounced when the element spacing is smaller than half a wavelength and can significantly affect the performance and efficiency of RIS-assisted systems. Accurate modeling of mutual coupling is therefore essential for RIS optimization. However, existing mutual-coupling-aware channel models usually suffer from high computational complexity because of the large dimensionality of the mutual-impedance matrix, which restricts their practical use. To address this limitation, a simplified mutual-coupling-aware channel model based on a sparse neighboring mutual-coupling matrix is proposed, together with an efficient optimization method for configuring RIS tunable impedances.  Methods  First, a simplified mutual-coupling-aware channel model is established through two main steps. (1) A neighboring mutual-coupling matrix is constructed by exploiting the exponential decay of mutual impedance with inter-element distance. (2) A closed-form approximation of the mutual impedance between the transmitter or receiver and the REs is derived under far-field conditions. By taking advantage of the rapid attenuation of mutual impedance as spacing increases, only eight or three mutual-coupling parameters, together with one self-impedance parameter, are retained. These parameters are arranged into a neighboring mutual-coupling matrix using predefined support matrices. To further reduce computational burden, the distance term in the mutual-impedance expression is approximated by a central value under far-field assumptions, which allows the original integral formulation to be simplified into a compact analytical expression. Based on the resulting channel model, an efficient optimization method for RIS tunable impedances is developed. Through impedance decomposition, a closed-form expression for the optimal tunable-impedance matrix is derived, enabling low-complexity RIS configuration with computational cost independent of the number of REs.  Results and Discussions  The accuracy and computational efficiency of the proposed simplified models, as well as the effectiveness of the proposed impedance optimization method, are validated through numerical simulations. First, the two simplified models are evaluated against a reference model. The first simplified model accounts for mutual coupling among elements separated by at most one intermediate unit, whereas the second model considers only immediately adjacent elements. Results indicate that channel gain increases as element spacing decreases, with faster growth observed at smaller spacings (Fig. 4). The modeling error between the simplified models and the reference model remains below 0.1 when the spacing does not exceed λ/4, but increases noticeably at larger spacings. Error curves further show that the modeling errors of both simplified models become negligible when the spacing is below λ/4, indicating that the second model can be adopted to further reduce complexity (Fig. 6). Second, the computational complexity of the proposed models is compared with that of the reference model. When the number of REs exceeds four, the complexity of computing the mutual-coupling matrix in the reference model exceeds that of the proposed neighboring mutual-coupling model. As the number of REs increases, the complexity of the reference model grows rapidly, whereas that of the proposed model remains constant (Fig. 5). Finally, the proposed impedance optimization method is compared with two benchmark methods (Fig. 7, Fig. 8). When the element spacing is no greater than λ/4, the channel gain achieved by the proposed method approaches that of the benchmark method. As the spacing increases beyond this range, a clear performance gap emerges. In all cases, the proposed method yields higher channel gain than the coherent phase-shift optimization method.  Conclusions  The integration of a large number of densely arranged REs in an RIS introduces notable mutual coupling effects, which can substantially influence system performance and therefore must be considered in channel modeling and impedance optimization. A simplified mutual-coupling-aware channel model based on a neighboring mutual-coupling matrix has been proposed, together with an efficient tunable-impedance optimization method. By combining the neighboring mutual-coupling matrix with a simplified mutual-impedance expression derived under far-field assumptions, a low-complexity channel model is obtained. Based on this model, a closed-form solution for the optimal RIS tunable impedances is derived using impedance decomposition. Simulation results confirm that the proposed channel model and optimization method maintain satisfactory accuracy and effectiveness when the element spacing does not exceed λ/4. The proposed framework provides practical theoretical support and useful design guidance for analyzing and optimizing RIS-assisted systems under mutual coupling effects.
Security Protection for Vessel Positioning in Smart Waterway Systems Based on Extended Kalman Filter–Based Dynamic Encoding
TANG Fengjian, YAN Xia, SUN Zeyi, ZHU Zhaowei, YANG Wen
Available online  , doi: 10.11999/JEIT250846
Abstract:
  Objective  With the rapid development of intelligent shipping systems, vessel positioning data face severe privacy leakage risks during wireless transmission. Traditional privacy-preserving methods, such as differential privacy and homomorphic encryption, suffer from data distortion, high computational overhead, or reliance on costly communication links, making it difficult to achieve both data integrity and efficient protection. This study addresses the characteristics of vessel stabilization systems and proposes a dynamic encoding scheme enhanced by time-varying perturbations. By integrating the Extended Kalman Filter (EKF) and introducing unstable temporal perturbations during encoding, the scheme uses receiver-side acknowledgments (ACK feedback) to achieve reference-time synchronization and independently generates synchronized perturbations through a shared random seed. Theoretical analysis and simulations show that the proposed method achieves nearly zero precision loss in state estimation for legitimate receivers, whereas decoding errors of eavesdroppers grow exponentially after a single packet loss, effectively countering both single- and multi-channel eavesdropping attacks. The shared-seed synchronization mechanism avoids complex key management and reduces communication and computational costs, making the scheme suitable for resource-constrained maritime wireless sensor networks.  Methods  The proposed dynamic encoding scheme introduces a time-varying perturbation term into the encoding process. The perturbation is governed by an unstable matrix to induce exponential error growth for eavesdroppers. The encoded signal is constructed from the difference between the current state estimate and a time-scaled reference state, combined with the perturbation term. A shared random seed between legitimate parties enables deterministic and synchronized generation of the perturbation sequence without online key exchange. At the legitimate receiver, the perturbation is canceled during decoding, enabling accurate state recovery. Local state estimation at each sensor node is performed using EKF, and the overall communication process is reinforced by acknowledgment-based synchronization to maintain consistency between the sender and receiver.  Results and Discussions  Simulations are conducted in a wireless sensor network with four sensors tracking vessel states, including position, velocity, and heading. The results indicate that legitimate receivers achieve nearly zero estimation error (Fig. 3), whereas eavesdroppers exhibit exponentially increasing errors after a single packet loss (Fig. 4). The error growth rate depends on the instability of the perturbation matrix, confirming the theoretical divergence. In multi-channel scenarios, independent perturbation sequences for each channel prevent cross-channel correlation attacks (Fig. 5). The scheme maintains low communication and computational overhead, making it practical for maritime environments. Furthermore, the method shows strong robustness to packet loss and channel variations, satisfying SOLAS requirements for data integrity and reliability.  Conclusions  A dynamic encoding scheme with time-varying perturbations is proposed for privacy-preserving vessel state estimation. By integrating EKF with an unstable perturbation mechanism, the method ensures high estimation precision for legitimate users and exponential error growth for eavesdroppers. The main contributions are as follows: (1) an encoding framework that achieves zero precision loss for legitimate receivers; (2) a lightweight synchronization mechanism based on shared random seeds, which removes complex key management; and (3) theoretical guarantees of exponential error divergence for eavesdroppers under single- or multi-channel attacks. The scheme is robust to packet loss and channel asynchrony, complies with SOLAS data integrity requirements, and is suitable for resource-limited maritime networks. Future work will extend the method to nonlinear vessel dynamics, adaptive perturbation optimization, and validation in real maritime communication environments.
Design of Dynamic Resource Awareness and Task Offloading Schemes in Multi-Access Edge Computing Networks
ZHANG Bingxue, LI Xisheng, YOU Jia
Available online  , doi: 10.11999/JEIT250640
Abstract:
  Objective  With the development of industrial Internet of Things and the widespread use of multi-mode terminal equipment, multi-access edge computing has become a key technology to support low delay and energy-efficient industrial applications. The task offloading mechanism of edge computing is the core method to solve the large number and complex task processing requirements of multi-mode terminals. In the multi-access edge computing system, the network selection of end users has a great impact on the offloading mechanism and resource allocation. However, the existing network selection mechanism focuses on the user's selection decision, and ignores the impact of user’s task execution, task data offloading transmission and processing on network performance. For the research on the formulation of task offloading mechanism, the existing research focuses on the offloading delay, energy consumption optimization and resource allocation, ignoring the impact of multi-access heterogeneous network collaborative computing on resource costs and the dynamic resource balance between heterogeneous networks. In order to meet these challenges, this paper considers the impact of users’ diverse needs and heterogeneous resource providers’ differentiated capabilities on the decision-making of offloading in a complex computing environment, and makes the decision-making of user task execution cost optimization and rational allocation of dynamic resources in multi-access heterogeneous networks, so as to reduce the system operation cost, improve the quality of service, and efficiently and cooperatively utilize heterogeneous resources.  Methods  According to the multi-access edge computing network model, this paper establishes the cost calculation model for the task execution time, energy consumption and communication resource consumption of different networks for the end-user task selection. Based on the auction theory, it establishes the cost-effective model of computing task evaluation and bidding for the interaction between users and edge servers, and establishes the objective optimization problem according to the combinatorial two-way auction theory. Then, a dynamic resource sensing and task offloading algorithm based on auction mechanism is proposed. Through the two-way broadcast of the task information to be accessed and the required resources, network selection judgment and dynamic resource allocation are carried out. Only when the available resources meet the user resource constraints can the server offer effective bidding. An effective bidding edge server is proposed to compete for the opportunity of user task execution until the user obtains an optimal bidding and corresponding server to complete the auction matching process of the user task.  Results and Discussions  The dynamic resource allocation and task offloading algorithm based on auction mechanism considers the heterogeneous network status and resource usage, and selects the task offloading location according to the resource allocation. By setting the simulation system parameters, the edge computing model of heterogeneous wireless network cooperation is constructed, and the impact of network size on task offload cost and task offload data volume is analyzed. The simulation results show that the dynamic resource allocation and task offloading algorithm based on auction mechanism can reduce the system cost by at least 5% compared with other benchmark algorithms (Fig. 3), which is more obvious when there are more end users. Changes in the number of servers in heterogeneous networks have a certain impact on users' selection of a network for task offloading (Fig. 4, 5, 6). Under different algorithms, the proposed algorithm has a 10% improvement in the amount of task offload data compared with the benchmark algorithm (Fig. 7. 8). Finally, the impact of the change of the communication resource cost parameter on the user’s choice of 5G public network for task offloading is studied. The larger the communication cost parameter, the amount of data processed by the end user’s choice of 5G public network offloading task is significantly reduced (Fig. 9).  Conclusions  Aiming at the complex data processing requirements of multi-mode terminals, this paper constructs a multi-access edge computing cooperation network architecture for multi-mode terminals. The flexible and intelligent selection of wireless communication network by multi-mode terminals provides more resources for end-user task offloading. A server bidding and user target bidding model is established based on the auction model, and a dynamic resource perception and task unloading algorithm based on the auction mechanism is proposed to offload multi-mode terminal tasks, network selection and resource allocation. The algorithm first dynamically adjusts and selects the offloading network and allocates computing and communication resources according to the access tasks, and then selects the task offloading location with the minimum execution cost according to the bidding competition of each edge server. The results show that the proposed algorithm can effectively reduce the system cost compared with the benchmark algorithm, and improve the amount of data offloading from end-user tasks to multi edge servers, make full use of edge computing resources, and improve the system energy efficiency and operation efficiency.
A Neural Network-Based Robust Direction Finding Algorithm for Mixed Circular and Non-Circular Signals Under Array Imperfections
YU Qi, YIN Jiexin, LIU Zhengwu, WANG Ding
Available online  , doi: 10.11999/JEIT250884
Abstract:
  Objective   Direction Of Arrival (DOA) estimation is affected by low Signal-to-Noise Ratios (SNR), the coexistence of Circular Signals (CSs) and Non-Circular Signals (NCSs), and multiple forms of array imperfections. Conventional subspace-based estimators exhibit model mismatch in such environments and show reduced accuracy. Although neural-network methods provide data-driven alternatives, the effective use of the distinctive statistical properties of NCSs and the maintenance of robustness against diverse array errors remain insufficiently addressed. The objective is to design a DOA estimation algorithm that operates reliably for mixed CSs and NCSs in the presence of array imperfections and provides improved estimation accuracy in challenging operating conditions.  Methods   A robust DOA estimation algorithm is proposed based on an improved Vision Transformer (ViT) model. A six-channel image-like input is first constructed by fusing features derived from the covariance matrix and pseudo-covariance matrix of the received signal. These channels include the real component, imaginary component, magnitude, phase, magnitude ratio reflecting the NCS characteristic, and the phase of the pseudo-covariance matrix. A gradient-masking mechanism is introduced to adaptively fuse core and auxiliary features. The ViT architecture is then modified: the standard patch-embedding module is replaced with a convolutional layer to extract local information, and a dual-class-token attention mechanism, placed at the sequence head and tail, is designed to enhance feature representation. A standard Transformer encoder is used for deep feature learning, and DOA estimation is performed through a multi-label classification head.  Results and Discussions   Extensive simulations are carried out to assess the proposed algorithm (6C-ViT) against MUSIC, NC-MUSIC, a Convolutional Neural Network (6C-CNN), a Residual Network (6C-ResNet), and a MultiLayer Perceptron (6C-MLP). Performance is evaluated using Root Mean Square Error (RMSE) and angular estimation error under different operating conditions. Under single-source scenarios with low SNR and no array errors, 6C-ViT achieves near-zero RMSE across most angles and shows minor edge deviations (Fig. 2). It maintains the lowest RMSE across the SNR range from –20 dB to 15 dB (Fig. 3), indicating good generalization to unseen SNR levels. In dual-source scenarios containing mixed CS and NCSs under array errors, 6C-ViT shows clear advantages. Its estimation errors fluctuate slightly around zero, whereas competing techniques present larger errors and pronounced instabilities, especially near array edges (Fig. 4). Its RMSE decreases steadily as SNR increases and reaches below 0.1° at high SNR, while traditional approaches saturate around 0.4° (Fig. 5). Robust behavior is further observed across different numbers of signal sources (K = 1, 2, 3) and snapshot counts (100 to 2 000). 6C-ViT preserves high accuracy and stability under these variations, whereas other methods show marked degradation or instability, most evident at low snapshot counts or with multiple sources (Fig. 6). When evaluated using unknown modulation types, including UQPSK with a non-circularity rate of 0.6 and 64QAM, under array errors, 6C-ViT continues to produce the lowest RMSE across most angles (Fig. 7), demonstrating strong generalization capability. Ablation studies (Fig. 8) confirm the contributions of the six-channel input, the gradient masking module, the convolutional embedding, and the dual class token mechanism. The complete configuration yields the highest accuracy and the most stable performance.  Conclusions   Strong robustness is demonstrated in complex scenarios that contain mixed CS and NCSs, multiple array imperfections, low SNR, and closely spaced sources. By fusing multi-dimensional features of the received signal and using an enhanced Transformer architecture, the algorithm attains higher estimation accuracy and improved generalization across different signal types, error conditions, snapshot counts, and noise levels compared with subspace- and neural-network-based baselines. The method provides a reliable DOA estimation solution for demanding practical environments.
Dynamic State Estimation of Distribution Network by Integrating High-degree Cubature Kalman Filter and Long Short-Term Memory Under False Data Injection Attack
XU Daxing, SU Lei, HAN Heqiao, WANG Hailun, ZHANG Heng, CHEN Bo
Available online  , doi: 10.11999/JEIT250805
Abstract:
  Objective  Dynamic state estimation of distribution networks is presented as a core technique for maintaining secure and stable operation in cyber-physical power systems. Its practical performance is limited by strong system nonlinearity, high-dimensional state characteristics, and the threat posed by False Data Injection Attack (FDIA). A method that integrates High-degree Cubature Kalman Filter (HCKF) with Long Short-Term Memory network (LSTM) is proposed. HCKF is applied to enhance estimation precision in nonlinear high-dimensional scenarios. The estimation outputs from HCKF and Weighted Least Squares (WLS) are combined for rapid FDIA identification using residual-based analysis. The LSTM model is then employed to reconstruct measurement data of compromised nodes and refine state estimation results. The approach is validated on the IEEE 33-bus distribution system, demonstrating reliable accuracy enhancement and effective attack resilience.  Methods   The strong nonlinearity of distribution networks limits the estimation accuracy of dynamic methods based on the Cubature Kalman Filter (CKF). A hybrid measurement state estimation model that combines data from Phasor Measurement Unit (PMU) and Supervisory Control And Data Acquisition (SCADA) is established. HCKF is applied to enhance estimation performance in nonlinear, high-dimensional scenarios by generating higher-order cubature points. Under FDIA, the estimation outputs from WLS and HCKF are jointly assessed, allowing rapid intrusion detection through residual evaluation and state consistency checking. Once an attack is identified, an LSTM model performs time-series prediction to reconstruct the measurement data of compromised nodes. The reconstructed data replace abnormal values, enabling correction of the final state estimation.  Results and Discussions  Experiments on the IEEE 33-bus distribution system show that without FDIA, HCKF achieves higher estimation accuracy for voltage magnitude and phase angle than CKF. The Average voltage Relative Error (ARE) of voltage magnitude decreases by 57.9%, and the corresponding phase-angle error decreases by 28.9%, confirming the superiority of the method for strongly nonlinear and high-dimensional state estimation. Under FDIA, residual-based detection effectively identifies cyber attacks and avoids false alarms and missed detections. The prediction error of LSTM for the measurement data of compromised nodes and their associated branches remains on the order of 10–6, indicating high reconstruction fidelity. The combined HCKF and LSTM maintains stable state tracking after intrusion, and its performance exceeds that of WLS and adaptive Unscented Kalman Filter.  Conclusions  The dynamic state estimation method that integrates HCKF and LSTM enhances adaptability to strong nonlinearity and high-dimensional characteristics of distribution networks. Rapid and accurate FDIA identification is achieved through residual evaluation, and LSTM reconstructs the measurement data of compromised nodes with high reliability. The method maintains high estimation accuracy under normal operation and preserves stability and precision under cyber intrusion. It offers technical support for secure and stable operation of distribution networks in the presence of malicious attacks.
LLM-based Data Compliance Checking for Internet of Things Scenarios
LI Chaohao, WANG Haoran, ZHOU Shaopeng, YAN Haonan, ZHANG Feng, LU Tianyang, XI Ning, WANG Bin
Available online  , doi: 10.11999/JEIT250704
Abstract:
  Objective  The implementation of regulations such as the Data Security Law of the People’s Republic of China, the Personal Information Protection Law of the People’s Republic of China, and the European Union General Data Protection Regulation (GDPR) has established data compliance checking as a central mechanism for regulating data processing activities, ensuring data security, and protecting the legitimate rights and interests of individuals and organizations. However, the characteristics of the Internet of Things (IoT), defined by large numbers of heterogeneous devices and the dynamic, extensive, and variable nature of transmitted data, increase the difficulty of compliance checking. Logs and traffic data generated by IoT devices are long, unstructured, and often ambiguous, which results in a high false-positive rate when traditional rule-matching methods are applied. In addition, the dynamic business environments and user-defined compliance requirements further increase the complexity of rule design, maintenance, and decision-making.  Methods  A large language model-driven data compliance checking method for IoT scenarios is proposed to address the identified challenges. In the first stage, a fast regular expression matching algorithm is employed to efficiently screen potential non-compliant data based on a comprehensive rule database. This process produces structured preliminary checking results that include the original non-compliant content and the corresponding violation type. The rule database incorporates current legislation and regulations, standard requirements, enterprise norms, and customized business requirements, and it maintains flexibility and expandability. By relying on the efficiency of regular expression matching and generating structured preliminary results, this stage addresses the difficulty of reviewing large volumes of long IoT text data and enhances the accuracy of the subsequent large language model review. In the second stage, a Large Language Model (LLM) is employed to evaluate the precision of the initial detection results. For different categories of violations, the LLM adaptively selects different prompt words to perform differentiated classification detection.  Results and Discussions  Data are collected from 52 IoT devices operating in a real environment, including log and traffic data (Table 2). A compliance-checking rule library for IoT devices is established in accordance with the Cybersecurity Law, the Data Security Law, other relevant regulations, and internal enterprise information-security requirements. Based on this library, the collected data undergo a first-stage rule-matching process, yielding a false-positive rate of 64.3% and identifying 55 080 potential non-compliant data points. Three aspects are examined: benchmark models, prompt schemes, and role prompts. In the benchmark model comparison, eight mainstream large language models are used to evaluate detection performance (Table 5), including Qwen2.5-32B-Instruct, DeepSeek-R1-70B, and DeepSeek-R1-0528 with different parameter configurations. After review and testing by the large language model, the initial false-positive rate is reduced to 6.9%, which demonstrates a substantial improvement in the quality of compliance checking. The model’s own error rate remains below 0.01%. The prompt-engineering assessment shows that prompt design exerts a strong effect on review accuracy (Table 6). When general prompts are applied, the final false-positive rate remains high at 59%. When only chain-of-thought prompts or concise sample prompts are used, the false-positive rate is reduced to approximately 12% and 6%, respectively, and the model’s own error rate decreases to about 30% and 13%. Combining these strategies further reduces the error rate of the small-sample prompt approach to 0.01%. The effect of system-role prompt words on review accuracy is also evaluated (Table 7). Simple role prompts yield higher accuracy and F1 scores than the absence of role prompts, whereas detailed role prompts provide a clearer overall advantage than simple role prompts. Ablation experiments (Table 8) further examine the contribution of rule classification and prompt engineering to compliance checking. Knowledge supplementation is applied to reduce interference and misjudgment among rules, lower prompt redundancy, and decrease the false-alarm rate during large language model review.  Conclusions  A large language model-driven data compliance checking method for IoT scenarios is presented. The method is designed to address the challenge of assessing compliance in large-scale unstructured device data. Its feasibility is verified through rationality analysis experiments, and the results indicate that false-positive rates are effectively reduced during compliance checking. The initial rule-based method yields a false-positive rate of 64.3%, which is reduced to 6.9% after review by the large language model. Additionally, the error introduced by the model itself is maintained below 0.01%.
Finite-time Adaptive Sliding Mode Control of Servo Motors Considering Frictional Nonlinearity and Unknown Loads
ZHANG Tianyu, GUO Qinxia, YANG Tingkai, GUO Xiangji, MING Ming
Available online  , doi: 10.11999/JEIT250521
Abstract:
  Objective  Ultra-fast laser processing with an infinite field of view requires servo motor systems with superior tracking accuracy and robustness. However, such systems are highly nonlinear and affected by coupled unknown load disturbances and complex friction, which constrain the performance of conventional controllers. Although Sliding Mode Control (SMC) exhibits inherent robustness, traditional SMC and observer designs cannot achieve accurate finite-time disturbance compensation under strong nonlinearities, thus limiting high-speed and high-precision trajectory tracking. To address this limitation, a novel finite-time adaptive SMC approach is proposed to ensure rapid and precise angular position tracking within a finite time, satisfying the stringent synchronization requirements of advanced laser processing systems.  Methods  A novel control strategy is developed by integrating an adaptive disturbance observer fused with a Radial Basis Function Neural Network (RBFNN) and finite-time Sliding Mode Control (SMC). First, the unknown load disturbance and complex frictional nonlinear dynamics are combined into a unified "lumped disturbance" term, improving model generality and the ability to represent real operating conditions. Second, a finite-time adaptive disturbance observer is constructed to estimate this lumped disturbance. The observer utilizes the universal approximation capability of the RBFNN to learn and approximate the dynamic characteristics of unknown disturbances online. Simultaneously, a finite-time adaptive law based on the error norm is introduced to update the neural network weights in real time, ensuring rapid and accurate finite-time estimation of the lumped disturbance while reducing dependence on precise model parameters. Based on this design, a finite-time SMC is developed. The controller uses the observer’s disturbance estimation as a feedforward compensation term, incorporates a carefully formulated finite-time sliding surface and equivalent control law, and introduces a saturation function to suppress control input chattering. A suitable Lyapunov function is then constructed, and the finite-time stability theory is rigorously applied to prove the practical finite-time convergence of both the adaptive observer and the closed-loop control system, guaranteeing that the system tracking error converges to a bounded neighborhood near the origin within finite time.  Results and Discussions  To verify the effectiveness and superiority of the proposed control strategy, a typical Permanent Magnet Synchronous Motor (PMSM) servo system model is constructed in the MATLAB environment, and a simulation scenario with desired trajectories of varying frequencies is established. The proposed method is comprehensively compared with the widely used Proportional–Integral (PI) control and the advanced method reported in reference [7]. Simulation results demonstrate the following: 1. Tracking performance: Under various reference trajectories, the proposed controller enables the system to accurately follow the target trajectory with a tracking error substantially smaller than that of the PI controller. Compared with the method in reference [7], it achieves smoother responses and smaller residual errors, effectively eliminating the chattering observed in some operating conditions of the latter. 2 Disturbance rejection and robustness: The adaptive disturbance observer based on the RBFNN rapidly and effectively learns and compensates for the lumped disturbance composed of unknown load variations and frictional nonlinearities. Even in the presence of these disturbances, the proposed controller maintains high-precision trajectory tracking, demonstrating strong disturbance rejection and robustness to system parameter variations. 3. Control input characteristics: Compared with the reference methods, the control signal of the proposed approach quickly stabilizes after the initial transient phase, effectively suppressing chattering caused by high-frequency switching. The amplitude range of the control input remains reasonable, facilitating practical actuator implementation. 4. Comprehensive evaluation: Based on multiple error performance indices, including Integral Squared Error (ISE), Integral Absolute Error (IAE), Time-weighted Integral Absolute Error (ITAE), and Time-weighted Integral Squared Error (ITSE), the proposed controller consistently outperforms both PI control and the method in reference [7]. It demonstrates comprehensive advantages in suppressing transient errors rapidly and reducing overall error accumulation. The method also improves steady-state accuracy and achieves a balanced response speed with effective noise attenuation. 5. Observer performance: The RBFNN weight norm estimation converges rapidly and stabilizes at a low level after initial adaptation, confirming the effectiveness of the proposed adaptive law and the learning efficiency of the observer.  Conclusions  A finite-time sliding mode control strategy with an adaptive disturbance observer is proposed for servo systems used in ultra-fast laser processing. The method models unknown load disturbances and frictional nonlinearities as a lumped disturbance term. An adaptive observer, integrating an RBF neural network with a finite-time mechanism, accurately estimates this disturbance for real-time compensation. Based on the observer, a finite-time SMC law is formulated, and the practical finite-time stability of the closed-loop system is theoretically proven. Simulations conducted on a permanent magnet synchronous motor platform confirm that the proposed approach achieves superior tracking accuracy, robustness, and control smoothness compared with conventional PI and existing advanced methods. This work offers an effective solution for achieving high-precision control in nonlinear systems subject to strong disturbances.
A Learning-Based Security Control Method for Cyber-Physical Systems Based on False Data Detection
MIAO Jinzhao, LIU Jinliang, SUN Le, ZHA Lijuan, TIAN Engang
Available online  , doi: 10.11999/JEIT250537
Abstract:
  Objective  Cyber-Physical Systems (CPS) constitute the backbone of critical infrastructures and industrial applications, but the tight coupling of cyber and physical components renders them highly susceptible to cyberattacks. False data injection attacks are particularly dangerous because they compromise sensor integrity, mislead controllers, and can trigger severe system failures. Existing control strategies often assume reliable sensor data and lack resilience under adversarial conditions. Furthermore, most conventional approaches decouple attack detection from control adaptation, leading to delayed or ineffective responses to dynamic threats. To overcome these limitations, this study develops a unified secure learning control framework that integrates real-time attack detection with adaptive control policy learning. By enabling the dynamic identification and mitigation of false data injection attacks, the proposed method enhances both stability and performance of CPS under uncertain and adversarial environments.  Methods  To address false data injection attacks in CPS, this study proposes an integrated secure control framework that combines attack detection, state estimation, and adaptive control strategy learning. A sensor grouping-based security assessment index is first developed to detect anomalous sensor data in real time without requiring prior knowledge of attacks. Next, a multi-source sensor fusion estimation method is introduced to reconstruct the system’s true state, thereby improving accuracy and robustness under adversarial disturbances. Finally, an adaptive learning control algorithm is designed, in which dynamic weight updating via gradient descent approximates the optimal control policy online. This unified framework enhances both steady-state performance and resilience of CPS against sophisticated attack scenarios. Its effectiveness and security performance are validated through simulation studies under diverse false data injection attack settings.  Results and Discussions  Simulation results confirm the effectiveness of the proposed secure adaptive learning control framework under multiple false data injection attacks in CPS. As shown in Fig. 1, system states rapidly converge to steady values and maintain stability despite sensor attacks. Fig. 2 demonstrates that the fused state estimator tracks the true system state with greater accuracy than individual local estimators. In Fig. 3, the compensated observation outputs align closely with the original, uncorrupted measurements, indicating precise attack estimation. Fig. 4 shows that detection indicators for sensor groups 2–5 increase sharply during attack intervals, while unaffected sensors remain near zero, verifying timely and accurate detection. Fig. 5 further confirms that the estimated attack signals closely match the true injected values. Finally, Fig. 6 compares different control strategies, showing that the proposed method achieves faster stabilization and smaller state deviations. Together, these results demonstrate robust control, accurate state estimation, and real-time detection under unknown attack conditions.  Conclusions  This study addresses secure perception and control in CPS under false data injection attacks by developing an integrated adaptive learning control framework that unifies detection, estimation, and control. A sensor-level anomaly detection mechanism is introduced to identify and localize malicious data, substantially enhancing attack detection capability. The fusion-based state estimation method further improves reconstruction accuracy of true system states, even when observations are compromised. At the control level, an adaptive learning controller with online weight adjustment enables real-time approximation of the optimal control policy without requiring prior knowledge of the attack model. Future research will extend the proposed framework to broader application scenarios and evaluate its resilience under diverse attack environments.
A Two-Stage Framework for CAN Bus Attack Detection by Fusing Temporal and Deep Features
TAN Mingming, ZHANG Heng, WANG Xin, LI Ming, ZHANG Jian, YANG Ming
Available online  , doi: 10.11999/JEIT250651
Abstract:
  Objective  The Controller Area Network (CAN), the de facto standard for in-vehicle communication, is inherently vulnerable to cyberattacks. Existing Intrusion Detection Systems (IDSs) face a fundamental trade-off: achieving fine-grained classification of diverse attack types often requires computationally intensive models that exceed the resource limitations of on-board Electronic Control Units (ECUs). To address this problem, this study proposes a two-stage attack detection framework for the CAN bus that fuses temporal and deep features. The framework is designed to achieve both high classification accuracy and computational efficiency, thereby reconciling the tension between detection performance and practical deployability.  Methods  The proposed framework adopts a “detect-then-classify” strategy and incorporates two key innovations. (1) Stage 1: Temporal Feature-Aware Anomaly Detection. Two custom features are designed to quantify anomalies: Payload Data Entropy (PDE), which measures content randomness, and ID Frequency Mean Deviation (IFMD), which captures behavioral deviations. These features are processed by a Bidirectional Long Short-Term Memory (BiLSTM) network that exploits contextual temporal information to achieve high-recall anomaly detection. (2) Stage 2: Deep Feature-Based Fine-Grained Classification. Triggered only for samples flagged as anomalous, this stage employs a lightweight one-dimensional ParC1D-Net. The core ParC1D Block (Fig. 4) integrates depthwise separable one-dimensional convolution, Squeeze-and-Excitation (SE) attention, and a Feed-Forward Network (FFN), enabling efficient feature extraction with minimal parameters. Stage 1 is optimized using BCEWithLogitsLoss, whereas Stage 2 is trained with Cross-Entropy Loss.  Results and Discussions  The efficacy of the proposed framework is evaluated on public datasets. (1) State-of-the-art performance. On the Car-Hacking dataset (Table 5), an accuracy and F1-score of 99.99% are achieved, exceeding advanced baselines. On the more challenging Challenge dataset (Table 6), superior accuracy (99.90%) and a competitive F1-score (99.70% are also obtained. (2) Feature contribution analysis. Ablation studies (Tables 7 and 8) confirm the critical role of the proposed features. Removal of the IFMD feature results in the largest performance reduction, highlighting the importance of behavioral modeling. A synergistic effect is observed when PDE and IFMD are applied together. (3) Spatiotemporal efficiency. The complete model remains lightweight at only 0.39 MB. Latency tests (Table 9) demonstrate real-time capability, with average detection times of 0.62 ms on a GPU and 0.93 ms on a simulated CPU (batch size = 1). A system-level analysis (Section 3.5.4) further shows that the two-stage framework is approximately 1.65 times more efficient than a single-stage model in a realistic sparse-attack scenario.  Conclusions  This study establishes the two-stage framework as an effective and practical solution for CAN bus intrusion detection. By decoupling detection from classification, the framework resolves the trade-off between accuracy and on-board deployability. Its strong performance, combined with a minimal computational footprint, indicates its potential for securing real-world vehicular systems. Future research could extend the framework and explore hardware-specific optimizations.
Data-Driven Secure Control for Cyber-Physical Systems under Denial-of-Service Attacks: An Online Mode-Dependent Switching-Q-Learning Strategy
ZHANG Ruifeng, YANG Rongni
Available online  , doi: 10.11999/JEIT250746
Abstract:
  Objective   The open network architecture of cyber-physical systems (CPSs) enables remarkable flexibility and scalability, but it also renders CPSs highly vulnerable to cyber-attacks. Particularly, denial-of-service (DoS) attacks have emerged as one of the predominant threats, which can cause packet loss and reduce system performance by directly jamming channels. On the other hand, CPSs under dormant and active DoS attacks can be regarded as dual-mode switched systems with stable and unstable subsystems, respectively. Therefore, it is worth exploring how to utilize the switched system theory to design a secure control approach with high degrees of freedom and low conservatism. However, due to the influence of complex environments such as attacks and noises, it is difficult to model practical CPSs exactly. Currently, although a Q-learning-based control method demonstrates potential for handling unknown CPSs, the significant research gap exists in switched systems with unstable modes, particularly for establishing the evaluable stability criterion. Therefore, it remains to be investigated for unknown CPSs under DoS attacks to apply switched system theory to design the learning-based control algorithm and evaluable security criterion.   Methods   An online mode-dependent switching-Q-learning strategy is presented to study the data-driven evaluable criterion and secure control for unknown CPSs under DoS attacks. Initially, the CPSs under dormant and active DoS attacks are transformed into switched systems with stable and unstable subsystems, respectively. Subsequently, the optimal control problem of the value function is addressed for the model-based switched systems by designing a new generalized switching algebraic Riccati equation (GSARE) and obtaining the corresponding mode-dependent optimal security controller. Furthermore, the existence and uniqueness of the GSARE’s solution are proved. In what follows, with the help of model-based results, a data-driven optimal security control law is proposed by developing a novel online mode-dependent switching-Q-learning control algorithm. Finally, through utilizing the learned control gain and parameter matrices from the above algorithm, a data-driven evaluable security criterion with the attack frequency and duration is established based on the switching constraints and subsystem constraints.   Results and Discussions   In order to verify the efficiency and advantage of the proposed methods, comparative experiments of the wheeled robot are displayed in this work. Firstly, compare the model-based result (Theorem 1) and the data-driven result (Algorithm 1) as follows: From the iterative process curves of control gain and parameter matrices (Fig. 2 and Fig. 3), it can be observed that the optimal control gain and parameter matrices under threshold errors can all be successfully obtained from both the model-based GSARE and the data-driven algorithm. Meanwhile, the tracking errors of CPSs can converge to 0 by utilizing the above data-driven controller (Fig. 5), which ensures the exponential stability of CPSs and verifies the efficiency of our proposed switching-Q-learning algorithm. Secondly, it is evident from learning process curves (Fig.4) that although the initial value of the learned control gain is not stabilizable, the optimal control gain can still be successfully learned to stabilize the system from Algorithm 1. This result significantly reduces conservatism compared to existing Q-learning approaches, which take stabilizable initial control gains as the learning premise. Thirdly, compare the data-driven evaluable security criterion in Theorem 2 of this work and existing criteria as follows: While the switching parameters learned from Algorithm 1 do not satisfy the popular switching constraint to obtain the model dwell-time, by utilizing the evaluable security criterion proposed in this paper, the attack frequency and duration are obtained based on the new switching constraints and subsystem constraints. Furthermore, it is seen from the comparison of the evaluable security criteria (Tab.1) that our proposed evaluable security criterion is less conservative than the existing evaluable criteria. Finally, the learned optimal controller and the obtained DoS attack constraints are applied to the tracking control experiment of a wheeled robot under DoS attacks, and the result is compared with existing results via Q-learning controllers. It is evident from the tracking trajectory comparisons of the robot (Fig.6 and Fig.7) that the robot enables significantly faster and more accurate trajectory tracking with the help of our proposed switching-Q-learning controller. Therefore, the efficiency and advantage of the proposed algorithm and criterion in this work are verified.   Conclusions   Based on the learning strategy and the switched system theory, this study presents an online mode-dependent switching-Q-learning control algorithm and the corresponding evaluable security criterion for the unknown CPSs under DoS attacks. The detailed results are provided as follows: (1) By representing the unknown CPSs under dormant and active DoS attacks as unknown switched systems with stable and unstable subsystems, respectively, the security problem of CPSs under DoS attacks is transformed into a stabilization problem of the switched systems, which offers high design freedom and low conservatism. (2) A novel online mode-dependent switching-Q-learning control algorithm is developed for unknown switched systems with unstable modes. Through the comparative experiments, the proposed switching-Q-learning algorithm effectively increases the design freedom of controllers and decreases conservatism over existing Q-learning algorithms. (3) A new data-driven evaluable security criterion with the attack frequency and duration is established based on the switching constraints and subsystem constraints. It is evident from the comparative criteria that the proposed criterion demonstrates significantly reduced conservatism over existing evaluable criteria via single subsystem constraints and traditional model dwell-time constraints.
Entropy Quantum Collaborative Planning Method for Emergency Path of Unmanned Aerial Vehicles Driven by Survival Probability
WANG Enliang, ZHANG Zhen, SUN Zhixin
Available online  , doi: 10.11999/JEIT250694
Abstract:
  Objective  Natural disaster emergency rescue places stringent requirements on the timeliness and safety of Unmanned Aerial Vehicle (UAV) path planning. Conventional optimization objectives, such as minimizing total distance, often fail to reflect the critical time-sensitive priority of maximizing the survival probability of trapped victims. Moreover, existing algorithms struggle with the complex constraints of disaster environments, including no-fly zones, caution zones, and dynamic obstacles. To address these challenges, this paper proposes an Entropy-Enhanced Quantum Ripple Synergy Algorithm (E2QRSA). The primary goals are to establish a survival probability maximization model that incorporates time decay characteristics and to design a robust optimization algorithm capable of efficiently handling complex spatiotemporal constraints in dynamic disaster scenarios.  Methods  E2QRSA enhances the Quantum Ripple Optimization framework through four key innovations: (1) information entropy–based quantum state initialization, which guides population generation toward high-entropy regions; (2) multi-ripple collaborative interference, which promotes beneficial feature propagation through constructive superposition; (3) entropy-driven parameter control, which dynamically adjusts ripple propagation according to search entropy rates; and (4) quantum entanglement, which enables information sharing among elite individuals. The model employs a survival probability objective function that accounts for time-sensitive decay, base conditions, and mission success probability, subject to constraints including no-fly zones, warning zones, and dynamic obstacles.  Results and Discussions  Simulation experiments are conducted in medium- and large-scale typhoon disaster scenarios. The proposed E2QRSA achieves the highest survival probabilities of 0.847 and 0.762, respectively (Table 1), exceeding comparison algorithms such as SEWOA and PSO by 4.2–16.0%. Although the paths generated by E2QRSA are not the shortest, they are the most effective in maximizing survival chances. The ablation study (Table 3) confirms the contribution of each component, with the removal of multi-ripple interference causing the largest performance decrease (9.97%). The dynamic coupling between search entropy and ripple parameters (Fig. 2) is validated, demonstrating the effectiveness of the adaptive control mechanism. The entanglement effect (Fig. 4) is shown to maintain population diversity. In terms of constraint satisfaction, E2QRSA-planned paths consume only 85.2% of the total available energy (Table 5), ensuring a safe return, and all static and dynamic obstacles are successfully avoided, as visually verified in the 3D path plots (Figs. 6 and 7).  Conclusions  E2QRSA effectively addresses the challenge of UAV path planning for disaster relief by integrating adaptive entropy control with quantum-inspired mechanisms. The survival probability objective captures the essential requirements of disaster scenarios more accurately than conventional distance minimization. Experimental validation demonstrates that E2QRSA achieves superior solution quality and faster convergence, providing a robust technical basis for strengthening emergency response capabilities.
Secrecy Rate Maximization Algorithm for IRS Assisted UAV-RSMA Systems
WANG Zhengqiang, KONG Weidong, WAN Xiaoyu, FAN Zifu, DUO Bin
Available online  , doi: 10.11999/JEIT250452
Abstract:
  Objective  Under the stringent requirements of Sixth-Generation(6G) mobile communication networks for spectral efficiency, energy efficiency, low latency, and wide coverage, Unmanned Aerial Vehicle (UAV) communication has emerged as a key solution for 6G and beyond, leveraging its Line-of-Sight propagation advantages and flexible deployment capabilities. Functioning as aerial base stations, UAVs significantly enhance network performance by improving spectral efficiency and connection reliability, demonstrating irreplaceable value in critical scenarios such as emergency communications, remote area coverage, and maritime operations. However, UAV communication systems face dual challenges in high-mobility environments: severe multi-user interference in dense access scenarios that substantially degrades system performance, alongside critical physical-layer security threats resulting from the broadcast nature and spatial openness of wireless channels that enable malicious interception of transmitted signals. Rate-Splitting Multiple Access (RSMA) mitigates these challenges by decomposing user messages into common and private streams, thereby providing a flexible interference management mechanism that balances decoding complexity with spectral efficiency. This makes RSMA especially suitable for high-density user access scenarios. In parallel, Intelligent Reflecting Surfaces (IRS) have emerged as a promising technology to dynamically reconfigure wireless propagation through programmable electromagnetic unit arrays. IRS improves the quality of legitimate links while reducing the capacity of eavesdropping links, thereby enhancing physical-layer security in UAV communications. It is noteworthy that while existing research has predominantly centered on conventional multiple access schemes, the application potential of RSMA technology in IRS-assisted UAV communication systems remains relatively unexplored. Against this background, this paper investigates secure transmission strategies in IRS-assisted UAV-RSMA systems.  Methods  This paper investigates the effect of eavesdroppers on the security performance of UAV communication systems and proposes an IRS-assisted RSMA-based UAV communication model. The system comprises a multi-antenna UAV base station, an IRS mounted on a building, multiple single-antenna legitimate users, and multiple single-antenna eavesdroppers. The optimization problem is formulated to maximize the system secrecy rate by jointly optimizing precoding vectors, common secrecy rate allocation, IRS phase shifts, and UAV positioning. The problem is highly non-convex due to the strong coupling among these variables, rendering direct solutions intractable. To overcome this challenge, a two-layer optimization framework is developed. In the inner layer, with UAV position fixed, an alternating optimization strategy divides the problem into two subproblems: (1) joint optimization of precoding vectors and common secrecy rate allocation and (2) optimization of IRS phase shifts. Non-convex constraints are transformed into convex forms using techniques such as Successive Convex Approximation (SCA), relaxation variables, first-order Taylor expansion, and Semidefinite Relaxation (SDR). In the outer layer, the Particle Swarm Optimization (PSO) algorithm determines the UAV deployment position based on the optimized inner-layer variables.  Results and Discussions  Simulation results show that the proposed algorithm outperforms RSMA without IRS, NOMA with IRS, and NOMA without IRS in terms of secrecy rate. (Fig. 2) illustrates that the secrecy rate increases with the number of iterations and converges under different UAV maximum transmit power levels and antenna configurations. (Fig. 3) demonstrates that increasing UAV transmit power significantly enhances the secrecy rate for both the proposed and benchmark schemes. This improvement arises because higher transmit power strengthens the signal received by legitimate users, increasing their achievable rates and enhancing system secrecy performance. (Fig. 4) indicates that the secrecy rate grows with the number of UAV antennas. This improvement is due to expanded signal coverage and greater spatial degrees of freedom, which amplify effective signal strength in legitimate user channels. (Fig. 5) shows that both the proposed scheme and NOMA with IRS achieve higher secrecy rate as the number of IRS reflecting elements increases. The additional elements provide greater spatial degrees of freedom, improving channel gains for legitimate users and strengthening resistance to eavesdropping. In contrast, benchmark schemes operating without IRS assistance exhibit no performance improvement and maintain constant secrecy rate. This result highlights the critical role of the IRS in enabling secure communications. Finally, (Fig. 6) demonstrates the optimal UAV position when \begin{document}${P_{\max }} = 30{\text{ dBm}}$\end{document}. Deploying the UAV near the center of legitimate users and adjacent to the IRS minimizes the average distance to users, thereby reducing path loss and fully exploiting IRS passive beamforming. This placement strengthens legitimate signals while suppressing the eavesdropping link, leading to enhanced secrecy performance.  Conclusions  This study addresses secure communication scenarios with multiple eavesdroppers by proposing an IRS-assisted secure resource allocation algorithm for UAV-enabled RSMA systems. An optimization problem is formulated to maximize the system secrecy rate under multiple constraints, including UAV transmit power, by jointly optimizing precoding vectors, common rate allocation, IRS configurations, and UAV positioning. Due to the non-convex nature of the problem, a hierarchical optimization framework is developed to decompose it into two subproblems. These are effectively solved using techniques such as SCA, SDR, Gaussian randomization, and PSO. Simulation results confirm that the proposed algorithm achieves substantial secrecy rate gains over three benchmark schemes, thereby validating its effectiveness.
Breakthrough in Solving NP-Complete Problems Using Electronic Probe Computers
XU Jin, YU Le, YANG Huihui, JI Siyuan, ZHANG Yu, YANG Anqi, LI Quanyou, LI Haisheng, ZHU Enqiang, SHI Xiaolong, WU Pu, SHAO Zehui, LENG Huang, LIU Xiaoqing
Available online  , doi: 10.11999/JEIT250352
Abstract:
This study presents a breakthrough in addressing NP-complete problems using a newly developed Electronic Probe Computer (EPC60). The system employs a hybrid serial–parallel computational model and performs large-scale parallel operations through seven probe operators. In benchmark tests on 3-coloring problems in graphs with 2,000 vertices, EPC60 achieves 100% accuracy, outperforming the mainstream solver Gurobi, which succeeds in only 6% of cases. Computation time is reduced from 15 days to 54 seconds. The system demonstrates high scalability and offers a general-purpose solution for complex optimization problems in areas such as supply chain management, finance, and telecommunications.  Objective   NP-complete problems pose a fundamental challenge in computer science. As problem size increases, the required computational effort grows exponentially, making it infeasible for traditional electronic computers to provide timely solutions. Alternative computational models have been proposed, with biological approaches—particularly DNA computing—demonstrating notable theoretical advances. However, DNA computing systems continue to face major limitations in practical implementation.  Methods  Computational Model: EPC is based on a non-Turing computational model in which data are multidimensional and processed in parallel. Its database comprises four types of graphs, and the probe library includes seven operators, each designed for specific graph operations. By executing parallel probe operations, EPC efficiently addresses NP-complete problems.Structural Features:EPC consists of four subsystems: a conversion system, input system, computation system, and output system. The conversion system transforms the target problem into a graph coloring problem; the input system allocates tasks to the computation system; the computation system performs parallel operations via probe computation cards; and the output system maps the solution back to the original problem format.EPC60 features a three-tier hierarchical hardware architecture comprising a control layer, optical routing layer, and probe computation layer. The control layer manages data conversion, format transformation, and task scheduling. The optical routing layer supports high-throughput data transmission, while the probe computation layer conducts large-scale parallel operations using probe computation cards.  Results and Discussions  EPC60 successfully solved 100 instances of the 3-coloring problem for graphs with 2,000 vertices, achieving a 100% success rate. In comparison, the mainstream solver Gurobi succeeded in only 6% of cases. Additionally, EPC60 rapidly solved two 3-coloring problems for graphs with 1,500 and 2,000 vertices, which Gurobi failed to resolve after 15 days of continuous computation on a high-performance workstation.Using an open-source dataset, we identified 1,000 3-colorable graphs with 1,000 vertices and 100 3-colorable graphs with 2,000 vertices. These correspond to theoretical complexities of O(1.3289n) for both cases. The test results are summarized in Table 1.Currently, EPC60 can directly solve 3-coloring problems for graphs with up to n vertices, with theoretical complexity of at least O(1.3289n).On April 15, 2023, a scientific and technological achievement appraisal meeting organized by the Chinese Institute of Electronics was held at Beijing Technology and Business University. A panel of ten senior experts conducted a comprehensive technical evaluation and Q&A session. The committee reached the following unanimous conclusions:1. The probe computer represents an original breakthrough in computational models.2. The system architecture design demonstrates significant innovation.3. The technical complexity reaches internationally leading levels.4. It provides a novel approach to solving NP-complete problems.Experts at the appraisal meeting stated, “This is a major breakthrough in computational science achieved by our country, with not only theoretical value but also broad application prospects.” In cybersecurity, EPC60 has also demonstrated remarkable potential. Supported by the National Key R&D Program of China (2019YFA0706400), Professor Xu Jin’s team developed an automated binary vulnerability mining system based on a function call graph model. Evaluation of the system using the Modbus Slave software showed over 95% vulnerability coverage, far exceeding the 75 vulnerabilities detected by conventional depth-first search algorithms. The system also discovered a previously unknown flaw, the “Unauthorized Access Vulnerability in Changyuan Shenrui PRS-7910 Data Gateway” (CNVD-2020-31406), highlighting EPC60’s efficacy in cybersecurity applications.The high efficiency of EPC60 derives from its unique computational model and hardware architecture. Given that all NP-complete problems can be polynomially reduced to one another, EPC60 provides a general-purpose solution framework. It is therefore expected to be applicable in a wide range of domains, including supply chain management, financial services, telecommunications, energy, and manufacturing.  Conclusions   The successful development of EPC offers a novel approach to solving NP-complete problems. As technological capabilities continue to evolve, EPC is expected to demonstrate strong computational performance across a broader range of application domains. Its distinctive computational model and hardware architecture also provide important insights for the design of next-generation computing systems.
Personalized Federated Learning Method Based on Collation Game and Knowledge Distillation
SUN Yanhua, SHI Yahui, LI Meng, YANG Ruizhe, SI Pengbo
Available online  , doi: 10.11999/JEIT221203
Abstract:
To overcome the limitation of the Federated Learning (FL) when the data and model of each client are all heterogenous and improve the accuracy, a personalized Federated learning algorithm with Collation game and Knowledge distillation (pFedCK) is proposed. Firstly, each client uploads its soft-predict on public dataset and download the most correlative of the k soft-predict. Then, this method apply the shapley value from collation game to measure the multi-wise influences among clients and quantify their marginal contribution to others on personalized learning performance. Lastly, each client identify it’s optimal coalition and then distill the knowledge to local model and train on private dataset. The results show that compared with the state-of-the-art algorithm, this approach can achieve superior personalized accuracy and can improve by about 10%.
The Range-angle Estimation of Target Based on Time-invariant and Spot Beam Optimization
Wei CHU, Yunqing LIU, Wenyug LIU, Xiaolong LI
Available online  , doi: 10.11999/JEIT210265
Abstract:
The application of Frequency Diverse Array and Multiple Input Multiple Output (FDA-MIMO) radar to achieve range-angle estimation of target has attracted more and more attention. The FDA can simultaneously obtain the degree of freedom of transmitting beam pattern in angle and range. However, its performance is degraded due to the periodicity and time-varying of the beam pattern. Therefore, an improved Estimating Signal Parameter via Rotational Invariance Techniques (ESPRIT) algorithm to estimate the target’s parameters based on a new waveform synthesis model of the Time Modulation and Range Compensation FDA-MIMO (TMRC-FDA-MIMO) radar is proposed. Finally, the proposed method is compared with identical frequency increment FDA-MIMO radar system, logarithmically increased frequency offset FDA-MIMO radar system and MUltiple SIgnal Classification (MUSIC) algorithm through the Cramer Rao lower bound and root mean square error of range and angle estimation, and the excellent performance of the proposed method is verified.
Special Topic on Smart Healthcare and Engineering Innovation
Clinical Disease Risk Assessment System Based on Multi-source Genetic Information
NING Kaida, YU Zhengyang, ZHAO Xin, LI Ziyan, DAI Ju, XIA Li
Available online  , doi: 10.11999/JEIT251025
Abstract:
  Objective  Complex diseases are driven by polygenic inheritance and gene–environment interactions, resulting in highly heterogeneous pathogenic mechanisms and posing major challenges for both research and public health. Conventional single-trait polygenic risk scores (PRS) aggregate genetic variants associated with individual diseases but are limited by their neglect of cross-trait genetic correlations and nonlinear genetic interactions. Although multi-trait PRS approaches have been proposed to improve prediction accuracy, existing statistical-learning frameworks predominantly rely on linear integration of PRS features, failing to capture nonlinear interactions among single-nucleotide polymorphisms (SNPs) and to fully exploit shared genetic information across diseases. To address these limitations, we propose a nonlinear multi-source disease prediction framework, the SNP–PRS Fusion model, termed the mtSNPPRS_XGB (mtSNP-PRS XGBoost Integration Model).  Methods  The mtSNPPRS_XGB framework integrates raw SNP data of target traits with multi-trait PRS information to enhance genetic risk prediction for complex diseases through nonlinear modeling. SNPs significantly associated with target diseases were extracted from the GWAS Catalog (p < 5 × 10–8) and encoded as allele dosages (0/1/2), while PRS weights covering 80 traits were obtained from the PGS Catalog and used to compute individual PRS. After standardized preprocessing, SNP and PRS features were jointly fused and modeled using XGBoost to capture complex SNP–SNP and SNP–PRS interactions. This framework introduces two key innovations:(i) collaborative modeling of multi-trait genetic information by jointly leveraging disease-specific SNPs and cross-disease PRS, and (ii) systematic learning of nonlinear genetic interactions to overcome the linear constraints of conventional PRS-based models.  Results and Discussions   The mtSNPPRS_XGB model was evaluated using UK Biobank data across 18 complex diseases. It achieved an average AUC of 66.70%, representing improvements of 1.04% over the elastic-net-based model and 4.39% over the conventional UniPRS model. The inclusion of SNP features substantially improved predictive performance in diseases such as coronary heart disease, psoriasis, and celiac disease, while the integration of multi-trait PRS further enhanced specificity, particularly in cardiovascular, autoimmune, and cancer-related conditions. SHAP-based interpretability analyses demonstrated that mtSNPPRS_XGB simultaneously captures global cross-disease genetic liability encoded by PRS and disease-specific localized SNP effects, as illustrated in Alzheimer’s disease, colorectal cancer, gout, and ischemic stroke. These findings support both the biological plausibility and interpretability of the proposed framework.  Conclusions  We present a novel statistical learning–based multi-trait genetic risk prediction model, mtSNPPRS_XGB, which introduces an SNP–PRS fusion architecture and employs XGBoost to capture nonlinear interactions among multi-source genetic features. By integrating raw SNP data with multi-trait PRS, the proposed framework significantly improves risk prediction performance for complex diseases. Validation across 18 diseases in the UK Biobank demonstrates consistent performance gains over traditional PRS-based methods. This study overcomes the linear modeling limitations of conventional PRS approaches and provides a new paradigm for nonlinear integration of SNPs and multi-trait PRS, offering a robust and interpretable tool for personalized genetic risk prediction in precision medicine.
Evaluation of Domestic Large Language Models as Educational Tools for Cancer Patients
ZHANG Junli, XU Weiran, WANG Zhao
Available online  , doi: 10.11999/JEIT251056
Abstract:
  Objective  With the rapid increase in cancer incidence and mortality worldwide, patient education has become a critical strategy for reducing the disease burden and improving patient outcomes. However, traditional education methods, such as paper-based materials or face-to-face consultations, are limited by time, space, and personalization constraints. The emergence of large language models (LLMs) has opened new opportunities for delivering intelligent, scalable, and personalized health education. Although domestic LLMs, such as Doubao, Kimi, and DeepSeek have been widely applied in general scenarios, their utility in oncology education remains underexplored. This study aimed to systematically evaluate the performance of three domestic LLMs in cancer patient education across multiple dimensions, providing empirical evidence for their potential clinical application and optimization.  Methods  Frequently asked patient education questions were collected through group discussions with oncology nurses from a tertiary hospital. Nineteen oncology nurses with ≥1 year of clinical experience participated in item selection, and the ten most common questions were chosen, covering domains such as diet, nutrition, treatment, adverse drug reactions, and prognosis. Each question was independently input into Doubao (Pro, ByteDance, May 2024), Kimi (V1.1, Moonshot AI, Nov 2023), and DeepSeek (R1, DeepSeek AI, Jan 2025) under “new chat” conditions to avoid contextual interference. Responses were standardized to remove model identifiers and randomly coded. Quality evaluation followed a blinded design. Thirteen inpatients with cancer assessed responses for readability and effectiveness, while six senior oncologists rated responses for accuracy, comprehensiveness, and professionalism. A self-designed five-point Likert scale was used for each dimension. Statistical analyses were conducted using GraphPad Prism 9.5.1. One-way ANOVA with Bonferroni correction was applied for dimensional comparisons, while Welch’s ANOVA and Games-Howell post hoc tests were used for overall score analysis. Results were visualized with tables and radar plots.  Results and Discussions  Overall, the three models achieved mean total scores of 4.05±0.687 (Doubao), 4.17±0.791 (Kimi), and 4.19±0.640 (DeepSeek). Welch’s ANOVA showed significant overall differences (F=5.537, P=0.004). Games-Howell analysis revealed that Doubao performed significantly worse than Kimi and DeepSeek (P=0.005 and 0.042, respectively), while Kimi and DeepSeek did not differ significantly (P=0.975). From the patient perspective, Kimi outperformed its peers, achieving the highest scores in readability (4.615±0.534) and effectiveness (4.476±0.560), with statistically significant differences (P<0.05). Patients rated Kimi’s responses to lifestyle-related queries, such as managing nausea or loss of appetite during chemotherapy, as particularly clear and actionable. From the expert perspective, DeepSeek demonstrated superiority in accuracy (4.117±0.846), comprehensiveness (4.100±0.681), and professionalism (3.917±0.645), with significant advantages over Kimi (P<0.01) and moderate superiority over Doubao (P<0.05). DeepSeek was favored for handling technical and evidence-based questions, such as drug metabolism or integrative therapy evaluation. The divergence between patient and expert assessments highlighted a mismatch: the “most understandable”responses (Kimi) were not always the “most professional” (DeepSeek). This complementarity suggests that future research should explore layered output formats or dual verification mechanisms. Such approaches would balance readability with professional rigor, minimizing the risks of misinformation while improving accessibility. Despite promising findings, limitations exist. This single-center study involved a relatively small sample size, and only patients with lung and breast cancer were included. The evaluation simulated static Q&A interactions rather than dynamic multi-turn dialogues, which are more representative of real-world consultations. Additionally, technical enhancements such as retrieval-augmented generation (RAG), fine-tuning with oncology-specific corpora, and multi-agent collaboration were not implemented. Future studies should expand to multi-center designs, diverse cancer populations, and advanced LLM optimization methods.  Conclusions  Domestic LLMs demonstrated significant potential as tools for cancer patient education. Kimi excelled in communication and patient-centered knowledge translation, while DeepSeek showed strength in professional accuracy and comprehensiveness. Doubao, although moderate across all dimensions, lagged behind in overall performance. The results indicate that LLMs can complement traditional health education by bridging the gap between patient comprehension and clinical expertise.
Satellite Navigation
Research on GRI Combination Design of eLORAN System
LIU Shiyao, ZHANG Shougang, HUA Yu
Available online  , doi: 10.11999/JEIT201066
Abstract:
To solve the problem of Group Repetition Interval (GRI) selection in the construction of the enhanced LORAN (eLORAN) system supplementary transmission station, a screening algorithm based on cross interference rate is proposed mainly from the mathematical point of view. Firstly, this method considers the requirement of second information, and on this basis, conducts a first screening by comparing the mutual Cross Rate Interference (CRI) with the adjacent Loran-C stations in the neighboring countries. Secondly, a second screening is conducted through permutation and pairwise comparison. Finally, the optimal GRI combination scheme is given by considering the requirements of data rate and system specification. Then, in view of the high-precision timing requirements for the new eLORAN system, an optimized selection is made in multiple optimal combinations. The analysis results show that the average interference rate of the optimal combination scheme obtained by this algorithm is comparable to that between the current navigation chains and can take into account the timing requirements, which can provide referential suggestions and theoretical basis for the construction of high-precision ground-based timing system.