电子与信息学报

Monthly Journal Founded in 1979

The Source Journal of EI Compendex The Source Journal of ESCI Database

Competent unit：Authorized by CAS

Host unit：Hosted by AIRCAS (IECAS),
Department of Information Science of NNSFC

Editor-in-Chief：FU Kun

ISSN 1009-5896 CN 11-4494/TN

Online First
Current Issue
Archive
View List

Articles in press have been peer-reviewed and accepted, which are not yet assigned to volumes /issues, but are citable by Digital Object Identifier (DOI).

Display Method:

A Class of Double-twisted Generalized Reed-Solomon Codes and Their Extended Codes

CHENG Hongli, ZHU Shixin

doi: 10.11999/JEIT251045

[Abstract](71) [FullText HTML](31) [PDF 759KB](10)

Abstract:
Objective Twisted Generalized Reed-Solomon (TGRS) codes have attracted considerable attention in coding theory due to their flexible structural properties. However, studies on their extended codes remain limited. Existing results indicate that only a small number of works examine extended TGRS codes, leaving gaps in the understanding of their error-correcting capability, duality properties, and applications. In addition, previously proposed parity-check matrix forms for TGRS codes lack clarity and do not cover all parameter ranges. In particular, the case h = 0 is not addressed, which limits applicability in scenarios requiring diverse parameter settings. Constructing non-Generalized Reed-Solomon (non-GRS) codes is of interest because such codes resist Sidelnikov-Shestakov and Wieschebrink attacks, whereas GRS codes are vulnerable. Maximum Distance Separable (MDS) codes, self-orthogonal codes, and almost self-dual codes are valued for their error-correcting efficiency and structural properties. MDS codes achieve the Singleton bound and are essential for distributed storage systems that require data reliability under node failures. Self-orthogonal and almost self-dual codes, due to their duality structures, are applied in quantum coding, secret sharing schemes, and secure multi-party computation. Accordingly, this paper aims to: (1) characterize the MDS and Almost MDS (AMDS) properties of double-twisted GRS codes

\begin{document}$ {C}_{k,\boldsymbol{h},\boldsymbol{\eta }}(\boldsymbol{\alpha },\boldsymbol{v}) $\end{document}

and their extended codes

\begin{document}$ {C}_{k,\boldsymbol{h},\boldsymbol{\eta }}(\boldsymbol{\alpha },\boldsymbol{v},{\boldsymbol{\infty}}) $\end{document}

; (2) derive explicit and unified parity-check matrices for all valid parameter ranges, including h = 0; (3) establish non-GRS properties under specific parameter conditions; (4) provide necessary and sufficient conditions for self-orthogonality of the extended codes and almost self-duality of the original codes; and (5) construct a class of almost self-dual double-twisted GRS codes with flexible parameters for secure and reliable communication systems. Methods The study is based on algebraic coding theory and finite field methods. Explicit parity-check matrices are derived using properties of polynomial rings over

\begin{document}$ {F}_{q} $\end{document}

, Vandermonde matrix structures, and polynomial interpolation. The Schur product method is applied to determine non-GRS properties by comparing the dimensions of the Schur squares of the codes and their duals with those of GRS codes. Linear algebra and combinatorial techniques are used to characterize MDS and AMDS properties. Conditions are obtained by analyzing the nonsingularity of generator-matrix submatrices and solving systems involving symmetric sums of finite field elements. These conditions are expressed using the sets

\begin{document}$ {S}_{k}(\boldsymbol{\alpha },\boldsymbol{\eta }) $\end{document}

\begin{document}$ {L}_{k}(\boldsymbol{\alpha },\boldsymbol{\eta }) $\end{document}

, and

\begin{document}$ {D}_{k}(\boldsymbol{\alpha },\boldsymbol{\eta }) $\end{document}

. Duality theory is used to study orthogonality. A code C is self-orthogonal if

\begin{document}$ C\subseteq {C}^{\bot } $\end{document}

and its generator matrix satisfies

\begin{document}$ {\boldsymbol{G}}{{\boldsymbol{G}}}^{\rm T}=\boldsymbol{O} $\end{document}

. For almost self-dual codes with odd length and dimension-(n-1)/2, this condition is combined with the structure of the dual code and symmetric sum relations of αi to obtain necessary and sufficient conditions. Results and Discussions For MDS and AMDS properties, the following results are obtained. The extended double-twisted GRS code

\begin{document}$ {C}_{k,\boldsymbol{h},\boldsymbol{\eta }}(\boldsymbol{\alpha },\boldsymbol{v},{\boldsymbol{\infty}}) $\end{document}

is MDS if and only if

\begin{document}$ 1\notin {S}_{k}(\boldsymbol{\alpha },\boldsymbol{\eta }) $\end{document}

and

\begin{document}$ 1\notin {L}_{k}(\boldsymbol{\alpha },\boldsymbol{\eta }) $\end{document}

. The double-twisted GRS code

\begin{document}$ {C}_{k,\boldsymbol{h},\boldsymbol{\eta }}(\boldsymbol{\alpha },\boldsymbol{v}) $\end{document}

is AMDS if and only if

\begin{document}$ 1\in {S}_{k}(\boldsymbol{\alpha },\boldsymbol{\eta }) $\end{document}

and

\begin{document}$ (0,1)\notin {D}_{k}(\boldsymbol{\alpha },\boldsymbol{\eta }) $\end{document}

. The code

\begin{document}$ {C}_{k,\boldsymbol{h},\boldsymbol{\eta }}(\boldsymbol{\alpha },\boldsymbol{v}) $\end{document}

\begin{document}$ (0,1)\in {D}_{k}(\boldsymbol{\alpha },\boldsymbol{\eta }) $\end{document}

. Unified parity-check matrices of

\begin{document}$ {C}_{k,\boldsymbol{h},\boldsymbol{\eta }}(\boldsymbol{\alpha },\boldsymbol{v}) $\end{document}

and

\begin{document}$ {C}_{k,\boldsymbol{h},\boldsymbol{\eta }}(\boldsymbol{\alpha },\boldsymbol{v},{\boldsymbol{\infty}}) $\end{document}

are derived for all

\begin{document}$ 0\leq h\leq k-1 $\end{document}

, removing previous restrictions that exclude h = 0. For non-GRS properties, when

\begin{document}$ k\geq 4 $\end{document}

and

\begin{document}$ n-k\geq 4 $\end{document}

, both

\begin{document}$ {C}_{k,\boldsymbol{h},\boldsymbol{\eta }}(\boldsymbol{\alpha },\boldsymbol{v}) $\end{document}

and its extended code

\begin{document}$ {C}_{k,\boldsymbol{h},\boldsymbol{\eta }}(\boldsymbol{\alpha },\boldsymbol{v},{\boldsymbol{\infty}}) $\end{document}

are non-GRS for both

\begin{document}$ 2k\geq n $\end{document}

\begin{document}$ 2k \lt n $\end{document}

. This conclusion follows from the fact that the dimensions of their Schur squares exceed those of the corresponding GRS codes, which ensures resistance to Sidelnikov-Shestakov and Wieschebrink attacks. Regarding orthogonality, the extended code

\begin{document}$ {C}_{k,\boldsymbol{h},\boldsymbol{\eta }}(\boldsymbol{\alpha },\boldsymbol{v},{\boldsymbol{\infty}}) $\end{document}

with

\begin{document}$ h=k-1 $\end{document}

is self-orthogonal under specific algebraic conditions. The code

\begin{document}$ {C}_{k,\boldsymbol{h},\boldsymbol{\eta }}(\boldsymbol{\alpha },\boldsymbol{v}) $\end{document}

with

\begin{document}$ h=k-1 $\end{document}

and

\begin{document}$ n=2k+1 $\end{document}

is almost self-dual if and only if there exists

\begin{document}$ \lambda \in F_{q}^{*} $\end{document}

such that

\begin{document}$ \lambda {u}_{j}=v_{j}^{2} (j=1,\cdots ,2k+1) $\end{document}

together with a symmetric sum condition on

\begin{document}$ {\alpha }_{i} $\end{document}

involving

\begin{document}$ {\eta }_{1} $\end{document}

and

\begin{document}$ {\eta }_{2} $\end{document}

. For odd prime power

\begin{document}$ q $\end{document}

, an almost self-dual code with parameters

\begin{document}$ [q-t-1,(q-t-2)/2,\geq (q-t-2)/2] $\end{document}

is constructed using the roots of

\begin{document}$ m(x)=({x}^{q}-x)/f(x) $\end{document}

where

\begin{document}$ f(x)={x}^{t+1}-x $\end{document}

. An example over

\begin{document}$ {F}_{11} $\end{document}

yields a

\begin{document}$ [5,2,\geq 2] $\end{document}

code. Conclusions The study advances the theory of double-twisted GRS codes and their extensions through five contributions: (1) complete characterization of MDS and AMDS properties using sets

\begin{document}$ {S}_{k} $\end{document}

\begin{document}$ {L}_{k} $\end{document}

\begin{document}$ {D}_{k} $\end{document}

; (2) unified parity-check matrices for all

\begin{document}$ 0\leq h\leq k-1 $\end{document}

; (3) non-GRS properties are established for

\begin{document}$ k\geq 4 $\end{document}

, ensuring resistance to known structural attacks; (4) necessary and sufficient conditions for self-orthogonal extended codes and almost self-dual original codes are obtained; (5) a flexible construction of almost self-dual double-twisted GRS codes is proposed. These results extend the theoretical understanding of TGRS-type codes and support the design of secure and reliable coding systems.

Construction of Maximum Distance Separable Codes and Near Maximum Distance Separable Codes Based on Cyclic Subgroup of $ \mathbb{F}_{{q}^{2}}^{*} $

DU Xiaoni, XUE Jing, QIAO Xingbin, ZHAO Ziwei

doi: 10.11999/JEIT251204

[Abstract](65) [FullText HTML](27) [PDF 888KB](11)

Abstract:
Objective The demand for higher performance and efficiency in error-correcting codes has increased with the rapid development of modern communication technologies. These codes detect and correct transmission errors. Because of their algebraic structure, straightforward encoding and decoding, and ease of implementation, linear codes are widely used in communication systems. Their parameters follow classical bounds such as the Singleton bound: for a linear code with length

\begin{document}$ n $\end{document}

and dimension

\begin{document}$ k $\end{document}

, the minimum distance

\begin{document}$ d $\end{document}

satisfies

\begin{document}$ d\leq n-k+1 $\end{document}

. When

\begin{document}$ d=n-k+1 $\end{document}

, the code is a Maximum Distance Separable (MDS) code. MDS codes are applied in distributed storage systems and random error channels. If

\begin{document}$ d=n-k $\end{document}

, the code is Almost MDS (AMDS); when both a code and its dual are AMDS, the code is Near MDS (NMDS). NMDS codes have geometric properties that are useful in cryptography and combinatorics. Extensive research has focused on constructing structurally simple, high-performance MDS and NMDS codes. This paper constructs several families of MDS and NMDS codes of length

\begin{document}$ q+3 $\end{document}

over the finite field

\begin{document}$ {\mathbb{F}}_{{{q}^{2}}} $\end{document}

of even characteristic using the cyclic subgroup

\begin{document}$ {U}_{q+1} $\end{document}

. Several families of optimal Locally Repairable Codes (LRCs) are also obtained. LRCs support efficient failure recovery by accessing a small set of local nodes, which reduces repair overhead and improves system availability in distributed and cloud-storage settings. Methods In 2021, Wang et al. constructed NMDS codes of dimension 3 using elliptic curves over

\begin{document}$ {\mathbb{F}}_{q} $\end{document}

. In 2023, Heng et al. obtained several classes of dimension-4 NMDS codes by appending appropriate column vectors to a base generator matrix. In 2024, Ding et al. presented four classes of dimension-4 NMDS codes, determined the locality of their dual codes, and constructed four classes of distance-optimal and dimension-optimal LRCs. Building on these works, this paper uses the unit circle

\begin{document}$ {U}_{q+1} $\end{document}

\begin{document}$ {\mathbb{F}}_{{{q}^{2}}} $\end{document}

and elliptic curves to construct generator matrices. By augmenting these matrices with two additional column vectors, several classes of MDS and NMDS codes of length

\begin{document}$ q+3 $\end{document}

are obtained. The locality of the constructed NMDS codes is also determined, yielding several classes of optimal LRCs. Results and Discussions In 2023, Heng et al. constructed generator matrices with second-row entries in

\begin{document}$ \mathbb{F}_{q}^{*} $\end{document}

and with the remaining entries given by nonconsecutive powers of the second-row elements. In 2025, Yin et al. extended this approach by constructing generator matrices using elements of

\begin{document}$ {U}_{q+1} $\end{document}

and obtained infinite families of MDS and NMDS codes. Following this direction, the present study expands these matrices by appending two column vectors whose elements lie in

\begin{document}$ {\mathbb{F}}_{{{q}^{2}}} $\end{document}

. The resulting matrices generate several classes of MDS and NMDS codes of length

\begin{document}$ q+3 $\end{document}

. Several classes of NMDS codes with identical parameters but different weight distributions are also obtained. Computing the minimum locality of the constructed NMDS codes shows that some are optimal LRCs satisfying the Singleton-like, Cadambe–Mazumdar, Plotkin-like, and Griesmer-like bounds. All constructed MDS codes are Griesmer codes, and the NMDS codes are near Griesmer. These results show that the proposed constructions are more general and unified than earlier approaches. Conclusions This paper constructs several families of MDS and NMDS codes of length

\begin{document}$ q+3 $\end{document}

over

\begin{document}$ {\mathbb{F}}_{{{q}^{2}}} $\end{document}

using elements of the unit circle

\begin{document}$ {U}_{q+1} $\end{document}

and oval polynomials, and by appending two additional column vectors with entries in

\begin{document}$ {\mathbb{F}}_{q} $\end{document}

. The minimum locality of the constructed NMDS codes is analyzed, and some of these codes are shown to be optimal LRCs. The framework generalizes earlier constructions, and the resulting codes are optimal or near-optimal with respect to the Griesmer bound.

Research on Load Modulation Enhancement of Quasi-Ideal Doherty Power Amplifier with Equivalent Transconductance Compensation

HUA Jun, XU Gaoming, CHEN Jinghao, LU Siyang, YOU Leiyuan, LÜ Yan, LI Gang, SHI Weimin, LIU Taijun

doi: 10.11999/JEIT250789

[Abstract](109) [FullText HTML](60) [PDF 5350KB](6)

Abstract:
Objective Modern wireless communication systems require efficient dynamic-range performance in RF power amplifiers. The Doherty Power Amplifier (DPA), which uses dynamic load modulation between the main and auxiliary paths, achieves high efficiency at power backoff. It is widely applied in multi-carrier 4G and 5G macro base stations. Research on DPAs generally focuses on improving backoff efficiency, backoff range, and bandwidth. However, the architecture has a structural limitation because the auxiliary amplifier, biased in Class C, exhibits weak current output compared with the main amplifier biased in Class AB. The low conduction level and short turn-on period of the auxiliary path create nonlinear imbalance and reduce overall performance. Methods The study addresses insufficient load modulation caused by the weak current output capability of the auxiliary amplifier. An equivalent transconductance compensation theory is proposed. It compensates the current of the auxiliary amplifier under Class C bias by injecting a compensatory current into the branch. A load-modulation-enhanced quasi-ideal high-performance DPA is developed to resolve the inherent current deficiency in the auxiliary path of traditional configurations. Results and Discussions A load-modulation-enhanced DPA was designed and fabricated using the GaN HEMT device CG2H40010F for the 1.3

\begin{document}$ \sim $\end{document}

1.8 GHz band. Measurements show that the saturated output power ranges from 43.7 to 44.5 dBm and that the Drain Efficiency (DE) exceeds 69.1%. At a 6 dB backoff, the DE remains between 62.9% and 69.4% and the gain ranges from 9.7 to 10.5 dB. At a 9 dB backoff, the DE ranges from 49.5% to 57% and the gain ranges from 10.3 to 11.5 dB. The equivalent transconductance compensation theory resolves the load modulation bottleneck of traditional DPA structures through the current-injection mechanism. It provides meaningful guidance for broadband RF power-amplifier design with high backoff efficiency. Conclusions The study proposes an equivalent transconductance compensation method by adding a third compensation branch to the traditional DPA structure. This mechanism corrects the weak auxiliary-amplifier current caused by Class C bias and its short turn-on period, thereby achieving a quasi-ideal load-modulation-enhanced DPA. A device operating from 1.3 to 1.8 GHz was designed to validate the method. The measured saturated DE exceeds 69.1%. The DE ranges from 62.9% to 69.4% at a 6 dB backoff and from 49.5% to 57% at a 9 dB backoff. The linearized Adjacent Channel Leakage Ratio (ACLR) is lower than –49 dBc. These results verify the feasibility of the method and show strong application potential.

Adversarial Attacks on 3D Target Recognition Driven by Gradient Adaptive Adjustment

LIU Weiquan, SHEN Xiaoying, LIU Dunqiang, SUN Yanwen, CAI Guorong, ZANG Yu, SHEN Siqi, WANG Cheng

doi: 10.11999/JEIT251264

[Abstract](53) [FullText HTML](28) [PDF 4238KB](2)

Abstract:
Objective Robust environmental perception is essential for intelligent driving systems. Light Detection and Ranging (LiDAR) provides high-resolution 3D point cloud data and serves as a core information source for object detection and recognition. However, deep learning models for 3D point cloud recognition show notable vulnerability to adversarial attacks. Small, imperceptible perturbations can cause severe classification errors and threaten system safety. Existing attack methods have improved the Attack Success Rate (ASR), but the perturbations they generate often lack concealment, create outliers, and show poor imperceptibility because they do not adequately preserve the geometric structure of point clouds. This reduces their suitability for realistic security evaluation of optoelectronic perception systems. Developing an attack method that maintains a high success rate while preserving geometric consistency and imperceptibility is therefore critical. This study addresses this need by proposing a framework that incorporates point cloud geometry into perturbation generation. Methods A Gradient Adaptive Adjustment (GAA) adversarial attack method for 3D point cloud recognition is proposed. The framework (Fig. 2) includes three coordinated modules. The 3D Point Cloud Salient Region Extraction module evaluates decision-level vulnerability using Shapley value analysis to identify and rank point subsets with the strongest influence on classifier output. Perturbations are then concentrated in these sensitive regions. A Curvature-Weighted Gradient Mechanism integrates local geometric priors. For each point in the salient region, a local covariance matrix is computed from its k-nearest neighbors. Principal component analysis generates eigenvalues and eigenvectors, which are used to compute a curvature measure. A Gaussian kernel function produces curvature-dependent weights that are applied to backpropagated gradients. This suppresses perturbations in high-curvature areas and encourages them in low-curvature regions to preserve local shape morphology. A Principal Curvature Direction Constrained Optimization module further refines the perturbation direction. The weighted gradient is projected onto the principal curvature directions, and the projection components are fused using coefficients derived from the corresponding eigenvalues. This aligns the perturbation with natural geometric trends and avoids unnatural deformation. An Adaptive Optimization Algorithm then minimizes a multi-objective loss balancing attack success, geometric similarity (via Chamfer Distance and Hausdorff Distance), and perturbation sparsity. The adversarial point cloud is iteratively updated based on the saliency map, curvature-weighted gradients, and principal direction constraints. Results and Discussions Experiments on ModelNet40, ShapeNetPart, and KITTI were conducted using PointNet, DGCNN, and PointConv. The GAA method showed strong performance. On ModelNet40 with PointNet, it achieved a 97.69% ASR with an average of 28 perturbed points, outperforming ten baselines such as AL-Adv (92.92% ASR, 40 points) and Kim et al. (89.38% ASR, 36 points) (Table 1). It also produced lower geometric distortion, as indicated by smaller Chamfer Distance and Hausdorff Distance values. Visual results (Fig. 4) show that GAA produces fewer outliers and more natural adversarial point clouds compared with methods such as AL-Adv. The method generalized well across architectures, reaching 99.78% ASR on DGCNN and 96.91% on PointConv (Table 2), with similar performance on ShapeNetPart (Table 3). Ablation experiments on the number of salient regions (K) showed consistent improvements in ASR and reduced geometric distortion as K increased from 1 to 6 (Table 4, Fig. 5), confirming the advantage of targeting multiple critical regions. Tests on the KITTI dataset demonstrated strong performance in real-world, noisy environments. The method maintained high ASRs, such as 99.33% on PointNet, with limited perturbations (Table 5). An ablation study on K indicated that K=4 offers an effective balance between success rate and perturbation cost for PointNet (Table 6). Conclusions This study presents a GAA method for adversarial attacks on 3D point cloud recognition. By combining a Shapley value-based saliency analyzer, a curvature-weighted gradient mechanism, and a principal curvature direction constraint, the method generates adversarial examples that achieve high attack success while preserving geometric consistency. Experiments show that GAA minimizes perceptual distortion and perturbs fewer points across datasets and models. The method provides a practical tool for vulnerability analysis and supports the development of more robust and secure optoelectronic perception systems for intelligent driving. Future work will examine robustness under adverse conditions and assess physical-world implications.

Wavelet Transform and Attentional Dual-Path EEG Model for Virtual Reality Motion Sickness Detection

CHEN Yuechi, HUA Chengcheng, DAI Zhian, FU Jingqi, ZHU Min, WANG Qiuyu, YAN Ying, LIU Jia

doi: 10.11999/JEIT251233

[Abstract](72) [FullText HTML](29) [PDF 4643KB](10)

Abstract:
Objective Virtual Reality Motion Sickness (VRMS) presents a barrier to the wider adoption of immersive Virtual Reality (VR). It is primarily caused by sensory conflict between the vestibular and visual systems. Existing assessments rely on subjective reports that disrupt immersion and do not provide real-time measurements. An objective detection method is therefore needed. This study proposes a dual-path fusion model, the Wavelet Transform ATtentional Network (WTATNet), which integrates wavelet transform and attention mechanisms. WTATNet is designed to classify resting-state ElectroEncephaloGraph (EEG) signals collected before and after VR motion stimulus exposure to support VRMS detection and research on the mechanisms and mitigation strategies. Methods WTATNet contains two parallel pathways for EEG feature extraction. The first applies a Two-Dimensional Discrete Wavelet Transform (2D-DWT) to both the time and electrode dimensions of the EEG, reshaping the signal into a two-dimensional matrix based on the spatial layout of the scalp electrodes in horizontal or vertical form. This decomposition captures multi-scale spatiotemporal features, which are then processed using Convolutional Neural Network (CNN) layers. The second pathway applies a one-dimensional CNN for initial filtering followed by a dual-attention structure consisting of a channel attention module and an electrode attention module. These modules recalibrate the importance of features across channels and electrodes to emphasize task-relevant information. Features from both pathways are fused and passed through fully connected layers to classify EEGs into pre-exposure (non-VRMS) and post-exposure (VRMS) states based on subjective questionnaire validation. EEG data were collected from 22 subjects exposed to VRMS using the game “Ultrawings2.” Ten-fold cross-validation was used for training and evaluation with accuracy, precision, recall, and F1-score as metrics. Results and Discussions WTATNet achieved high VRMS-related EEG classification performance, with an average accuracy of 98.39%, F1-score of 98.39%, precision of 98.38%, and recall of 98.40%. It outperformed classical and state-of-the-art EEG models, including ShallowConvNet, EEGNet, Conformer, and FBCNet (Table 2). Ablation experiments (Tables 3 and 4) showed that removing the wavelet transform path, the electrode attention module, or the channel attention module reduced accuracy by 1.78%, 1.36%, and 1.01%, respectively. The 2D-DWT performed better than the one-dimensional DWT, supporting the value of joint spatiotemporal analysis. Experiments with randomized electrode ordering (Table 5) produced lower accuracy than spatially coherent layouts, indicating that 2D-DWT leverages inherent spatial correlations among electrodes. Feature visualizations using t-SNE (Figures 5 and 6) showed that WTATNet produced more discriminative features than baseline and ablated variants. Conclusions The dual-path WTATNet model integrates wavelet transform and attention mechanisms to achieve accurate VRMS detection using resting-state EEG. Its design combines interpretable, multi-scale spatiotemporal features from 2D-DWT with adaptive channel-level and electrode-level weighting. The experimental results confirm state-of-the-art performance and show that WTATNet offers an objective, robust, and non-intrusive VRMS detection method. It provides a technical foundation for studies on VRMS neural mechanisms and countermeasure development. WTATNet also shows potential for generalization to other EEG decoding tasks in neuroscience and clinical research.

Performance Analysis and Rapid Prediction of Long-range Underwater Acoustic Communications in Uncertain Deep-sea Environments

CHEN Xiangmei, TAI Yupeng, WANG Haibin, HU Chenghao, WANG Jun, WANG diya

doi: 10.11999/JEIT251244

[Abstract](53) [FullText HTML](21) [PDF 5350KB](4)

Abstract:
Objective In complex and dynamically changing deep-sea environments, the performance of underwater acoustic communications shows substantial variability. Feedback-based channel estimation and parameter adaptation are impractical in long-range scenarios because platform constraints prevent reliable feedback channels and the slow propagation of sound introduces significant delay. In typical long-range systems, environmental dynamics are often ignored and communication parameters are selected heuristically, which frequently leads to mismatches with actual channel conditions and causes communication failures or reduced efficiency. Predictive methods able to assess performance in advance and support feed-forward parameter adjustment are therefore required. This study proposes a deep-learning-based framework for performance analysis and rapid prediction of long-range underwater acoustic communications under uncertain environmental conditions to enable efficient and reliable parameter–channel matching without feedback. Methods A feed-forward method for underwater acoustic communication performance analysis and rapid prediction is developed using deep-learning-based sound-field uncertainty estimation. A neural network is first used to estimate probability distributions of Transmission Loss (TL PDFs) at the receiver under dynamic environments. TL PDFs are then mapped to probability distributions of the Signal-to-Noise Ratio (SNR PDFs), enabling communication performance evaluation without real-time feedback. Statistical channel capacity and outage capacity are analyzed to characterize the theoretical upper limits of achievable rates in dynamic conditions. Finally, by integrating the SNR distribution with the bit-error-rate characteristics of a representative deep-sea single-carrier communication system under the corresponding channel, a rate–reliability prediction model is constructed. This model estimates the probability of reliable communication at different data rates and serves as a practical tool for forecasting link performance in highly dynamic and feedback-limited underwater acoustic environments. Results and Discussions The method is validated using simulation data and sea trial data. The TL PDFs predicted by the deep learning model show strong consistency with the traditional Monte Carlo (MC) method across multiple receiver locations (Fig. 6). Under identical computational settings, deep-learning-based TL PDF prediction reduces computation time by 2–3 orders of magnitude compared with the MC method. The chained mapping from TL PDFs to SNR PDFs and then to channel capacity metrics accurately represents the probabilistic features of communication performance under uncertain conditions (Fig. 7 and Fig. 8). The rate–reliability curves derived from the deep-learning-based TL PDFs are highly consistent with MC-based results. In the high sound-intensity region, prediction errors for reliable communication probabilities across data rates range from 0.1% to 3%, and in the low sound-intensity region errors are approximately 0.3% to 5% (Fig. 12). Sea trial results further indicate that predicted rate–reliability performance agrees well with measured data. In the convergence zone, deviations between predicted and measured reliability probabilities at each rate range from 0.9% to 4%, and in the shadow zone from 1% to 9% (Fig. 18). Under a 90% reliability requirement, the maximum achievable rates predicted by the method match the measurements in both the convergence and shadow zones, demonstrating accuracy and practical applicability in complex channel environments. Conclusions A deep-learning-based framework for performance analysis and rapid prediction of long-range underwater acoustic communications in uncertain deep-sea environments is developed and validated. The framework builds a chained mapping from environmental parameters to TL PDFs, SNR PDFs, and communication performance metrics, enabling quantitative capacity assessment under dynamic ocean conditions. Predictive “rate–reliability’’ profiles are obtained by integrating probabilistic propagation characteristics with the performance of a representative deep-sea single-carrier system under the corresponding channel, providing guidance for parameter selection without feedback. Sea trial results confirm strong agreement between predicted and measured performance. The proposed approach offers a technical pathway for feed-forward performance analysis and dynamic adaptation in long-range deep-sea communication systems, and can be extended to other communication scenarios in dynamic ocean environments.

Physical Layer Key Generation Method for Integrated Sensing and Communication Systems

LIU Kexin, HUANG Kaizhi, PEI Xinglong, JIN Liang, CHEN Yajun

doi: 10.11999/JEIT251034

[Abstract](126) [FullText HTML](48) [PDF 2090KB](31)

Abstract:
Objective Integrated Sensing And Communication (ISAC) has become a central technology in Sixth-Generation (6G) wireless networks, enabling simultaneous data transmission and environmental sensing. However, the characteristics of ISAC systems, including highly directional sensing signals and the risk of sensitive information leakage to malicious sensing targets, create specific security challenges. Physical layer security provides lightweight methods to enhance confidentiality. In secure transmission, approaches such as artificial noise injection and beamforming can partially improve secrecy, although they may reduce sensing accuracy or communication efficiency. Their effect also depends on the quality advantage of legitimate channels over eavesdropping channels. For Physical Layer Key Generation (PLKG), existing work has only demonstrated basic feasibility. Most current schemes adopt a radar-centric design, which limits compatibility with communication protocols and restricts key generation rates. This paper proposes a PLKG method tailored for ISAC systems. It aims to maximize the Sum Key Generation Rate (SKGR) under sensing accuracy constraints through a Twin Delayed Deep Deterministic policy gradient (TD3)-based joint communication and sensing beamforming algorithm, thereby improving the security performance of ISAC systems. Methods A MIMO ISAC system is considered, where a base station (Alice) equipped with multiple antennas communicates with single-antenna users (Bobs) and senses a malicious target (Eve). The system operates under a TDD protocol to leverage channel reciprocity. A PLKG protocol designed for ISAC systems is developed, including channel estimation, joint communication and sensing beamforming, and key generation. The SKGR is derived in closed form, and sensing accuracy is evaluated using the Cramér-Rao Bound (CRB). To maximize the SKGR under CRB constraints, a non-convex optimization problem for the joint design of communication and sensing beamforming matrices is formulated. Given its NP-hardness, an algorithm based on TD3 is proposed. TD3 employs dual critic networks to reduce overestimation, delayed policy updates to enhance stability, and target policy smoothing to improve robustness. The state includes channel state information, the actions correspond to beamforming matrices, and the reward function combines SKGR, CRB, and power constraints. Results and Discussions Simulation results confirm the effectiveness of the proposed design. The TD3-based algorithm achieves a stable SKGR of 18.5 bits/channel use after training (Fig. 4), outperforming benchmark schemes such as Deep Deterministic Policy Gradient (DDPG), greedy search, and random algorithms. The SKGR increases monotonically with transmit power because of reduced noise interference (Fig. 5). Increasing the number of antennas also improves SKGR, although the gain diminishes as power per antenna decreases. The scheme maintains stable SKGR across different distances to the eavesdropper (Fig. 6), demonstrating the robustness of PLKG against eavesdropping attacks. The proposed algorithm manages the complex optimization problem effectively and adapts to dynamic system conditions, offering a practical approach for secure ISAC systems. Conclusions This paper presents a PLKG method for ISAC systems. The proposed protocol generates consistent keys between the base station and communication users. The SKGR maximization problem with sensing constraints is solved using a TD3-based algorithm that jointly optimizes communication and sensing beamforming matrices. Simulation results show that the method outperforms benchmark schemes, with significant gains in SKGR and adaptability to system conditions. The study establishes a basis for integrating PLKG into ISAC to strengthen security without reducing sensing performance. Future work will examine real-time implementation and scalability in large networks.

Depression Screening Method Driven by Global-Local Feature Fusion

ZHANG Siyong, QIU Jiefan, ZHAO Xiangyun, XIAO Kejiang, CHEN Xiaofu, MAO Keji

doi: 10.11999/JEIT250035

[Abstract](397) [FullText HTML](259) [PDF 3275KB](27)

Abstract:
Objective Depression is a globally prevalent mental disorder that poses a serious threat to the physical and mental health of millions of individuals. Early screening and diagnosis are essential to reducing severe consequences such as self-harm and suicide. However, conventional questionnaire-based screening methods are limited by their dependence on the reliability of respondents’ answers, their difficulty in balancing efficiency with accuracy, and the uneven distribution of medical resources. New auxiliary screening approaches are therefore needed. Existing Artificial Intelligence (AI) methods for depression detection based on facial features primarily emphasize global expressions and often overlook subtle local cues such as eye features. Their performance also declines in scenarios where partial facial information is obscured, for instance by masks, and they raise privacy concerns. This study proposes a Global-Local Fusion Axial Network (GLFAN) for depression screening. By jointly extracting global facial and local eye features, this approach enhances screening accuracy and robustness under complex conditions. A corresponding dataset is constructed, and experimental evaluations are conducted to validate the method’s effectiveness. The model is deployed on edge devices to improve privacy protection while maintaining screening efficiency, offering a more objective, accurate, efficient, and secure depression screening solution that contributes to mitigating global mental health challenges. Methods To address the challenges of accuracy and efficiency in depression screening, this study proposes GLFAN. For long-duration consultation videos with partial occlusions such as masks, data preprocessing is performed using OpenFace 2.0 and facial keypoint algorithms, combined with peak detection, clustering, and centroid search strategies to segment the videos into short sequences capturing dynamic facial changes, thereby enhancing data validity. At the model level, GLFAN adopts a dual-branch parallel architecture to extract global facial and local eye features simultaneously. The global branch uses MTCNN for facial keypoint detection and enhances feature extraction under occlusion using an inverted bottleneck structure. The local branch detects eye regions via YOLO v7 and extracts eye movement features using a ResNet-18 network integrated with a convolutional attention module. Following dual-branch feature fusion, an integrated convolutional module optimizes the representation, and classification is performed using an axial attention network. Results and Discussions The performance of GLFAN is evaluated through comprehensive, multi-dimensional experiments. On the self-constructed depression dataset, high accuracy is achieved in binary classification tasks, and non-depression and severe depression categories are accurately distinguished in four-class classification. Under mask-occluded conditions, a precision of 0.72 and a precision of 0.690 are obtained for depression detection. Although these values are lower than the precision of 0.87 and precision of 0.840 observed under non-occluded conditions, reliable screening performance is maintained. Compared with other advanced methods, GLFAN achieves higher recall and F1 scores. On the public AVEC2013 and AVEC2014 datasets, the model achieves lower Mean Absolute Error (MAE) values and shows advantages in both short- and long-sequence video processing. Heatmap visualizations indicate that GLFAN dynamically adjusts its attention according to the degree of facial occlusion, demonstrating stronger adaptability than ResNet-50. Edge device tests further confirm that the average processing delay remains below 17.56 milliseconds per frame, and stable performance is maintained under low-bandwidth conditionsThe performance of GLFAN is evaluated through comprehensive, multi-dimensional experiments. On the self-constructed depression dataset, high accuracy is achieved in binary classification tasks, and non-depression and severe depression categories are accurately distinguished in four-class classification. Under mask-occluded conditions, a precision of 0.72 and a recall of 0.690 are obtained for depression detection. Although these values are lower than the precision of 0.87 and recall of 0.840 observed under non-occluded conditions, reliable screening performance is maintained. Compared with other advanced methods, GLFAN achieves higher recall and F1 scores. On the public AVEC2013 and AVEC2014 datasets, the model achieves lower Mean Absolute Error (MAE) values and shows advantages in both short- and long-sequence video processing. Heatmap visualizations indicate that GLFAN dynamically adjusts its attention according to the degree of facial occlusion, demonstrating stronger adaptability than ResNet-50. Edge device tests further confirm that the average processing delay remains below 17.56 frame/s, and stable performance is maintained under low-bandwidth conditions. Conclusions This study proposes a depression screening approach based on edge vision technology. A lightweight, end-to-end GLFAN is developed to address the limitations of existing screening methods. The model integrates global facial features extracted via MTCNN with local eye-region features captured by YOLO v7, followed by effective feature fusion and classification using an Axial Transformer module. By emphasizing local eye-region information, GLFAN enhances performance in occluded scenarios such as mask-wearing. Experimental validation using both self-constructed and public datasets demonstrates that GLFAN reduces missed detections and improves adaptability to short-duration video inputs compared with existing models. Grad-CAM visualizations further reveal that GLFAN prioritizes eye-region features under occluded conditions and shifts focus to global facial features when full facial information is available, confirming its context-specific adaptability. The model has been successfully deployed on edge devices, offering a lightweight, efficient, and privacy-conscious solution for real-time depression screening.

Communication, Computation, and Caching Resource Collaboration for Heterogeneous AIGC Service Provisioning

WU Mengru, GAO Yu, ZHAO Bo, XU Bo, SUN Hao, GUO Lei

doi: 10.11999/JEIT251300

[Abstract](17) [FullText HTML](7) [PDF 4126KB](3)

Abstract:
Objective In the artificial intelligence of things (AIoT), edge servers (ESs) can provide intelligent content generation services to AIoT devices by utilizing their cached AI-generated content (AIGC) models. However, the limited computing resources and caching capacity of ESs make it difficult to support the large-scale caching demands of heterogeneous AIGC services. To address this issue, this paper proposes a communication, computation, and caching resource collaboration scheme that leverages a combined cloud-edge and edge-edge collaborative framework. This scheme focuses on three representative AIGC services, including lightweight AIGC services, computation-intensive AIGC services, and preprocessing-based AIGC services. Furthermore, the proposed approach aims to minimize the total AIGC service latency by jointly optimizing transmit power, computing resource allocation, model caching strategies, and offloading decisions. Methods This paper investigates communication, computation, and caching resource collaboration for supporting heterogeneous AIGC services. First, an AIGC service-oriented AIoT system model is proposed to incorporate both cloud-edge and edge-edge collaboration. Subsequently, an optimization problem is formulated with the objective of minimizing the total latency of AIGC services by jointly optimizing transmit power, computing resource allocation, model caching strategies, and offloading decisions. Since the formulated problem is non-convex, an alternating optimization (AO) algorithm is proposed, which decomposes the problem into three subproblems that are solved using the successive convex approximation (SCA) method, Karush-Kuhn-Tucker (KKT) conditions, and an improved Harris Hawks Optimization (HHO) algorithm, respectively. Results and Discussions In the simulations, the proposed joint optimization scheme is compared to three baselines, including particle swarm optimization (PSO), fixed resource allocation, and random offloading and caching. First, the convergence of the proposed AO algorithm is verified (Fig. 2). The results demonstrate that the algorithm achieves rapid convergence within a limited number of iterations across different sub-problems. Second, increasing the transmission bandwidth leads to a significant reduction in the total AIGC service latency (Fig. 3). This is because each device can occupy more bandwidth resources to send tasks. Similarly, the ES can allocate more bandwidth to send generated content in the downlink. Furthermore, the total AIGC service latency decreases with the ES’s storage capacity for all the schemes (Fig. 4). This is because an increase in storage capacity allows the ES to store more AIGC models, thus reducing the transmission delay between the ES and the cloud server. Additionally, as the required floating point operations per bit increase, the total AIGC service latency exhibits a significant upward trend across all schemes (Fig. 5). Finally, the total AIGC service latency for all schemes decreases as the BS’s maximum transmit power increases (Fig. 6). This trend is attributed to the fact that the improvement of the BS’s maximum transmit power strengthens the downlink signal-to-noise ratio, which improves the downlink transmission rate, thereby leading to a reduction in the total AIGC service latency. However, the proposed scheme mitigates this increase more effectively than the baselines, demonstrating its robustness in handling computationally demanding AIGC tasks. In conclusion, these simulation results confirm that, compared to baselines, the proposed schemes significantly minimize the total AIGC service latency. Conclusions This paper investigates communication, computation, and caching resource collaboration for supporting heterogeneous AIGC services. Our objective is to minimize the total latency of AIGC services by jointly optimizing the transmit power of AIoT devices and base stations, computing resource allocation, AIGC model deployment, and service offloading decisions, subject to computation and caching resource constraints. Since the formulated problem is a mixed-integer non-linear programming problem, an efficient AO algorithm is designed. This algorithm decomposes the original optimization problem into three sub-problems, which are solved via the SCA algorithm, KKT conditions, and the HHO algorithm, respectively. Simulation results demonstrate that the proposed algorithm can reduce the total AIGC service latency compared to baselines.

CaRS-Align: Channel Relation Spectra Alignment for Cross-Modal Vehicle Re-identification

SA Baihui, ZHUANG Jingyi, ZHENG Jinjie, ZHU Jianqing

doi: 10.11999/JEIT250917

[Abstract](65) [FullText HTML](37) [PDF 7391KB](8)

Abstract:
Objective Visible and infrared images are two commonly used modalities in intelligent transportation scenarios and play a key role in vehicle re-identification. However, differences in imaging mechanisms and spectral responses lead to inconsistent visual characteristics between these modalities, which limits cross-modal vehicle re-identification. To address this problem, this paper proposes a Channel Relation Spectra Alignment (CaRS-Align) method that uses channel relation spectra, rather than channel-wise features, as the alignment target. This strategy reduces interference caused by imaging style differences at the relational-structure level. Within each modality, a channel relation spectrum is constructed to capture stable and semantically coordinated channel-to-channel relationships through correlation modeling. At the cross-modal level, the correlation between the corresponding channel relation spectra of the two modalities is maximized to achieve consistent alignment of relational structures. Experiments on the public MSVR310 and RGBN300 datasets show that CaRS-Align outperforms existing state-of-the-art methods. For example, on MSVR310, under infrared-to-visible retrieval, CaRS-Align achieves a Rank-1 accuracy of 64.35%, which is 2.58% higher than advanced existing methods. Methods CaRS-Align adopts a hierarchical optimization paradigm: (1) for each modality, a channel–channel relation spectrum is constructed by mining inter-channel dependencies, yielding a semantically coordinated relation matrix that preserves the organizational structure of semantic cues; (2) cross-modal consistency is achieved by maximizing the correlation between the relation spectra of the two modalities, enabling progressive optimization from intra-modal construction to cross-modal alignment; and (3) relation spectrum alignment is integrated with standard classification and retrieval objectives commonly used in re-identification to supervise backbone training for the vehicle re-identification model. Results and Discussions Compared with several state-of-the-art cross-modal re-identification methods on the RGBN300 and MSVR310 datasets, CaRS-Align demonstrates strong performance and achieves best or second-best results across both retrieval modes. As shown in (Table 1), on RGBN300 it attains 75.09% Rank-1 accuracy and 55.45% mean Average Precision (mAP) in the infrared-to-visible mode, and 76.60% Rank-1 accuracy and 56.12% mAP in the visible-to-infrared mode. As shown in (Table 2), similar advantages are observed on MSVR310, with 64.54% Rank-1 accuracy and 41.25% mAP in the visible-to-infrared mode, and 64.35% Rank-1 accuracy and 40.99% mAP in the infrared-to-visible mode. (Fig. 4) presents Top-10 retrieval results, where CaRS-Align reduces identity mismatches in both directions (Fig. 5) illustrates feature distance distributions, showing substantial overlap between intra-class and inter-class distances without CaRS-Align (Fig. 5(a)), whereas clearer separation is observed with CaRS-Align (Fig. 5(b)), confirming improved feature discrimination. These results indicate that modeling channel-level relational structures improves both retrieval modes, increases adaptability to modality shifts, and effectively reduces mismatches caused by cross-modal differences. Conclusions This paper proposes a visible–infrared cross-modal vehicle re-identification method based on CaRS-Align. Within each modality, a channel relation spectrum is constructed to preserve semantic co-occurrence structures. A CaRS-Align function is then designed to maximize the correlation between modalities, thereby achieving consistent alignment and improving cross-modal performance. Experiments on the MSVR310 and RGBN300 datasets demonstrate that CaRS-Align outperforms existing state-of-the-art methods in key metrics, including Rank-1 accuracy and mAP.

Routing and Resource Scheduling Algorithm Driven by Mixture of Experts in Large-scale Heterogeneous Local Power Communication Network

JING Chuanfang, ZHU Xiaorong

doi: 10.11999/JEIT251176

[Abstract](36) [FullText HTML](14) [PDF 2178KB](3)

Abstract:
Objective Emerging power services, such as distributed energy consumption, impose more stringent requirements on the performance of large-scale heterogeneous local power communication networks (LHLPCNs). Given the limited communication resources and rising service demands, providing on-demand services and enhancing network capacity while guaranteeing Quality of Service (QoS) presents a major challenge for LHLPCNs. Conventional routing and resource scheduling algorithms based on optimization or heuristics depend on precise mathematical models and parameters. As network scales and optimization variables increase, these algorithms become computationally expensive, hindering their effective adaptation to the growing variety of power application scenarios. Recent advances in mixture of experts (MoE) frameworks offer a promising solution, which greatly reduces the need to train individual task-specific model by employing an ensemble of AI models as specialized experts. Motivated by these challenges and the potential of MoE, this paper proposes a MoE-based routing and resource scheduling algorithm (RASMoE) tailored for LHLPCNs integrating High Power Line Carrier (HPLC) and Radio Frequency (RF). RASMoE can efficiently meet the personalized QoS requirements of diverse services and accommodate more power services within limited resources. Methods Firstly, considering the multi-modal links, channels and data modulation methods, the optimization problem of minimizing the difference between QoS supply and demand in LHLPCNs is established, which conforms to a 0-1 integer linear programming model. Then, to solve this NP-hard problem, a novel MOE framework comprising expert networks and gated networks is designed. This framework is capable of meeting the personalized demands of diverse services in terms of data transmission rate, delay and reliability, while achieving faster convergence. The expert networks, which include both shared and QoS-specific experts, are responsible for generating the optimal next hop and computing the efficient allocation strategies of links, channels and data modulation modes between node pairs. Meanwhile, the gated networks dynamically combine and reuse these experts to efficiently accommodate both known and unforeseen service types. Finally, extensive comparative experiments validate the effectiveness of the proposed algorithm. Compared with many baselines, RASMoE shows better performance in terms of resource utilization, delay and Reliability. Results and Discussions The difference between the performance supply and demand of five algorithms under varying service numbers is compared ( Fig. 3 ) . Simulation results show that RASMoE consistently exhibits the smallest performance supply-demand differences across all scenarios. This advantage stems from its gating network, which dynamically combines QoS-specific experts to precisely match resource allocation with service requirements. Given that control and computing-intensive services have strict delay requirements, the average end-to-end (E2E) latency of these two service types under different service numbers is compared ( Fig. 4 ) . It can be observed that the proposed algorithm achieves the lowest average E2E latency. This is because its expert networks, enhanced by Graph Attention Networks (GATs), efficiently extract node load states and interact with the network environment in real-time via a Multi-Armed Bandit (MAB) mechanism. This enables RASMoE to learn adaptive resource allocation strategies. Moreover, the average reliability of the E2E paths by the five algorithms for different numbers of control, compute-intensive, and acquisition services is illustrated (Fig. 5). Conclusions This paper proposes a MoE-driven routing and resource scheduling algorithm for LHLPCNs. The proposed framework comprises two core components: expert networks and a gating network. The expert networks include shared experts based on GATs and service QoS-specific experts based on MAB. The former are responsible for E2E path selection by analyzing node characteristics, while the latter focuses on adaptively allocating and scheduling links, channels, and modulation schemes according to distinct QoS requirements and link conditions. The gated networks dynamically orchestrate and reuse these expert models to efficiently serve services with single or multiple QoS demands, including previously unseen service types. Theoretical analysis validates that the proposed method enhances resource utilization of LHLCPNs, with its advantages being particularly pronounced in multi-service scenarios characterized by diverse QoS requirements. Future work will explore the integration of the MoE framework with domain-specific models (e.g., for power load forecasting) and predictive analytics, aiming to optimize the integration and utilization of renewable energy sources, such as wind and solar power.

Vision-Guided and Force-Controlled Method for Robotic Screw Assembly

ZHANG Chunyun, MENG Xintong, TAO Tao, ZHOU Huaidong

doi: 10.11999/JEIT251193

[Abstract](30) [FullText HTML](16) [PDF 6092KB](2)

Abstract:
Objective With the rapid development of intelligent manufacturing and industrial automation, robots have been increasingly applied to high-precision assembly tasks, especially in screw assembly. However, existing assembly systems still face multiple challenges. First, the pose of assembly objects is often uncertain, making initial localization difficult. Second, small features such as threaded holes are blurred and hard to identify accurately. Third, traditional vision-based open-loop control may lead to assembly deviation or jamming. To address these issues, this study proposes a vision–force cooperative method for robotic screw assembly. The method builds a closed-loop assembly system that covers both coarse positioning and fine alignment. A semantic-enhanced 6D pose estimation algorithm and a lightweight hole detection model are used to improve perception accuracy. Force feedback control is then applied to adjust the end-effector posture dynamically. The proposed approach improves the accuracy and stability of screw assembly. Methods The proposed screw assembly method is built on a vision–force cooperative strategy, forming a closed-loop process. In the visual perception stage, a semantic-enhanced 6D pose estimation algorithm is applied to handle disturbances and pose uncertainty in complex industrial environments. During initial pose estimation, Grounding DINO and SAM2 jointly generate pixel-level masks to provide semantic priors for the FoundationPose module. In the continuous tracking stage, semantic constraints from Grounding DINO are used for translational correction. For detecting small threaded holes, an improved lightweight hole detection algorithm based on NanoDet is designed. It uses MobileNetV3 as the backbone and adds a CircleRefine module in the detection head to regress the hole center precisely. In the assembly positioning stage, a hierarchical vision-guided strategy is used. The global camera conducts coarse positioning to provide overall guidance. The hand-eye camera then performs local correction based on hole detection results. In the closed-loop assembly stage, force feedback is applied for posture adjustment, achieving precise alignment between the screw and the threaded hole. Results and Discussions The proposed method is experimentally validated in robotic screw assembly scenarios. The improved 6D pose estimation algorithm reduces the average position error by 18% and the orientation error by 11.7% compared with the baseline (Tbl.1). The tracking success rate in dynamic sequences increases from 72% to 85% (Tbl.2). For threaded hole detection, the lightweight algorithm based on the improved NanoDet is evaluated using a dataset collected from assembly scenarios. The algorithm achieves 98.3% precision, 99.2% recall and 98.7% mAP on the test set (Tbl.3). The model size is only 11.7 MB and the computation cost is 2.9 GFLOPS. This remains lower than most benchmark models while maintaining high accuracy. A circular branch is introduced to fit hole edges (Fig.8), providing accurate center for visual guidance. Under various inclination angles (Fig.10), the assembly success rate stays above 91.6% (Tbl.4). For screws of different sizes (M4, M6 and M8), the success rate remains higher than 90% (Tbl.5). Under small external disturbances (Fig.12), the success rates reach 93.3%, 90% and 83.3% for translational, rotational and mixed disturbances, respectively (Tbl.6). Force-feedback comparison experiments show that the assembly success rate is 66.7% under visual guidance alone. When force feedback is added, the success rate increases to 96.7% (Tbl.7). The system demonstrated stable performance throughout a screw assembly cycle, achieving an average total cycle time of 9.53 seconds (Tbl.8), thereby meeting industrial assembly requirements. Conclusions This study proposes a vision and force control cooperative method to address several challenges in robotic screw assembly. The approach improves target localization accuracy through a semantics-enhanced 6D pose estimation algorithm and a lightweight threaded-hole detection network. By integrating a hierarchical vision-guided strategy with force-feedback control, precise alignment between the screw and the threaded hole is achieved. Experimental results demonstrate that the proposed method ensures reliable assembly under various conditions, providing a feasible solution for intelligent robotic assembly. Future work will focus on adaptive force control, multimodal perception fusion and intelligent task planning to further enhance the system’s generalization and self-optimization capabilities in complex industrial environments.

Lightweight Dual Convolutional Finger Vein Recognition Network Based on Attention Mechanism

ZHAO Bingyan, LIANG Yihuai, ZHANG Zhongxia, ZHANG Wenzheng

doi: 10.11999/JEIT250380

[Abstract](114) [FullText HTML](50) [PDF 2450KB](17)

Abstract:
Objective Finger vein recognition is an emerging biometric authentication technology valued for its physiological uniqueness and advantages in in vivo detection. However, mainstream deep learning recognition frameworks still face two challenges. High-precision recognition often depends on complex network structures, which increase parameter counts and hinder deployment in memory-limited embedded devices and edge scenarios with constrained computing resources. Model compression can reduce computational cost but often weakens feature representation, creating a conflict between recognition accuracy and efficiency. To address these issues, a lightweight dual convolutional model integrated with an attention mechanism is proposed. A parallel heterogeneous convolution module and an attention guidance mechanism are designed to extract diverse image features and improve recognition accuracy while preserving a lightweight network structure. Methods The proposed architecture adopts a three-level collaborative mechanism comprising feature extraction, dynamic calibration, and decision fusion. A dual convolutional feature extraction module is constructed using normalized ROI images. This module combines heterogeneous convolution kernels. Rectangular convolution branches with different shapes capture venous topological structures and diameter orientations, whereas square convolution branches employ stacked square kernels to extract local texture details and background intensity distributions. These branches operate in parallel with reduced channel numbers and generate complementary responses through kernel shape diversity. This design reduces parameter scale while improving feature discrimination. A parallel dual attention mechanism is then applied to achieve two-dimensional calibration through joint optimization of channel attention and spatial attention. Channel attention adaptively assigns weights to enhance discriminative venous texture features, whereas spatial attention constructs pixel-level dependency models that focus on effective discriminative regions. A parallel concatenation fusion strategy preserves structural information without introducing additional parameters and improves sensitivity to critical features. Finally, a three-level progressive feature optimization structure is implemented. A convolutional compression module with stride 2 nests multi-scale receptive fields and progressively refines primary features during dimensionality reduction. Two fully connected layers then perform feature space transformation. The first layer applies ReLU activation to form sparse representations, and the final layer applies Softmax for probability calibration. This structure balances shallow underfitting and deep overfitting while maintaining efficient forward inference. Results and Discussions The effectiveness and robustness of the proposed network are evaluated on three public datasets, namely USM, HKPU, and SDUMLA. Recognition accuracy is assessed using the Acc metric. Experimental results (Table 1) show strong recognition performance. Feature visualization heatmaps (Fig. 4, Fig. 6) confirm that the network extracts complete and discriminative venous features. Training visualizations (Fig. 7, Fig. 8) show stable loss and accuracy trends, achieving 100% classification performance and demonstrating training reliability and robustness. Quantitative comparisons (Tables 2 and 3) indicate that the proposed method effectively addresses the trade-off between model complexity and classification performance and achieves superior results across all three datasets. Ablation studies (Table 4) further verify the effectiveness of the proposed modules and show significant improvements in finger vein recognition performance. Conclusions A lightweight dual convolutional neural network with an attention mechanism is proposed. The network consists of three core modules: a dual convolutional feature extraction module, a parallel dual-attention module, and a feature optimization classification module. During feature extraction, long-range venous features and background information are jointly encoded through a low-channel parallel design, which substantially reduces parameter counts while improving inter-individual discrimination. The attention module efficiently captures critical venous features without the parameter expansion commonly observed in conventional attention mechanisms. The feature optimization classification module applies progressive feature recalibration, which reduces underfitting and overfitting during stacked dimensionality reduction. Experimental results show recognition accuracies of 99.70%, 98.33%, and 98.27% on the USM, HKPU, and SDUMLA datasets, corresponding to an average improvement of 2.05% over existing state-of-the-art methods. Compared with representative lightweight finger vein recognition approaches, the proposed method reduces parameter scale by 11.35%～60.19%, achieving a balance between model lightening and performance improvement.

AutoPenGPT: Drift-Resistant Penetration Testing Driven by Search-Space Convergence and Dependency Modeling

HUANG Weigang, FU Lirong, LIU Peiyu, DU Linkang, YE Tong, XIA Yifan, WANG Wenhai

doi: 10.11999/JEIT250873

[Abstract](65) [FullText HTML](36) [PDF 3576KB](6)

Abstract:
Objective Industrial Control Systems (ICS) are widely deployed in critical sectors and often contain long-standing vulnerabilities due to strict availability requirements and limited patching opportunities. The increasing exposure of external management and access infrastructure has expanded the attack surface and allows adversaries to pivot from boundary components into fragile production networks. Continuous penetration testing of these components is essential but remains costly and difficult to scale when carried out manually. Recent work examines Large Language Models (LLMs) for automated penetration testing; however, existing systems often experience strategy drift and intention drift, which produce incoherent testing behaviors and ineffective exploitation chains. Methods This study proposes AutoPenGPT, a multi-agent framework for automated Web security testing. AutoPenGPT uses an adaptive exploration-space convergence mechanism that predicts likely vulnerability types from target semantics and constrains LLM-driven testing through a dynamically updated payload knowledge base. To reduce intention drift in multi-step exploitation, a dependency-driven strategy module rewrites historical feedback, models step dependencies, and generates coherent, executable strategies in a closed-loop workflow. A semi-structured prompt embedding scheme is also developed to support heterogeneous penetration testing tasks while preserving semantic integrity. Results and Discussions AutoPenGPT is evaluated on Capture-the-Flag (CTF) benchmarks and real-world ICS and Web platforms. On CTF datasets, it achieves 97.62% vulnerability-type detection accuracy and an 80.95% requirement completion rate, exceeding state-of-the-art tools by a wide margin. In real-world deployments, it reaches approximately 70% requirement completion and identifies six previously undisclosed vulnerabilities, demonstrating practical effectiveness. Conclusions The contributions are threefold. (1) Strategy drift and intention drift in LLM-driven penetration testing are examined and addressed through adaptive exploration and dependency-aware strategy mechanisms that stabilize long-horizon testing behaviors. (2) AutoPenGPT is designed and implemented as a multi-agent penetration testing system that integrates semantic vulnerability prediction, closed-loop strategy generation, and semi-structured prompt embedding. (3) Extensive evaluation on CTF and real-world ICS and Web platforms confirms the effectiveness and practicality of the system, including the discovery of previously unknown vulnerabilities.

Cross-modal Retrieval Enhanced Energy-efficient Multimodal Federated Learning in Wireless Networks

LIU Jingyuan, MA Ke, XU Runchen, CHANG Zheng

doi: 10.11999/JEIT251221

[Abstract](95) [FullText HTML](42) [PDF 3428KB](14)

Abstract:
Objective Multimodal Federated Learning (MFL) uses complementary information from multiple modalities, yet in wireless edge networks it is restricted by limited energy and frequent missing modalities because many clients store only images or only reports. This study presents Cross-modal Retrieval Enhanced Energy-efficient Multimodal Federated Learning (CREEMFL), which applies selective completion and joint communication–computation optimization to reduce training energy under latency and wireless constraints. Methods CREEMFL completes part of the incomplete samples by querying a public multimodal subset, and processes the remaining samples through zero padding. Each selected user downloads the global model, performs image-to-text or text-to-image retrieval, conducts local multimodal training, and uploads model updates for aggregation. An energy–delay model couples local computation and wireless communication and treats the required number of global rounds as a function of retrieval ratios. Based on this model, an energy minimization problem is formulated and solved using a two-layer algorithm with an outer search over retrieval ratios and an inner optimization of transmission time, Central Processing Unit (CPU) frequency, and transmit power. Results and Discussions Simulations on a single-cell wireless MFL system show that increasing the ratio of completing text from images improves test accuracy and reduces total energy. In contrast, a large ratio of completing images from text provides limited accuracy gain but increases energy consumption (Fig. 3, Fig. 4). Compared with four representative baselines, CREEMFL achieves shorter completion time and lower total energy across a wide range of maximum average transmit powers (Fig. 5, Fig. 6). For CREEMFL, increased system bandwidth further reduces completion time and energy consumption (Fig. 7, Fig. 8). Under different user modality compositions, CREEMFL also attains higher test accuracy than local training, zero padding, and cross-modal retrieval without energy optimization (Fig. 9). Conclusions CREEMFL integrates selective cross-modal retrieval and joint communication–computation optimization for energy-efficient MFL. By treating retrieval ratios as variables and modeling their effect on global convergence rounds, it captures the coupling between per-round costs and global training progress. Simulations verify that CREEMFL reduces training completion time and total energy while preserving classification accuracy in resource-constrained wireless edge networks.

Column

Cover
The 3rd Intelligent Aerospace Forum - Special Topic on Intelligent Processing and Application Technology of Satellite Information
Wireless Communication and Internet of Things
Radar, Sonar, Navigation and Array Signal Processing
Image and Intelligent Information Processing
Circuit and System Design
Dataset Papers

Select All

Display Method:

Cover

2026, 48(1).

[Abstract](60) [PDF 1720KB](8)

Abstract:

2025, 47(12): 1-6.

[Abstract](46) [FullText HTML](27) [PDF 312KB](8)

Abstract:

The 3rd Intelligent Aerospace Forum - Special Topic on Intelligent Processing and Application Technology of Satellite Information

A One-Shot Object Detection Method Fusing Dual-Branch Optimized SAM and Global-Local Collaborative Matching

FAN Shenghua, YIN Hang, LIU Jian, QU Tao

2025, 47(12): 4665-4676. doi: 10.11999/JEIT250982

[Abstract](155) [FullText HTML](60) [PDF 8844KB](19)

Abstract:
Objective DOS-GLNet targets high-precision object recognition and localization through hierarchical collaboration of model components, using only a single query image with novel category prototypes and a target image. The method follows a two-layer architecture consisting of feature extraction and matching interaction. In the feature extraction layer, the Segment Anything Model (SAM) is adopted as the base extractor and is fine-tuned using a dual-branch strategy. This strategy preserves SAM’s general visual and category-agnostic perception while enhancing local spatial detail representation. A multi-scale module is further incorporated to construct a feature pyramid and address the single-scale limitation of SAM. In the matching interaction layer, a global-local collaborative two-stage matching mechanism is designed. The Global Matching Module (GMM) performs coarse-grained semantic alignment by suppressing background responses and guiding the Region Proposal Network (RPN) to generate high-quality candidate regions. The Bidirectional Local Matching Module (BLMM) then establishes fine-grained spatial correspondence between candidate regions and the query image to capture part-level associations. Methods A detection network based on Dual-Branch Optimized SAM and Global-Local Collaborative Matching, termed DOS-GLNet, is proposed. The main contributions are as follows. (1) In the feature matching stage, a two-stage global-local matching mechanism is constructed. A GMM is embedded before the RPN to achieve robust matching of overall target features using a large receptive field. (2) A BLMM is embedded before the detection head to capture pixel-level, bidirectional fine-grained semantic correlation through a four-dimensional correlation tensor. This progressive matching strategy establishes cross-sample correlations in the feature space, optimizes feature representation, and improves object localization accuracy. Results and Discussions On the Pascal VOC dataset, the proposed method is compared with SiamRPN, which was originally developed for one-shot tracking and is adapted for detection due to task similarity, as well as OSCD, CoAE, AIT, BHRL, and BSPG. The results show that the proposed method outperforms all baseline approaches and achieves stronger overall one-shot detection performance. On the MS COCO dataset, comparative methods include SiamMask, CoAE, AIT, BHRL, and BSPG. Although base and novel class performance varies across different data splits, consistent trends are observed. DOS-GLNet matches state-of-the-art performance on base classes while maintaining strong accuracy on fully trained categories. It further achieves state-of-the-art results on novel classes, with an average improvement of approximately 2%. These results indicate more effective feature alignment and relationship modeling based on one-shot samples, as well as improved representation of novel class features under limited prior information. Conclusions Conclusions To improve feature optimization in one-shot object detection, enhancements are introduced at both the backbone network and the matching mechanism levels. A DOS-GLNet framework based on dual-branch optimized SAM and global-local collaborative matching is proposed. For the backbone, an SAM-based dual-branch fine-tuning feature extraction network is constructed. Lightweight adapters are integrated into the SAM encoder to enable parameter-efficient fine-tuning, preserving generalization capability while improving task adaptability. In parallel, a convolutional local branch is designed to strengthen local feature perception, and cross-layer fusion is applied to enhance local detail representation. A multi-scale module further increases the scale diversity of the feature pyramid. For feature matching, a two-stage global-local collaborative strategy is adopted. Global matching focuses on target-level semantic alignment, whereas local matching refines instance-level detail discrimination. Together, these designs effectively improve one-shot object detection performance.

A Distributed Multi-Satellite Collaborative Framework for Remote Sensing Scene Classification

JIN Jing, WANG Feng

2025, 47(12): 4677-4688. doi: 10.11999/JEIT250866

[Abstract](196) [FullText HTML](102) [PDF 2969KB](19)

Abstract:
Objective With the rapid development of space technologies, satellites generate large volumes of Remote Sensing (RS) data. Scene classification, a fundamental task in RS interpretation, is essential for earth observation applications. Although Deep Learning (DL) improves classification accuracy, most existing methods rely on centralized architectures. This design allows unified management but faces limited bandwidth, high latency, and privacy risks, which restrict scalability in multi-satellite settings. With increasing demand for distributed computation, Federated Learning (FL) has received growing attention in RS. Research on FL for RS scene classification, however, remains at an early stage. This study proposes a distributed collaborative framework for multi-satellite scene classification that applies efficient parameter aggregation to reduce communication overhead while preserving accuracy. Methods An FL-based framework is proposed for multi-satellite RS scene classification. Each satellite conducts local training while raw data remain stored locally to preserve privacy. Only updated model parameters are transmitted to a central server for global aggregation. The optimized global model is then broadcast to satellites to enable joint modeling and inference. To reduce the high communication cost of space-to-ground links, an inter-satellite communication mechanism is added. This design lowers communication overhead and strengthens scalability. The effect of parameter consensus on global convergence is theoretically analyzed, and an upper bound of convergence error is derived to provide a rigorous convergence guarantee and support practical applicability. Results and Discussions Comparative experiments are conducted on the UC-Merced and NWPU-RESISC45 datasets (Table 2, Table 3) to evaluate the proposed framework. The method consistently shows higher accuracy than centralized training, FedAvg, and FedProx under different client numbers and training ratios. On UC-Merced, Overall Accuracy (OA) reaches 96.68% at a 50% training ratio with 2 clients and rises to 97.49% at 80% with 10 clients. On NWPU-RESISC45, OA reaches 83.64% at 10% with 5 clients and 88.41% at 20% with 10 clients, both exceeding baseline methods. Confusion matrices (Fig. 4, Fig. 5) show clear diagonal dominance and only minor confusions, and t-SNE visualizations (Fig. 6) show compact intra-class clusters and well-separated inter-class distributions, indicating strong generalization even under lower training ratios. Communication energy analysis (Table 4) shows high efficiency. On UC-Merced with a 50% training ratio, the communication cost is 1.30 kJ, more than 60% lower than FedAvg and FedProx. On NWPU-RESISC45, substantial savings are also observed across all ratios. Conclusions This study proposes an FL-based framework for multi-satellite RS scene classification and addresses limitations of centralized training, including restricted bandwidth, high latency, and privacy concerns. By allowing satellites to conduct local training and applying central aggregation with inter-satellite consensus, the framework achieves collaborative modeling with high communication efficiency. Evaluations on UC-Merced and NWPU-RESISC45 verify the effectiveness of the method. On UC-Merced with an 80% training ratio and 10 clients, OA reaches 97.49%, higher than centralized training, FedAvg, and FedProx by 1.85%, 0.60%, and 0.81%, respectively. On NWPU-RESISC45 with a 20% training ratio, the communication energy cost is 5.88 kJ, showing reductions of 57.45% and 58.18% compared with FedAvg and FedProx. These results indicate strong generalization and efficiency across different data scales and training ratios. The framework is suited for bandwidth-limited and dynamic space environments and offers a promising direction for distributed RS applications. Future work will examine cross-task transfer learning to improve adaptability and generalization under multi-task and heterogeneous data conditions.

Few-Shot Remote Sensing Image Classification Based on Parameter-Efficient Vision Transformer and Multimodal Guidance

WEN Hongli, HU Qinghao, HUANG Liwei, WANG Peisong, CHENG Jian

2025, 47(12): 4689-4703. doi: 10.11999/JEIT250996

[Abstract](75) [FullText HTML](26) [PDF 5751KB](14)

Abstract:
Objective Remote sensing image classification is a core task in Earth observation. Its development is limited by the scarcity of high-quality labeled data. Few-shot learning provides a feasible solution. However, existing methods often suffer from limited feature representation, weak generalization to unseen classes, and high computational cost when adapting large models. These issues restrict their application in time-sensitive and resource-constrained scenarios. To address these challenges, this study proposes an Efficient Few-Shot Vision Transformer with Multimodal Guidance (EFS-ViT-MM). The objective is to construct an efficient and accurate classification framework by combining the strong representation capability of a pre-trained Vision Transformer with parameter-efficient fine-tuning. Discriminative capability is further enhanced by incorporating semantic information from textual descriptions to guide prediction. Methods The proposed EFS-ViT-MM framework is formulated as a metric-based learning system composed of three coordinated components. First, an Efficient Low-Rank Vision Transformer (ELR-ViT) is adopted as the visual backbone. A pre-trained Vision Transformer is used for feature extraction, whereas a low-rank adaptation strategy is applied for fine-tuning. The pre-trained parameters are frozen, and only a small number of injected low-rank matrices are optimized. This design reduces the number of trainable parameters and mitigates overfitting while preserving generalization capability. Second, a multimodal guidance mechanism is introduced to enrich visual features with semantic context. A Multimodal Large Language Model generates descriptive text for each support image. The text is embedded into a semantic vector and injected into the visual features through Feature-wise Linear Modulation, which adaptively recalibrates visual representations. Third, a cross-attention metric module is designed to replace fixed distance functions. The module learns similarity between query images and multimodally enhanced support samples by adaptively weighting feature correlations, leading to more precise matching in complex remote sensing scenes. Results and Discussions The proposed method is evaluated on multiple public remote sensing datasets, including NWPU-RESISC45, WHU-RS19, UC-Merced, and AID. The results demonstrate consistent performance gains over baseline methods. Under the 5 way-1shot and 5 way-5 shot settings, classification accuracy increases by 4.7% and 7.0%, respectively. These improvements are achieved with a substantially reduced number of trainable parameters, indicating high computational efficiency. The results confirm that combining large pre-trained models with parameter-efficient fine-tuning is effective for few-shot classification. Performance gains are primarily attributed to multimodal guidance and the cross-attention-based metric, which improve feature discrimination and similarity measurement. Conclusions The EFS-ViT-MM framework effectively addresses limited feature representation, poor generalization, and high computational cost in few-shot remote sensing image classification. The integration of a pre-trained Vision Transformer with parameter-efficient fine-tuning enables effective utilization of large models with reduced computational burden. Multimodal guidance introduces semantic context that enhances visual understanding, whereas the cross-attention metric provides adaptive and accurate similarity estimation. Extensive experiments demonstrate state-of-the-art performance across multiple datasets. The proposed framework offers an efficient and generalizable solution for data-scarce remote sensing applications and provides a foundation for future research on multimodal and efficient deep learning methods for Earth observation.

UMM-Det: A Unified Object Detection Framework for Heterogeneous Multimodal Remote Sensing Imagery

ZOU Minrui, LI Yuxuan, DAI Yimian, LI Xiang, CHENG Mingming

2025, 47(12): 4704-4713. doi: 10.11999/JEIT250933

[Abstract](102) [FullText HTML](53) [PDF 1983KB](13)

Abstract:
Objective With the increasing demand for space-based situational awareness, object detection across multiple modalities has become a fundamental yet challenging task. Existing large-scale multimodal detection models for space-based remote sensing mainly operate on single-frame images from visible light, Synthetic Aperture Radar (SAR), and infrared modalities. Although these models achieve acceptable performance in conventional detection tasks, they largely neglect the critical role of infrared video sequences in improving weak and small target detection. Temporal information in sequential infrared data provides discriminative cues for separating dynamic targets from complex clutter, which cannot be captured by single-frame detectors. To address this limitation, this study proposes UMM-Det, a unified detection model designed for infrared sequences. The proposed model extends existing space-based multimodal frameworks to sequential data and demonstrates that exploiting temporal dynamics is essential for next-generation high-precision space-based sensing systems. Methods UMM-Det is developed based on the unified multimodal detection framework SM3Det and introduces three key innovations. First, the ConvNeXt backbone is replaced with InternImage, a state-of-the-art architecture with dynamic sampling and large receptive field modeling. This replacement improves feature extraction robustness under multi-scale variations and low-contrast conditions that are typical of weak and small targets. Second, a spatiotemporal visual prompting module is designed for the infrared branch. This module generates high-contrast motion features using a refined frame-difference enhancement strategy. The resulting temporal priors guide the backbone to focus on dynamic target regions, thereby reducing interference from static background clutter. Third, to address the imbalance between positive and negative samples during training, Probabilistic Anchor Assignment (PAA) is incorporated into the infrared detection head. This strategy improves anchor selection reliability and enhances small target detection under highly skewed data distributions. The overall pipeline is shown in Fig. 1, and the structure of the spatiotemporal visual prompting module is illustrated in Fig. 2. Results and Discussions Extensive experiments are conducted on three public benchmarks: SatVideoIRSTD for infrared sequence detection, SARDet-50K for SAR target detection, and DOTA for visible light remote sensing detection. The results in Table 2 show that UMM-Det consistently outperforms the baseline SM3Det across all modalities while significantly improving efficiency. For infrared sequence small target detection, UMM-Det improves detection accuracy by 2.54% compared with SM3Det, confirming the effectiveness of temporal priors. In SAR target detection, the model achieves a 2.40% improvement in mAP@0.5:0.95. In visible light detection, an improvement of 1.77% is observed. These results demonstrate the strong generalization capability of the proposed framework across heterogeneous modalities. In addition, UMM-Det reduces the number of parameters by more than 50% relative to SM3Det, which supports efficient and lightweight deployment in space-based systems. Qualitative results in Fig. 3 show that UMM-Det detects low-contrast and dynamic weak targets that are missed by the baseline model. The analysis highlights three main findings. First, the spatiotemporal visual prompting strategy effectively converts frame-to-frame variations into salient motion-aware cues, which are critical for distinguishing small dynamic targets from clutter in complex infrared scenes. Second, the use of InternImage substantially strengthens multi-scale representation capability, improving robustness to variations in target size and contrast. Third, PAA alleviates training imbalance, leading to more stable optimization and higher detection reliability. Together, these components produce a synergistic effect, resulting in superior performance on both sequential infrared data and static SAR and visible light imagery. Conclusions This study proposes UMM-Det, a space-based multimodal detection model that explicitly integrates infrared sequence information into a unified detection framework. By adopting InternImage for feature extraction, a spatiotemporal visual prompting module for motion-aware enhancement, and PAA for balanced training, UMM-Det achieves notable improvements in detection accuracy while reducing computational cost by more than 50%. Experimental results on SatVideoIRSTD, SARDet-50K, and DOTA demonstrate state-of-the-art performance across infrared, SAR, and visible light modalities, with accuracy gains of 2.54%, 2.40%, and 1.77%, respectively. The proposed framework provides a practical solution for future high-performance space-based situational awareness systems, where accuracy, efficiency, and lightweight design are all required. Future work may extend this framework to multi-satellite cooperative sensing and real-time onboard deployment.

PATC: Prototype Alignment and Topology-Consistent Pseudo-Supervision for Multimodal Semi-Supervised Semantic Segmentation of Remote Sensing Images

HAN Wenqi, JIANG Wen, GENG Jie, BAO Yanchen

2025, 47(12): 4714-4727. doi: 10.11999/JEIT251115

[Abstract](116) [FullText HTML](55) [PDF 7086KB](13)

Abstract:
Objective The high annotation cost of remote sensing data and the heterogeneity between optical and SAR modalities limit the performance and scalability of semantic segmentation systems. This study examines a practical semi-supervised setting where only a small set of paired optical–SAR samples is labeled, whereas numerous single-modality SAR images remain unlabeled. The objective is to design a semi-supervised multimodal framework capable of learning discriminative and topology-consistent fused representations under sparse labels by aligning cross-modal semantics and preserving structural coherence through pseudo-supervision. The proposed Prototype Alignment and Topology Consistent (PATC) method aims to achieve robust land-cover segmentation on challenging multimodal datasets, improving region-level accuracy and connectivity-aware structure quality. Methods PATC adopts a teacher–student framework that exploits limited labeled optical–SAR pairs and abundant unlabeled SAR data. A shared semantic prototype space is first constructed to reduce modality gaps, where class prototypes are updated with a momentum mechanism for stability. A prototype-level contrastive alignment strategy enhances intra-class compactness and inter-class separability, guiding optical and SAR features of the same category to cluster around unified prototypes and improving cross-modal semantic consistency. To preserve structural integrity, a topology-consistent pseudo-supervision mechanism is incorporated. Inspired by persistent homology, a topology-aware loss constrains the teacher-generated pseudo-labels by penalizing errors such as incorrect formation or removal of connected components and holes. This structural constraint complements pixel-wise losses by maintaining boundary continuity and fine structures (e.g., roads and rivers), ensuring that pseudo-supervised learning remains geometrically and topologically coherent. Results and Discussions Experiments show that PATC reduces cross-modal semantic misalignment and topology degradation. By regularizing pseudo-labels with a topology-consistent loss derived from persistent homology, the method preserves connectivity and boundary integrity, especially for thin or fragmented structures. Evaluations on the WHU-OPT-SAR and Suzhou datasets demonstrate consistent improvements over state-of-the-art fully supervised and semi-supervised baselines under 1/16, 1/8, and 1/4 label regimes (Fig. 4, Fig. 5, Fig. 6; Table 3, Table 4). Ablation studies confirm the complementary roles of prototype alignment and topology regularization (Table 5). The findings indicate that unlabeled SAR data provides structural priors that, when used through topology-aware consistency and prototype-level alignment, substantially enhance multimodal fusion under sparse annotation. Conclusions This study proposes PATC, a multimodal semi-supervised semantic segmentation method that addresses limited annotations, modality misalignment, and weak generalization. PATC constructs multimodal semantic prototypes in a shared feature subspace and applies prototype-level contrastive learning to improve cross-modal consistency and feature discriminability. A topology-consistent loss based on persistent homology further regularizes the student network, improving the connectivity and structural stability of segmentation results. By incorporating structural priors from unlabeled SAR data within a teacher-student framework with EMA updates, PATC achieves robust multimodal feature fusion and accurate segmentation under scarce labels. Future work will expand topology-based pseudo-supervision to broader multimodal configurations and integrate active learning to refine pseudo-label quality.

Enhanced Super-Resolution-based Dual-Path Short-Term Dense Concatenate Metric Change Detection Network for Heterogeneous Remote Sensing Images

LI Xi, ZENG Huaien, WEI Pengcheng

2025, 47(12): 4728-4741. doi: 10.11999/JEIT250328

[Abstract](91) [FullText HTML](54) [PDF 8660KB](12)

Abstract:
Objective In sudden-onset natural disasters such as landslides and floods, homologous pre-event and post-event remote sensing images are often unavailable in a timely manner, which restricts accurate assessment of disaster-induced changes and subsequent disaster relief planning. Optical heterogeneous remote sensing images differ in sensor type, imaging angle, imaging altitude, and acquisition time. These differences lead to challenges in cross time-space-spectrum change detection, particularly due to spatial resolution inconsistency, spectral discrepancies, and the complexity and diversity of change types for identical ground objects. To address these issues, an Enhanced Super-Resolution-Based Dual-Path Short-Term Dense Concatenate Metric Change Detection Network (ESR-DSMNet) is proposed to achieve accurate and efficient change detection in optical heterogeneous remote sensing images. Methods The ESR-DSMNet consists of an Enhanced Super-Resolution-Based Heterogeneous Remote Sensing Image Quality Optimization Network (ESRNet) and a Dual-Path Short-Term Dense Concatenate Metric Change Detection Network (DSMNet). ESRNet first establishes mapping relationships between remote sensing images with different spatial resolutions using an enhanced super-resolution network. Based on this mapping, low-resolution images are reconstructed to enhance high-frequency edge information and fine texture details, thereby unifying the spatial resolution of heterogeneous remote sensing images at the image level. DSMNet comprises a semantic branch, a spatial-detail branch, a dual-branch feature fusion module, and a metric module based on a batch-balanced contrast loss function. This architecture addresses spectral discrepancies at the feature level and enables accurate and efficient change detection in heterogeneous remote sensing images. Three loss functions are used to optimize the proposed network, which is evaluated and compared with twelve deep learning-based change detection benchmark methods on four datasets, including homologous and heterogeneous remote sensing image datasets. Results and Discussions Comparative analysis on the SYSU dataset (Table 2) shows that DSMNet outperforms the other twelve change detection methods, achieving the highest recall and F1 values of 82.98% and 79.69%, respectively. The method exhibits strong internal consistency for large-area objects and the best visual performance (Fig. 5). On the CLCD dataset (Table 2), DSMNet ranks first in accuracy among the thirteen methods, with recall and F1 values of 73.98% and 71.01%, respectively, and demonstrates superior performance in detecting small-object changes (Fig. 5). On the heterogeneous remote sensing image dataset WXCD (Table 3), ESR-DSMNet achieves the highest F1 value of 95.87% compared with the other methods, with more consistent internal regions and finer building edges (Fig. 6). On the heterogeneous remote sensing image dataset SACD (Table 3), ESR-DSMNet attains the highest recall and F1 values of 92.63% and 90.55%, respectively, and produces refined edges in both dense and sparse building change detection scenarios (Fig. 6). Compared with low-resolution images, the reconstructed images present sharper edges without distortion, which improves change detection accuracy (Fig. 6). Comparisons of reconstructed image quality using different super-resolution methods (Table 4 and Fig. 7), ablation experiments on the DSMNet core modules (Table 5 and Fig. 8), and model efficiency evaluations (Table 6 and Fig. 9) further verify the effectiveness and generalization performance of the proposed method. Conclusions The ESR-DSMNet is proposed to address spatial resolution inconsistency, spectral discrepancies, and the complexity and diversity of change types in heterogeneous remote sensing image change detection. The ESRNet unifies spatial resolution at the image level, whereas the DSMNet mitigates spectral differences at the feature level and improves detection accuracy and efficiency. The proposed network is optimized using three loss functions and validated on two homologous and two heterogeneous remote sensing image datasets. Experimental results demonstrate that ESR-DSMNet achieves superior generalization performance and higher accuracy and efficiency than twelve advanced deep learning-based remote sensing image change detection methods. Additional experiments on reconstructed image quality, DSMNet module ablation, and model efficiency comparisons further confirm the effectiveness of the proposed approach.

Remote Sensing Data Intelligent Interpretation Task Scheduling Algorithm Based on Heterogeneous Platform

HAO Lijiang, TIAN Luyun, SUN Peng, CHEN Jian, LIU Pengying, HE Guangjun, LOU Shuqin

2025, 47(12): 4742-4753. doi: 10.11999/JEIT251072

[Abstract](148) [FullText HTML](63) [PDF 5639KB](19)

Abstract:
Objective Intelligent remote sensing data intelligent interpretation tasks executed on heterogeneous platforms exhibit diverse task types, heterogeneous resources, sensitivity to real-time environmental disturbances, and inter-task resource contention. These characteristics often lead to load imbalance and reduced resource utilization across platforms. Therefore, adaptive and efficient scheduling of complex multi-task workloads in resource-heterogeneous environments remains a central challenge in heterogeneous platform task scheduling. Methods A Heterogeneous Remote Sensing-Intelligent Task Scheduling (HRS-ITS) algorithm is proposed. The CP-SAT optimizer is enhanced by incorporating four score factors—data affinity, load balancing, makespan prediction, and cross-device transmission efficiency—as optimization objectives to generate an initial task-resource mapping. An adaptive resource-scaling-based Dueling Double Deep Q-Network (D3QN) model is then constructed to optimize task execution sequences for makespan reduction. Resource allocation is dynamically adjusted to eliminate idle time during task queuing, enabling dynamic resource perception and configuration optimization. Results and Discussions By integrating static optimization with dynamic adaptation, the HRS-ITS algorithm improves scheduling efficiency and resource utilization on heterogeneous platforms, providing an effective solution for complex remote sensing data intelligent interpretation tasks. Conclusions The proposed framework combines global optimization with dynamic adaptation to achieve computationally efficient real-time remote sensing processing. It provides a basis for extension to more complex task dependencies and larger-scale clusters.

A Landmark Matching Method Considering Gray-Gradient Dual-Channel Features and Deformation Parameter Optimization

XU Changding, LIU Shijie, XIAO Changjiang

2025, 47(12): 4754-4762. doi: 10.11999/JEIT250953

[Abstract](98) [FullText HTML](51) [PDF 5716KB](5)

Abstract:
Objective High-precision optical autonomous navigation is a key technology for deep-space exploration and planetary landing missions. During the descent phase of a lunar lander, communication delays and accumulated errors in the Inertial Navigation System (INS) lead to significant positioning deviations, which pose serious risks to safe landing. Optical images acquired by the lander are matched with pre-stored lunar landmark databases to establish correspondences between image coordinates and three-dimensional coordinates of lunar surface features, thereby enabling precise position estimation. This process is challenged by dynamic illumination variation on the lunar surface, noise in prior pose information, and limited onboard computational resources. Traditional template matching methods exhibit high computational cost and sensitivity to rotation and scale variation. Keypoint-based methods, such as Scale Invariant Feature Transform (SIFT) and Speeded Up Robust Features (SURF), suffer from uneven keypoint distribution and sensitivity to illumination variation, which results in reduced robustness. Deep learning-based approaches, including SuperPoint, SuperGlue, and LF-Net, improve feature detection accuracy but require substantial computational resources, which restricts real-time onboard deployment. To address these limitations, a landmark matching algorithm is proposed that integrates gray-gradient dual-channel features with deformation parameter optimization, enabling high-precision and real-time matching for lunar optical autonomous navigation. Methods Dual-channel image features are constructed by combining gray-level intensity and gradient magnitude representations. Gradient features are computed using Sobel operators in the horizontal and vertical directions, and the gradient magnitude is calculated as the Euclidean norm of the two components. To reduce the effect of local brightness variation and ensure inter-region comparability, zero-mean normalization is applied independently to each feature channel. An adaptive weighting strategy is employed, in which weights are assigned according to local gradient saliency, and a bias term is introduced to retain weak texture information, thereby improving robustness under noisy conditions. Landmark matching is formulated as a nonlinear least-squares optimization problem. A deformation parameter vector is defined, which includes incremental rotation, scale, and translation relative to the prior pose. The objective function minimizes the weighted sum of squared differences between dual-channel landmark features and image features, with Tikhonov regularization applied to constrain parameter magnitude and improve numerical stability. The Levenberg-Marquardt (LM) algorithm is adopted to iteratively estimate the optimal deformation parameters. Its adaptive damping strategy enables switching between gradient descent and Gauss-Newton updates, ensuring stable convergence under large prior pose errors. Iteration terminates when the error norm falls below a predefined threshold or when the maximum iteration number is reached, yielding the optimal landmark transformation parameters. Results and Discussions Experiments are conducted using simulated lunar landing images generated from 60 m-resolution SLDEM (Digital Elevation Model Coregistered with SELENE Data) data, with high-fidelity illumination rendering applied to ensure realistic lighting conditions (Fig. 2). To evaluate matching performance under different scenarios, 143 landmarks are synthesized with systematically controlled perturbations in rotation, scale, and translation. Four representative methods are selected for comparison, including convolution-accelerated Normalized Cross-Correlation (NCC), SURF-based feature matching with image enhancement, globally and locally optimized NCC, and the proposed algorithm (Fig. 4). The results indicate clear performance differences among the methods. Convolution-accelerated NCC achieves sub-second runtime and demonstrates high computational efficiency, although its accuracy degrades under gray-level variation and geometric deformation, with mean absolute errors of 2.41 px along the x-axis and 3.37 px along the y-axis, and a success rate of 89.51% (Table 1). SURF-based matching achieves sub-pixel accuracy, with mean absolute errors of 0.56 px along the x-axis and 0.54 px along the y-axis, although its success rate is limited to 48.95% and its runtime exceeds one second, which restricts onboard applicability. The globally and locally optimized NCC method exhibits the lowest accuracy, with errors of 4.54 px along the x-axis and 4.92 px along the y-axis, and the longest runtime of 4.41 s, despite achieving a 100% success rate. In contrast, the proposed algorithm consistently achieves sub-pixel accuracy comparable to SURF, maintains a 100% success rate, and sustains a stable runtime of approximately 0.5 s across all test cases. Its robustness to landmark deformation and illumination variation demonstrates suitability for complex operational conditions. Overall, the results show that the proposed algorithm achieves a favorable balance among accuracy, robustness, and computational efficiency. Conclusions A landmark matching algorithm is presented that integrates gray-gradient dual-channel features with deformation parameter optimization. Gray-level intensity and gradient magnitude information from both landmark templates and lander images are jointly exploited to construct a dual-channel matching model that minimizes feature differences. Deformation parameters, including rotation, scale, and translation, are iteratively optimized using the LM algorithm, enabling rapid estimation of the optimal landmark position in the lander image. Experimental results show stable convergence within sub-second runtime, with an average matching error of 1.03 pixels under disturbances in attitude, scale, and position. The proposed method outperforms single-channel gray-level cross-correlation and SURF-based matching approaches in accuracy, robustness, and efficiency. These results provide practical support for the design and implementation of future autonomous lunar optical navigation systems.

DetDiffRS: A Detail-Enhanced Diffusion Model for Remote Sensing Image Super-Resolution

SONG Miao, CHEN Zhiqiang, WANG Peisong, XING Xiangwei, HUANG Liwei, CHENG Jian

2025, 47(12): 4763-4778. doi: 10.11999/JEIT250995

[Abstract](148) [FullText HTML](88) [PDF 7968KB](31)

Abstract:
Objective This study aims to enhance the reconstruction of fine structural details in High-Resolution (HR) Remote Sensing image Super-Resolution (RSSR) by leveraging Diffusion Models (DM). Although diffusion-based approaches achieve strong performance in natural image restoration, their direct application to remote sensing imagery remains suboptimal because of the pronounced imbalance between extensive low-frequency homogeneous regions, such as water bodies and farmland, and localized high-frequency regions with complex structures, such as buildings, ports, and aircraft. This imbalance leads to insufficient learning of critical high-frequency details, resulting in reconstructions that appear globally smooth but lack sharpness and realism. To address this limitation, DetDiffRS, a detail-enhanced diffusion-based framework, is proposed to explicitly increase sensitivity to high-frequency information during data sampling and optimization, thereby improving perceptual quality and structural fidelity. Methods The DetDiffRS framework introduces improvements at both the data input and loss-function levels to mitigate the high-low frequency imbalance in remote sensing imagery. First, a Multi-Scale Patch Sampling (MSPS) strategy is proposed to increase the probability of selecting patches containing high-frequency structures during training. This is achieved by constructing a multi-scale patch pool and applying weighted sampling to prioritize structurally complex regions. Second, a composite perceptual loss is designed to provide supervision beyond conventional denoising objectives. This loss integrates a High-Dimensional Perceptual Loss (HDPL) to enforce structural consistency in deep feature space and a High-Frequency-Aware Loss (HFAL) to constrain high-frequency components in the frequency domain. The combination of MSPS and the composite perceptual loss enables the DM to capture and reconstruct fine details more effectively, improving both objective quality metrics and visual realism. Results and Discussions Extensive experiments are conducted on three publicly available remote sensing datasets, AID, DOTA, and DIOR, and comparisons are performed against representative state-of-the-art super-resolution methods, including CNN-based approaches (EDSR and RCAN), Transformer-based approaches (HAT-L and TTST), GAN-based approaches (MSRGAN, ESRGAN, and SPSR), and diffusion-based approaches (SR3 and IRSDE). Quantitative evaluation using Fréchet Inception Distance (FID) on the AID dataset shows that DetDiffRS achieves the best performance in 21 of 30 scene categories, with an average FID of 48.37, exceeding the second-best method by 1.14. The improvements are most evident in texture-rich and structurally complex categories such as Dense Residential, Meadow, and River, where FID reductions exceed 3.0 relative to competing diffusion-based methods (Table 1). Although PSNR-oriented methods such as RCAN achieve the highest PSNR and SSIM values in some cases, they generate overly smooth reconstructions with limited fine detail. In contrast, DetDiffRS, supported by HDPL and HFAL, achieves a balanced improvement in objective metrics and perceptual quality, improving PSNR by 1.178 9 dB over SR3 on AID and SSIM by 0.064 4 on DOTA (Table 2). Visual comparisons further indicate that DetDiffRS consistently produces sharper edges, clearer structures, and more realistic textures, reducing over-smoothing in PSNR-focused methods and artifacts commonly observed in GAN-based approaches (Fig. 5 and Fig. 6). Conclusions This study presents DetDiffRS, a detail-enhanced diffusion-based super-resolution framework tailored to the frequency distribution characteristics of remote sensing imagery. Through integration of the MSPS strategy and a composite perceptual loss that combines HDPL and HFAL, the proposed method addresses the underrepresentation of high-frequency regions during training and achieves substantial improvements in detail preservation and perceptual fidelity. Experimental results across multiple datasets and scene types demonstrate that DetDiffRS outperforms existing CNN-, Transformer-, GAN-, and diffusion-based methods in FID while maintaining a competitive balance between PSNR, SSIM, and visual realism. These results indicate that DetDiffRS provides a robust and generalizable solution for high-quality RSSR in applications requiring structural accuracy and fine detail reconstruction.

Differentiable Sparse Mask Guided Infrared Small Target Fast Detection Network

SHENG Weidong, WU Shuanglin, XIAO Chao, LONG Yunli, LI Xiaobin, ZHANG Yiming

2025, 47(12): 4779-4789. doi: 10.11999/JEIT250989

[Abstract](220) [FullText HTML](111) [PDF 3520KB](32)

Abstract:
Objective Infrared small target detection has significant and irreplaceable application value in infrared guidance, environmental monitoring, and security surveillance. Its relevance is reflected in early warning, precision targeting, and pollution tracking, where timely and accurate detection is required. Core challenges arise from the inherent properties of infrared small targets: extremely small size (typically less than 9 × 9 pixels), limited spatial features due to long imaging distance, and a high likelihood of being submerged in complex and cluttered backgrounds such as clouds, sea glint, or urban thermal noise. These factors hinder reliable separation of true targets from background clutter using conventional approaches. Existing methods are generally divided into traditional model-based techniques and modern deep learning techniques. Traditional methods rely on manually designed background suppression operators, such as morphological filters (e.g., Top-Hat) or low-rank matrix recovery (e.g., IPI). Although interpretable in simple scenes, they adapt poorly to dynamic and complex environments, leading to high false alarm rates and limited robustness. Deep learning methods, particularly dense Convolutional Neural Networks (CNNs), achieve improved performance through data-driven feature learning. However, they do not sufficiently address the extreme imbalance between target and background pixels, with targets usually accounting for less than 1% of an image. Therefore, substantial computational redundancy occurs because large background regions contribute little to detection, which limits efficiency and real-time capability. Exploiting the sparsity of infrared small targets therefore provides a practical direction. By introducing a sparse mask generation module that uses target sparsity, potential target regions can be coarsely extracted while most redundant background areas are suppressed, followed by refinement in later stages. This study presents a solution that balances detection accuracy and computational efficiency for real-time applications. Methods An end-to-end infrared small target detection network guided by a differentiable sparse mask is proposed. First, an input infrared image is first processed by convolution to obtain raw features. A differentiable sparse mask generation module then adopts two convolution branches to generate a probability map and a threshold map, and produces a binary mask through a differentiable binarization function to extract candidate target regions and suppress background redundancy. Next, a target region sampling module converts dense raw features into sparse features according to the binary mask. A sparse feature extraction module with a U-shaped structure, composed of encoders, decoders, and skip connections, applies Minkowski Engine sparse convolution to perform refined processing only on non-zero target regions, thereby reducing computation. Finally, a pyramid pooling module fuses multi-scale sparse features, which are fed into a target-background binary classifier to generate detection results. Results and Discussions Comprehensive experiments are conducted on two mainstream infrared small target datasets: NUAA-SIRST, which includes 427 real infrared images extracted from real videos, and NUDT-SIRST, a large-scale synthetic dataset containing 1 327 diverse images. Comparisons are made with three representative traditional algorithms (e.g., Top-Hat, IPI) and six state-of-the-art deep learning methods (e.g., DNA-Net, ACM). The proposed method achieves competitive detection performance. On NUAA-SIRST, it attains 74.38% IoU, 100% P_d, and 7.98 × 10^–6 F_a. On NUDT-SIRST, it reaches 83.03% IoU, 97.67% P_d, and 9.81 × 10^–6 F_a, which is comparable to leading deep learning approaches. High efficiency is observed, with only 0.35 M parameters, 11.10 GFLOPs, and 215.06 fps. The frame rate is 4.8 times that of DNA-Net, indicating a substantial reduction in computational redundancy. Ablation experiments (Fig. 6) confirm that the differentiable sparse mask module effectively suppresses most background regions while retaining target areas. Visual results (Fig. 5) show fewer false alarms than traditional methods such as PSTNN, as the coarse-to-fine strategy reduces background interference and supports a balance between accuracy and efficiency. Conclusions A fast infrared small target detection network guided by a differentiable sparse mask is proposed to address the severe computational redundancy of dense computation methods, which originates from the extreme imbalance between target and background pixels (target proportion is usually smaller than 1% of the whole image). Candidate target regions are adaptively extracted and background redundancy is filtered through a differentiable sparse mask generation module. A sparse feature extraction module based on Minkowski Engine sparse convolution further reduces computation, forming an end-to-end coarse-to-fine detection framework. Experiments on the NUAA-SIRST and NUDT-SIRST datasets show that the proposed method achieves detection performance comparable to existing deep learning methods while significantly optimizing computational efficiency. The method supports real-time requirements in scenarios such as remote sensing detection, infrared guidance, and environmental monitoring, and provides a practical reference for lightweight development in infrared small target detection.

A Focused Attention and Feature Compact Fusion Transformer for Semantic Segmentation of Urban Remote Sensing Images

ZHOU Guoyu, ZHANG Jing, YAN Yi, ZHUO Li

2025, 47(12): 4790-4800. doi: 10.11999/JEIT250812

[Abstract](181) [FullText HTML](65) [PDF 4770KB](26)

Abstract:
Objective Driven by the growing integration of remote sensing data acquisition and intelligent interpretation technologies within aerospace information intelligent processing, semantic segmentation of Urban Remote Sensing Image (URSI) has emerged as a key research area connecting aerospace information and urban computing. However, compared to general Remote Sensing Image (RSI), URSI exhibits a high diversity and complex of geo-objects, characterized by fine-grained intra-class variations, inter-class similarities that cause confusion, as well as blurred and irregular object boundaries. These factors present difficulties for fine-grained segmentation. Despite their success in RSI semantic segmentation, applying Transformer-based methods to URSI requires a balance between capturing detailed features and boundaries, and managing the computational cost of self-attention. To address these issues, this paper introduces a focused attention mechanism in the encoder to efficiently capture discriminative intra- and inter-class features, while performing compact edge feature fusion in the decoder. Methods This paper proposes a Focused attention and Feature compact Fusion Transformer (F³Former). The encoder incorporates a dedicated Feature-Focused Encoding Block (FFEB). By leveraging the focused attention mechanism, it adjusts the directions of Query and Key features such that the features of the same class are pulled closer while those of different classes are repelled, thereby enhancing intra-class consistency and inter-class separability during feature representation. This process yields a compact and highly discriminative attention distribution, which amplifies semantically critical features while curbing computational overhead. To complement this design, the decoder employs a Compact Feature Fusion Module (CFFM), where Depth-Wise Convolution (DW Conv) is utilized to minimize redundant cross-channel computations. This design strengths the discriminative power of edge representations, improvs inference efficiency and deployment adaptability, and maintains segmentation accuracy. Results and Discussions F³Former demonstrates favorable performance on several benchmark datasets, alongside a lower computational complexity. On the Potsdam and Vaihingen benchmarks, it attained mIoU scores of 88.33%/81.32%, respectively, ranking second only to TEFormer with marginal differences in accuracy (Table 1). Compared to other lightweight models including CMTFNet, ESST, and FSegNet, F³Former consistently delivered superior results in mIoU, mF1, and PA, demonstrating the efficacy of the proposed FFEB and CFFM modules in capturing complex URSI features. On the LoveDA dataset, it reached 53.16% mIoU and outperformed D²SFormer in several critical categories (Fig. 4). Moreover, F³Former strikes a favorable balance between accuracy and efficiency, reducing parameter count and FLOPs by over 30% compared to TEFormer, with only negligible degradation in accuracy (Table 2). Qualitative results further indicate clearer boundary delineation and improved recognition of small or occluded objects relative to other lightweight approaches (Fig. 5 and Fig. 6). Ablation studies validate the critical role of both the Focused Attention (FA) mechanism and the Compact Feature Fusion Head (CFFHead) in achieving accuracy and efficiency gains (Tables 3 & Tables 4). Conclusions This work tackles key challenges in URSI semantic segmentation—including intra-class variability, inter-class ambiguity, and complex boundaries—by proposing F³Former. In the encoder, the FFEB improves intra-class aggregation and inter-class discrimination through directional feature modeling. In the decoder, the CFFM employs DW Conv to minimize redundancy and enhance boundary representations. With linear complexity, F³Former attains higher accuracy and stronger representational capacity while remaining efficient and deployment-friendly. Extensive experiments across multiple URSI benchmarks confirm its superior performance, highlighting its practicality for large-scale URSI applications. However, compared to existing State-Of-The-Art (SOTA) lightweight methods, the computational efficiency of the FFEB still has room for improvement. Future work is directed towards replacing Softmax with a more efficient operator to accelerate attention computation, maintaining accuracy while advancing efficient URSI semantic segmentation. Additionally, as the decoder’s channel interaction mechanism remains relatively limited, we plan to incorporate lightweight attention or pointwise convolution designs to further strengthen feature fusion.

Knowledge-guided Few-shot Earth Surface Anomalies Detection

JI Hong, GAO Zhi, CHEN Boan, AO Wei, CAO Min, WANG Qiao

2025, 47(12): 4801-4812. doi: 10.11999/JEIT251000

[Abstract](132) [FullText HTML](60) [PDF 5747KB](30)

Abstract:
Objective Earth Surface Anomalies (ESAs), defined as sudden natural or human-generated disruptions on the Earth’s surface, present severe risks and widespread effects. Timely and accurate ESA detection is therefore essential for public security and sustainable development. Remote sensing offers an effective approach for this task. However, current deep learning models remain limited due to the scarcity of labeled data, the complexity of anomalous backgrounds, and distribution shifts across multi-source remote sensing imagery. To address these issues, this paper proposes a knowledge-guided few-shot learning method. Large language models generate abstract textual descriptions of normal and anomalous geospatial features. These descriptions are encoded and fused with visual prototypes to construct a cross-modal joint representation. The integrated representation improves prototype discriminability in few-shot settings and demonstrates that linguistic knowledge strengthens ESA detection. The findings suggest a feasible direction for reliable disaster monitoring when annotated data are limited. Methods The knowledge-guided few-shot learning method is constructed on a metric-based paradigm in which each episode contains support and query sets, and classification is achieved by comparing query features with class prototypes through distance-based similarity and cross-entropy optimization (Fig. 1). To supplement limited visual prototypes, class-level textual descriptions are generated with ChatGPT through carefully designed prompts, producing semantic sentences that characterize the appearance, attributes, and contextual relations of normal and anomalous categories (Fig. 2, 3). These descriptions encode domain-specific properties such as anomaly extent, morphology, and environmental effect, which are otherwise difficult to capture when only a few visual samples are available. The sentences are encoded with a Contrastive Language-Image Pre-training (CLIP) text encoder, and task-adaptive soft prompts are introduced by generating tokens from support features and concatenating them with static embeddings to form adaptive word embeddings. Encoded sentence vectors are processed with a lightweight self-attention module to model dependencies across multiple descriptions and to obtain a coherent paragraph-level semantic representation (Fig. 4). The resulting semantic prototypes are fused with the visual prototypes through weighted addition to produce cross-modal prototypes that integrate visual grounding and linguistic abstraction. During training, query samples are compared with the cross-modal prototypes, and optimization is guided by two objectives: a classification loss that enforces accurate query–prototype alignment, and a prototype regularization loss that ensures semantic prototypes are discriminative and well separated. The entire method is implemented in an episodic training framework (Algorithm 1). Results and Discussions The proposed method is evaluated under both cross-domain and in-domain few-shot settings. In the cross-domain case, models are trained on NWPU45 or AID and tested on ESAD to assess ESAs recognition. As shown in the comparisons (Table 2), traditional meta-learning methods such as MAML and Meta-SGD reach accuracies below 50%, whereas metric-based baselines such as ProtoNet and RelationNet demonstrate greater stability but remain limited. The proposed method reaches 61.99% on the NWPU45→ESAD and 59.79% on the AID→ESAD settings, outperforming ProtoNet by 4.72% and 2.67% respectively. In the in-domain setting, where training and testing are conducted on the same dataset, the method reaches 76.94% on NWPU45 and 72.98% on AID, and consistently exceeds state-of-the-art baselines such as S2M2 and IDLN (Table 3). Ablation experiments further support the contribution of each component. Using only visual prototypes produces accuracies of 57.74% and 72.16%, and progressively incorporating simple class names, task-oriented templates, and ChatGPT-generated descriptions improves performance. The best accuracy is achieved by combining ChatGPT descriptions, learnable tokens, and an attention-based mechanism, reaching 61.99% and 76.94% (Table 4). Parameter sensitivity analysis shows that an appropriate weight for language features (α = 0.2) and the use of two learnable tokens yield optimal performance (Fig. 5). Conclusions This paper addresses ESAs detection in remote sensing imagery through a knowledge-guided few-shot learning method. The approach uses large language models to generate abstract textual descriptions for anomaly categories and conventional remote sensing scenes, thereby constructing multimodal training and testing resources. These descriptions are encoded into semantic feature vectors with a pretrained text encoder. To extract task-specific knowledge, a dynamic token learning strategy is developed in which a small number of learnable parameters are guided by visual samples within few-shot tasks to generate adaptive semantic vectors. An attention-based semantic knowledge module models dependencies among language features and produces cross-modal semantic vectors for each class. By fusing these vectors with visual prototypes, the method forms joint multimodal representations used for query-prototype matching and network optimization. Experimental evaluations show that the method effectively leverages prior knowledge contained in pretrained models, compensates for limited visual data, and improves feature discriminability for anomalies recognition. Both cross-domain and in-domain results confirm consistent gains over competitive baselines, highlighting the potential of the approach for reliable application in real-world remote sensing anomalies detection scenarios.

A Frequency-Aware and Spatially Constrained Network for Ship Instance Segmentation in SAR Images

ZHANG Boya, WANG Yong

2025, 47(12): 4813-4823. doi: 10.11999/JEIT250938

[Abstract](63) [FullText HTML](43) [PDF 6264KB](11)

Abstract:
Objective With the development of Synthetic Aperture Radar (SAR) imaging technology, ship instance segmentation in SAR images has become an important research direction in radar signal processing. Unlike traditional optical image segmentation tasks, SAR images reflect target backscatter intensity and usually contain objects with diverse scales and irregular spatial distributions, which poses significant challenges for ship instance segmentation. Although recent studies have achieved notable progress, existing networks do not fully exploit frequency features and spatial information of targets, resulting in classification and localization errors. To address this limitation, a frequency-aware and spatially constrained network is proposed to extract frequency features and spatial information from multiscale representations, thereby improving feature representation and instance segmentation accuracy in SAR images. Methods For input SAR images, a frequency-aware backbone network is first applied to extract frequency features at different scales. Features from the first four stages of the backbone network are then processed by a selective feature pyramid network to guide the model to focus on the most informative regions and to fuse multiscale features effectively. After enhanced multiscale features are obtained, a region proposal network is employed to generate candidate target proposals. These features and proposals are subsequently fed into a segmentation head with spatial information constraints to produce final instance segmentation results. The frequency-aware backbone network encodes multiscale features in the frequency domain, which strengthens feature extraction for ship targets. Based on image semantic information, the selective feature pyramid network enables effective attention to informative regions and integration of features across scales. In addition, a spatially constrained mask loss function is designed to update model parameters under constraints of centroid distance and directional deviation between predicted masks and ground-truth targets. Results and Discussions The effectiveness and robustness of the proposed network are validated on two public datasets, SSDD and HRSID. For the SSDD dataset, P, R, F₁, AP0.5, AP0.75, and AP0.5:0.95 metrics are used for evaluation. Quantitative and qualitative comparisons (Figures 6 and 7, Table 1) indicate that the proposed network improves feature extraction and feature integration for SAR images, which enables more accurate segmentation of ships with different scales in complex backgrounds. For the HRSID dataset, AP0.5, AP0.75, and AP0.5:0.95 are reported for quantitative comparison. The results (Table 3) demonstrate strong adaptability and generalization capability across different datasets and application scenarios in ship instance segmentation tasks. Additionally, ablation experiments (Table 2) confirm the contribution of each module of the proposed network to segmentation performance improvement in SAR images. Conclusions A frequency-aware and spatially constrained network for ship instance segmentation in SAR images is proposed. The frequency-aware backbone network enhances feature perception for SAR imagery, whereas the selective feature pyramid network guides attention toward informative regions and improves segmentation of ship targets at different scales. The segmentation head incorporates spatial information constraints into the mask loss function, which yields more accurate instance segmentation results. Experimental results on the SSDD and HRSID datasets show that the proposed method outperforms existing approaches and achieves improved effectiveness and generalization capability for ship instance segmentation in SAR images.

Research on Model-Driven Integrated Simulation Technology for Space-Based Support

REN Yuli, YOU Lingfei, CHANG Chuangye, GUO Zhiqi

2025, 47(12): 4824-4837. doi: 10.11999/JEIT251004

[Abstract](97) [FullText HTML](53) [PDF 5435KB](8)

Abstract:
Objective Space-based information support is a core component of modern operational systems. It acquires, transmits, and processes information through space-based platforms to provide full-process, round-the-clock information support for remote precision strikes. Therefore, digital simulation and verification of space-based information support systems have become key means for combat concept design, scheme demonstration, and rapid capability iteration. This study examines the application of Model-Based Systems Engineering(MBSE) to integrated simulation of space-based support operations. The objective is to address key challenges in information representation, system interoperability, and integrated simulation in complex combat systems. To overcome the limitations of traditional simulation approaches in cross-platform collaboration, dynamic extensibility, and efficient integration of functional logic with spatiotemporal information, a multi-perspective modeling and simulation method based on the Discrete EVent System specification(DEVS) is proposed. A hybrid integrated simulation framework is constructed. Methods The proposed framework enables abstract interconnection of weapon and equipment models, plug-and-play integration of simulation resources, precise global time synchronization, and high-performance real-time communication. These capabilities are achieved through four core modules: data integration management, heterogeneous software adapters, time management control, and publish-subscribe mechanisms. The framework supports interoperability and reusability of heterogeneous simulation software. On this basis, a joint simulation system is designed by integrating system architecture development and verification software with spatiotemporal simulation software for visualization and reasoning. Message middleware supports bidirectional synchronous interaction between state machines and spatiotemporal models, enabling closed-loop verification from combat concepts to digital inference. The core contribution of this research is the removal of long-standing separation between discrete event logic describing combat functions, information flow, and state machines, typically modeled using Systems Modeling Language (SysML), and continuous physical scenes, such as spatiotemporal motion and sensor coverage, constructed on visualization and deduction platforms. Through real-time bidirectional data exchange enabled by message middleware, discrete command decisions drive continuous platform behavior, whereas dynamic changes in battlefield conditions trigger corresponding combat responses. Results and Discussions A complete closed-loop simulation of a “maritime island and reef reconnaissance support denial operation” scenario is conducted using the joint simulation system, producing effective verification results. The simulation reproduces the full process from space-based target detection to coordinated regional denial by multi-domain forces. First, a capability-activity-equipment analysis model for the combat mission is developed using the Unified Architecture Framework (UAF), generating equipment interaction relationships and corresponding state machines. In parallel, continuous construction of the combat scenario is implemented on the visualization and deduction platform. The entire deduction process is precisely synchronized with physical motion through the state machine model deployed in the system architecture development and verification platform. Each state transition, such as “target detection,” “strike initiation,” and “effect evaluation,” triggers corresponding spatiotemporal simulation activities. Platform states and environmental data fed back from the visualization and deduction platform then drive subsequent state machine evolution. Through joint simulation, the rationality and feasibility of the operational concept, in which multi-domain unmanned forces conduct reconnaissance, deterrence, and denial under space-based information support, are verified. The results provide an intuitive and high-confidence basis for decision-making in system scheme optimization. Conclusions This study investigates the application of model-driven technology to the design and validation of space-based support joint operation systems. A joint simulation framework enabling deep integration of functional logic and physical scenarios is constructed and validated. Unlike conventional simulation approaches that focus on static structures or isolated functions, the proposed framework couples SysML-based discrete event logic models with continuous spatiotemporal dynamic models through a distributed architecture consisting of one core platform and multiple component adapters. This approach resolves the long-standing separation between functional modeling and scene simulation. Discrete behaviors, such as command decision-making and state transitions, directly drive platform movement and interaction within realistic spatiotemporal environments. Conversely, dynamic battlefield changes provide real-time feedback that affects higher-level logical decisions, forming a bidirectional closed loop. The framework integrates the precision of functional logic with the intuitiveness of scene simulation and enables realistic reproduction of multi-domain collaborative operations in digital space. It provides effective support for system design, operational deduction, and high-confidence verification of space-based support joint operation systems.

Residual Subspace Prototype Constraint for SAR Target Class-Incremental Recognition

XU Yanjie, SUN Hao, LIN Qinjie, JI Kefeng, KUANG Gangyao

2025, 47(12): 4838-4850. doi: 10.11999/JEIT251007

[Abstract](160) [FullText HTML](79) [PDF 4091KB](9)

Abstract:
Synthetic Aperture Radar (SAR) target recognition systems deployed in open environments frequently encounter continuously emerging categories. This paper proposes a SAR target class-incremental recognition method named Residual Subspace Prototype Constraint (RSPC). RSPC constructs lightweight, task-specific adapters to expand the feature subspace, enabling effective learning of new classes and alleviating catastrophic forgetting. First, self-supervised learning is used to pretrain the backbone network to extract generic feature representations from SAR data. During incremental learning, the backbone network is frozen, and residual adapters are trained to focus on changes in discriminative features. To address old-class prototype invalidation caused by feature space expansion, a structured constraint-based prototype completion mechanism is proposed to synthesize prototypes of old classes in the new subspace without replaying historical data. During inference, predictions are made based on the similarity between the input target and the integrated prototypes from all subspaces. Experiments on the MSTAR, SAMPLE, and SAR-ACD datasets validate the effectiveness of RSPC. Objective SAR target recognition in dynamic environments must learn new classes while preserving previously acquired knowledge. Rehearsal-based methods are often impractical because of data privacy and storage constraints in real-world applications. Moreover, conventional pretraining suffers from high interclass scattering similarity and ambiguous decision boundaries, which represents a challenge different from typical catastrophic forgetting. A rehearsal-free framework is proposed to model discriminative feature evolution and reconstruct old-class prototypes in expanded subspaces. This framework enables robust, efficient, and scalable SAR target recognition without rehearsal. Methods An RSPC framework is proposed for SAR target class-incremental recognition and is built on a pretrained Vision Transformer backbone. During the incremental phase, the backbone is frozen, and a lightweight residual adapter is trained for each new task to learn the residual feature difference between the current task and the historical average, thereby forming a task-specific discriminative subspace. To address prototype decay in expanded subspaces, a structured prototype completion mechanism is introduced. This mechanism synthesizes the prototype of a historical class in the current subspace by aggregating its observed prototypes from all prior subspaces in which it is learned, weighted by a confidence score derived from three geometric consistency metrics: norm ratio, angular similarity, and Euclidean distance between the historical class and all current new classes within each prior subspace. Optimization of the residual adapter is guided by a dual-constraint loss, including a prototype contrastive loss that enforces intraclass compactness and interclass separation, and a subspace orthogonality loss that maximizes the angular distance between the residual features of a sample across consecutive subspaces, thereby preventing feature reuse and promoting task-specific learning. Results and Discussions RSPC achieves the highest Average Incremental Accuracy (AIA) and the lowest Precision Drop (PD) among all rehearsal-free methods across all three datasets (Tables 4～6). On MSTAR, RSPC achieves an AIA of 95.23% (N=1) and 94.83% (N=2), outperforming the best baseline EASE by 0.58% and 0.38%, respectively, while reducing PD by 1.90% and 1.21%. On SAMPLE, RSPC achieves an AIA of 93.30% (N=1) and 93.23% (N=2), exceeding EASE by 1.15% and 2.31 percentage points with substantially lower PD. On the more challenging SAR-ACD dataset, RSPC achieves an AIA of 58.69% (N=1) and 60.35% (N=2), demonstrating superior performance over EASE and SimpleCIL and approaching the performance of rehearsal-based methods ILFL and HLFCC. The t-SNE visualizations (Figs. 2～4) show that RSPC produces more compact and well-separated class clusters than EASE and MEMO and provides improved interclass boundary discrimination compared with DualPrompt and APER_SSF. The ablation study (Tables 7

\begin{document}$ \sim $\end{document}

9) confirms that both the prototype contrastive loss and the subspace orthogonality loss are essential. Their joint use usually yields the highest AIA and the lowest PD across all datasets, demonstrating complementary effects on discriminability and feature disentanglement. Under low-data conditions (Fig. 5), RSPC maintains superior performance and achieves higher accuracy than EASE when only 20% of new-class training samples are available, indicating strong data efficiency. Conclusions A rehearsal-free incremental learning framework, RSPC, is presented for SAR target recognition to mitigate catastrophic forgetting caused by high interclass scattering similarity. RSPC employs a residual subspace mechanism to capture discriminative feature increments and a structured prototype completion strategy to reconstruct stable prototypes without historical data. Experiments on three benchmarks show that RSPC substantially outperforms existing rehearsal-free methods and rivals rehearsal-based approaches, establishing a state-of-the-art solution for scalable and privacy-preserving recognition. Robust performance in low-data regimes further supports its suitability for deployment in resource-constrained and privacy-sensitive scenarios.

A Morphology-guided Decoupled Framework for Oriented SAR Ship Detection

WANG Zeyu, WANG Qingsong

2025, 47(12): 4851-4861. doi: 10.11999/JEIT250979

[Abstract](98) [FullText HTML](47) [PDF 4314KB](9)

Abstract:
Objective Synthetic Aperture Radar (SAR) is an indispensable remote sensing technology; however, accurate ship detection in SAR imagery remains challenging. Most deep learning-based detection approaches rely on Horizontal Bounding Box (HBB) annotations, which do not provide sufficient geometric information to estimate ship orientation and scale. Although Oriented Bounding Box (OBB) annotation contains such information, reliable OBB labeling for SAR imagery is costly and frequently inaccurate because of speckle noise and geometric distortions intrinsic to SAR imaging. Weakly supervised object detection provides a potential alternative, yet approaches designed for optical imagery exhibit limited generalization capability in the SAR domain. To address these limitations, a simulation-driven decoupled framework is proposed. The objective is to enable standard HBB-based detectors to produce accurate OBB predictions without structural modification by training a dedicated orientation estimation module using a fully supervised synthetic dataset that captures essential SAR ship morphology. Methods The proposed framework decomposes oriented ship detection into two sequential sub-tasks: coarse localization and fine-grained orientation estimation (Fig. 1). First, an axis-aligned localization module based on a standard HBB detector, such as YOLOX, is trained using available HBB annotations to identify candidate regions of interest. This stage exploits the high-recall capability of mature detection networks and outputs image patches that potentially contain ship targets. Second, to learn orientation information without real OBB annotations, a large-scale morphological simulation dataset composed of binary images is constructed. The dataset generation begins with simple binary rectangles of randomized aspect ratios and known ground-truth orientations. To approximate the appearance of binarized SAR ship targets, morphological operations, including edge-level and region-level erosion and dilation, are applied to introduce boundary ambiguity. Structured strong scattering cross noise is further injected to simulate SAR-specific artifacts. This process yields a synthetic dataset with precise orientation labels. Third, an orientation estimation module based on a lightweight ResNet-18 architecture is trained exclusively on the synthetic dataset. This module predicts object orientation and refines aspect ratio using only shape and contour information. During inference, candidate patches produced by the localization module are binarized and processed by the orientation estimation module. Final OBBs are generated by fusing the spatial coordinates derived from the initial HBBs with the predicted orientation and refined dimensions. Results and Discussions The proposed method is evaluated on two public SAR ship detection benchmarks, HRSID and SSDD. Training is conducted using only HBB annotations, whereas performance is assessed against ground-truth OBBs using Average Precision at 0.5 intersection over union (AP50) and Recall (R). The method demonstrates superior performance relative to existing weakly supervised approaches and remains competitive with fully supervised methods (Table 1 and Table 2). On the HRSID dataset, an AP50 of 84.3% and a recall of 91.9% are achieved. These results exceed those of weakly supervised methods such as H2Rbox-v2 (56.2% AP50) and the approach reported by Yue et al.^[14] (81.5% AP50), and also outperform several fully supervised detectors, such as R-RetinaNet (72.7% AP50) and S2ANet (80.8% AP50). A similar advantage is observed on the SSDD dataset, where an AP50 of 89.4% is obtained, representing a significant improvement over the best reported weakly supervised result of 87.3%. Qualitative inspection of detection outputs supports these quantitative results (Fig. 3). The proposed method shows a lower missed-detection rate, particularly for small and densely clustered ships, relative to other weakly supervised approaches. This robustness is attributed to the high-recall property of the first-stage localization network combined with reliable orientation cues learned from the morphological dataset. To examine key methodological aspects, additional experiments are conducted. Analysis of the domain gap between synthetic and real data using UMAP-based visualization of high-dimensional features (Fig. 5) reveals substantial overlap and similar manifold structures across domains, indicating strong morphological consistency. An ablation study of the morphological components (Fig. 4) further shows that each simulation element contributes incrementally to performance improvement, supporting the design of the high-fidelity simulation process. Conclusions A morphology-guided decoupled framework for oriented ship detection in SAR imagery is presented. By separating localization and orientation estimation, standard HBB-based detectors are enabled to perform accurate oriented detection without retraining. The central contribution is a fully supervised morphological simulation dataset that allows a dedicated module to learn robust orientation features from structural contours, thereby mitigating the annotation challenges associated with real SAR data. Experimental results demonstrate that the proposed approach substantially outperforms existing HBB-supervised methods and remains competitive with fully supervised alternatives. The plug-and-play design highlights its practical applicability.

Autonomous Radar Scan-Mode Recognition Method Based on High-Dimensional Features and Random Forest

WU Kanghui, GUO Zixun, FAN Yifei, XIE Jian, TAO Mingliang

2025, 47(12): 4862-4873. doi: 10.11999/JEIT250985

[Abstract](79) [FullText HTML](41) [PDF 4078KB](10)

Abstract:
Objective Rapid and robust recognition of radar scanning modes under noncooperative electronic reconnaissance conditions is a prerequisite for threat assessment, resource scheduling, and countermeasure design. Mechanical Scanning (MST) and phased-array Electronic Scanning (EST) leave different physical imprints in the Time-Of-Arrival-Pulse Amplitude (TOA-PA) stream. However, their separability degrades under low Signal-to-Noise Ratio (SNR), nonstationary dwell scheduling, and jittered timing typical of dense electromagnetic environments. In this study, a physics-grounded, multi-domain feature framework coupled with a Random Forest (RF) classifier is developed to discriminate MST from EST using only intercepted TOA-PA sequences, without synchronization or prior emitter knowledge. Methods The reconnaissance reception chain is modeled, and Pulse Amplitude (PA) formation is formalized to clarify the association between antenna-pattern traversal and amplitude texture. From this physical perspective, seven complementary features are extracted across time, frequency, and graph structure: Coefficient of Variation (CV), Total Variation (TV), Gaussian Fitting Degree (GFD), Relative Width (RW), Spectral Flatness Measure (SFM), Global Clustering Coefficient (GCC) on a Horizontal Visibility Graph (HVG), and Normalized Degree Entropy (NDE). HVG construction preserves temporal order and reveals global structure induced by sequence shape. Features are computed per frame and concatenated into a seven-dimensional vector. The RF classifier is trained using bootstrap sampling and random-subspace splits, and inference is performed by majority voting over leaf-level posteriors. The full pipeline is summarized in Fig. 10. Computational complexity remains near linear: CV, TV, and RW scale as O(N); SFM is dominated by a single fast Fourier transform with O(Nlog₂N); and HVG-based features scale as O(Nlog₂N), satisfying low-latency constraints. Results and Discussions The dataset is constructed using paired MST and EST frames with time-of-arrival jitter of approximately 0.2% of the pulse repetition interval, additive white Gaussian noise across SNR levels, and realistic beam patterns that include sidelobes for both scanning schemes. Training spans 0

\begin{document}$ \sim $\end{document}

30 dB, and testing covers –5.5

\begin{document}$ \sim $\end{document}

29.5 dB in 5 dB steps. Using the proposed seven-feature vector, the RF classifier achieves an average accuracy of 97.59% across all SNRs and exceeds a support vector machine baseline with identical inputs at 96.01%. The largest margins are observed at low to mid SNR, as shown in Fig. 11. Single-feature analysis shows clear heterogeneity and complementarity. SFM provides the best single-feature performance at 0.916 1, followed by TV and NDE at 0.822 0 and 0.806 5, respectively. CV and GFD show intermediate performance at approximately 0.66, whereas RW and graph-based similarity measures are lower at approximately 0.56

\begin{document}$ \sim $\end{document}

0.57. Joint multi-feature inputs increase accuracy to 0.975 9, yielding an absolute gain of 5.98 percentage points over the best single feature and reducing the error rate from 8.39% to 2.41%, corresponding to a relative reduction of approximately 71%. These improvements are summarized in Table 1 and Fig. 12. Runtime evaluation indicates that the dominant computational cost arises from the fast Fourier transform and HVG construction. A per-frame computation time of approximately 0.515 ms keeps the method suitable for on-orbit and embedded processing. The performance gains arise from the joint capture of four factors: smooth versus stepwise amplitude evolution represented by CV and TV; main-lobe morphology and time scale represented by GFD and RW, as illustrated in Fig. 4; spectral concentration versus dispersion represented by SFM, as illustrated in Fig. 5; and topology induced by alternating highs and lows under dwell switching represented by HVG clustering and entropy, as detailed in Fig. 6. Together, these factors stabilize the decision boundary against noise and dwell nonstationarity. Conclusions A physics-grounded, multi-domain feature framework combined with an RF discriminator is presented for radar scanning mode recognition under noncooperative conditions. The method is derived from intrinsic contrasts between MST, characterized by continuous, smooth, and quasi-periodic behavior, and phased-array EST, characterized by dwell-based, jumping, and nonstationary behavior. A TOA-PA signal model consistent with engineering practice is constructed, and complementary features are designed across time (CV, TV), main-lobe morphology (GFD, RW), frequency (SFM), and graph structure (GCC, NDE). The RF classifier applies bootstrap sampling and random subspaces to reduce variance and mitigate overfitting, enabling robust decisions. Across detection scenarios from –5 dB to 30 dB, an average accuracy of 97.59% is obtained. Compared with schemes based on single-domain features or a limited feature set, the proposed framework provides higher recognition stability under low-SNR and other challenging disturbance conditions.

Multiscale Fractional Information Potential Field and Dynamic Gradient-Guided Energy Modeling for SAR and Multispectral Image Fusion

SONG Jiawen, WANG Qingsong

2025, 47(12): 4874-4886. doi: 10.11999/JEIT250976

[Abstract](109) [FullText HTML](68) [PDF 24023KB](5)

Abstract:
Objective In remote sensing, fusion of Synthetic Aperture Radar (SAR) and MultiSpectral (MS) images is essential for comprehensive Earth observation. SAR sensors provide all-weather imaging capability and capture dielectric and geometric surface characteristics, although they are inherently affected by multiplicative speckle noise. In contrast, MS sensors provide rich spectral information that supports visual interpretation, although their performance is constrained by atmospheric conditions. The objective of SAR-MS image fusion is to integrate the structural details and scattering characteristics of SAR imagery with the spectral content of MS imagery, thereby improving performance in applications such as land-cover classification and target detection. However, existing fusion approaches, ranging from component substitution and multiscale transformation to Deep Learning (DL), face persistent limitations. Many methods fail to achieve an effective balance between noise suppression and texture preservation, which leads to spectral distortion or residual speckle, particularly in highly heterogeneous regions. DL-based methods, although effective in specific scenarios, exhibit strong dependence on training data and limited generalization across sensors. To address these issues, a robust unsupervised fusion framework is developed that explicitly models modality-specific noise characteristics and structural differences. Fractional calculus and dynamic energy modeling are combined to improve structural preservation and spectral fidelity without relying on large-scale training datasets. Methods The proposed framework adopts a multistage fusion strategy based on Relative Total Variation filtering for image decomposition and consists of four core components. First, a MultiScale Fractional Information Potential Field (MS-FIPF) method (Fig. 2) is proposed to extract robust detail layers. A fractional-order kernel is constructed in the Fourier domain to achieve nonlinear frequency weighting, and a local entropy-driven adaptive scale mechanism is applied to enhance edge information while suppressing noise. Second, to address the different noise distributions observed in SAR and MS detail layers, a Bayesian adaptive fusion model based on the minimum mean square error criterion is constructed. A dynamic regularization term is incorporated to adaptively balance structural preservation and noise suppression. Third, for base layers containing low-frequency geometric information, a Dynamic Gradient-Guided Multiresolution Local Energy (DGMLE) method (Fig. 3) is proposed. This method constructs a global entropy-driven multiresolution pyramid and applies a gradient-variance-controlled enhancement factor combined with adaptive Gaussian smoothing to emphasize significant geometric structures. Finally, a Scattering Intensity Adaptive Modulation (SIAM) strategy is applied through a nonlinear mapping regulated by joint entropy and root mean square error, enabling adaptive adjustment of SAR scattering contributions to maintain visual and spectral consistency. Results and Discussions The proposed framework is evaluated on the WHU, YYX, and HQ datasets, which represent different spatial resolutions and scene complexities, and is compared with seven state-of-the-art fusion methods. Qualitative comparisons (Figs. 5

\begin{document}$ \sim $\end{document}

7) show that several existing approaches, including hybrid multiscale decomposition and image fusion convolutional neural networks, exhibit limited noise modeling capability. This limitation results in spectral distortion and detail blurring caused by SAR speckle interference. Methods based on infrared feature extraction and visual information preservation also show image whitening and contrast degradation due to excessive scattering feature injection. In contrast, the proposed method effectively filters redundant SAR noise through multiscale fractional denoising and adaptive scattering modulation, while preserving MS spectral consistency and salient SAR geometric structures. Improved visual clarity and detail preservation are observed, exceeding the performance of competitive approaches such as visual saliency feature fusion, which still presents residual noise. Quantitative results (Tables 1

\begin{document}$ \sim $\end{document}

3) demonstrate consistent superiority across six evaluation metrics. On the WHU dataset, optimal ERGAS (3.737 0) and PSNR (24.798 3 dB) values are achieved. Performance improvements are more evident on the high-resolution YYX dataset and the structurally complex HQ dataset, where the proposed method ranks first for all indices. The mutual information on the YYX dataset reaches 3.353 5, which is nearly twice that of the second-ranked method, confirming strong multimodal information preservation. On average, the proposed framework achieves a performance improvement of 29.11% compared with the second-best baseline. Mechanism validation and efficiency analysis (Tables 4, 5) further support these results. Ablation experiments demonstrate that SIAM plays a critical role in maintaining the balance between spectral information and scattering characteristics, whereas DGMLE contributes substantially to structural fidelity. With an average runtime of 1.303 3 s, the proposed method achieves an effective trade-off between computational efficiency and fusion quality and remains significantly faster than complex transform-domain approaches such as multiscale non-subsampled shearlet transform combined with parallel convolutional neural networks. Conclusions A robust and unsupervised framework for SAR and MS image fusion is proposed. By integrating MS-FIPF-based fractional-order saliency extraction with DGMLE-based gradient-guided energy modeling, the proposed method addresses the long-standing trade-off between noise suppression and detail preservation. Bayesian adaptive fusion and scattering intensity modulation further improve robustness to modality differences. Experimental results confirm that the proposed framework outperforms seven representative fusion algorithms, achieving an average improvement of 29.11% across comprehensive evaluation metrics. Significant gains are observed in noise suppression, structural fidelity, and spectral preservation, demonstrating strong potential for multisource remote sensing data processing.

Convolutional Mixed Multi-Attention Encoder-Decoder Network for Radar Signal Sorting

CHANG Huaizhao, GU Yingyan, HAN Yunzhi, JIN Benzhou

2025, 47(12): 4887-4895. doi: 10.11999/JEIT251031

[Abstract](142) [FullText HTML](56) [PDF 1679KB](21)

Abstract:
Objective Radar signal sorting is a fundamental technology for electromagnetic environment awareness and electronic warfare systems. The objective of this study is to develop an effective radar signal sorting method that accurately separates intercepted pulse sequences and assigns them to different radiation sources in complex electromagnetic environments. With the increasing complexity of modern radar systems, intercepted pulse sequences are severely affected by pulse overlap, pulse loss, false pulses, and pulse arrival time measurement errors, which substantially reduce the performance of conventional sorting approaches. Therefore, a robust signal sorting framework that maintains high accuracy under non-ideal conditions is required. Methods Radar signal sorting in complex electromagnetic environments is formulated as a pulse-level time-series semantic segmentation problem, where each pulse is treated as the minimum processing unit and classified in an end-to-end manner. Under this formulation, sorting is achieved through unified sequence modeling and label prediction without explicit pulse subsequence extraction or iterative stripping procedures, which reduces error accumulation. To address this task, a convolutional mixed multi-attention encoder-decoder network is proposed (Fig. 1). The network consists of an encoder-decoder backbone, a local attention module, and a feature selection module. The encoder-decoder backbone adopts a symmetric structure with progressive downsampling and upsampling to aggregate contextual information while restoring pulse-level temporal resolution. Its core component is a dual-branch dilated bottleneck module (Fig. 2), in which a 1

\begin{document}$* $\end{document}

1 temporal convolution is applied for channel projection. Two parallel dilated convolution branches with different dilation rates are then employed to construct multi-scale receptive fields, which enable simultaneous modeling of short-term local variations and long-term modulation patterns across multiple pulses and ensure robust temporal representation under pulse time shifts and missing pulses. To enhance long-range dependency modeling beyond convolutional operations, a local Transformer module is inserted between the encoder and the decoder. By applying local self-attention to temporally downsampled feature maps, temporal dependencies among pulses are captured with reduced computational complexity, whereas the influence of false and missing pulses is suppressed during feature aggregation. In addition, a feature selection module is integrated into skip connections to reduce feature redundancy and interference (Fig. 3). Through hybrid attention across temporal and channel dimensions, multi-level features are adaptively filtered and fused to emphasize discriminative information for radiation source identification. During training, focal loss is applied to alleviate class imbalance and improve the discrimination of difficult and boundary pulses. Results and Discussions Experimental results demonstrate that the proposed network achieves pulse-level fine-grained classification for radar signal sorting and outperforms mainstream baseline methods across various complex scenarios. Compared with existing approaches, an average sorting accuracy improvement of more than 6% is obtained under moderate interference conditions. In MultiFunctional Radar (MFR) overlapping scenarios, recall rates of 88.30%, 85.48%, 86.89%, and 86.48% are achieved for four different MFRs, respectively, with an overall average accuracy of 86.82%. For different pulse repetition interval modulation types, recall rates exceed 90% for fixed patterns and remain above 85% for jittered, staggered, and group-varying modes. In staggered and group-varying cases, performance improvements exceeding 3.5% relative to baseline methods are observed. Generalization experiments indicate that high accuracy is maintained under parameter distribution shifts of 5% and 15%, which demonstrates strong robustness to distribution perturbations (Fig. 8). Ablation studies confirm the effectiveness of each proposed module in improving overall performance (Table 7). Conclusions A convolutional mixed multi-attention encoder-decoder network is proposed for radar signal sorting in complex electromagnetic environments. By modeling radar signal sorting as a pulse-level time-series semantic segmentation task and integrating multi-scale dilated convolutions, local attention modeling, and adaptive feature selection, high sorting accuracy, robustness, and generalization capability are achieved under severe interference conditions. The experimental results indicate that the proposed approach provides an effective and practical solution for radar signal sorting in complex electromagnetic environments.

ISAR Sequence Motion Modeling and Fuzzy Attitude Classification Method for Small Sample Space Target

YE Juhang, DUAN Jia, ZHANG Lei

2025, 47(12): 4896-4904. doi: 10.11999/JEIT250689

[Abstract](156) [FullText HTML](71) [PDF 2993KB](21)

Abstract:
Objective Space activities continue to expand, and Space Situational Awareness (SSA) is required to support collision avoidance and national security. A core task is attitude classification of space targets to interpret states and predict possible behavior. Current classification strategies mainly depend on Ground-Based Inverse Synthetic Aperture Radar (GBISAR). Model-driven methods require accurate prior modeling and have high computational cost, whereas data-driven methods such as deep learning require large annotated datasets, which are difficult to obtain for space targets and therefore perform poorly in small-sample conditions. To address this limitation, a Fuzzy Attitude Classification (FAC) method is proposed that integrates temporal motion modeling with fuzzy set theory. The method is designed as a training-free, real-time classifier for rapid deployment under data-constrained scenarios. Methods The method establishes a mapping between Three-dimensional (3D) attitude dynamics and Two-dimensional (2D) ISAR features through a framework combining the Horizon Coordinate System (HCS), the UNW orbital system, and the Body-Fixed Reference Frame (BFRF). Attitude evolution is represented as Euler rotations of the BFRF relative to the UNW system. The periodic 3D rotation is projected onto the 2D Range-Doppler plane as circular keypoint trajectories. Fourier series analysis is then applied to convert the motion into One-Dimensional (1D) cosine features, where phase represents angular velocity and amplitude reflects motion magnitude. A 10-point annotation model is employed to describe targets, and dimensionless roll, pitch, and yaw feature vectors are constructed. For classification, magnitude- and angle-based decision rules are defined and processed using a softmax membership function, which incorporates feature variance to compute fuzzy membership degrees. The algorithm operates directly on keypoint sequences, requires no training, and maintains linear computational complexity O(n), enabling real-time execution. Results and Discussions The FAC method is evaluated using a Ku-band GBISAR simulated dataset of a spinning target. The dataset contains 36 sequences, each composed of 36 frames with a resolution of 512×512 pixels and is partitioned into a reference set and a testing set. Although raw keypoint trajectories appear disordered (Fig. 4(a)), the engineered features form clear clusters (Fig. 4(b)), and the variance of the defined criteria reflects motion significance (Fig. 4(c)). Robustness is confirmed: across nine imaging angles, classification consistency remains 100% within a 0.0015 tolerance (Fig. 5(a)). Under noise conditions, consistency is maintained from 10 dB to 1 dB signal-to-noise ratio (Fig. 5(b)). When frames are removed, 90% consistency is retained at a 0.03 threshold, and six frames are identified as the minimum number required for effective classification (Fig. 5(c)). Benchmark comparisons indicate that FAC outperforms Hidden Markov Models (HMM) and Convolutional Neural Networks (CNN), preserving accuracy under noise (Fig. 6(a)), sustaining stability under frame loss where HMM degrade to random behavior (Fig. 6(b)), and achieving significantly lower processing time than both benchmarks (Fig. 6(c)). Conclusions An FAC method that integrates motion modeling with fuzzy reasoning is presented for small-sample space target recognition. By mapping multi-coordinate kinematics into interpretable cosine features, the method reduces dependence on prior models and large datasets while achieving training-free, linear-time processing. Simulation tests confirm robustness across observation angles, Signal-to-Noise Ratios (SNR), and frame availability. Benchmark comparisons demonstrate higher accuracy, stability, and computational efficiency relative to HMM and CNN. The FAC method provides a feasible solution for real-time attitude classification in data-constrained scenarios. Future work will extend the approach to multi-axis tumbling and validation using measured data, with potential integration of multimodal observations to improve adaptability.

Joint Suppression of Range-Ambiguous Clutter and Mainlobe Deceptive Jammer with Subarray FDA-MIMO Radar

ZHANG Mengdi, LU Jiahao, XU Jingwei, LI Shiyin, WANG Ning, LIU Zhixin

2025, 47(12): 4905-4916. doi: 10.11999/JEIT251116

[Abstract](117) [FullText HTML](60) [PDF 3923KB](14)

Abstract:
Objective In the downward-looking mode, airborne radar systems face the dual challenge of mitigating strong clutter and mainlobe deceptive jammers in increasingly complex electromagnetic environments. Clutter exhibiting both range ambiguity and range dependence constrains of Moving Target Detection (MTD) in high Pulse Repetition Frequency (PRF) radars with non-side-looking configurations. Mainlobe deceptive jammers further increase the difficulty of detecting the true target. By exploiting controllable range Degrees Of Freedom (DOFs), Waveform Diverse Array (WDA) radars, such as Frequency Diverse Array Multiple-Input Multiple-Output (FDA-MIMO) radar and Element Pulse Coding Multiple-Input Multiple-Output (EPC-MIMO) radar, show clear advantages in suppressing mainlobe deceptive jammers. However, existing WDA-based techniques are limited to suppressing false targets whose delays exceed one Pulse Repetition Interval (PRI) relative to the true target, referred to as cross-pulse repeater jammers. With advances in Digital Radio Frequency Memory (DRFM) technology, the delay of false targets is reduced, enabling the generation of false targets that share the same number of delayed pulses as the true target, referred to as intra-PRI rapid repeater jammers. Furthermore, most anti-jamming methods are developed under Gaussian white noise assumptions and do not consider practical clutter environments. Therefore, a joint suppression framework is required to simultaneously handle range-ambiguous clutter and multiple types of mainlobe deceptive jammer. Methods A joint suppression framework based on a subarray FDA-MIMO radar is proposed for scenarios with coexisting range-ambiguous clutter, cross-pulse repeater jammers, and intra-PRI rapid repeater jammers. Compared with conventional FDA-MIMO radar, the subarray FDA-MIMO configuration employs small frequency increments within transmit subarrays and large frequency increments across subarrays, which provides two-level range DOFs at the intra-subarray and inter-subarray scales. First, a Range-Dependent Compensation (RDC) technique is applied to separate the true target from echoes contaminated by clutter and jammers in the joint intra-subarray and inter-subarray transmit spatial frequency domain. Next, a pre-Space-Time Adaptive Processing (STAP) filter is designed by exploiting range DOFs in the intra-subarray transmit dimension to suppress range-ambiguous clutter and cross-pulse repeater jammers. Finally, subspace projection-based three-Dimensional (3-D) STAP is applied to suppress local clutter and intra-PRI rapid repeater jammers. Results and Discussions After RDC, the true target is effectively separated from ambiguous clutter and jammers in the joint intra-subarray and inter-subarray transmit spatial frequency domain (Fig. 3). By exploiting range DOFs in the intra-subarray transmit dimension, the pre-STAP filter achieves effective suppression of range-ambiguous clutter and cross-pulse repeater jammers (Fig. 4). Local clutter in the inter-subarray transmit spatial frequency domain is suppressed by using clutter distribution characteristics in the receive-doppler domain combined with subspace projection (Fig. 5). This enables accurate estimation of the Jammer Covariance Matrix (JCM) for intra-PRI-inner-bin rapid repeater jammers. Subsequently, 3-D STAP suppresses local clutter and intra-PRI-inner-bin rapid repeater jammers (Fig. 6, Fig. 7). Comparative simulation results show that the proposed framework achieves significantly improved suppression performance under the considered complex scenario (Fig. 8). Conclusions The problem of MTD in scenarios with simultaneous range-ambiguous clutter, cross-pulse repeater jammers, and intra-PRI-inner-bin rapid repeater jammers is addressed. A joint suppression framework based on subarray FDA-MIMO radar is proposed, in which small frequency increments are used within transmit subarrays and large increments across subarrays to enable flexible utilization of range DOFs. RDC achieves effective separation of the target from ambiguous clutter and jammers in the joint transmit spatial frequency domain. By exploiting intra-subarray range DOFs, a pre-STAP filter suppresses range-ambiguous clutter and cross-pulse repeater jammers. To mitigate the Inner-Bin Range Dependence (IRD) effect of clutter, a subspace projection method is developed to recover the JCM for intra-PRI-inner-bin rapid repeater jammers from clutter-contaminated data. Finally, 3-D STAP in the inter-subarray transmit-receive-Doppler domain suppresses local clutter and intra-PRI-inner-bin rapid repeater jammers. Numerical simulations verify the effectiveness of the proposed joint suppression framework.

Single-Channel High-Precision Sparse DOA Estimation of GNSS Signals for Deception Suppression

KANG Weiquan, LU Zukun, LI Baiyu, SONG Jie, XIAO Wei

2025, 47(12): 4917-4925. doi: 10.11999/JEIT250725

[Abstract](129) [FullText HTML](55) [PDF 2744KB](16)

Abstract:
Objective Spoofing attacks present a major threat to the reliability and security of Global Navigation Satellite System (GNSS) receivers used in civilian and military navigation. Conventional anti-spoofing approaches based on multi-antenna arrays require substantial hardware resources and show reduced estimation accuracy under low Signal-to-Noise Ratio (SNR) conditions, which limits their suitability for constrained or adverse environments. This study proposes a single-channel high-precision sparse Direction-of-Arrival (DOA) estimation method designed to suppress spoofing signals in GNSS receivers. The aim is to reduce hardware complexity and achieve accurate DOA estimation in very low SNR conditions. By using the spatial sparsity of GNSS signals and integrating advanced signal-processing techniques, the method provides a cost-efficient and robust approach for strengthening GNSS resilience against deceptive interference. Methods The proposed method uses a single-channel processing framework to estimate the DOA of GNSS signals with high precision through a multi-step strategy designed for spoofing suppression. The process begins by reconstructing the digital intermediate-frequency signal using tracking-loop parameters such as code phase and carrier Doppler obtained from a reference array element. This reconstruction uses the orthogonality of pseudo-random noise codes in GNSS signals and enables correlation between the reconstructed signal and the original array data to improve the SNR before despreading. This step isolates a clean steering vector and reduces noise and interference. The method then uses the spatial sparsity of GNSS signals, which results from the limited number of authentic satellites and potential spoofing sources in the angular domain. An overcomplete dictionary is formed from steering vectors corresponding to a grid of candidate azimuth and elevation angles. The DOA estimation is expressed as a sparse reconstruction problem in which the steering vector is represented as a sparse combination of dictionary elements. To solve this efficiently, the Alternating Direction Method of Multipliers (ADMM) is used to iteratively optimize a regularized objective that balances data fidelity and sparsity. A two-stage grid-refinement process, beginning with a coarse search followed by a finer resolution, reduces computational cost while preserving accuracy. After DOA estimates are obtained, spoofing signals are identified based on their angular proximity to authentic signals, and a Linearly Constrained Minimum Variance (LCMV) beamformer is applied to suppress these interferers while retaining legitimate signals. Results and Discussions Simulations were performed to evaluate the proposed method under a range of low SNR conditions using a 4×4 uniform planar array and Beidou B3I signals as the test case. The results show that the single-channel sparse DOA estimation method provides markedly higher accuracy and resolution than Unitary ESPRIT and Cyclic MUSIC. When the SNR is –35 dB, the method achieves Root Mean Square Errors (RMSE) for azimuth and elevation estimates below 1 degree (Fig. 2), whereas the benchmark methods yield errors greater than 30 degrees. The method also resolves signals with angular separations as small as 1 degree (Fig. 4(a), Fig. 4(b)), demonstrating its strong resolution capability. Using the accurate DOA estimates derived from the proposed method, LCMV beamforming then suppresses spoofing signals effectively. As shown in Fig. 5(b), the high-fidelity DOA estimates enable the beamformer to place deep nulls at the spoofing directions (for example, (10°, 250°) and (20°, 250°)) and to attenuate spoofers while retaining authentic signals. In contrast, the reduced DOA accuracy of Cyclic MUSIC (Fig. 5(a)) leads to misaligned nulls and weaker suppression. These results confirm the practical value of accurate DOA estimation for robust spoofing mitigation. Conclusions This study presents a single-channel high-precision sparse DOA estimation method for GNSS spoofing suppression, addressing the limitations of conventional multi-antenna techniques related to hardware complexity and reduced performance under low-SNR conditions. By combining signal reconstruction, sparse modeling, and ADMM-based optimization, the method provides accurate and high-resolution DOA estimation in challenging environments, with simulations showing RMSE below 1 degree at –35 dB SNR. When used with LCMV beamforming, it suppresses spoofing signals effectively and improves GNSS reliability while requiring minimal hardware resources. This cost-efficient approach is well suited to applications with limited system capacity, as it reduces reliance on complex array configurations and maintains strong security performance. Future work may examine its performance in dynamic settings such as moving spoofers or multipath conditions, as well as its integration with other anti-spoofing strategies. This research offers a practical and high-performance framework for strengthening GNSS systems and has clear value for navigation safety and operational stability.

Robust Adaptive Beamforming Algorithm Based on Dominant Eigenvector Extraction and Orthogonal Projection

LIU Yiyuan, ZHANG Xiaokai, XU Yuhua, ZHENG Xueqiang, YANG Weiwei

2025, 47(12): 4926-4936. doi: 10.11999/JEIT251282

[Abstract](74) [FullText HTML](41) [PDF 2950KB](16)

Abstract:
Objective In practical applications, the spatial anti-jamming performance of adaptive beamformers is often degraded by mismatches in the Directions Of Arrival (DOAs) of signals. Some robust adaptive beamforming algorithms reduce the error between the estimated signal steering vector and the actual steering vector by solving a Quadratic Constrained Quadratic Programming (QCQP) problem. This strategy significantly increases hardware cost. In addition, traditional adaptive beamforming algorithms often exhibit beampattern distortion under non-ideal conditions, such as DOA mismatch. The objective of this paper is to design a robust adaptive beamformer that effectively suppresses jamming signals under different mismatch scenarios. Methods A robust adaptive beamforming algorithm for spatial anti-jamming is proposed. First, the actual output Signal-to-Jamming-plus-Noise Ratio (SJNR) in the presence of DOA mismatch is analyzed. An ideal beamformer based on orthogonal projection is then proposed to achieve accurate beampattern control and maximize the practical output SJNR. To improve anti-jamming robustness in mismatch environments, the signal steering vector is estimated through covariance matrix construction and dominant eigenvector extraction. The beamforming weight vector is obtained by constructing an orthogonal projection matrix. Results and Discussions The proposed adaptive beamforming algorithm effectively suppresses jamming signals in mismatch environments. Numerical results show that the algorithm achieves good spatial anti-jamming performance in an ideal scenario without mismatch (Fig. 3) and in a scenario with steering vector mismatch (Fig. 4). In DOA mismatch scenarios, the proposed algorithm demonstrates superior beampattern performance (Fig. 5, Fig. 6) and output SJNR performance (Fig. 7, Fig. 8, Fig. 9). The results also indicate stronger robustness to DOA mismatch (Fig. 10, Fig. 11). Effective jamming suppression is maintained even when the incoming directions of the jamming signals are closely spaced (Fig. 12). Conclusions This paper proposes a robust adaptive beamforming algorithm for suppressing power suppressive jamming signals. An ideal beamformer is first developed to achieve precise beampattern control and maximize the actual output SJNR. A robust adaptive beamforming algorithm is then constructed through covariance matrix construction, dominant eigenvector extraction, and orthogonal projection. Numerical results show that the proposed algorithm provides strong spatial anti-jamming performance in ideal scenarios without mismatch and in scenarios with DOA mismatch or steering vector mismatch.

LightMamba: A Lightweight Mamba Network for the Joint Classification of HSI and LiDAR Data

LIAO Diling, LAI Tao, HUANG Haifeng, WANG Qingsong

2025, 47(12): 4937-4947. doi: 10.11999/JEIT250981

[Abstract](125) [FullText HTML](55) [PDF 4786KB](10)

Abstract:
Objective The joint classification of HyperSpectral Imagery (HSI) and Light Detection And Ranging (LiDAR) data is a critical task in remote sensing, where complementary spectral and spatial information is exploited to improve land cover recognition accuracy. However, mainstream deep learning approaches, particularly those based on Convolutional Neural Networks (CNNs) and Transformers, are constrained by high computational cost and limited efficiency in modeling long-range dependencies. CNN-based methods are effective for local feature extraction but suffer from limited receptive fields and increased parameter counts when scaled. Transformer architectures provide global context modeling but incur quadratic computational complexity due to self-attention mechanisms, which leads to prohibitive costs when processing high-dimensional remote sensing data. To address these limitations, a lightweight network architecture named LightMamba is proposed. The model leverages an advanced State Space Model (SSM) to achieve efficient and accurate joint classification of HSI and LiDAR data. The objective is to maintain linear computational complexity while effectively fusing multi-source features and capturing global contextual relationships, thereby supporting resource-constrained applications without accuracy degradation. Methods The proposed LightMamba framework consists of three core components. First, a MultiSource Alignment Module (MSAM) is designed to address heterogeneity between HSI and LiDAR data. A dual-branch network with shared weights projects both modalities into a unified feature space, which ensures consistent spatial-spectral representation. This shared-weight strategy reduces the parameter count and strengthens inter-modal correlation through the learning of common foundational features. Second, the Multi-Source Lightweight Mamba Module (MSLMM) forms the core of the framework. Aligned HSI and LiDAR feature sequences are processed using a parameter-efficient Mamba architecture. A hybrid parameter-sharing strategy is adopted by combining shared matrices with modality-specific parameters, which preserves discriminative capability while reducing redundancy. LiDAR elevation information is used as a positional guide to enhance spatial awareness during feature fusion. The selective scanning mechanism of the SSM enables efficient modeling of long-range dependencies with linear complexity, thereby avoiding the quadratic cost associated with Transformers. Spectral bands are processed sequentially to preserve joint spectral spatial characteristics. Finally, a MultiLayer Perceptron (MLP)-based classifier maps fused high-level features to class probabilities with low computational overhead. The model is trained end to end using cross-entropy loss. Evaluations are conducted on two public benchmarks, namely the Houston and Augsburg datasets. Comparisons are performed against representative methods, including CoupledCNN, GAMF, HCT, MFT, Cross-HL, and S2CrossMamba, using Overall Accuracy (OA), Average Accuracy (AA), and the Kappa coefficient. Ablation experiments analyze the contribution of each module, and parameter count and FLoating-Point Operations (FLOPs) are reported. Results and Discussions Experimental results demonstrate that LightMamba achieves superior performance and efficiency. On the Houston dataset, an OA of 94.30%, an AA of 95.25%, and a Kappa coefficient of 93.83% are obtained, which exceed those of all comparison methods. Perfect classification accuracy is achieved for several classes, including Soil and Water. Classification maps exhibit improved spatial continuity and internal consistency, with reduced speckle noise, particularly in heterogeneous regions such as commercial areas. On the Augsburg dataset, LightMamba achieves the highest OA of 87.41% and a Kappa coefficient of 82.30%, which confirms strong generalization across different scenes. Although the AA is slightly lower than that of S2CrossMamba, the higher OA and Kappa values indicate better overall performance. Complexity analysis shows that LightMamba attains high accuracy with a lightweight structure containing only 69.93 k parameters, which is substantially fewer than GAMF and comparable to S2CrossMamba, while maintaining moderate FLOPs. Experiments on input patch size indicate adaptability to scene characteristics, with optimal performance observed at 17×17 for the Houston dataset and 9×9 for the Augsburg dataset. Conclusions A lightweight network architecture, LightMamba, is presented for joint HSI and LiDAR classification. By combining a shared-weight MSAM with a lightweight Mamba module that adopts hybrid parameterization and elevation-guided fusion, modal heterogeneity is effectively addressed and long-range contextual dependencies are captured with linear computational complexity. Experimental results on public benchmarks demonstrate state-of-the-art classification accuracy with a reduced parameter count and computational cost compared with existing methods. These findings confirm the potential of Mamba-based architectures for efficient multi-source remote sensing data fusion. Future research will explore optimized two-dimensional scanning mechanisms and adaptive scanning strategies to further improve feature capture efficiency and classification performance. The LightMamba code is available at https://www.scidb.cn/detail?dataSetId=064dc4ac5350418e87a8b82dd324737b&version=V1&code=j00173.

NAS4CIM: Tailored Neural Network Architecture Search for RRAM-Based Compute-in-Memory Chips

LI Yuankun, WANG Ze, ZHANG Qingtian, GAO Bin, WU Huaqiang

2025, 47(12): 4948-4958. doi: 10.11999/JEIT250978

[Abstract](87) [FullText HTML](41) [PDF 2528KB](13)

Abstract:
Objective With the growing demand for on-orbit information processing in satellite missions, efficient deployment of neural networks under strict power and latency constraints remains a major challenge. Resistive Random Access Memory (RRAM)-based Compute-in-Memory (CIM) architectures provide a promising solution for low power consumption and high throughput at the edge. To bridge the gap between conventional neural architectures and CIM hardware, this paper proposes NAS4CIM, a Neural Architecture Search (NAS) framework tailored for RRAM-based CIM chips. The framework proposes a decoupled distillation-enhanced training strategy and a Top-k-based operator selection method, enabling balanced optimization of task accuracy and hardware efficiency. This study presents a practical approach for algorithm-architecture co-optimization in CIM systems with potential application in satellite edge intelligence. Methods NAS4CIM is designed as a multi-stage architecture search framework that explicitly considers task performance and CIM hardware characteristics. The search process consists of three stages: task-driven operator evaluation, hardware-driven operator evaluation, and final architecture selection with retraining. In the task-driven stage, NAS4CIM employs the Decoupled Distillation-Enhanced Gradient-based Significance Coefficient Supernet Training (DDE-GSCST) method. Rather than jointly training all candidate operators in a fully coupled supernet, DDE-GSCST applies a semi-decoupled training strategy across different network stages. A high-accuracy teacher network is used to guide training. For each stage, the teacher network provides stable feature representations, whereas the remaining stages remain fixed, which reduces interference among candidate operators. Knowledge distillation is critical under CIM constraints. RRAM-based CIM systems typically rely on low-bit quantization and are affected by device-level noise, under which conventional weight-sharing NAS methods show unstable convergence. Feature distillation from a strong teacher network ensures clear optimization signals for candidate operators and supports reliable convergence. After training, each operator is assigned a task significance coefficient that quantitatively reflects its contribution to task accuracy. Following the task-driven stage, a hardware-driven search stage is performed. Candidate network structures are constructed by combining operators according to task significance rankings and are evaluated using an RRAM-based CIM hardware simulator. System-level hardware metrics, including inference latency and energy consumption, are measured. Complete network structures are evaluated directly, capturing realistic effects such as array partitioning, inter-array communication, and Analog-to-Digital Converter (ADC) overhead. From hardware-efficient networks with superior performance, the selection frequency of each operator is analyzed. Operators that appear more frequently in low-latency and low-energy designs are assigned higher hardware significance coefficients. This data-driven evaluation avoids inaccurate operator-level hardware modeling and reflects system-level behavior. In the final stage, task significance and hardware significance matrices are integrated. By adjusting weighting factors, the framework prioritizes accuracy, efficiency, or a balanced trade-off. Based on the combined evaluation, an optimal operator set is selected to construct the final network architecture, which is then retrained from scratch to refine weights and further improve accuracy while maintaining high hardware efficiency on CIM platforms. Results and Discussions NAS4CIM is evaluated on FashionMNIST, CIFAR-10, and ImageNet to demonstrate effectiveness across tasks of different scales. On FashionMNIST, the framework achieves 90.1% Top-1 accuracy in the accuracy-oriented search and an Energy-Delay Product (EDP) of 0.16 in the efficiency-oriented search (Table 4). Real-chip experiments on fabricated RRAM macros show close agreement between measured accuracy and simulation results, confirming practical feasibility. On CIFAR-10, NAS4CIM reaches 90.5% Top-1 accuracy in the accuracy-oriented mode and an EDP of 0.16 in the efficiency-oriented mode, exceeding state-of-the-art methods under the same hardware configuration. Under a balanced accuracy-efficiency setting, the framework produces a network with 89.3% accuracy and an EDP of 0.97 (Table 3). On ImageNet, which represents a large-scale and more complex classification task, NAS4CIM achieves 70.0% Top-1 accuracy in the accuracy-oriented mode, whereas the efficiency-oriented search yields an EDP of 504.74 (Table 5). These results indicate effective scalability from simple to complex datasets while maintaining a favorable balance between accuracy and energy efficiency across optimization settings. Conclusions This study proposes NAS4CIM, a NAS framework for RRAM-based CIM chips. Through a decoupled distillation-enhanced training method and a Top-k-based operator selection strategy, the framework addresses instability in random sampling approaches and inaccuracies in operator-level performance modeling. NAS4CIM provides a unified strategy to balance task accuracy and hardware efficiency and demonstrates generality across tasks of different complexity. Simulation and real-chip experiments confirm stable performance and consistency between algorithmic and hardware evaluations. NAS4CIM presents a practical pathway for algorithm–hardware co-optimization in CIM systems and supports energy-efficient, real-time information processing for satellite edge intelligence.

Automating Algorithm Design for Agile Satellite Task Assignment with Large Language Models and Reinforcement Learning

CHEN Yingguo, WANG Feiran, HU Yunpeng, YANG Bin, YAN Bing

2025, 47(12): 4959-4972. doi: 10.11999/JEIT250991

[Abstract](105) [FullText HTML](58) [PDF 3882KB](20)

Abstract:
Objective The Multi-Agile Earth Observation Satellite Mission Scheduling Problem (MAEOSMSP) is an NP-hard problem. Algorithm design for this problem has long been constrained by reliance on expert experience and limited adaptability across diverse scenarios. To address this limitation, an Adaptive Algorithm Design (AAD) framework is proposed. The framework integrates a Large Language Model (LLM) and Reinforcement Learning (RL) to enable automated generation and intelligent application of scheduling algorithms. It is built on a novel offline evolution-online decision-making architecture. The objective is to discover heuristic algorithms that outperform human-designed methods and to provide an efficient and adaptive solution methodology for the MAEOSMSP. Methods The AAD framework adopts a two-stage mechanism. In the offline evolution stage, LLM-driven evolutionary computation is used to automatically generate a diverse and high-quality library of task assignment algorithms, thereby alleviating the limitations of manual design. In the online decision-making stage, an RL agent is trained to dynamically select the most suitable algorithm from the library based on the real-time solving state (e.g., solution improvement and stagnation). This process is formulated as a Markov decision process, which allows the agent to learn a policy that adapts to problem-solving dynamics. Results and Discussions The effectiveness of the AAD framework is evaluated through comprehensive experiments on 15 standard test scenarios. The framework is compared with several state-of-the-art methods, including expert-designed heuristics, an advanced deep learning approach, and ablation variants of the proposed framework. The results show that the dynamic strategies generated by AAD consistently outperform the baselines, with performance improvements of up to 9.8% in complex scenarios. Statistical analysis further indicates that AAD achieves superior solution quality and demonstrates strong robustness across different problem instances. Conclusions A novel AAD framework is presented to automate algorithm design for the MAEOSMSP by decoupling algorithm generation from algorithm application. The combination of LLM-based generation and RL-based decision making is validated empirically. Compared with traditional hyper-heuristics and existing LLM-based methods, the proposed architecture enables both the creation of new algorithms and their dynamic application. The framework provides a new paradigm for solving complex combinatorial optimization problems and shows potential for extension to other domains.

Satellite Test Tasks Autonomous Orchestration Based on Task-Coupling Constraints and Time-Bounded Windows

LI Zhen, YU Zhigang, ZHANG Yang, ZHU Xuetian, XIE Ningyu, YANG Fan

2025, 47(12): 4973-4985. doi: 10.11999/JEIT250878

[Abstract](132) [FullText HTML](55) [PDF 2590KB](18)

Abstract:
Objective In recent years, the scale of on-orbit space assets has continued to expand, satellite constellation deployment has accelerated significantly, and the number of satellite launches has increased rapidly. As a result, the demand for on-orbit testing has grown sharply. However, limited ground station availability and scarce visibility arcs severely constrain testing opportunities, giving rise to an increasingly prominent contradiction between “many satellites and few ground stations” under highly limited visibility resources. Traditional satellite mission planning approaches, which rely primarily on manual pre-scheduling, suffer from long decision cycles, low planning efficiency, and high susceptibility to scheduling errors. These limitations make them inadequate for large-scale, multi-task, and highly coupled testing scenarios. Consequently, there is an urgent need to develop efficient automated on-orbit test mission planning technologies to improve the utilization efficiency of satellite-ground visibility arcs. Methods To address these limitations, this thesis proposes an automated satellite task orchestration framework to ensure effectiveness and reliability in integrated space-ground systems throughout their lifecycle of construction and operation. A task slider model and a time window model are established, and both general and task-specific orchestration constraints are designed to form a unified constraint paradigm for satellite tasks. A non-convex constraint transformation scheme is further proposed. Using satellite-to-ground link testing as a representative application scenario, an automated task orchestration model is constructed to maximize the number of schedulable tasks under stringent visibility arc constraints while improving the efficiency of visibility arc utilization. Results and Discussions Using satellite-to-ground link testing as a representative on-orbit testing scenario, the proposed autonomous orchestration framework is evaluated through simulations with multiple low Earth orbit satellites and limited visibility arcs. The results show that the proposed method schedules testing tasks effectively while strictly satisfying all operational constraints. Compared with traditional heuristic-based algorithms, including genetic algorithms, tabu search, and particle swarm optimization, the proposed approach achieves a significant performance improvement, increasing the total number of scheduled satellite-ground link testing tasks by approximately 1.9 to 2.3 times. The results also indicate that, under highly constrained time windows, the proposed model fully exploits available visibility arcs and avoids resource conflicts, which substantially improves the utilization efficiency of satellite-ground links. Conclusions This paper proposes an autonomous orchestration framework for satellite on-orbit testing tasks under complex coupling constraints and time-bounded visibility windows. By modeling testing subtasks and visibility arcs using task slider and time window abstractions, and by integrating general and task-specific constraints into a unified mixed-integer programming formulation, the proposed method provides an effective solution for large-scale testing task scheduling. Simulation results confirm that the framework outperforms traditional heuristic-based methods in terms of the number of executable testing tasks and visibility arc utilization. The proposed approach provides a practical and extensible scheduling paradigm for future large-scale satellite constellation testing scenarios. Future work will consider additional resource-layer constraints and uncertainty factors to further improve robustness in real-world testing environments.

An Expert Chain Construction and Optimization Method for Satellite Mission Planning

XIA Wei, WEI Hongtu, CHENG Ying, WANG Junting, HU Xiaoxuan

2025, 47(12): 4986-4994. doi: 10.11999/JEIT251018

[Abstract](119) [FullText HTML](67) [PDF 1833KB](15)

Abstract:
Objective Satellite mission planning is a core optimization problem in space resource scheduling. Existing workflows exhibit a semantic gap between business-level natural language requirements and the mathematical models used for planning. In dynamic operational scenarios, model updates, such as constraint modification, parameter recalculation, or task attribute adjustment, rely heavily on human experts. This dependence leads to slow responses, limited adaptability, and high operational costs. To address these limitations, this paper proposes a Large Language Model (LLM)-driven inference framework based on a Chain of Experts (CoE) and a Dynamic Knowledge Enhancement (DKE) mechanism. The framework enables accurate, efficient, and robust modification of satellite mission planning models from natural language instructions. Methods The proposed framework decomposes natural language-driven model modification into a collaborative workflow comprising requirement parsing, task routing, and code generation experts. The requirement parsing expert converts natural language requests into structured modification instructions. The task routing expert assesses task difficulty and dispatches instructions accordingly. The code generation expert produces executable modification scripts for complex, large-scale, or batch operations. To improve accuracy and reduce reliance on manual expert intervention, a DKE mechanism is incorporated. This mechanism adopts a tiered LLM strategy, using a lightweight general model for rapid processing and a stronger reasoning model for complex cases, and constructs a dynamic knowledge base of validated modification cases. Through retrieval-augmented few-shot prompting, historical successful cases are integrated into the reasoning process, enabling continuous self-improvement without model fine-tuning. A sandbox environment performs mathematical consistency checks, including constraint completeness, parameter validity, and solution feasibility, before final acceptance of model updates. Results and Discussions Experiments are conducted on a simulated satellite mission planning dataset comprising 100 heterogeneous satellites and 1,000 point targets with different payload types, resolution requirements, and operational constraints. A test set of 100 natural language modification requests with varying complexity is constructed to represent dynamic real-world adjustment scenarios (Table 1). The proposed CoE with DKE framework is evaluated against three baselines: standard prompting with DeepSeek R1, Chain-of-Thought prompting with DeepSeek R1, and standard prompting with GPT-4o. The proposed method achieves an accuracy of 82% with an average response time of 81.28 s, outperforming all baselines in both correctness and efficiency (Table 2). Accuracy increases by 35 percentage points relative to the best-performing baseline, whereas response time decreases by 53.3% (Table 2). Scalability experiments show that the CoE with DKE framework maintains stable response times across small, medium, and large problem instances, whereas baseline methods exhibit significant delays as problem size increases (Table 3). Ablation studies indicate that DKE substantially reduces reliance on high-cost reasoning models, improves the general model’s ability to resolve complex modifications independently, and increases accuracy without sacrificing efficiency (Table 5). Conclusions This paper presents an LLM-powered reasoning framework that integrates a Chain of Experts workflow with a DKE mechanism to bridge the semantic gap between natural language requirements and formal optimization models in satellite mission planning. Through layered model collaboration, retrieval-augmented prompting, and sandbox-based mathematical verification, the proposed method achieves high accuracy, fast processing, and strong adaptability to dynamic and complex planning scenarios. Experimental results demonstrate its effectiveness in supporting precise model modification and improving operational intelligence. Future work will extend the framework to multimodal inputs and real-world mission environments to further improve robustness and engineering applicability.

A Spatio-Temporal Feature Fusion LSTM Relaxation Measurement Method for LEO Satellites

YANG Mengxin, ZHANG Qingting, ZENG Lingxin, GU Yixiao, ZENG Dan, XIA Bin

2025, 47(12): 4995-5004. doi: 10.11999/JEIT251146

[Abstract](109) [FullText HTML](64) [PDF 2540KB](10)

Abstract:
Objective The high dynamics of Low Earth Orbit (LEO) satellite communication systems cause frequent link measurements. Existing schemes mainly adopt threshold-based or standard spatio-temporal prediction-based relaxation measurement strategies to mitigate this issue. However, these approaches do not effectively capture the dynamic evolution of the importance of historical data and multiple measurement metrics induced by satellite mobility. Therefore, adaptation to highly time-varying satellite-ground link environments remains limited. To address this problem, a spatio-temporal feature fusion relaxation measurement method based on a Long Short-Term Memory (LSTM) network is proposed for LEO satellite communication. An LSTM recurrent neural network integrated with a dual-attention mechanism is constructed. The LSTM extracts correlations among historical measurement data, whereas temporal attention and variable attention focus on key time instants and significant features, respectively. On this basis, the measurement frequency point set and the number of relaxation periods are jointly predicted. Intelligent link measurement is then performed using the selected frequency point set and relaxation period, enabling adaptive and energy-efficient link monitoring in LEO satellite systems. Methods The proposed spatio-temporal feature fusion LSTM-based relaxation measurement method employs a Dual-Attention LSTM (DA-LSTM) model to reduce measurement overhead while maintaining reliable link monitoring. Historical link quality indicators, including Reference Signal Receiving Power (RSRP), Reference Signal Receiving Quality (RSRQ), and Doppler shift, together with satellite ephemeris information, are used as model inputs. These features capture temporal and spatial variations and support the joint prediction of a subset of measurement frequency points and their corresponding relaxation periods. Based on the predicted results, the terminal performs adaptive frequency point selection and dynamic relaxation period adjustment or executes full-band measurements with a fixed measurement period. This process enables adaptive and energy-efficient link monitoring while preserving communication performance in LEO satellite systems. Results and Discussions The proposed relaxation measurement method applies the DA-LSTM model to predict measurement frequency points and the number of relaxation periods using historical link quality information. Simulation results show higher convergence efficiency, higher training accuracy, and lower loss for both frequency point selection and relaxation period selection compared with baseline methods (Fig. 4 and Fig. 5). The proposed measurement algorithm achieves an average measurement frequency below 30% with minimal performance degradation (Table 3). This result is attributed to the adaptive selection of high-quality frequency points and dynamic adjustment of the measurement period. The trade-off between measurement frequency and communication performance is further examined (Fig. 6 and Fig. 7), indicating that the proposed method achieves a better balance than baseline methods under different terminal speeds. Additional simulations under different terminal speeds (Fig. 8) and different maximum relaxation periods (Fig. 9) further confirm that high energy efficiency and communication performance are maintained under diverse operational conditions. Conclusions This work addresses the challenge of dynamic spatio-temporal importance evolution caused by satellite mobility, which limits the effectiveness of existing relaxation measurement strategies. A DA-LSTM-based relaxation measurement algorithm is proposed to predict both the measurement frequency point set and the number of relaxation periods by extracting spatio-temporal correlations from historical link quality data. Simulation results under various scenarios show that: (1) the proposed algorithm achieves higher convergence efficiency and training accuracy than baseline methods; (2) adaptive selection of high-quality frequency points and dynamic adjustment of relaxation periods maintain a favorable balance between measurement frequency and communication reliability; and (3) the method remains effective across different terminal speeds and maximum relaxation periods, indicating good scalability and robustness in dynamic operational environments. The current study is limited to simulations and does not consider hardware constraints, atmospheric effects, or real-time processing requirements. These factors should be investigated in future work.

A Deep Reinforcement Learning Enhanced Adaptive Large Neighborhood Search for Imaging Satellite Scheduling

WEI Puyuan, HE Lei

2025, 47(12): 5005-5015. doi: 10.11999/JEIT251009

[Abstract](69) [FullText HTML](31) [PDF 2693KB](8)

Abstract:
Objective The Satellite Scheduling Problem (SSP) is a typical NP-hard combinatorial optimization problem. The objective is to maximize observation benefits or the number of completed tasks under complex physical and operational constraints. Adaptive Large Neighborhood Search (ALNS) is an effective metaheuristic for this class of problems; however, its performance strongly depends on the selection of destroy and repair operators. Traditional ALNS methods usually employ heuristic scoring mechanisms based on historical performance to adjust operator selection probabilities. These mechanisms are sensitive to parameter settings and cannot adapt dynamically to complex state changes during the search process. This study aims to address this limitation and proposes an improved algorithm to enhance ALNS performance for SSP. Methods To achieve this objective, a Deep Reinforcement Learning based Adaptive Large Neighborhood Search algorithm (DR-ALNS) is proposed. The operator selection process is formulated as a Markov Decision Process (MDP). A Deep Reinforcement Learning (DRL) agent is employed to select destroy and repair operators dynamically according to the current solution state at each iteration. Through end-to-end learning, the DRL agent acquires an implicit and efficient operator selection strategy. This strategy guides the search process and improves both global exploration and local exploitation. Experiments are conducted on a standard satellite scheduling test suite, and the results indicate that DR-ALNS outperforms conventional ALNS and other comparison algorithms in solution quality and convergence speed. Results and Discussions To verify the effectiveness of DR-ALNS, experiments are conducted on 100 scenarios selected from the Tianzhi-Cup dataset. These scenarios are classified into small, medium, and large categories based on the number of task strips. The experimental results are summarized in Table 4, and detailed comparisons of average scores across scenario types are reported in Table 5. In small scenarios, the average score of DR-ALNS is 0.6% higher than that of the comparison algorithms. In medium scenarios, the average score exceeds that of the second-ranked algorithm by 2.3%. In large scenarios, DR-ALNS outperforms the second-ranked algorithm by 3.8%. Conclusions A DR-ALNS model for the SSP is proposed. By integrating DRL, destroy and repair operator selection and destruction coefficient settings in ALNS are dynamically guided through iterative learning of solution states. This strategy accelerates convergence toward high-quality solutions. Experiments on the Tianzhi-Cup dataset confirm the effectiveness of the proposed method, with clear advantages over A-ALNS and GRILS, particularly in large-scale satellite cluster scheduling. Future studies will evaluate the method in ultra-large-scale scenarios to assess stability and will explore adaptation to dynamic constraints to enhance practical applicability.

A Review of Ground-to-Aerial Cross-View Localization Research

HU Di, YUAN Xia, XU Xiaoqiang, ZHAO Chunxia

2025, 47(12): 5016-5032. doi: 10.11999/JEIT250167

[Abstract](225) [FullText HTML](106) [PDF 4141KB](32)

Abstract:
Significance This paper presents a comprehensive review of ground-to-aerial cross-view localization, systematically organizes representative methods, benchmark datasets, and evaluation metrics. Notably, it is the first review to systematically organize ground-to-aerial cross-view localization algorithms that integrate range sensors, such as Light Detection and Ranging (LiDAR) and millimeter-wave radar, thereby providing new perspectives for subsequent research. Ground-to-aerial cross-view localization has emerged as a key topic in computer vision, aiming to determine the precise pose of ground-based sensors by referencing aerial imagery. This technology is increasingly applied in autonomous driving, unmanned aerial vehicle navigation, intelligent transportation systems, and urban management. Despite substantial progress, ground-to-aerial cross-view localization continues to face major challenges arising from temporal and spatial variations, including seasonal changes, day-night transitions, weather conditions, viewpoint differences, and scene layout changes. These factors require more robust and accurate algorithms to reduce localization errors. This review summarizes the state of the art and provides a forward-looking discussion of challenges and research directions. Progress Ground-to-aerial cross-view localization has advanced rapidly, particularly through the integration of range sensors such as LiDAR and millimeter-wave radar, which has opened new research directions and application scenarios. The development of this field can be divided into several stages. Early studies rely on manually designed features, marking a transition from same-view localization to cross-view geographic localization. With the emergence of deep learning, metric learning, image transformation, and image generation methods are adopted to learn correspondences between images captured from different viewpoints. However, many deep learning models exhibit limited robustness to temporal and spatial variations, especially in long-term cross-season scenarios in which visual appearances at the same location differ markedly across seasons. Additionally, the large-scale nature of urban environments presents difficulties for efficient image retrieval and matching. Range sensors provide accurate distance measurements and three-dimensional structural information, which support reliable localization in scenes where visual cues are weak or absent. Nevertheless, effective fusion of range-sensor data and visual data remains challenging because of discrepancies in spatial resolution, sampling frequency, and sensing coverage. Conclusions This paper reviews the evolution of ground-to-aerial cross-view localization technologies, analyzes major technical advances and their driving factors at different stages. From an algorithmic perspective, the main categories of ground-to-aerial cross-view localization methods are systematically discussed to provide a clear theoretical framework and technical overview. The role of benchmark datasets in promoting progress in this field is highlighted by comparing the performance of representative models across datasets, thereby clarifying differences and relative advantages among methods. Although notable progress has been achieved, several challenges persist, including cross-region localization accuracy, precise localization over large-scale aerial imagery, and sensitivity to temporal changes in geographic features. Further research is required to improve the robustness, accuracy, and efficiency of localization systems. Prospects Future research on ground-to-aerial cross-view localization is expected to concentrate on several directions. Greater attention should be paid to transform range-sensor data into feature representations that align effectively with image features, enabling efficient cross-modal information fusion. Multi-branch network architectures, in which different modalities are processed separately and then fused, may support richer feature extraction. Graph-based models may also be explored to capture shared semantics between ground and aerial views and to support information propagation across modalities. In addition, algorithms that adapt to seasonal variation, day-night cycles, and changing weather conditions are required to enhance robustness and localization accuracy. The integration of multi-scale and multi-temporal data may further improve adaptability to spatio-temporal variation, for example through the combination of images with different spatial resolutions or acquisition times. For large-scale urban environments, efficient search and matching strategies remain essential. Parallel computing frameworks may be applied to manage large datasets and accelerate retrieval, whereas algorithmic strategies such as pruning can reduce computational redundancy and improve matching efficiency. Overall, although ground-to-aerial cross-view localization continues to face challenges, it shows substantial potential for further methodological development and practical deployment.

Survey on Intelligent Methods for Large-scale Remote Sensing Satellite Scheduling

DU Yonghao, ZHANG Benkui, WU Jian, CHEN Yingguo, YAN Donglei, YU Haiyan, XING Lining, BAI Baocun

2025, 47(12): 5033-5047. doi: 10.11999/JEIT251038

[Abstract](94) [FullText HTML](50) [PDF 2731KB](24)

Abstract:
Significance Satellite task scheduling is an operational optimization technique. It constructs combinatorial optimization models for space-ground resources and applies operations research and computational intelligence methods to generate task plans, resolve task conflicts and constraints, and maximize satellite utilization efficiency. With the development of large-scale constellations, satellite task scheduling faces several new challenges. (1) The rapid increase in the number of satellites and tasks leads to a combinatorial explosion of the solution space. (2) Satellite applications are shifting from planned operations to on-demand services, which require response times to be reduced from hours to minutes or even seconds. (3) Advances in satellite payload capabilities enable onboard autonomous decision making and in-orbit collaboration, which support interactive and swarm-intelligence-based management of large-scale remote sensing constellations. Progress To address large-scale complexity, constellation collaboration, and on-demand service requirements in satellite task scheduling, recent research developments are reviewed from the perspectives of task scheduling frameworks, task scheduling models, and task scheduling algorithms, following a top-down approach. First, centralized scheduling frameworks, distributed scheduling frameworks, and hybrid centralized-distributed scheduling frameworks are described, and their control paradigms and application scenarios are clarified. Second, task scheduling models are examined according to their theoretical foundations and applicable solution methods, including classical operations research models, constraint satisfaction optimization models, and artificial neural network-based decision-making models. Their modeling approaches and application scopes are discussed in detail. Subsequently, three major classes of task scheduling algorithms are summarized, including exact algorithms, metaheuristic algorithms, and machine learning-based algorithms. Their decision-making mechanisms, advantages, and limitations are analyzed. Finally, future research directions are identified, including the reconstruction of large-scale and order-oriented task scheduling frameworks, the development of novel task scheduling models, and the innovative integration of different task scheduling algorithms. Conclusions and prospects At the framework level, task scheduling frameworks for constellations consisting of more than one thousand satellites have not yet been reported. Existing task scheduling frameworks mainly address problems with fewer than 100 satellites, which remains insufficient for large-scale remote sensing constellations with thousands or even tens of thousands of satellites. The hybrid centralized-distributed task scheduling framework combines the advantages of centralized scheduling frameworks and distributed scheduling frameworks and is consistent with the hierarchical construction and management characteristics of satellite constellations. It can further adapt to satellite scale expansion and order-based process mechanisms. At the model level, constraint satisfaction optimization models focus on detailed representations of optimization attributes and elements and are suitable for small-scale and medium-scale satellite task scheduling problems. In contrast, artificial neural network-based decision-making models emphasize classification and decision-making characteristics and support online and on-demand scheduling, which makes them suitable for large-scale satellite task scheduling scenarios. These two types of task scheduling models can therefore be coordinated to characterize different stages of large-scale constellation task scheduling. At the algorithm level, the integration of metaheuristic algorithms and machine learning-based algorithms has become an important technical approach for solving satellite task scheduling problems. This integrated approach supports hybrid centralized-distributed task scheduling frameworks and complements both constraint satisfaction optimization models and artificial neural network-based decision-making models.

Vegetation Height Prediction Dataset Oriented to Mountainous Forest Areas

YU Cuilin, ZHONG Zixuan, PANG Hongyi, DING Yusheng, LAI Tao, Huang Haifeng, WANG Qingsong

2025, 47(12): 5048-5060. doi: 10.11999/JEIT250941

[Abstract](188) [FullText HTML](91) [PDF 6680KB](30)

Abstract:
Objective Vegetation height is a key ecological parameter that reflects forest vertical structure, biomass, ecosystem functions, and biodiversity. Existing open-source vegetation height datasets are often sparse, unstable, and poorly suited to mountainous forest regions, which limits their utility for large-scale modeling. This study constructs the Vegetation Height Prediction Dataset (VHP-Dataset) to provide a standardized large-scale training resource that integrates multi-source remote sensing features and supports supervised learning tasks for vegetation height estimation. Methods The VHP-Dataset is constructed by integrating Landsat 8 multispectral imagery, the digital elevation model AW3D30 (ALOS World 3D, 30 m), land cover data CGLS-LC100 (Copernicus Global Land Service, Land Cover 100 m), and tree canopy cover data GFCC30TC (Global Forest Canopy Cover 30 m Tree Canopy). Canopy height from GEDI L2A (Global Ecosystem Dynamics Investigation, Level 2A) footprints is used as the target variable. A total of 18 input features is extracted, covering spatial location, spectral reflectance, topographic structure, vegetation indices, and vegetation cover information (Table 3, Fig. 4). For model validation, five representative approaches are applied: Extremely Randomized Trees (ExtraTree), Random Forest (RF), Artificial Neural Network (ANN), Broad Learning System (BLS), and Transformer. Model performance is assessed using Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Standard Deviation (SD), and Coefficient of Determination (R²). Results and Discussions The experimental results show that the VHP-Dataset supports stable vegetation height prediction across regions and terrain conditions, which reflects its scientific validity and practical applicability. Model comparisons indicate that ExtraTree achieves the best performance in most regions, and Transformer performs well in specific areas, which confirms that the dataset is compatible with different approaches (Table 5). Stratified analyses show that prediction errors increase under high canopy cover and steep slope conditions, and predictions remain more stable at higher elevations (Figs. 6～9). These findings indicate that the dataset captures the effects of complex topography and canopy structure on model accuracy. Feature importance analysis shows that spatial location, topographic factors, and canopy cover indices are the primary drivers of prediction accuracy, while spectral and land cover information provide complementary contributions (Fig. 10). Conclusions The results show that the VHP-Dataset supports vegetation height prediction across regions and terrain types, which reflects its scientific validity and applicability. The dataset enables robust predictions with traditional machine learning methods such as tree-based models, and it also provides a foundation for deep learning approaches such as Transformers, which reflects broad methodological compatibility. Stratified analyses based on vegetation cover and terrain show the effects of complex canopy structures and topographic factors on prediction accuracy, and feature importance analysis identifies spatial location, topographic attributes, and canopy cover indices as the primary drivers. Overall, the VHP-Dataset fills the gap in large-scale high-quality datasets for vegetation height prediction in mountainous forests and provides a standardized benchmark for cross-regional model evaluation and comparison. This offers value for research on vegetation height prediction and forest ecosystem monitoring.

Wireless Communication and Internet of Things

Research on Non-cooperative Interference Suppression Technology for Dual Antennas without Channel Prior Information

YAN Cheng, LI Tong, PAN Wensheng, DUAN Baiyu, SHAO Shihai

2025, 47(12): 5061-5070. doi: 10.11999/JEIT250378

[Abstract](153) [FullText HTML](69) [PDF 3612KB](28)

Abstract:
Objective In electronic countermeasures, friendly communication links are vulnerable to interference from adversaries. The auxiliary antenna scheme is employed to extract reference signals for interference cancellation, which improves communication quality. Although the auxiliary antenna is designed to capture interference signals, it often receives communication signals at the same time, and this reduces the suppression capability. Typical approaches for non-cooperative interference suppression include interference rejection combining and spatial domain adaptive filtering. These approaches rely on the uncorrelated nature of the interference and desired signals to achieve suppression. They also require channel information and interference noise information, which restricts their applicability in some scenarios. Methods This paper proposes the Fast ICA-based Simulated Annealing Algorithm for SINR Maximization (FSA) to address non-cooperative interference suppression in communication systems. Designed for scenarios without prior channel information, FSA applies a weighted reconstruction cancellation method implemented through a Finite Impulse Response (FIR) filter. The method operates in a dual-antenna system in which one antenna supports communication and the other provides an auxiliary reference for interference. Its central innovation is the optimization of weighted reconstruction coefficients using the Simulated Annealing algorithm, together with Fast Independent Component Analysis (Fast ICA) for SINR estimation. The FIR filter reconstructs interference from the auxiliary antenna signal using optimized coefficients and then subtracts this reconstructed interference from the main received signal to improve communication quality. Accurate SINR estimation in non-cooperative settings is difficult because the received signals contain mixed components. FSA addresses this through blind source separation based on Fast ICA, which extracts sample signals of both communication and interference components. SINR is then calculated from cross-correlation results between these separated signals and the signals after interference suppression. The Simulated Annealing algorithm functions as a probabilistic optimization process that adjusts reconstruction coefficients to maximize the output SINR. Starting from initial coefficients, the algorithm perturbs them and evaluates the resulting SINR. Using the Monte Carlo acceptance rule, it allows occasional acceptance of perturbations that do not yield immediate improvement, which supports escape from local optima and promotes convergence toward global solutions. This iterative process identifies optimal filter coefficients within the search range. The combined use of Fast ICA and Simulated Annealing enables interference suppression without prior channel information. By pairing blind estimation with robust optimization, the method provides reliable performance in dynamic interference environments. The FIR-based structure offers a practical basis for real-time interference cancellation. FSA is therefore suitable for electronic countermeasure applications where channel conditions are unknown and change rapidly. This approach advances beyond conventional techniques that require channel state information and offers improved adaptability in non-cooperative scenarios while maintaining computational efficiency through the combined use of blind source separation and intelligent optimization. Results and Discussions The performance of the proposed FSA is assessed through simulations and experiments. The output SINR is improved under varied conditions. In simulations, a maximum SINR improvement of 27.2 dB is achieved when the communication and auxiliary antennas have a large SINR difference and are placed farther apart (Fig. 5). The performance is reduced when the channel correlation between the antennas increases. Experimental results confirm these observations, and an SINR improvement of 19.6 dB is measured at a 2 m antenna separation (Fig. 7). The method is shown to be effective for non-cooperative interference suppression without prior channel information, although its performance is affected by antenna configuration and channel correlation. Conclusions The proposed FSA method provides an effective solution for non-cooperative interference suppression in communication systems. The method applies weighted reconstruction cancellation optimized by the Simulated Annealing algorithm and uses Fast ICA-based SINR estimation to improve communication quality without prior channel information. The results from simulations and experiments show that the method performs well across varied conditions and has potential for practical electronic warfare applications. The study finds that the performance of the FSA method depends on the SINR difference and the channel correlation between the communication and auxiliary antennas. Future research focuses on refining the algorithm for more complex scenarios and examining the effect of system parameters on its performance. These findings support the development of communication systems that operate reliably in challenging interference environments.

Design and Optimization for Orbital Angular Momentum-based Wireless-powered NOMA Communication System

CHEN Ruirui, CHEN Yu, RAN Jiale, SUN Yanjing, LI Song

2025, 47(12): 5071-5081. doi: 10.11999/JEIT250634

[Abstract](150) [FullText HTML](70) [PDF 1550KB](25)

Abstract:
Objective The Internet of Things (IoT) requires not only interconnection among devices but also seamless connectivity among users, information, and things. Ensuring stable operation and extending the lifespan of IoT Devices (IDs) through continuous power supply have become urgent challenges in IoT-driven Sixth-Generation (6G) communications. Radio Frequency (RF) signals can simultaneously transmit information and energy, forming the basis for Simultaneous Wireless Information and Power Transfer (SWIPT). Non-Orthogonal Multiple Access (NOMA), a key technology in Fifth-Generation (5G) communications, enables multiple users to share the same time and frequency resources. Efficient wireless-powered NOMA communication requires a Line-of-Sight (LoS) channel. However, the strong correlation in LoS channels severely limits the degree of freedom, making it difficult for conventional spatial multiplexing to achieve capacity gains. To address this limitation, this study designs an Orbital Angular Momentum (OAM)-based wireless-powered NOMA communication system. By exploiting OAM mode multiplexing, multiple data streams can be transmitted independently through orthogonal OAM modes, thereby significantly enhancing communication capacity in LoS channels. Methods The OAM-based wireless-powered NOMA communication system is designed to enable simultaneous energy transfer and multi-channel information transmission for IDs under LoS conditions. Under the constraints of the communication capacity threshold and the harvested energy threshold, this study formulates a sum-capacity maximization problem by converting harvested energy into the achievable uplink information capacity. The optimization problem is decomposed into two subproblems. A closed-form expression for the optimal Power-Splitting (PS) factor is derived, and the optimal power allocation is obtained using the subgradient method. The transmitting Uniform Circular Array (UCA) employs the Movable Antenna (MA) technique to adjust both position and array angle. To maintain system performance under typical parallel misalignment conditions, a beam-steering method is investigated. Results and Discussions Simulation results demonstrate that the proposed OAM-based wireless-powered NOMA communication system effectively enhances capacity performance compared with conventional wireless communication systems. As the OAM mode increases, the sum capacity of the ID decreases. This occurs because higher OAM modes exhibit stronger hollow divergence characteristics, resulting in greater energy attenuation of the received OAM signals (Fig. 3). The sum capacity of the ID increases with the PS factor (Fig. 4). However, as the harvested energy threshold increases, the system’s sum capacity decreases (Fig. 5). When the communication capacity threshold increases, the sum capacity first rises and then gradually declines (Fig. 6). In power allocation optimization, allocating more power to the ID with the best channel condition further improves the total system capacity. Conclusions To enhance communication capacity under LoS conditions, this study designs an OAM-based wireless-powered NOMA communication system that employs mode multiplexing to achieve independent multi-channel information transmission. On this basis, a sum-capacity maximization problem is formulated under communication capacity and harvested energy threshold constraints by transforming harvested energy into achievable uplink information capacity. The optimization problem is decomposed into two subproblems. A closed-form expression for the optimal PS factor is derived, and the optimal power allocation is obtained using the subgradient method. In future work, the MA technique will be integrated into the proposed OAM-based wireless-powered NOMA system to further optimize sum-capacity performance based on the three-dimensional spatial configuration and adjustable array angle.

Multi-modal Joint Automatic Modulation Recognition Method Towards Low SNR Sequences

WANG Zhen, LIU Wei, LU Wanjie, NIU Chaoyang, LI Runsheng

2025, 47(12): 5082-5093. doi: 10.11999/JEIT250594

[Abstract](446) [FullText HTML](156) [PDF 7826KB](79)

Abstract:
Objective The rapid evolution of data-driven intelligent algorithms and the rise of multi-modal data indicate that the future of Automatic Modulation Recognition (AMR) lies in joint approaches that integrate multiple domains, use multiple frameworks, and connect multiple scales. However, the embedding spaces of different modalities are heterogeneous, and existing models lack cross-modal adaptive representation, limiting their ability to achieve collaborative interpretation. To address this challenge, this study proposes a performance-interpretable two-stage deep learning-based AMR (DL-AMR) method that jointly models the signal in the time and transform domains. The approach explicitly and implicitly represents signals from multiple perspectives, including temporal, spatial, frequency, and intensity dimensions. This design provides theoretical support for multi-modal AMR and offers an intelligent solution for modeling low Signal-to-Noise Ratio (SNR) time sequences in open environments. Methods The proposed AMR network begins with a preprocessing stage, where the input signal is represented as an in-phase and quadrature (I-Q) sequence. After wavelet thresholding denoising, the signal is converted into a dual-channel representation, with one channel undergoing Short-Time Fourier transform (STFT). This preprocessing yields a dual-stream representation comprising both time-domain and transform-domain signals. The signal is then tokenized through time-domain and transform-domain encoders. In the first stage, explicit modal alignment is performed. The token sequences from the time and transform domains are input in parallel into a contrastive learning module, which explicitly captures and strengthens correlations between the two modalities in dimensions such as temporal structure and amplitude. The learned features are then passed into the feature fusion module. Bidirectional Long Short-Term Memory (BiLSTM) and local representation layers are employed to capture temporally sparse features, enabling subsequent feature decomposition and reconstruction. To refine feature extraction, a subspace attention mechanism is applied to the high-dimensional sparse feature space, allowing efficient capture of discriminative information contained in both high-frequency and low-frequency components. Finally, Convolutional Neural Network—Kolmogorov-Arnold Network (CNN-KAN) layers replace traditional multilayer perceptrons as classifiers, thereby enhancing classification performance under low SNR conditions. Results and Discussions The proposed method is experimentally validated on three datasets: RadioML2016.10a, RadioML2016.10b, and HisarMod2019.1. Under high SNR conditions (SNR > 0 dB), classification accuracies of 93.36%, 93.13%, and 93.37% are achieved on the three datasets, respectively. Under low SNR conditions, where signals are severely corrupted or blurred by noise, recognition performance decreases but remains robust. When the SNR ranges from –6 dB to 0 dB, overall accuracies of 78.36%, 80.72%, and 85.43% are maintained, respectively. Even at SNR levels below –6 dB, accuracies of 17.10%, 21.30%, and 29.85% are obtained. At particularly challenging low-SNR levels, the model still achieves 43.45%, 44.54%, and 60.02%. Compared with traditional approaches, and while maintaining a low parameter count (0.33～0.41 M), the proposed method improves average recognition accuracy by 2.12%～7.89%, 0.45%～4.64%, and 6.18%～9.53% on the three datasets. The improvements under low SNR conditions are especially significant, reaching 4.89%～12.70% (RadioML2016.10a), 2.62%～8.72% (RadioML2016.10b), and 4.96%～11.63% (HisarMod2019.1). The results indicate that explicit modeling of time–transform domain correlations through contrastive learning, combined with the hybrid architecture consisting of LSTM for temporal sequence modeling, CNN for local feature extraction, and KAN for nonlinear approximation, substantially enhances the noise robustness of the model. Conclusions This study proposes a two-stage AMR method based on time-transform domain multimodal fusion. Explicit multimodal alignment is achieved through contrastive learning, while temporal and local features are extracted using a combination of LSTM and CNN. The KAN is used to enhance nonlinear modeling, enabling implicit feature-level multimodal fusion. Experiments conducted on three benchmark datasets demonstrate that, compared with classical methods, the proposed approach improves recognition accuracy by 2.62%～11.63% within the SNR range of –20 to 0 dB, while maintaining a similar number of parameters. The performance gains are particularly significant under low-SNR conditions, confirming the effectiveness of multimodal joint modeling for robust AMR.

Entangled Optical Quantum Transmission Distance Matrix Construction For Dynamic Target Localization

ZHOU Mu, WANG Min, CAO Jingyang, HE Wei

2025, 47(12): 5094-5105. doi: 10.11999/JEIT250020

[Abstract](133) [FullText HTML](72) [PDF 7908KB](18)

Abstract:
Objective Quantum information research has grown rapidly with the integration of quantum mechanics, information science, and computer science. Grounded in principles such as quantum superposition and quantum entanglement, quantum information technology can overcome the limitations of traditional approaches and address problems that classical information technologies and conventional computers cannot resolve. As a core technology, space-based quantum information technology has advanced quickly, offering new possibilities to overcome the performance bottlenecks of conventional positioning systems. However, existing quantum positioning methods mainly focus on stationary targets and have difficulty addressing the dynamic variations in the transmission channels of entangled photon pairs caused by particles, scatterers, and noise photons in the environment. These factors hinder the detection of moving targets and increase positioning errors because of reduced data acquisition at fixed points during target motion. Traditional wireless signal-based localization methods also face challenges in dynamic target tracking, including signal attenuation, multipath effects, and noise interference in complex environments. To address these limitations, a dynamic target localization method based on constructing an optical quantum transmission distance matrix is proposed. This method achieves high-precision and robust dynamic localization, meeting the requirements for moving target localization in practical scenarios. It provides centimeter-level positioning accuracy and significantly enhances the adaptability and stability of the system for moving targets, supporting the future practical application of quantum-based dynamic localization technology. Methods To improve the accuracy of the dynamic target localization system, a dynamic threshold optical quantum detection model based on background noise estimation is proposed, utilizing the characteristics of optical quantum echo signals. A dynamic target localization optical path is established in which two entangled optical signals are generated through the Spontaneous Parametric Down-Conversion (SPDC) process. One signal is retained as a reference in a local Single-Photon Detector (SPD), and the other is transmitted toward the moving target as the signal light. The optical quantum echo signals are analyzed, and the background noise is estimated using a coincidence counting algorithm. The detection threshold is then dynamically adjusted and compared with the signals from the detection unit, enabling rapid detection of dynamic targets. To accommodate variations in quantum echo signals caused by target motion, an adaptive optical quantum grouping method based on velocity measurement is introduced. The time pulse sequence is initially coarsely grouped to calculate the rough velocity of the target. The grouping size is subsequently adjusted according to the target’s speed, updating the time grouping sequence and further optimizing the distance measurement accuracy to generate an updated velocity matrix. The photon transmission distance matrix is refined using the relative velocity error matrix. By constructing a system of equations involving the coordinates of the light source, the optical quantum transmission distance matrix, and the dynamic target coordinate sequence, the target position is estimated through the least squares method. This approach improves localization accuracy and effectively reduces errors arising from target motion. Results and Discussions The effectiveness of the proposed method is verified through both simulations and experimental validation on a practical measurement platform. The experimental results demonstrate that the dynamic threshold detection approach based on background noise estimation achieves high-sensitivity detection performance (Fig. 7). When a moving target enters the detection range, rapid identification is realized, enabling subsequent dynamic localization. The adaptive grouping method based on velocity measurement significantly improves the performance of the quantum dynamic target localization system. Through grouped coincidence counting, the problem of blurred coincidence counting peaks caused by target movement is effectively mitigated (Fig. 8), achieving high-precision velocity measurement (Table 1) and reducing localization errors associated with motion. Centimeter-level positioning accuracy is attained (Fig. 9). Furthermore, an entangled optical quantum experimental platform is established, with analyses focusing on measurement results under different velocities and localization performances across various methods. The findings confirm the reliability and adaptability of the proposed approach in improving distance measurement accuracy (Fig. 12). Conclusions A novel method for dynamic target localization in entangled optical quantum dynamics is proposed based on constructing an optical quantum transmission distance matrix. The method enhances distance measurement accuracy and optimizes the overall positioning accuracy of the localization system through a background noise estimation-based dynamic threshold detection model and a velocity measurement-based adaptive grouping approach. By integrating the optical quantum transmission distance matrix with the least squares optimization method, the proposed framework offers a promising direction for achieving more precise quantum localization systems and demonstrates strong potential for real-time dynamic target tracking. This approach not only improves the accuracy of dynamic quantum localization systems but also broadens the applicability of quantum localization technology in complex environments. It is expected to provide solid support for real-time quantum dynamic target localization and find applications in intelligent health monitoring, the Internet of Things, and autonomous driving.

Radar, Sonar, Navigation and Array Signal Processing

Multi-Channel Switching Array DOA Estimation Algorithm Based on FRIDA

CHEN Tao, XI Haolin, ZHAN Lei, YU Yuwei

2025, 47(12): 5106-5115. doi: 10.11999/JEIT250350

[Abstract](105) [FullText HTML](56) [PDF 3695KB](8)

Abstract:
Objective With the increasing complexity of electromagnetic environments, the demand for higher estimation accuracy in practical direction-finding systems is rising. Enlarging the antenna array is an effective approach to improve estimation accuracy; however, it also significantly increases system complexity. This study aims to reduce the number of channels required while preserving the Direction-Of-Arrival (DOA) estimation performance achievable with full-channel data. By combining the channel compression algorithm, which reduces channel usage, with the time-modulated array structure that incorporates RF front-end switches, this paper proposes a multi-channel switching array DOA estimation algorithm based on FRIDA. Methods The algorithm introduces a selection matrix composed of switches between the antenna array and the channels. This matrix directs the signal received by a selected antenna into the corresponding channel, thereby enabling a specific subarray to capture the data. By switching across different subarrays, multiple reduced-channel received data covariance matrices are collected. To ensure phase consistency within these covariance matrices, common array elements are specified for each subarray. After weighted summation, these covariance matrices are combined to restore the dimensionality of the covariance matrix, producing the total covariance matrix. Next, the elements of the total covariance matrix that correspond to identical array-element spacings are weighted and summed, yielding the full-channel received data vector. Using this vector, an FRI reconstruction model is established. Finally, the incident angle is estimated through the combination of the proximal gradient descent algorithm and the parameter recovery algorithm. Results and Discussions Simulation results of DOA estimation for SA-FRI under multiple source incidence demonstrate that the full-channel received data vectors reconstructed from multiple covariance matrices of reduced-channel data can successfully discriminate multi-source incident signals, achieving performance comparable to that of full-channel data (Fig. 2). Further simulations evaluating estimation accuracy with varying numbers of snapshots and Signal-to-Noise Ratios (SNRs) show that the accuracy of the proposed algorithm improves with increasing snapshots and SNR. Under identical conditions, the use of more channels yields higher DOA estimation accuracy (Figs. 3 and 4). Comparisons of five different algorithms under varying SNRs and snapshot numbers indicate that estimation accuracy increases with both parameters. The proposed algorithm consistently outperforms the other algorithms under the same conditions (Figs. 5 and 6). Finally, verification with measured data produces results consistent with the simulations (Fig. 9), further confirming the effectiveness of the proposed algorithm. Conclusions To address the challenge of reducing the number of channels in practical DOA estimation systems, this study proposes an array-switching DOA estimation method based on proximal gradient descent. The algorithm first reduces channel usage through a switching matrix, then generates multiple covariance matrices by sequentially switching different subarray access channels. These covariance matrices are combined to reconstruct the full-channel received data covariance matrix. Finally, the DOA parameters of incident signals are estimated using the proximal gradient descent algorithm. Simulation results confirm that the proposed algorithm achieves reduced channel usage while maintaining reliable estimation accuracy. Moreover, validation with measured data collected from an actual DOA estimation system demonstrates results consistent with the simulations, further verifying the algorithm’s effectiveness.

3D Localization Method with Uniform Circular Array Driven by Complex Subspace Neural Network

JIANG Wei, ZHI Boxin, YANG Junjie, WANG hui, DING Pengfei, ZHANG Zheng

2025, 47(12): 5116-5125. doi: 10.11999/JEIT250395

[Abstract](127) [FullText HTML](70) [PDF 4168KB](13)

Abstract:
Objective High-precision indoor localization is increasingly required in intelligent service scenarios, yet existing techniques continue to face difficulties in complex environments where signal frequency offset, multipath propagation, and noise interfere with accuracy. To address these limitations, a 3D localization method using a Uniform Circular Array (UCA) driven by a Complex Subspace Neural Network (CSNN) is proposed to improve accuracy and robustness under challenging conditions. Methods The proposed method establishes a complete localization pipeline based on a hierarchical signal processing framework that includes frequency offset compensation, three-dimensional angle estimation, and spatial mapping (Fig. 2). A dual-estimation frequency compensation algorithm is first designed. The frequency offsets during the Channel Time Extension (CTE) reference period and sample period are estimated separately, and the estimate obtained from the reference period is used to resolve ambiguity in the antenna sample period, which enables high-precision frequency compensation. The CSNN is then constructed to estimate the two-dimensional angle (Fig. 3). Within this framework, a Complex-Valued Convolutional Neural Network (CVCNN) (Fig. 4) is introduced to calibrate the covariance matrix of the received signals, which suppresses correlated noise and multipath interference. Based on the theory of mode-space transformation, the calibrated covariance matrix is projected onto a virtual Uniform Linear Array (ULA). The azimuth and elevation angles are jointly estimated by the ESPRIT algorithm. The estimated angles from three Access Points (APs) are subsequently fused to obtain the final position estimate. Results and Discussions Experiments are conducted to evaluate the performance of the proposed method. For frequency offset suppression, the dual-estimation frequency compensation algorithm markedly reduces the effect on angle estimation, improving estimation accuracy by 91.7% compared with uncorrected data and showing clear improvement over commonly used approaches (Fig. 6). For angle estimation, the CSNN achieves reductions of more than 40% in azimuth error and 25% in elevation error compared with the MUSIC algorithm under simulation conditions (Fig. 7), and verifies the capability of the CVCNN module to suppress various interferences. In practical experiments, the CSNN achieves an average azimuth error of 1.07° and an average elevation error of 1.28° in the training scenario (Table 1, Fig. 10). Generalization experiments conducted in three indoor environments (warehouse, corridor, and office) show that the average angular errors remain low at 2.78° for azimuth and 3.39° for elevation (Table 2, Fig. 11). The proposed method further maintains average positioning accuracies of 28.9 cm in 2D and 36.5 cm in 3D after cross-scene migration (Table 4, Fig. 13). Conclusions The proposed high-precision indoor localization method integrates dual-estimation frequency compensation, the CSNN angle estimation algorithm, and three-AP cooperative localization. It demonstrates strong performance in both simulation and real-environment experiments. The method also maintains stable cross-scene adaptability and accuracy that meet the requirements of high-precision indoor localization.

Performance Analysis of Spatial-Reference-Signal-Based Digital Interference Cancellation Systems

XIN Yedi, HE Fangmin, GE Songhu, XING Jinling, GUO Yu, CUI Zhongpu

2025, 47(12): 5126-5136. doi: 10.11999/JEIT250679

[Abstract](111) [FullText HTML](86) [PDF 2336KB](23)

Abstract:
Objective With the rapid development of wireless communications, an increasing number of transceivers are deployed on platforms with limited spatial and spectral resources. Restrictions in frequency and spatial isolation cause high-power local transmitters to couple signals into nearby high-sensitivity receivers, causing co-site interference. Interference cancellation serves as an effective mitigation technique, whose performance depends on precise acquisition of a reference signal representing the interference waveform. Compared with digital sampling, Radio Frequency (RF) sampling enables simpler implementation. However, existing RF-based approaches are generally restricted to low-power communication systems. In high-power RF systems, RF sampling faces critical challenges, including excessive sampling power loss and high integration complexity. Therefore, developing new sampling methods and cancellation architectures suitable for high-power RF systems is of substantial theoretical and practical value. Methods To overcome the limitations of conventional high-power RF interference sampling methods based on couplers, a spatial-reference-based digital cancellation architecture is proposed. A directional sampling antenna and its associated link are positioned near the transmitter to acquire the reference signal. This configuration, however, introduces spatial noise, link noise, and possible multipath effects, which can degrade cancellation performance. A system model is developed, and closed-form expressions for the cancellation ratio under multipath conditions are derived. The validity of these expressions is verified through Monte Carlo simulations using three representative modulated signals. Furthermore, a systematic analysis is conducted to evaluate the effects of key system parameters on cancellation performance. Results and Discussions Based on the proposed spatial-reference-based digital cancellation architecture, analytical expressions for the cancellation ratio are derived and validated through extensive simulations. These expressions enable systematic evaluation of the key performance factors. For three representative modulation schemes, the cancellation ratio shows excellent consistency between theoretical predictions and simulation results under various conditions, including receiver and sampling channel Interference-to-Noise Ratios (INRs), time-delay mismatch errors, and filter tap numbers (Figs. 2～4). The established theoretical framework is further applied to analyze the effects of system parameters. Simulations quantitatively assess (1) the influence of filter tap number, multipath delay spread, and the number of multipaths on cancellation performance in multipath environments (Figs. 5–7), and (2) the upper performance bounds and contour characteristics under different INR combinations in the receiver and sampling channels (Figs. 8～9). Conclusion To reduce the high deployment complexity and substantial insertion loss associated with coupler-based RF interference sampling in high-power systems, a digital interference cancellation architecture based on spatial reference signals is proposed. Closed-form expressions and performance bounds for the cancellation ratio of rectangular band-limited interference under multipath conditions are derived. Simulation results demonstrate that the proposed expressions provide high accuracy in representative scenarios. Based on the analytical findings, the effects of key parameters are examined, including INRs in receiver and sampling channels, filter tap length, multipath delay spread, number of paths, and time-delay mismatch. The results provide practical insights that support the design and optimization of spatial reference-based digital interference cancellation systems.

Detection of Underwater Acoustic Transient Signals under Alpha Stable Distribution Noise

CHEN Wen, ZOU Nan, ZHANG Guangpu, LI Yanhe

2025, 47(12): 5137-5145. doi: 10.11999/JEIT250500

[Abstract](181) [FullText HTML](101) [PDF 2748KB](46)

Abstract:
Objective Transient signals are generated during changes in the state of underwater acoustic targets and are difficult to suppress or remove. Therefore, they serve as an important basis for covert detection of underwater targets. Practical marine noise exhibits non-Gaussian behavior with impulsive components, which degrade or disable conventional Gaussian-based detectors, including energy detection commonly used in engineering systems. Existing approaches apply nonlinear processing or fractional lower-order statistics to mitigate non-Gaussian noise, yet they face drawbacks such as signal distortion and increased computational cost. To address these issues, an Alpha-stable noise model is adopted. A Data-Preprocessing denoising Short-Time Cross-Correntropy Detection (DP-STCCD) method is proposed to enable passive detection and Time-of-Arrival (ToA) estimation for unknown deterministic transient signals in non-Gaussian underwater environments. Methods The method consists of two stages: data-preprocessing denoising and short-time cross-correntropy detection. In the preprocessing stage, an outlier detection approach based on the InterQuartile Range (IQR) is used. Upper and lower thresholds are calculated to remove impulsive spikes while retaining local signal structure. Multi-stage filtering is then applied to further reduce noise. Median filtering reconstructs the signal with limited detail loss, and modified mean filtering suppresses remaining spikes by discarding extreme values within local windows. In the detection stage, the denoised signal is divided into short frames. Short-time cross-correntropy with a Gaussian kernel is calculated between adjacent frames to form the detection statistic. A first-order recursive filter estimates background noise and determines adaptive thresholds. Detection outputs are generated using joint amplitude-width decision rules. ToA estimation is performed by locating peaks in the short-time cross-correntropy. The method does not require prior noise information and improves robustness in non-Gaussian environments through data cleaning and information-theoretic feature extraction. Results and Discussions Simulations under symmetric Alpha-stable noise verify the effectiveness of the method. The preprocessing stage removes impulsive spikes while preserving key temporal features (Fig. 3). After denoising, the performance of energy detection shows partial recovery, and the peak-to-average ratio of short-time cross-correntropy features increases by 10 dB (Fig. 4, Fig. 5). Experimental results show that DP-STCCD provides higher detection probability and improved ToA estimation accuracy compared with Data Preprocessing denoising-Energy Detection(DP-ED). Under conditions with characteristic index α=1.5 and a Generalized Signal-to-Noise Ratio (GSNR) of –11 dB, DP-STCCD yields a 30.2% improvement in detection probability and an 18.4% increase in ToA estimation precision relative to the comparative method (Fig. 6, Fig. 9(a)). These findings confirm the robustness and detection capability of the proposed approach in non-Gaussian underwater noise environments. Conclusions A joint detection method, DP-STCCD, combining data-preprocessing denoising and short-time cross-correntropy features is proposed for transient signal detection under Alpha-stable noise. Preprocessing approaches based on IQR outlier detection and multi-stage filtering suppress impulsive interference while preserving key time-domain characteristics. Short-time cross-correntropy improves detection sensitivity and ToA estimation accuracy. The results show that the proposed method provides better performance than traditional energy detection under low GSNR and maintains stable behavior across different characteristic indices. The method offers a feasible solution for covert underwater target detection in non-Gaussian environments. Future work will optimize the algorithm for real marine noise and improve its engineering applicability.

Application of WAM Data Set and Classification Method of Electromagnetic Wave Absorbing Materials

YUAN Yuyang, ZHANG Junhan, LI Dandan, SHA Jianjun

2025, 47(12): 5146-5155. doi: 10.11999/JEIT250166

[Abstract](113) [FullText HTML](80) [PDF 3117KB](10)

Abstract:
The performance of electromagnetic radiation shielding and absorbing materials depends primarily on thickness, maximum reflection loss, and effective absorption bandwidth. Current research focuses on Metal Organic Frameworks (MOFs), carbon-based, and ceramic absorbing materials, analyzed using weak artificial intelligence techniques applied to the Wave-Absorbing Materials (WAM) dataset. After dividing the dataset into training and testing subsets, data augmentation, correlation analysis, and principal component analysis are performed. A decision tree algorithm is then applied to establish classification indicators, revealing that the reflection loss of MOF materials exceeds that of carbon-based materials. MOFs are more likely to achieve a maximum reflection loss below –45 dB. The random forest algorithm demonstrates stronger generalization ability than the decision tree algorithm, with a higher ROC–AUC value. Neural network classification shows that the self-organizing map neural network yields superior classification performance, whereas the probabilistic neural network performs poorly. When the binary classification problem is extended to a three-class problem, nonlinear classification, clustering, and Boosting algorithms indicate that maximum reflection loss serves as a key discriminative feature. Further analysis confirms that the WAM dataset is nonlinearly separable and that fuzzy clustering achieves better results. Artificial intelligence facilitates the identification of relationships between material properties and absorption performance, accelerates the development of new Wave-Absorbing Materials (WAM), and supports the construction of a knowledge graph and database for absorbing materials. Objective Computational materials science, high-throughput experimentation, and the Materials Genome Initiative (MGI) have emerged as key frontiers in modern materials research. The MGI provides a strategic framework and developmental roadmap for advancing materials discovery through artificial intelligence. Analogous to gene sequencing in bioinformatics, its central objective is to accelerate the identification of novel material compositions and structures. Extracting valuable information from large-scale datasets substantially reduces costs, enhances efficiency, fosters interdisciplinary integration, and promotes transformative progress in materials development. Big data analytics, high-performance computing, and advanced algorithms form the core pillars of this initiative, supplying essential support for new materials research and development. Nevertheless, the discovery of new compositions and structures depends on the effective screening of candidate materials to identify those exhibiting superior properties suitable for engineering applications. Achieving this goal requires the establishment of comprehensive datasets, the development of reliable classification algorithms, the improvement of model generalization performance, and the advancement of application-oriented software tools. Methods Pattern recognition techniques are employed in this study. A self-developed WAM dataset is first constructed, comprising a test set and a validation set. Data preprocessing is performed initially, including data augmentation, data integration, and principal component analysis. Decision tree and random forest algorithms are applied to establish classification indicators and define classification criteria. Self-Organizing Map (SOM) and Probabilistic Neural Network (PNN) models are subsequently utilized for material classification. Finally, the accuracy of various clustering algorithms is evaluated, and the fuzzy clustering algorithm is found to achieve relatively superior performance and satisfactory classification results. Results and Discussions It is found that the reflection loss of MOF materials is superior to that of carbon-based materials. Semantic segmentation algorithms are identified as unsuitable for classifying the WAM dataset. Among the neural network approaches, the SOM achieves higher classification accuracy than the PNN. The WAM dataset is determined to be nonlinearly separable, indicating that classification performance depends strongly on the intrinsic data distribution characteristics. The maximum reflection loss is identified as the key indicator for effective classification. Conclusions A self-developed WAM dataset is constructed to address the lack of publicly available datasets for applying pattern recognition methods to electromagnetic WAM. The performance of multiple algorithms is evaluated, and the optimal algorithm is identified according to the dataset characteristics. The conventional binary classification problem is extended to a three-class framework, providing the foundation for further research on multi-class classification. The application of artificial intelligence algorithms is found to enhance the credibility and reliability of the research, reduce time and labor costs, and facilitate the exploration of relationships between material properties and absorption performance. This approach shortens the research and development cycle, supports the screening of new materials, and contributes to the establishment of a knowledge base for absorbing materials. However, the knowledge extracted from the WAM dataset remains limited by data sparsity, which constrains the effectiveness of artificial intelligence methods.

Image and Intelligent Information Processing

A Test-Time Adaptive Method for Nighttime Image-Aided Beam Prediction

SUN Kunyang, YAO Rui, ZHU Hancheng, ZHAO Jiaqi, LI Xixi, HU Dianlin, HUANG Wei

2025, 47(12): 5156-5165. doi: 10.11999/JEIT250530

[Abstract](121) [FullText HTML](85) [PDF 2623KB](10)

Abstract:
The latency of traditional beam management in dynamic scenarios and the severe degradation of vision-aided beam prediction under adverse environmental conditions in millimeter-wave (mmWave) systems are addressed by a nighttime image-assisted beam prediction method based on Test-Time Adaptation (TTA). mmWave communications rely on massive Multiple-Input Multiple-Output (MIMO) technology to achieve high-gain narrow beam alignment. However, conventional beam scanning suffers from exponential complexity and latency, limiting applicability in high-mobility settings such as vehicular networks. Vision-assisted schemes that employ deep learning to map image features to beam parameters experience sharp performance loss in low-light, rainy, or foggy environments because of distribution shifts between training data and real-time inputs. In the proposed framework, a TTA mechanism is introduced to overcome the limitations of static inference by performing a single gradient back propagation across model parameters during inference on degraded images. This adaptation dynamically aligns cross-domain feature distributions without the need for adverse-condition data collection or annotation. An entropy minimization-based consistency strategy is further designed to enforce agreement between original and augmented views, guiding parameter updates toward higher confidence and lower uncertainty. Experiments on real nighttime scenarios demonstrate that the framework achieves a top-3 beam prediction accuracy of 93.01%, improving performance by nearly 20% over static inference and outperforming conventional low-light enhancement. By leveraging the semantic consistency of fixed-base-station deployments, this lightweight online adaptation improves robustness, providing a promising solution for efficient beam management in mmWave systems operating in complex open environments. Objective mmWave communication, a cornerstone of 5G and beyond, relies on massive MIMO architectures to counter severe path loss through high-gain narrow beam alignment. Traditional beam management schemes, based on exhaustive beam scanning and channel measurement, incur exponential complexity and latency on the order of hundreds of milliseconds, making them unsuitable for high-mobility scenarios such as vehicular networks. Vision-aided beam prediction has recently emerged as a promising alternative, using deep learning to map visual features (e.g., user location and motion) to optimal beam parameters. Although this approach achieves high accuracy under daytime conditions (>90%), it experiences sharp performance degradation in low-light, rainy, or foggy environments because of domain shifts between training data (typically daylight images) and real-time degraded inputs. Existing countermeasures depend on offline data augmentation, which is costly and provides limited generalization to unseen adverse environments. To overcome these limitations, this work proposes a lightweight online adaptation framework that dynamically aligns cross-domain features during inference, eliminating the need for pre-collected adverse-condition data. The objective is to enable robust mmWave communications in unpredictable environments, a necessary step toward practical deployment in autonomous driving and industrial IoT. Methods The proposed TTA method operates in three stages. First, a pre-trained beam prediction model with a ResNet-18 backbone is initialized using daylight images and labeled beam indices. During inference, real-time low-quality nighttime images are processed through two parallel pipelines: (1) the original view and (2) a data-augmented view incorporating Gaussian noise. A consistency loss is applied to minimize the prediction distance between the two views, enforcing robustness against local feature perturbations. In parallel, an entropy minimization loss sharpens the output probability distribution by penalizing high prediction uncertainty. These combined losses drive a single-step gradient back propagation that updates all model parameters. Through this mechanism, feature distributions between the training (daylight) and testing (nighttime) domains are aligned without altering global semantic representations, as illustrated in Fig. 2. The system architecture consists of a roadside base station equipped with an RGB camera and a N-element antenna array, which captures environmental data and executes real-time beam prediction. Results and Discussions Experiments on a real-world dataset demonstrate the effectiveness of the proposed method. Under nighttime conditions, the TTA framework achieves a Top-3 beam prediction accuracy of 93.01%, exceeding static inference (71.25%) and traditional low-light enhancement methods (85.27%) (Table 3). Ablation studies further validate the contributions of each component: the online feature alignment mechanism, optimized for small-batch data, significantly improves accuracy (Table 4), and the entropy minimization strategy with multi-view consistency learning provides additional gains (Table 5). As shown in Figure 4, the framework exhibits rapid convergence during online testing, enabling base stations to promptly recover performance when faced with new environmental disturbances. Conclusions This study addresses the limited robustness of existing vision-aided beam prediction methods in dynamically changing environments by introducing a TTA framework for nighttime image-assisted beam prediction. A small-batch adaptive feature alignment strategy is developed to mitigate feature mismatches in unseen domains while satisfying real-time communication constraints. Besides, a joint optimization framework integrates classical low-light image enhancement with multi-view consistency learning, thereby improving feature discrimination under complex lighting conditions. Experiments conducted on real-world data confirm the effectiveness of the proposed algorithm, achieving more than 20% higher Top-3 beam prediction accuracy compared with direct testing. These results demonstrate the framework’s robustness in dynamic environments and its potential to optimize vision-aided communication systems under non-ideal conditions. Future work will extend this approach to beam prediction under rain and fog, as well as to multi-modal perception-assisted communication systems.

Kepler’s Laws Inspired Single Image Detail Enhancement Algorithm

JIANG He, SUN Mang, ZHENG Zhou, WU Peilin, CHENG Deqiang, ZHOU Chen

2025, 47(12): 5166-5177. doi: 10.11999/JEIT250455

[Abstract](143) [FullText HTML](82) [PDF 9709KB](26)

Abstract:
Objective Single-image detail enhancement based on residual learning has received extensive attention in recent years. In these methods, the residual layer is updated by using the similarity between the residual layer and the detail layer, and it is then combined linearly with the original image to enhance image detail. This update process is a greedy algorithm, which tends to trap the system in local optima and limits overall performance. Inspired by Kepler’s laws, the residual update is treated as the dynamic adjustment of planetary positions. By applying Kepler’s laws and computing the global optimal position of the planets, precise updates of the residual layer are achieved. Methods The input image is partitioned into multiple blocks. For each block, its candidate blocks are treated as “planets”, and the best matching block is treated as a “star”. The positions of the “planets” and the “star” are updated by computing the differences between each “planet” and the original image block until the positions converge, which determines the location of the global optimal matching block. Results and Discussions In this study, 16 algorithms are tested on three datasets at two magnification levels (Table 1). The test results show that the proposed algorithm achieves strong performance in both PSNR and SSIM evaluations. During detail enhancement, compared with other algorithms, the proposed algorithm shows stronger edge preservation capability (Fig. 7). However, it is not robust to noise (Fig. 8–Fig. 10), and the performance of the enhanced images continues to decline as noise intensity increases (Fig. 11). Both the initial gravitational constant and the gravitational attenuation rate constant present a fluctuating trend, meaning they increase first and then decrease (Fig. 12). When the gradient loss and texture loss weights are set to 0.001, the KLDE system achieves its best performance (Fig. 13). Conclusions This study proposes a single-image detail enhancement algorithm inspired by Kepler’s laws. By treating the residual update process as the dynamic adjustment of planetary positions, the algorithm applies Kepler’s laws to optimize residual layer updates, reduces the tendency of greedy search to reach local optima, and achieves more precise image detail enhancement. Experimental results show that the algorithm performs better than existing methods in visual effects and quantitative metrics and produces natural enhancement results. The running time remains relatively long because the iterative update of candidate blocks and the calculation of parameters such as gravity form the main computational bottleneck. Future work will focus on optimizing the algorithm structure to reduce unnecessary searches and improve system efficiency. The algorithm does not require training and achieves strong performance, which indicates potential value in high-precision offline image enhancement scenarios.

Stealthy Path Planning Algorithm for UAV Swarm Based on Improved APF-RRT* Under Dynamic Threat

ZHANG Xinrui, SHI Chenguang, WU Zhifeng, WEN Wen, ZHOU Jianjiang

2025, 47(12): 5178-5191. doi: 10.11999/JEIT250554

[Abstract](137) [FullText HTML](87) [PDF 4287KB](11)

Abstract:
Objective The efficient penetration and survivability of Unmanned Aerial Vehicle (UAV) swarms in complex battlefield environments depend on robust trajectory planning. With the increasing deployment of advanced air defense systems, including radar networks, anti-aircraft artillery, and dynamic no-fly zones, conventional planning methods struggle to meet simultaneous requirements for stealth, feasibility, and safety. Although prior studies provide useful progress in UAV swarm path planning, several limitations remain. (1) Most research concentrates on detection models for single radars and does not account for the relation between UAV Radar Cross Section (RCS) and stealth trajectory optimization. (2) UAV kinematic constraints are often handled separately from stealth characteristics. (3) Environmental threats are commonly modeled as static and singular, which limits real-time adaptation to dynamic threats. (4) Stealth planning is also examined mainly for individual UAVs, with limited consideration of swarm-level coordination. This study addresses these gaps by proposing a cooperative stealth trajectory planning framework that integrates real-time threat perception with swarm dynamics optimization and strengthens survivability in contested airspace. Methods This study proposes a stealth path planning algorithm for UAV swarms based on an improved Artificial Potential Field (APF) and a Rapidly-exploring Random Trees star (RRT*) framework under dynamic threat conditions. A multi-threat environment model is first constructed to represent radars, anti-aircraft artillery, and fixed obstacles. A comprehensive stealth cost function is then developed by integrating UAV RCS characteristics and considering flight distance, radar detection probability, and artillery threat probability. A stealth trajectory optimization model is formulated to minimize the overall cost function under constraints on UAV kinematics, swarm coordination, and path feasibility. To solve this model efficiently, an enhanced APF-RRT* algorithm is designed. A rolling-window strategy is applied to enable continuous local replanning in response to dynamic threats. This approach supports real-time trajectory updates and improves responsiveness to sudden changes in the threat field. A target-biased sampling method is also used to reduce sampling redundancy and increase convergence speed. By combining the global search ability of RRT* with the local adaptability of APF, the method enables UAV swarms to generate stealth-optimal paths in real time while maintaining safety and coordination in adversarial environments. Results and Discussions Simulation experiments confirm the effectiveness of the proposed algorithm. During global path planning, several UAVs enter regions threatened by dynamic no-fly zones, radars, and artillery systems, although others reach their destinations through clear paths. In the local replanning phase, affected UAVs adjust their trajectories to reduce radar detection probability and overall stealth cost. When encountering mobile threats, UAVs execute lateral evasive maneuvers to prevent collisions and ensure mission completion. Under the comparison algorithms, the detection probabilities of the UAVs requiring replanning all exceed the specified threshold for networked radar detection, which shows that these methods do not generate UAV swarm trajectories that satisfy platform safety requirements and therefore fail in practical settings. Comparative simulations show that the proposed method yields lower stealth costs and improves trajectory feasibility and swarm coordination. The algorithm achieves swarm-level stealth optimization and ensures safe and efficient penetration in dynamic environments. Conclusions This study addresses stealth trajectory planning for UAV swarms in dynamic threat environments by proposing an improved APF-RRT* algorithm. The following key findings are obtained from extensive simulations conducted in different contested scenarios (Section 5): (1) The proposed algorithm reduces the voyage distance by 11.1 km in Scene 1 and 66.9 km in Scene 2 compared with the baseline RRT* method (Tab. 3, Tab. 5). This reduction is primarily due to RCS-minimizing attitude adjustments produced through heading angle change (Fig. 3, Fig. 6). (2) The networked radar detection probability remains below the 30 percent threshold for all UAVs (Fig. 4(a), Fig. 7(a)), whereas the comparison algorithms exceed the safety limit for up to 98 percent of the group members (Fig. 4(b), Fig. 7(b), Fig. 9(a), Fig. 9(b)). (3) The rolling-window replanning mechanism supports real-time avoidance of mobile threats such as dynamic no-fly zones and anti-aircraft artillery (Fig. 5, Fig. 8), while reducing the comprehensive trajectory cost by 9.0 percent in Scene 1 and 15.6 percent in Scene 2 compared with the baseline RRT method (Tab. 3, Tab. 5). (4) Cooperative constraints embedded in the planning algorithm maintain safe inter-UAV separation and optimize swarm-level stealth performance (Fig. 2, Fig. 5, Fig. 8). These findings show the superiority of the proposed method in balancing stealth optimization, dynamic threat adaptation, and swarm kinematic feasibility. Future work will extend this framework to three-dimensional complex terrain and integrate deep reinforcement learning to strengthen predictive threat response and battlefield adaptability.

Multimodal Hypergraph Learning Guidance with Global Noise Enhancement for Sentiment Analysis under Missing Modality Information

HUANG Chen, LIU Huijie, ZHANG Yan, YANG Chao, SONG Jianhua

2025, 47(12): 5192-5202. doi: 10.11999/JEIT250649

[Abstract](215) [FullText HTML](97) [PDF 2292KB](33)

Abstract:
Objective Multimodal Sentiment Analysis (MSA) has shown considerable promise in interdisciplinary domains such as Natural Language Processing (NLP) and Affective Computing, particularly by integrating information from ElectroEncephaloGraphy (EEG) signals, visual images, and text to classify sentiment polarity and provide a comprehensive understanding of human emotional states. However, in complex real-world scenarios, challenges including missing modalities, limited high-level semantic correlation learning across modalities, and the lack of mechanisms to guide cross-modal information transfer substantially restrict the generalization ability and accuracy of sentiment recognition models. To address these limitations, this study proposes a Multimodal Hypergraph Learning Guidance method with Global Noise Enhancement (MHLGNE), designed to improve the robustness and performance of MSA under conditions of missing modality information in complex environments. Methods The overall architecture of the MHLGNE model is illustrated in Fig. 2 and consists of the Adaptive Global Noise Sampling Module, the Multimodal Hypergraph Learning Guiding Module, and the Sentiment Prediction Target Module. A pretrained language model is first applied to encode the multimodal input data. To simulate missing modality conditions, the input data are constructed with incomplete modal information, where a modality

\begin{document}$ m\in \{{\mathrm{e,v,t}}\} $\end{document}

is randomly absent. The adaptive global noise sampling strategy is then employed to supplement missing modalities from a global perspective, thereby improving adaptability and enhancing both robustness and generalization in complex environments. This design allows the model to handle noisy data and missing modalities more effectively. The Multimodal Hypergraph Learning Guiding Module is further applied to capture high-level semantic correlations across different modalities, overcoming the limitations of conventional methods that rely only on feature alignment and fusion. By guiding cross-modal information transfer, this module enables the model to focus on essential inter-modal semantic dependencies, thereby improving sentiment prediction accuracy. Finally, the performance of MHLGNE is compared with that of State-Of-The-Art (SOTA) MSA models under two conditions: complete modality data and randomly missing modality information. Results and Discussions Three publicly available MSA datasets (SEED-IV, SEED-V, and DREAMER) are employed, with features extracted from EEG signals, visual images, and text. To ensure robustness, standard cross-validation is applied, and the training process is conducted with iterative adjustments to the noise sampling strategy, modality fusion method, and hypergraph learning structure to optimize sentiment prediction. Under the complete modality condition, MHLGNE is observed to outperform the second-best M2S model across most evaluation metrics, with accuracy improvements of 3.26%, 2.10%, and 0.58% on SEED-IV, SEED-V, and DREAMER, respectively. Additional metrics also indicate advantages over other SOTA methods. Under the random missing modality condition, MHLGNE maintains superiority over existing MSA approaches, with improvements of 1.03% in accuracy, 0.24% in precision, and 0.08 in Kappa score. The adaptive noise sampling module is further shown to effectively compensate for missing modalities. Unlike conventional models that suffer performance degradation under such conditions, MHLGNE maintains robustness by generating complementary information. In addition, the multimodal hypergraph structure enables the capture of high-level semantic dependencies across modalities, thereby strengthening cross-modal information transfer and offering clear advantages when modalities are absent. Ablation experiments confirm the independent contributions of each module. The removal of either the adaptive noise sampling or the multimodal hypergraph learning guiding module results in notable performance declines, particularly under high-noise or severely missing modality conditions. The exclusion of the cross-modal information transfer mechanism causes a substantial decline in accuracy and robustness, highlighting its essential role in MSA. Conclusions The MHLGNE model, equipped with the Adaptive Global Noise Sampling Module and the Multimodal Hypergraph Learning Guiding Module, markedly improves the performance of MSA under conditions of missing modalities and in tasks requiring effective cross-modal information transfer. Experiments on SEED-IV, SEED-V, and DREAMER confirm that MHLGNE exceeds SOTA MSA models across multiple evaluation metrics, including accuracy, precision, Kappa score, and F1 score, thereby demonstrating its robustness and effectiveness. Future work may focus on refining noise sampling strategies and developing more sophisticated hypergraph structures to further strengthen performance under extreme modality-missing scenarios. In addition, this framework has the potential to be extended to broader sentiment analysis tasks across diverse application domains.

Quality Map-guided Fidelity Compression Method for High-energy Regions of Spectral Data

LIU Xiangli, LI Zan, CHEN Yifeng, CHEN Le

2025, 47(12): 5203-5213. doi: 10.11999/JEIT250650

[Abstract](138) [FullText HTML](70) [PDF 5903KB](15)

Abstract:
Objective In the context of intelligent evolution in communication and radar technologies, inefficiency in Radio Frequency (RF) data compression represents a critical bottleneck that restricts transmission bandwidth expansion and system energy efficiency improvement. Conventional compression methods fail to balance compression ratio and reconstruction accuracy in complex scenarios characterized by non-uniform energy distribution. This study aims to address fidelity compression of spectral data with non-uniform energy distribution by developing a quality map-guided method that preserves high-energy regions and improves the adaptability of RF signal processing in complex environments. Methods A quality map-guided fidelity compression method is proposed. A three-dimensional energy mask is constructed to dynamically guide the encoder and enhance features in high-energy regions. Multi-level complex convolution and inverted residual connections are adopted for efficient feature extraction and reconstruction. The quality map is derived from local energy and amplitude variations of RF signals by fusing energy proportion and variation as structured prior information. A rate-distortion joint optimization loss function is designed by integrating weighted mean squared error, complex correlation loss, and phase difference loss, with learnable parameters used to balance competing objectives (Fig. 1). The compression network follows an encoder-decoder framework that incorporates quality map extractors, deep encoders and decoders, and entropy coding. Complex convolution, residual spatial feature transformation for multi-scale and high-frequency feature preservation, and gated normalization for low-energy noise suppression are employed (Figs. 2～6). Results and Discussions Experiments conducted on the public dataset RML2018.01a demonstrate the superiority of the proposed method. Reconstruction accuracy: Visual comparisons of real and imaginary components and amplitude spectra show strong overlap between reconstructed and original signals (Figs. 7, 8), with reconstruction errors mainly concentrated in low-energy regions. The Peak Signal-to-Noise Ratio (PSNR) remains ≥35 dB across the tested –4 to 20 dB signal-to-Noise Ratio (SNR) range, confirming robust performance even under extremely low signal-to-noise conditions (Fig. 9). Ablation experiments: Removal of the quality map guidance mechanism results in significant reconstruction errors in high-energy regions, reflected by lower PSNR, higher Mean Relative Error (MRE), and reduced correlation coefficients compared with the complete method (Fig. 9). These results confirm the critical role of the quality map in preserving high-energy features. Comparative analysis: Relative to conventional methods, including LFZip and CORAD, the proposed method achieves superior performance at –4 dB SNR, with higher PSNR (35.75 dB vs. ≤29.45 dB), lower MRE (6.91% vs. ≥8.45%), and stronger correlation coefficients (0.898 vs. ≤0.832), at the expense of a slightly lower compression ratio (Table 1). Self-built dataset validation: To evaluate adaptability to practical complex scenarios, supplementary experiments are performed using a MATLAB-simulated dataset (Table 3) comprising five modulation schemes (BPSK, QPSK, 8PSK, 16QAM, and 64QAM), an additive white Gaussian noise plus Rayleigh fading channel, SNRs from –4 to 20 dB with a 6 dB step, 25 000 samples, and an 8:1:1 data split. Under fading channels, the proposed method continues to outperform baseline methods at –4 dB SNR, achieving a PSNR of 34.61 dB (vs. 28.46/27.88 dB), MRE of 7.53% (vs. 9.00%/9.38%), and a correlation coefficient of 0.885 (vs. 0.821/0.808; Table 3), with optimal rate-distortion performance observed across all compression ratios (Fig. 11). The slight performance degradation relative to RML2018.01a is attributed to Rayleigh fading-induced energy dispersion. Consistent superiority across datasets confirms strong robustness to non-uniform energy distribution and complex channel characteristics in practical applications. Conclusions A quality map-guided fidelity compression method for frequency-domain RF data is presented to address challenges caused by non-uniform energy distribution. High-energy region features are effectively preserved through dynamic feature enhancement and multi-dimensional loss optimization. Experimental results demonstrate advantages in reconstruction accuracy and noise resistance, providing a viable framework for high-fidelity compression of complex RF signals in communication and radar systems. Future work will extend the method to real-time processing scenarios and incorporate physical-layer constraints to further enhance practical applicability.

A Vehicle-Infrastructure Cooperative 3D Object Detection Scheme Based on Adaptive Feature Selection

LIANG Yan, YANG Huilin, SHAO Kai

2025, 47(12): 5214-5225. doi: 10.11999/JEIT250601

[Abstract](145) [FullText HTML](77) [PDF 10180KB](17)

Abstract:
Objective Vehicle-infrastructure cooperative Three-Dimensional (3D) object detection is viewed as a core technology for intelligent transportation systems. As autonomous driving advances, the fusion of roadside and vehicle-mounted LiDAR data provides beyond-line-of-sight perception for vehicles, offering clear potential for improving traffic safety and efficiency. Conventional cooperative perception, however, is constrained by limited communication bandwidth and insufficient aggregation of heterogeneous data, which restricts the balance between detection performance and bandwidth usage. These constraints hinder the practical deployment of cooperative perception in complex traffic environments. This study proposes an Adaptive Feature Selection-based Vehicle-Infrastructure Cooperative 3D Object Detection Scheme (AFS-VIC3D) to address these challenges. Spatial filtering theory is used to identify and transmit the critical features required for detection, improving 3D perception performance while reducing bandwidth consumption. Methods AFS-VIC3D uses a coordinated design for roadside and vehicle-mounted terminals. Incoming point clouds are encoded into Bird’s-Eye View (BEV) features through PointPillar encoders, and metadata synchronization ensures spatiotemporal alignment. At the roadside terminal, key features are selected using two parallel branches: a Graph Structure Feature Enhancement Module (GSFEM) and an Adaptive Communication Mask Generation Module (ACMGM). Multi-scale features are then extracted hierarchically with a ResNet backbone. The outputs of both branches are fused through elementwise multiplication to generate optimized features for transmission. At the vehicle-mounted terminal, BEV features are processed using homogeneous backbones and fused through a Multi-Scale Feature Aggregation (MSFA) module across scale, spatial, and channel dimensions, reducing sensor heterogeneity and improving detection robustness. Results and Discussions The effectiveness and robustness of AFS-VIC3D are validated on both the DAIRV2X real-world dataset and the V2XSet simulation dataset. Comparative experiments (Table 1, Fig. 5) show that the model attains higher detection accuracy with lower communication overhead and exhibits slower degradation under low-bandwidth conditions. Ablation studies (Table 2) demonstrate that each module (GSFEM, ACMGM, and MSFA) contributes to performance. GSFEM improves the discriminability of target features, and ACMGM used with GSFEM further reduces communication cost. A comparison of feature transmission methods (Table 3) shows that adaptive sampling based on scene complexity and target density (C-DASFAN) yields higher accuracy and lower bandwidth usage, confirming the advantage of ACMGM. BEV visualizations (Fig. 6) indicate that predicted bounding boxes align closely with ground truth with minimal redundancy. Analysis of complex scenarios (Fig. 7) shows fewer missed detections and false positives, demonstrating robustness in high-density and complex road environments. Feature-level visualization (Fig. 8) further verifies that GSFEM and ACMGM enhance target features and suppress background noise, improving overall detection performance. Conclusions This study presents an AFS-VIC3D that addresses the key challenges of limited communication bandwidth and heterogeneous data aggregation through a coordinated design combining roadside dual-branch feature optimization and vehicle-mounted MSFA. The GSFEM module uses graph neural networks to enhance the discriminability of target features, the ACMGM module optimizes communication resources through communication mask generation, and the MSFA module improves heterogeneous data aggregation between vehicle and infrastructure terminals through joint spatial and channel aggregation. Experiments on the DAIR-V2X and V2XSet datasets show that AFS-VIC3D improves 3D detection accuracy while lowering communication overhead, with clear advantages in complex traffic scenarios. The framework offers a practical and effective solution for vehicle-infrastructure cooperative 3D perception and demonstrates strong potential for deployment in bandwidth-constrained intelligent transportation systems.

Circuit and System Design

Hybrid Vibration Isolation Design Based on Piezoelectric Actuator and Quasi-zero Stiffness System

YANG Liu, ZHAO Haiyang, ZHAO Kun, CHENG Jiajia, LI Dongjie

2025, 47(12): 5226-5235. doi: 10.11999/JEIT250310

[Abstract](400) [FullText HTML](247) [PDF 7170KB](20)

Abstract:
Objective Precision instruments now operate under increasingly demanding vibration conditions, and conventional passive isolation methods are insufficient for maintaining stable laboratory environments. Vibrations generated by personnel movement, machinery operation, and vehicle transit can travel long distances and penetrate structural materials, reaching instrument platforms and reducing measurement accuracy, stability, and reliability. Passive isolation units such as rubber elements and springs show limited performance when dealing with low-frequency and small-amplitude excitation. Quasi-Zero Stiffness (QZS) systems improve low-frequency isolation but their performance depends on amplitude and requires strict installation accuracy. Active vibration isolation uses controlled actuators between the vibration source and the support structure to reduce disturbances. Piezoelectric ceramics offer high precision and rapid response, and are widely applied in such systems. Purely active isolation, however, may perform poorly at high frequencies due to sensor sampling limitations and actuator response bandwidth. High-frequency or large-amplitude excitation also results in high actuator energy demand, while the hysteresis characteristics of piezoelectric ceramics reduce control precision. Combining active and passive approaches is therefore an effective strategy for ensuring vibration stability in precision laboratory applications. Methods A hybrid vibration isolation strategy is developed by integrating a piezoelectric actuator with a QZS mechanism. A stacked piezoelectric ceramic actuator is designed to generate the required output force and displacement, and elastic spacers are used to apply a preload that improves operational stability and linearity. The QZS system is formed by combining positive and negative stiffness components to achieve high static stiffness with low dynamic stiffness. To address hysteresis in the piezoelectric actuator, an improved Bouc-Wen (B-W) model is adopted and an inverse model is constructed to enable hysteresis compensation. The actuator is then coupled with the QZS structure, and the vibration isolation performance of the hybrid system is assessed through numerical simulation. Results and Discussions An active-passive vibration isolation device is developed, comprising a QZS system formed by linear springs and an active piezoelectric stack actuator (Fig. 9a). Because the traditional B-W algorithm does not accurately describe the dynamic relationship between acceleration and voltage, a voltage-derivative term (Equation 13) is introduced to improve the conventional model. This modification refines the force-voltage representation, enhances model adaptability, and enables accurate description of the acceleration-voltage response over a broader operating range. Forward model parameters are identified using the differential evolution algorithm (Table 1), and an inverse model is constructed through direct inversion with parameters obtained using the same optimization method (Table 2). The forward and inverse modules are then cascaded to compensate for hysteresis (Fig. 8). Dynamic equations for the QZS system and the linearized piezoelectric actuator are derived (Equation 16). An adaptive sliding-mode controller incorporating a Luenberger sliding-mode observer is subsequently designed to regulate vibration signals, and active isolation performance is verified. Conclusions The proposed hybrid vibration isolation design integrates the passive low-frequency isolation capability of the QZS system with the active control potential of the piezoelectric actuator, offering a feasible approach for vibration suppression in precision instruments. The hysteresis behavior of piezoelectric ceramics is characterized and fitted effectively, and an inverse model is established to compensate for the nonlinear voltage-acceleration response. A dynamic model of the combined passive-active configuration is derived, and vibration signals are regulated using adaptive sliding-mode control with a Luenberger sliding-mode observer. The resulting system demonstrates stable vibration reduction, indicating strong applicability and research value.

Dataset Papers

Visible Figure Part of Multi-source Maritime Ship Dataset

CUI Yaqi, ZHOU Tian, XIONG Wei, XU Saifei, LIN Chuanqi, XIA Shutao, SUN Weiwei, TANG Tiantian, ZHANG Jie, GUO Hengguang, SONG Penghan, HUAN Yingchun, ZHANG Zhenjie

2025, 47(12): 5236-5250. doi: 10.11999/JEIT250138

[Abstract](627) [FullText HTML](190) [PDF 6796KB](136)

Abstract:
Objective The increasing intensity of marine resource development and maritime operations has heightened the need for accurate vessel detection under complex marine conditions, which is essential for protecting maritime rights and interests. In recent years, object detection algorithms based on deep learning—such as YOLO and Faster R-CNN—have emerged as key methods for maritime target perception due to their strong feature extraction capabilities. However, their performance relies heavily on large-scale, high-quality training data. Existing general-purpose datasets, such as COCO and PASCAL VOC, offer limited vessel classes and predominantly feature static, urban, or terrestrial scenes, making them unsuitable for marine environments. Similarly, specialized datasets like SeaShips and the Singapore Marine Dataset (SMD) suffer from constraints such as limited data sources, simple scenes, small sample sizes, and incomplete coverage of marine target categories. These limitations significantly hinder further performance improvement of detection algorithms. Therefore, the development of large-scale, multimodal, and comprehensive marine-specific datasets represents a critical step toward resolving current application challenges. This effort is urgently needed to strengthen marine monitoring capabilities and ensure operational safety at sea. Methods To overcome the aforementioned challenges, a multi-sensor marine target acquisition system integrating radar, visible-light, infrared, laser, Automatic Identification System (AIS), and Global Positioning System (GPS) technologies is developed. A two-month shipborne observation campaign is conducted, yielding 200 hours of maritime monitoring and over 90 TB of multimodal raw data. To efficiently process this large volume of low-value-density data, a rapid annotation pipeline is designed, combining automated labeling with manual verification. Iterative training of intelligent annotation models, supplemented by extensive manual correction, enables the construction of the Visible Figure Part of the Multi-Source Maritime Ship Dataset (MSMS-VF). This dataset comprises 265 233 visible-light images with 1 097 268 bounding boxes across nine target categories: passenger ship, cargo vessel, speedboat, sailboat, fishing boat, buoy, floater, offshore platform, and others. Notably, 55.88% of targets are small, with pixel areas below 1 024. The dataset incorporates diverse environmental conditions including backlighting, haze, rain, and occlusion, and spans representative maritime settings such as harbor basins, open seas, and navigation channels. MSMS-VF offers a comprehensive data foundation for advancing maritime target detection, recognition, and tracking research. Results and Discussions The MSMS-VF dataset exhibits substantially greater diversity than existing datasets (Table 1, Table 2). Small targets, including buoys and floaters, occur frequently (Table 5), posing significant challenges for detection. Five object detection models—YOLO series, Real-Time Detection Transformer (RT-DETR), Faster R-CNN, Single Shot MultiBox Detector (SSD), and RetinaNet—are assessed, together with five multi-object tracking algorithms: Simple Online and Realtime Tracking (SORT), Optimal Compute for SORT (OC-SORT), DeepSORT, ByteTrack, and MotionTrack. YOLO models exhibit the most favorable trade-off between speed and accuracy. YOLOv11 achieves a mAP50 of 0.838 on the test set and a processing speed of 34.43 fps (Table 6). However, substantial performance gaps remain for small targets; for instance, YOLOv11 yields a mAP50 of 0.549 for speedboats, markedly lower than the 0.946 obtained for large targets such as cargo vessels (Table 7). RT-DETR shows moderate performance on small objects, achieving a mAP50 of 0.532 for floaters, whereas conventional models like Faster R-CNN perform poorly, with mAP50 values below 0.1. For tracking, MotionTrack performs best under low-frame-rate conditions, achieving a MOTA of 0.606, IDF1 of 0.750, and S of 0.681 using a Gaussian distance cascade-matching strategy (Table 8, Fig. 13). Conclusions This study presents the MSMS-VF dataset, which offers essential data support for maritime perception research through its integration of multi-source inputs, diverse environmental scenarios, and a high proportion of small targets. Experimental validation confirms the dataset’s utility in training and evaluating state-of-the-art algorithms, while also revealing persistent challenges in detecting and tracking small objects under dynamic maritime conditions. Nevertheless, the dataset has limitations. The current data are predominantly sourced from waters near Yantai, leading to imbalanced ship-type representation and the absence of certain vessel categories. Future efforts will focus on expanding data acquisition to additional maritime regions, broadening the scope of multi-source data collection, and incrementally releasing extended components of the dataset to support ongoing research.