Multi-scale Frequency Adapter and Dual-path Attention for Time Series Forecasting
-
摘要: 现有的主流时序预测方法在多尺度建模与频域特征提取方面,难以协同应对数据中复杂的周期性模式与局部动态变化,导致无法充分捕获关键时序特性。针对此问题,提出了一种基于多尺度频域适配器和双路注意力(Multi-scale Frequency Adapter and Dual-path Attention, MFADA)的时序预测方法。该方法采用多尺度频域适配器(Multi-scale Frequency Adapter, MFA)自适应提取时序数据的关键频率成分,获得其全局周期性先验。此外,还通过多尺度双路注意力(Multi-scale Dual-path Attention, MDA)机制,将频域先验嵌入时序与特征两条路径,实现跨粒度的动态协同建模,以增强对时序数据复杂演化规律的刻画能力。实验结果表明,提出的MFADA在8个公开时序数据集上显著超越现有主流预测方法,在预测精度与计算效率方面均取得优异表现,验证了提出的“频域引导—时域协同”框架的有效性和优越性,为复杂时序任务提供了新思路和解决方案。Abstract:
Objective With the rapid development of big data technology, time series data has been increasingly applied in areas such as meteorology, power systems, and finance. Nonetheless, mainstream methods for time series forecasting face notable challenges in multi-scale modeling and frequency-domain feature extraction, which prevents the comprehensive capture of crucial dynamic properties and periodic patterns in complex datasets. Traditional statistical approaches, including ARIMA, rely on assumptions of linear relationships, resulting in poor performance when handling nonlinear or high-dimensional time series data. Although deep learning methods, notably those based on convolutional neural network and Transformer, have improved forecasting accuracy through advanced feature extraction and long-range dependency modeling, limitations remain in the ability to efficiently extract and fuse multi-scale features, both in the temporal and frequency domains. These deficiencies lead to instability and suboptimal accuracy, particularly in dynamic and high-variety applications. This paper aims to address these challenges by proposing an intelligent forecasting framework that effectively models multi-scale information and enhances prediction accuracy in diverse scenarios. Methods The proposed method introduces a multi-scale frequency adapter and dual-path attention (MFADA) framework for time series forecasting. The framework integrates the multi-scale frequency adapter (MFA) and the multi-scale dual-path attention (MDA) two key modules. The MFA module efficiently captures multi-scale frequency features using the adaptive pooling and deep convolutions, which enhances the sensitivity to various frequency components and supports modeling of short-term and long-term dependencies. The MDA module applies a multi-scale attention mechanism to strengthen fine-grained modeling across both the temporal and feature dimensions, enabling effective extraction and fusion of comprehensive time and frequency information. The entire framework is designed with computational efficiency in mind to ensure scalability. Experimental validation on 8 public datasets demonstrates the superior performance and robustness compared to existing mainstream time series forecasting approaches. Results and Discussions Extensive experiments were conducted on 8 publicly available multivariate datasets, including ECL, Weather, ETT (ETTm1, ETTm2, ETTh1, ETTh2), Solar-Energy, and Traffic. The evaluation metrics used were mean absolute error (MAE) and mean squared error (MSE), with additional consideration given to parameter count, FLOPs, and training time for computational efficiency. Experimental comparisons with state-of-the-art models including Fredformer, Peri-midFormer, iTransformer, TFformer, PatchTST、MSGNet、TimesNet、TCM, show that the proposed MFADA consistently achieves superior forecasting performance across most datasets and forecasting horizons ( Table 1 ), with the best average MSE and MAE of 0.163 and 0.261 on ECL and a 13.2% and 17.3% decrease versus TimesNet for forecasting length 96. On the periodic ETTm1 dataset, the average MSE reaches 0.377, outperforming MSGNet by 5.3%. Ablation studies (Table 2 ) demonstrate the importance of both MFA and MDA modules: removing MFA or reverting MDA to standard self-attention increases error rates on ECL, Weather, ETTh1, and ETTh2, indicating the synergistic contribution to modeling complexity. Complexity analysis (Fig. 2 ) reveals that MFADA achieves optimal balance among forecasting accuracy, parameter efficiency, and training time, outperforming Fredformer, MSGNet, and TimesNet. Visualization results for ECL and ETTh2 (Fig. 3 ,Fig. 4 ) confirm the ability of MFADA to track ground truth trends, forecast turning points, and outperform baselines in both global and local prediction fidelity. Notably, MFADA performance lags on the Traffic dataset due to its high spatial correlation, highlighting future directions for spatial structure integration.Conclusions This paper proposes MFADA, a novel time series forecasting method integrating multi-scale frequency adaptation and dual-path attention mechanisms. MFADA stands out with four key strengths: (1) The MFA module effectively extracts and merges multi-scale frequency-domain features, emphasizing diverse temporal scales through pyramid pooling and channel gating; (2) The MDA module captures multi-scale dependencies along both temporal and feature dimensions, enabling fine-grained dynamic modeling; (3) The architecture maintains computational efficiency using lightweight convolution and pooling operations; (4) Superior results across 8 datasets and various forecasting lengths demonstrate robust generalization, especially for multivariate and long-term forecasting scenarios. The extensive experiments confirm that MFADA advances the state-of-the-art in accurate and efficient time series forecasting, offering promising perspectives for both academic research and practical deployment. Future work will explore spatial correlation integration to further enhance model applicability. -
Key words:
- time series forecasting /
- multi-scale /
- frequency adapter /
- dual-path attention /
- attention mechanism
-
表 1 对比实验结果
MFADA Fredformer Peri-midFormer iTransformer TFformer PatchTST MSGNet TimesNet TCM DLinear MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE ECL 96 0.139 0.234 0.147 0.241 0.141 0.235 0.148 0.239 0.151 0.251 0.181 0.270 0.165 0.274 0.168 0.272 0.153 0.253 0.197 0.282 192 0.152 0.248 0.163 0.257 0.157 0.249 0.167 0.258 0.165 0.264 0.188 0.274 0.184 0.292 0.184 0.289 0.171 0.269 0.196 0.285 336 0.166 0.266 0.180 0.276 0.174 0.267 0.179 0.272 0.180 0.278 0.204 0.293 0.195 0.302 0.198 0.300 0.183 0.283 0.209 0.301 720 0.196 0.292 0.213 0.302 0.205 0.296 0.211 0.300 0.213 0.302 0.246 0.324 0.231 0.332 0.220 0.320 0.217 0.311 0.245 0.333 Avg 0.163 0.261 0.176 0.269 0.169 0.262 0.176 0.267 0.177 0.274 0.205 0.29 0.194 0.300 0.192 0.295 0.181 0.279 0.212 0.300 Weather 96 0.153 0.201 0.160 0.205 0.172 0.218 0.176 0.216 0.172 0.220 0.177 0.218 0.163 0.212 0.172 0.22 0.153 0.202 0.196 0.255 192 0.205 0.248 0.208 0.249 0.217 0.256 0.225 0.257 0.219 0.259 0.225 0.259 0.212 0.254 0.219 0.261 0.203 0.249 0.237 0.296 336 0.263 0.290 0.265 0.291 0.276 0.298 0.281 0.299 0.275 0.298 0.278 0.297 0.272 0.299 0.28 0.306 0.263 0.294 0.283 0.335 720 0.340 0.340 0.343 0.341 0.356 0.349 0.358 0.350 0.350 0.347 0.354 0.348 0.350 0.348 0.365 0.359 0.344 0.345 0.345 0.381 Avg 0.240 0.270 0.244 0.272 0.255 0.280 0.260 0.280 0.254 0.281 0.259 0.281 0.246 0.278 0.259 0.287 0.241 0.273 0.265 0.317 ETTm1 96 0.321 0.362 0.328 0.363 0.331 0.368 0.342 0.377 0.334 0.370 0.329 0.367 0.319 0.366 0.338 0.375 0.311 0.352 0.345 0.372 192 0.354 0.378 0.367 0.382 0.372 0.390 0.383 0.396 0.373 0.390 0.367 0.385 0.376 0.397 0.374 0.387 0.368 0.384 0.38 0.389 336 0.384 0.400 0.395 0.403 0.411 0.420 0.418 0.418 0.405 0.417 0.399 0.410 0.417 0.422 0.410 0.411 0.395 0.402 0.413 0.413 720 0.449 0.438 0.454 0.440 0.472 0.453 0.487 0.457 0.471 0.453 0.454 0.439 0.481 0.458 0.478 0.450 0.462 0.440 0.474 0.453 Avg 0.377 0.395 0.386 0.397 0.397 0.408 0.408 0.412 0.396 0.408 0.387 0.400 0.398 0.411 0.400 0.406 0.384 0.395 0.403 0.407 ETTm2 96 0.177 0.260 0.178 0.261 0.178 0.260 0.186 0.272 0.176 0.261 0.175 0.259 0.247 0.307 0.187 0.267 0.173 0.258 0.193 0.292 192 0.241 0.300 0.244 0.303 0.248 0.306 0.254 0.314 0.245 0.305 0.241 0.302 0.312 0.346 0.249 0.309 0.246 0.306 0.284 0.362 336 0.299 0.339 0.302 0.341 0.308 0.342 0.316 0.351 0.304 0.342 0.305 0.343 0.314 0.348 0.321 0.351 0.302 0.341 0.369 0.427 720 0.394 0.395 0.397 0.396 0.419 0.404 0.414 0.407 0.400 0.398 1.730 1.042 0.414 0.403 0.408 0.522 0.406 0.400 0.421 0.415 Avg 0.278 0.323 0.280 0.325 0.288 0.328 0.292 0.336 0.281 0.327 0.613 0.487 0.322 0.351 0.358 0.404 0.282 0.326 0.350 0.401 ETTh1 96 0.367 0.392 0.376 0.394 0.380 0.400 0.387 0.405 0.370 0.394 0.414 0.419 0.389 0.411 0.384 0.402 0.374 0.395 0.376 0.400 192 0.431 0.424 0.439 0.425 0.433 0.432 0.441 0.436 0.432 0.425 0.460 0.445 0.442 0.418 0.436 0.429 0.436 0.421 0.42 0.432 336 0.472 0.439 0.473 0.440 0.480 0.453 0.491 0.462 0.475 0.443 0.501 0.466 0.480 0.468 0.491 0.469 0.475 0.442 0.481 0.459 720 0.479 0.461 0.490 0.466 0.547 0.511 0.509 0.494 0.481 0.463 0.500 0.488 0.494 0.488 0.521 0.500 0.476 0.463 0.478 0.453 Avg 0.437 0.429 0.445 0.432 0.460 0.449 0.457 0.449 0.440 0.431 0.469 0.454 0.451 0.446 0.458 0.450 0.440 0.430 0.433 0.447 ETTh2 96 0.290 0.341 0.293 0.343 0.296 0.342 0.301 0.350 0.294 0.344 0.302 0.348 0.328 0.371 0.34 0.374 0.294 0.346 0.333 0.387 192 0.364 0.388 0.370 0.390 0.392 0.406 0.380 0.399 0.375 0.391 0.388 0.400 0.402 0.414 0.402 0.414 0.383 0.399 0.477 0.476 336 0.379 0.406 0.385 0.413 0.428 0.434 0.424 0.432 0.388 0.415 0.426 0.433 0.435 0.443 0.412 0.424 0.413 0.424 0.594 0.541 720 0.407 0.431 0.419 0.439 0.479 0.470 0.430 0.447 0.423 0.440 0.431 0.446 0.417 0.441 0.462 0.468 0.427 0.440 0.831 0.657 Avg 0.360 0.391 0.367 0.396 0.399 0.413 0.384 0.407 0.370 0.398 0.387 0.407 0.396 0.417 0.414 0.427 0.379 0.402 0.559 0.515 Solar-
Energy96 0.191 0.225 0.195 0.251 0.198 0.251 0.208 0.238 0.197 0.252 0.234 0.286 0.228 0.263 0.25 0.292 0.312 0.399 0.290 0.378 192 0.221 0.251 0.227 0.259 0.237 0.259 0.240 0.264 0.228 0.260 0.267 0.31 0.248 0.275 0.296 0.318 0.339 0.416 0.320 0.398 336 0.252 0.287 0.247 0.275 0.248 0.276 0.249 0.274 0.253 0.287 0.290 0.315 0.291 0.301 0.319 0.330 0.368 0.430 0.353 0.415 720 0.245 0.281 0.253 0.283 0.261 0.283 0.250 0.275 0.252 0.283 0.289 0.317 0.291 0.306 0.338 0.337 0.370 0.425 0.356 0.413 Avg 0.228 0.261 0.230 0.267 0.236 0.267 0.237 0.263 0.233 0.271 0.270 0.307 0.265 0.286 0.301 0.319 0.347 0.417 0.330 0.400 Traffic 96 0.421 0.290 0.404 0.274 0.435 0.294 0.392 0.268 0.525 0.342 0.462 0.295 0.594 0.336 0.593 0.321 0.508 0.342 0.65 0.396 192 0.443 0.292 0.427 0.288 0.451 0.299 0.413 0.277 0.514 0.346 0.466 0.296 0.615 0.347 0.617 0.336 0.609 0.387 0.598 0.370 336 0.464 0.320 0.440 0.294 0.463 0.302 0.425 0.283 0.531 0.357 0.482 0.304 0.623 0.351 0.629 0.336 0.640 0.402 0.605 0.373 720 0.487 0.329 0.466 0.307 0.496 0.321 0.460 0.301 0.569 0.373 0.514 0.322 0.608 0.343 0.640 0.35 0.715 0.442 0.645 0.394 Avg 0.454 0.308 0.434 0.291 0.461 0.304 0.422 0.282 0.535 0.355 0.481 0.304 0.610 0.344 0.620 0.336 0.618 0.393 0.625 0.383 表 2 消融实验结果
模型 ECL Weather ETTh1 ETTh2 MSE MAE MSE MAE MSE MAE MSE MAE Fredformer 0.176 0.269 0.244 0.272 0.445 0.432 0.367 0.396 w/o MFA 0.172 0.266 0.242 0.272 0.439 0.431 0.361 0.392 Re MDA 0.170 0.265 0.243 0.271 0.440 0.430 0.363 0.394 MFADA 0.163 0.261 0.240 0.270 0.437 0.429 0.360 0.391 -
[1] KONG Xiangjie, CHEN Zhenghao, LIU Weiyao, et al. Deep learning for time series forecasting: A survey[J]. International Journal of Machine Learning and Cybernetics, 2025, 16(5): 5079–5112. doi: 10.1007/s13042-025-02560-w. [2] ZHONG Weiyi, ZHAI Dengshuai, XU Wenran, et al. Accurate and efficient daily carbon emission forecasting based on improved ARIMA[J]. Applied Energy, 2024, 376: 124232. doi: 10.1016/j.apenergy.2024.124232. [3] 潘金伟, 王乙乔, 钟博, 等. 基于统计特征搜索的多元时间序列预测方法[J]. 电子与信息学报, 2024, 46(8): 3276–3284. doi: 10.11999/JEIT231264.PAN Jinwei, WANG Yiqiao, ZHONG Bo, et al. Statistical feature-based search for multivariate time series forecasting[J]. Journal of Electronics & Information Technology, 2024, 46(8): 3276–3284. doi: 10.11999/JEIT231264. [4] DA SILVA D G and DE MOURA MENESES A A M. Comparing long short-term memory (LSTM) and bidirectional LSTM deep neural networks for power consumption prediction[J]. Energy Reports, 2023, 10: 3315–3334. doi: 10.1016/j.egyr.2023.09.175. [5] 郑庆河, 李秉霖, 于治国, 等. 深度学习使能的自动调制分类技术研究进展[J]. 电子与信息学报, 2025, 47(11): 4096–4111. doi: 10.11999/JEIT250674.ZHENG Qinghe, LI Binglin, YU Zhiguo, et al. Research progress of deep learning enabled automatic modulation classification technology[J]. Journal of Electronics & Information Technology, 2025, 47(11): 4096–4111. doi: 10.11999/JEIT250674. [6] 刘辉, 冯浩然, 马佳妮, 等. 融合空间自注意力感知的严重缺失多元时间序列插补算法[J]. 电子与信息学报, 2025, 47(10): 3917–3928. doi: 10.11999/JEIT250220.LIU Hui, FENG Haoran, MA Jiani, et al. Spatial self-attention incorporated imputation algorithm for severely missing multivariate time series[J]. Journal of Electronics & Information Technology, 2025, 47(10): 3917–3928. doi: 10.11999/JEIT250220. [7] RABBANI M B A, MUSARAT M A, ALALOUL W S, et al. A comparison between seasonal autoregressive integrated moving average (SARIMA) and exponential smoothing (ES) based on time series model for forecasting road accidents[J]. Arabian Journal for Science and Engineering, 2021, 46(11): 11113–11138. doi: 10.1007/s13369-021-05650-3. [8] WU Haixu, HU Tengge, LIU Yong, et al. TimesNet: Temporal 2D-variation modeling for general time series analysis[C]. Proceedings of the 11th International Conference on Learning Representations, Kigali, Rwanda, 2023. (查阅网上资料, 未找到页码和doi信息, 请确认). [9] COUTINHO E R, MADEIRA J G F, BORGES D G F, et al. Multi-step forecasting of meteorological time series using CNN-LSTM with decomposition methods[J]. Water Resources Management, 2025, 39(7): 3173–3198. doi: 10.1007/s11269-025-04102-z. [10] CAI Wanlin, LIANG Yuxuan, LIU Xianggen, et al. MSGNet: Learning multi-scale inter-series correlations for multivariate time series forecasting[C]. Proceedings of the 38th AAAI Conference on Artificial Intelligence, Vancouver, Canada, 2024: 11141–11149. doi: 10.1609/aaai.v38i10.28991. [11] YUNITA A, PRATAMA M H D I, ALMUZAKKI M Z, et al. Performance analysis of neural network architectures for time series forecasting: A comparative study of RNN, LSTM, GRU, and hybrid models[J]. MethodsX, 2025, 15: 103462. doi: 10.1016/j.mex.2024.103462. [12] YADAV H and THAKKAR A. NOA-LSTM: An efficient LSTM cell architecture for time series forecasting[J]. Expert Systems with Applications, 2024, 238: 122333. doi: 10.1016/j.eswa.2023.122333. [13] UBAL C, DI-GIORGI G, CONTRERAS-REYES J E, et al. Predicting the long-term dependencies in time series using recurrent artificial neural networks[J]. Machine Learning and Knowledge Extraction, 2023, 5(4): 1340–1358. doi: 10.3390/make5040068. [14] ZENG Ailing, CHEN Muxi, ZHANG Lei, et al. Are transformers effective for time series forecasting?[C]. Proceedings of the 37th AAAI Conference on Artificial Intelligence, Washington, USA, 2023: 11121–11128. doi: 10.1609/aaai.v37i9.26317. [15] JIANG Hongwei, LIU Dongsheng, DING Xinyi, et al. TCM: An efficient lightweight MLP-based network with affine transformation for long-term time series forecasting[J]. Neurocomputing, 2025, 617: 128960. doi: 10.1016/j.neucom.2024.128960. [16] ZHOU Haoyi, ZHANG Shanghang, PENG Jieqi, et al. Informer: Beyond efficient transformer for long sequence time-series forecasting[C]. Proceedings of the 35th AAAI Conference on Artificial Intelligence, Palo Alto, USA, 2021: 11106–11115. doi: 10.1609/aaai.v35i12.17325. (查阅网上资料,未找到出版地信息,请确认). [17] WU Haixu, XU Jiehui, WANG Jianmin, et al. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting[C]. Proceedings of the 35th Conference on Neural Information Processing Systems, Red Hook, USA, 2021: 22419–22430. (查阅网上资料, 未找到出版地和doi信息, 请确认). [18] ZHOU Tian, MA Ziqing, WEN Qingsong, et al. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting[C]. Proceedings of the International Conference on Machine Learning, Baltimore, USA, 2022: 27268–27286. [19] NIE Yuqi, NGUYEN N H, SINTHONG P, et al. A time series is worth 64 words: Long-term forecasting with transformers[C]. Proceedings of the 11th International Conference on Learning Representations, Kigali, Rwanda, 2023. (查阅网上资料, 未找到页码和doi信息, 请确认). [20] WU Qiang, YAO Gechang, FENG Zhixi, et al. Peri-midFormer: Periodic pyramid transformer for time series analysis[C]. Proceedings of the 38th International Conference on Neural Information Processing Systems, Vancouver, Canada, 2024: 415. doi: 10.52202/079017-0415. [21] LIU Yong, HU Tengge, ZHANG Haoran, et al. iTransformer: Inverted transformers are effective for time series forecasting[C]. Proceedings of the 12th International Conference on Learning Representations, Vienna, Austria, 2024. (查阅网上资料, 未找到页码和doi信息, 请确认). [22] ZHAO Tianlong, FANG Lexin, MA Xiang, et al. TFformer: A time-frequency domain bidirectional sequence-level attention based transformer for interpretable long-term sequence forecasting[J]. Pattern Recognition, 2025, 158: 110994. doi: 10.1016/j.patcog.2024.110994. [23] ZHOU Tian, NIU Peisong, WANG Xue, et al. One fits all: Power general time series analysis by pretrained LM[C]. Proceedings of the 37th International Conference on Neural Information Processing Systems, New Orleans, USA, 2023: 1877. [24] PIAO Xihao, CHEN Zheng, MURAYAMA T, et al. Fredformer: Frequency debiased transformer for time series forecasting[C]. Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Barcelona, Spain, 2024: 2400–2410. doi: 10.1145/3637528.3671928. [25] GAO Shixuan, ZHANG Pingping, YAN Tianyu, et al. Multi-scale and detail-enhanced segment anything model for salient object detection[C]. Proceedings of the 32nd ACM International Conference on Multimedia, Melbourne, Australia, 2024: 9894–9903. doi: 10.1145/3664647.3680650. [26] SI Yunzhong, XU Huiying, ZHU Xinzhong, et al. SCSA: Exploring the synergistic effects between spatial and channel attention[J]. Neurocomputing, 2025, 634: 129866. doi: 10.1016/j.neucom.2025.129866. -
下载:
下载: