Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Literature review: Transformers in Autoscaling

A review of 11 papers on transformer-based cloud autoscaling

This is a literature review of 11 papers on transformer-based cloud autoscaling, with a big-picture comparison and a pedagogical reading order.

Transformers applied to autoscaling

A taxonomy, comparison table, and the MAPE-K walk-through has been prepared. The reviewed papers are summarized in the following table:

#NoteFull titleVenue
1Predictive Auto-scaler for Kubernetes
Shim et al. (2023)
Predictive Auto-scaler for Kubernetes CloudIEEE International Systems Conference (SysCon)
2Predictive Autoscaling with Transformers
Shrestha & Tuz Sabiha (2025)
Enhancing Cloud Resource Utilization with Predictive Autoscaling Using Transformer Models9th Int. Conf. on Cloud, Big Data and Communication Systems (ICCBDCS)
3InformerAutoScale
Kumar et al. (2025)
Optimizing resource allocation in cloud-native applications through proactive autoscaling with the InformerAutoScale modelThe Journal of Supercomputing 81(9):1077
4MV-Transformer MAPE Framework
Kumar et al. (2025)
A multivariate transformer-based monitor-analyze-plan-execute (MAPE) autoscaling framework for dynamic resource allocation in cloud environmentComputing 107(3):69
5Transformer Workload Prediction & Adaptive Auto-Scaling
G et al. (2026)
Transformer-Based Workload Prediction and Adaptive Auto-Scaling in Cloud Data CentersIEEE Int. Conf. on Computing, Communication, Control and Networking (I3CTCON)
6CATScaler
Meng et al. (2025)
CATScaler: A Convolution-Augmented Transformer Scaling Framework for Cloud-Native ApplicationsIEEE Transactions on Services Computing 18(5):2659–2672
7WGAN-gp Transformer
Arbat et al. (2022)
Wasserstein Adversarial Transformer for Cloud Workload PredictionProceedings of the AAAI Conference on Artificial Intelligence 36(11):12433–12439
8Diffusion Autoformer (job-arrival)
Kumar et al. (2025)
Predicting Cloud Workload Job Arrival Rates Using a Diffusion Autoformer ModelIEEE International Conference on Big Data (BigData)
9Fremer
Ye et al. (2025)
Fremer: Lightweight and Effective Frequency Transformer for Workload Forecasting in Cloud ServicesProceedings of the VLDB Endowment 18(11):3812–3825
10FELT
Ding et al. (2025)
FELT: Large-Scale Cloud Workload Prediction Through Adaptive Feature-Enhanced and Similarity-Aware TransformerTsinghua Science and Technology
11CloudFormer
Shahbazinia et al. (2025)
CloudFormer: An Attention-based Performance Prediction for Public Clouds with Unknown WorkloadPreprint, arXiv:2509.03394

Forecasting foundations

A comparison of forecasting models is available. The general time-series forecasting transformers that may underpin the autoscaling forecasters above are:

#NoteFull titleVenue
1Autoformer
Wu et al. (2021)
Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series ForecastingNeurIPS 2021
2Informer
Zhou et al. (2020)
Informer: Beyond Efficient Transformer for Long Sequence Time-Series ForecastingAAAI 2021 (arXiv:2012.07436)
3PatchTST
Nie et al. (2022)
A Time Series is Worth 64 Words: Long-term Forecasting with TransformersICLR 2023 (arXiv:2211.14730)
4TimesFM
Das et al. (2023)
A decoder-only foundation model for time-series forecastingICML 2024 (arXiv:2310.10688)
References
  1. Shim, S., Dhokariya, A., Doshi, D., Upadhye, S., Patwari, V., & Park, J.-Y. (2023). Predictive Auto-scaler for Kubernetes Cloud. 2023 IEEE International Systems Conference (SysCon), 1–8. 10.1109/syscon53073.2023.10131106
  2. Shrestha, R., & Tuz Sabiha, F. (2025). Enhancing Cloud Resource Utilization with Predictive Autoscaling Using Transformer Models. 2025 9th International Conference on Cloud and Big Data Computing (ICCBDC), 24–29. 10.1109/iccbdc67784.2025.00011
  3. Kumar, S., Chauhan, M. K., Priyadarshni, Tripathi, S., Misra, R., & Singh, T. N. (2025). Predicting Cloud Workload Job Arrival Rates Using a Diffusion Autoformer Model. 2025 IEEE International Conference on Big Data (BigData), 6102–6107. 10.1109/bigdata66926.2025.11401693
  4. G, C. N., O, B. C., R, J. K., Raj B N, P., Naik, P. N., & G, M. B. (2026). Transformer-Based Workload Prediction and Adaptive Auto-Scaling in Cloud Data Centers. 2026 IEEE International Conference for Convergence in Computing Technology (I3CTCON), 1–6. 10.1109/i3ctcon68242.2026.11507247
  5. Meng, F., Dai, H., Cong, G., Zhu, B., & Zhao, H. (2025). CATScaler: A Convolution-Augmented Transformer Scaling Framework for Cloud-Native Applications. IEEE Transactions on Services Computing, 18(5), 2659–2672. 10.1109/tsc.2025.3592383
  6. Arbat, S., Jayakumar, V. K., Lee, J., Wang, W., & Kim, I. K. (2022). Wasserstein Adversarial Transformer for Cloud Workload Prediction. Proceedings of the AAAI Conference on Artificial Intelligence, 36(11), 12433–12439. 10.1609/aaai.v36i11.21509
  7. Ye, H., Chen, J., Jiang, F., He, X., Zhang, T., Chen, J., & Gao, X. (2025). Fremer: Lightweight and Effective Frequency Transformer for Workload Forecasting in Cloud Services. Proceedings of the VLDB Endowment, 18(11), 3812–3825. 10.14778/3749646.3749656
  8. Ding, Z., Feng, B., & Yu, W. (2025). FELT: Large-Scale Cloud Workload Prediction Through Adaptive Feature-Enhanced and Similarity-Aware Transformer. Tsinghua Science and Technology. 10.26599/tst.2025.9010102
  9. Shahbazinia, A., Huang, D., Costero, L., & Atienza, D. (2025). CloudFormer: An Attention-based Performance Prediction for Public Clouds with Unknown Workload. arXiv. 10.48550/ARXIV.2509.03394
  10. Wu, H., Xu, J., Wang, J., & Long, M. (2021). Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting. arXiv. 10.48550/ARXIV.2106.13008
  11. Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H., & Zhang, W. (2020). Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. arXiv. 10.48550/ARXIV.2012.07436
  12. Nie, Y., Nguyen, N. H., Sinthong, P., & Kalagnanam, J. (2022). A Time Series is Worth 64 Words: Long-term Forecasting with Transformers. arXiv. 10.48550/ARXIV.2211.14730
  13. Das, A., Kong, W., Sen, R., & Zhou, Y. (2023). A decoder-only foundation model for time-series forecasting. arXiv. 10.48550/ARXIV.2310.10688