Literature review: Transformers in Autoscaling
A review of 11 papers on transformer-based cloud autoscaling
This is a literature review of 11 papers on transformer-based cloud autoscaling, with a big-picture comparison and a pedagogical reading order.
Transformers applied to autoscaling¶
A taxonomy, comparison table, and the MAPE-K walk-through has been prepared. The reviewed papers are summarized in the following table:
| # | Note | Full title | Venue |
|---|---|---|---|
| 1 | Predictive Auto-scaler for Kubernetes Shim et al. (2023) | Predictive Auto-scaler for Kubernetes Cloud | IEEE International Systems Conference (SysCon) |
| 2 | Predictive Autoscaling with Transformers Shrestha & Tuz Sabiha (2025) | Enhancing Cloud Resource Utilization with Predictive Autoscaling Using Transformer Models | 9th Int. Conf. on Cloud, Big Data and Communication Systems (ICCBDCS) |
| 3 | InformerAutoScale Kumar et al. (2025) | Optimizing resource allocation in cloud-native applications through proactive autoscaling with the InformerAutoScale model | The Journal of Supercomputing 81(9):1077 |
| 4 | MV-Transformer MAPE Framework Kumar et al. (2025) | A multivariate transformer-based monitor-analyze-plan-execute (MAPE) autoscaling framework for dynamic resource allocation in cloud environment | Computing 107(3):69 |
| 5 | Transformer Workload Prediction & Adaptive Auto-Scaling G et al. (2026) | Transformer-Based Workload Prediction and Adaptive Auto-Scaling in Cloud Data Centers | IEEE Int. Conf. on Computing, Communication, Control and Networking (I3CTCON) |
| 6 | CATScaler Meng et al. (2025) | CATScaler: A Convolution-Augmented Transformer Scaling Framework for Cloud-Native Applications | IEEE Transactions on Services Computing 18(5):2659–2672 |
| 7 | WGAN-gp Transformer Arbat et al. (2022) | Wasserstein Adversarial Transformer for Cloud Workload Prediction | Proceedings of the AAAI Conference on Artificial Intelligence 36(11):12433–12439 |
| 8 | Diffusion Autoformer (job-arrival) Kumar et al. (2025) | Predicting Cloud Workload Job Arrival Rates Using a Diffusion Autoformer Model | IEEE International Conference on Big Data (BigData) |
| 9 | Fremer Ye et al. (2025) | Fremer: Lightweight and Effective Frequency Transformer for Workload Forecasting in Cloud Services | Proceedings of the VLDB Endowment 18(11):3812–3825 |
| 10 | FELT Ding et al. (2025) | FELT: Large-Scale Cloud Workload Prediction Through Adaptive Feature-Enhanced and Similarity-Aware Transformer | Tsinghua Science and Technology |
| 11 | CloudFormer Shahbazinia et al. (2025) | CloudFormer: An Attention-based Performance Prediction for Public Clouds with Unknown Workload | Preprint, arXiv:2509.03394 |
Forecasting foundations¶
A comparison of forecasting models is available. The general time-series forecasting transformers that may underpin the autoscaling forecasters above are:
| # | Note | Full title | Venue |
|---|---|---|---|
| 1 | Autoformer Wu et al. (2021) | Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting | NeurIPS 2021 |
| 2 | Informer Zhou et al. (2020) | Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting | AAAI 2021 (arXiv:2012.07436) |
| 3 | PatchTST Nie et al. (2022) | A Time Series is Worth 64 Words: Long-term Forecasting with Transformers | ICLR 2023 (arXiv:2211.14730) |
| 4 | TimesFM Das et al. (2023) | A decoder-only foundation model for time-series forecasting | ICML 2024 (arXiv:2310.10688) |
- Shim, S., Dhokariya, A., Doshi, D., Upadhye, S., Patwari, V., & Park, J.-Y. (2023). Predictive Auto-scaler for Kubernetes Cloud. 2023 IEEE International Systems Conference (SysCon), 1–8. 10.1109/syscon53073.2023.10131106
- Shrestha, R., & Tuz Sabiha, F. (2025). Enhancing Cloud Resource Utilization with Predictive Autoscaling Using Transformer Models. 2025 9th International Conference on Cloud and Big Data Computing (ICCBDC), 24–29. 10.1109/iccbdc67784.2025.00011
- Kumar, S., Chauhan, M. K., Priyadarshni, Tripathi, S., Misra, R., & Singh, T. N. (2025). Predicting Cloud Workload Job Arrival Rates Using a Diffusion Autoformer Model. 2025 IEEE International Conference on Big Data (BigData), 6102–6107. 10.1109/bigdata66926.2025.11401693
- G, C. N., O, B. C., R, J. K., Raj B N, P., Naik, P. N., & G, M. B. (2026). Transformer-Based Workload Prediction and Adaptive Auto-Scaling in Cloud Data Centers. 2026 IEEE International Conference for Convergence in Computing Technology (I3CTCON), 1–6. 10.1109/i3ctcon68242.2026.11507247
- Meng, F., Dai, H., Cong, G., Zhu, B., & Zhao, H. (2025). CATScaler: A Convolution-Augmented Transformer Scaling Framework for Cloud-Native Applications. IEEE Transactions on Services Computing, 18(5), 2659–2672. 10.1109/tsc.2025.3592383
- Arbat, S., Jayakumar, V. K., Lee, J., Wang, W., & Kim, I. K. (2022). Wasserstein Adversarial Transformer for Cloud Workload Prediction. Proceedings of the AAAI Conference on Artificial Intelligence, 36(11), 12433–12439. 10.1609/aaai.v36i11.21509
- Ye, H., Chen, J., Jiang, F., He, X., Zhang, T., Chen, J., & Gao, X. (2025). Fremer: Lightweight and Effective Frequency Transformer for Workload Forecasting in Cloud Services. Proceedings of the VLDB Endowment, 18(11), 3812–3825. 10.14778/3749646.3749656
- Ding, Z., Feng, B., & Yu, W. (2025). FELT: Large-Scale Cloud Workload Prediction Through Adaptive Feature-Enhanced and Similarity-Aware Transformer. Tsinghua Science and Technology. 10.26599/tst.2025.9010102
- Shahbazinia, A., Huang, D., Costero, L., & Atienza, D. (2025). CloudFormer: An Attention-based Performance Prediction for Public Clouds with Unknown Workload. arXiv. 10.48550/ARXIV.2509.03394
- Wu, H., Xu, J., Wang, J., & Long, M. (2021). Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting. arXiv. 10.48550/ARXIV.2106.13008
- Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H., & Zhang, W. (2020). Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. arXiv. 10.48550/ARXIV.2012.07436
- Nie, Y., Nguyen, N. H., Sinthong, P., & Kalagnanam, J. (2022). A Time Series is Worth 64 Words: Long-term Forecasting with Transformers. arXiv. 10.48550/ARXIV.2211.14730
- Das, A., Kong, W., Sen, R., & Zhou, Y. (2023). A decoder-only foundation model for time-series forecasting. arXiv. 10.48550/ARXIV.2310.10688