Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Papers Comparison

A map for newcomers. Read this alongside the individual paper summaries.

Common backbone: the MAPE-K loop

Almost every paper in this review innovates in Analyze (a better forecaster) and reuses or hand-waves PLAN and EXECUTE. A minority build the whole loop.


Taxonomy

Pure forecasters

Analyze only; hand off to an external scaler.

Contribution is forecast accuracy/efficiency, not actuation.

Then “a standard HPA does the rest”.

End-to-end frameworks

Implement the full MAPE-K loop.

These papers carry all four MAPE steps and loop every 30–60 s.

Performance predictors

Predict slowdown, not workload.

Papers notable for a special technique

The technique is often the whole reason to read it.

TechniquePaper(s)What it provides
Diffusion (generative, probabilistic)DAFOutputs a confidence band, not a point forecast; tune for SLA-safety vs. cost.
Adversarial / GAN (WGAN-gp)WGAN-gpA critic pushes forecasts to look realistic; better on bursty traffic, ~5x faster than LSTM.
Frequency-domain (FFT)FremerReads repeating cycles instead of the raw curve; ~12x smaller, ~3x faster, handles multi-period workloads.
Convolution-augmentedCATScaler (and CloudFormer’s system branch)Convolution catches short local spikes the attention smooths over.
Similarity-aware (shared model)FELTOne model serves thousands of containers by grouping look-alikes via attention masks.
Efficient long-sequence attention (Informer / ProbSparse)PredictiveK8s, InformerAutoScaleCuts attention cost from O(n²) to ~O(n log n) for long inputs and multi-step horizons.
Dual-branch (time + cross-metric)CloudFormerOne branch reads time, one reads the metrics; generalizes to unseen apps.

Comparison table

CloudFormer predicts degradation of an ongoing run (to pre-empt a QoS violation), not a fixed time-ahead forecast.


Model Training

Every paper in this review trains its model from scratch on cloud workload/trace data. None uses a pretrained time-series foundation model: no borrowed backbone, no fine-tuning, no zero-shot transfer. The contribution is always a task-specific architecture fit directly to the traces.

PaperOptimizer (LR)LossEpochsTrain/val/test split
MV-Transformernot statedMSE/MAE/RMSE/MAPE10 / 50 / 10080 / 20
AdaptiveAutoScalingAdam (1e-4)multi-step MSE≤100, early stop70 / 15 / 15
PredictiveAutoscalingDarts defaults (unstated)unstated20070 / 30
PredictiveK8snot statedMSE + early stop10 (NASA) / 4 (FIFA)train + last-16-days test
DAFAdamW (1e-4)denoising + MSE reconearly stoptrain + val split
FELTnot stated(classification)not stated7 : 1 : 2
FremerAdam (1e-3)MSEnot stated8 : 2
WGAN-gpMADGRAD (1e-3)MAE + WGAN-gp critic100060 / 20 / 20
CATScalerAdamSmoothL1 (Alibaba) / MSE (Huawei)early stop (5)80 / 10 / 10
InformerAutoScaleAdam (1e-3)MSE/MAE/RMSE10 / 20 / 50not stated
CloudFormerAdam (1e-5)log-coshnot stated7 train / 4 unseen apps

Two “false friends” worth flagging — wording that looks like pretraining but isn’t:

Other notables:


MAPE Stage-by-stage

Monitor

No novelties, but richness varies. The table below lists the metrics each paper actually feeds the model and where they come from.

PaperInput metricsUni/MultiSource
MV-TransformerCPU, memory, disk R/W, network throughput (Pearson selects throughput↔memory; predicts throughput)MultiTrace files (Bitbrains, Google, Azure Functions)
AdaptiveAutoScalingCPU, memory, request arrival rate, queue lengthMultiGoogle Cluster trace, in simulator
CATScalerCPU, memory, RPS per API + machine specsMultiPrometheus; Alibaba & Huawei traces
InformerAutoScaleCPU, memory, request rate (→ one aggregated workload target)MultiMetrics Server + cAdvisor (Prometheus secondary)
CloudFormer206 host metrics = 103 × (target VM + neighbors); each 103 = 53 VM (libvirt) + 38 Linux perf counters + 12 Intel Top-DownMultilibvirt API, Linux perf, Intel Top-Down
PredictiveAutoscalingIngress request rate (RPS) — CPU/mem/latency tracked but not model inputsUniPrometheus + Grafana (live, via KEDA)
PredictiveK8sHTTP requests/minUniPrometheus architecturally; experiments on offline NASA/FIFA logs
DAFJob-arrival rate + hour/day contextUniTraces (Google, Azure, Alibaba, Facebook, Wikipedia)
WGAN-gpJob-arrival rateUniTraces (Facebook, Alibaba, Google, Wiki, Azure)
FremerCPU or QPS per instance (CPU for IaaS/PaaS, QPS for FaaS/RDS)Uni per-seriesByteDance + Materna traces
FELTCPU only, compressed into 6 features/container: min, Q1, median, Q3, max + 1 waveform classUni (per-source)Alibaba microservices, Fisher

Takeaways:

Analyze

Plan

Where approaches diverge most sharply:

Execute


In one sentence each


Acronyms & short names

Models / methods named in this review

Short nameStands forWhat it is
DAFDiffusion AutoformerAutoformer (trend+seasonal decomposition) with a diffusion decoder → probabilistic JAR forecast with a confidence band.
MV-TransformerMultivariate TransformerEncoder-decoder transformer fed several metrics at once (CPU, mem, disk, net) to exploit cross-metric correlations.
WGAN-gpWasserstein Generative Adversarial Network with gradient penaltyA stable GAN variant: a transformer generator forecasts the series; an MLP critic scores realism via Wasserstein (earth-mover’s) distance, kept 1-Lipschitz by a gradient penalty.
FELT(model name)Encoder-only, similarity-aware transformer; one shared model classifies workload for thousands of containers.
FremerFrequency transformerFrequency-domain (FFT) transformer that forecasts a spectrum instead of the raw curve.
CATScalerConvolution-Augmented Transformer ScalerConvolution-augmented transformer forecaster + LightGBM pod calculator, full live loop.
CloudFormer(model name)Dual-branch transformer predicting performance-degradation ratio from black-box host metrics.
Informer(model name)Efficient long-sequence transformer using ProbSparse attention (O(n log n)).
LightGBMLight Gradient-Boosting MachineA fast gradient-boosting decision-tree model (not a neural net), strong at learning nonlinear tabular input→output mappings — used as CATScaler’s “how many pods?” calculator.
RevINReversible Instance NormalizationNormalize each input window, then exactly reverse it on the output; cheap defense against distribution shift.
Time2VecTime-to-VectorLearnable encoding of timestamps as features.

Domain & infrastructure terms

AcronymStands forMeaning
JARJob Arrival RateIncoming jobs/requests per unit time.
HPAHorizontal Pod AutoscalerKubernetes’ built-in autoscaler that adds/removes pods.
KEDAKubernetes Event-Driven AutoscalingExtends HPA to scale on custom/external metrics.
RPS / QPSRequests / Queries Per SecondTraffic-rate inputs.
SLA / QoSService-Level Agreement / Quality of ServiceThe performance guarantees autoscaling tries not to violate.
RRS / CPTReactive Rescaling System / Cool-down Period TimeAnti-thrashing guards in the Plan stage.

References
  1. Kumar, B., Verma, A., Verma, P., & Bennour, A. (2025). Optimizing resource allocation in cloud-native applications through proactive autoscaling with the InformerAutoScale model. The Journal of Supercomputing, 81(9). 10.1007/s11227-025-07500-7
  2. Ding, Z., Feng, B., & Yu, W. (2025). FELT: Large-Scale Cloud Workload Prediction Through Adaptive Feature-Enhanced and Similarity-Aware Transformer. Tsinghua Science and Technology. 10.26599/tst.2025.9010102
  3. Ye, H., Chen, J., Jiang, F., He, X., Zhang, T., Chen, J., & Gao, X. (2025). Fremer: Lightweight and Effective Frequency Transformer for Workload Forecasting in Cloud Services. Proceedings of the VLDB Endowment, 18(11), 3812–3825. 10.14778/3749646.3749656
  4. Arbat, S., Jayakumar, V. K., Lee, J., Wang, W., & Kim, I. K. (2022). Wasserstein Adversarial Transformer for Cloud Workload Prediction. Proceedings of the AAAI Conference on Artificial Intelligence, 36(11), 12433–12439. 10.1609/aaai.v36i11.21509
  5. G, C. N., O, B. C., R, J. K., Raj B N, P., Naik, P. N., & G, M. B. (2026). Transformer-Based Workload Prediction and Adaptive Auto-Scaling in Cloud Data Centers. 2026 IEEE International Conference for Convergence in Computing Technology (I3CTCON), 1–6. 10.1109/i3ctcon68242.2026.11507247
  6. Shrestha, R., & Tuz Sabiha, F. (2025). Enhancing Cloud Resource Utilization with Predictive Autoscaling Using Transformer Models. 2025 9th International Conference on Cloud and Big Data Computing (ICCBDC), 24–29. 10.1109/iccbdc67784.2025.00011
  7. Shim, S., Dhokariya, A., Doshi, D., Upadhye, S., Patwari, V., & Park, J.-Y. (2023). Predictive Auto-scaler for Kubernetes Cloud. 2023 IEEE International Systems Conference (SysCon), 1–8. 10.1109/syscon53073.2023.10131106
  8. Meng, F., Dai, H., Cong, G., Zhu, B., & Zhao, H. (2025). CATScaler: A Convolution-Augmented Transformer Scaling Framework for Cloud-Native Applications. IEEE Transactions on Services Computing, 18(5), 2659–2672. 10.1109/tsc.2025.3592383
  9. Shahbazinia, A., Huang, D., Costero, L., & Atienza, D. (2025). CloudFormer: An Attention-based Performance Prediction for Public Clouds with Unknown Workload. arXiv. 10.48550/ARXIV.2509.03394