Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

DAF

A diffusion Autoformer for probabilistic cloud job-arrival forecasting with uncertainty-aware confidence bands

Kumar et al. (2025) Citations


TL;DR

Cloud data centers want to add or remove servers before a traffic surge hits, not after. To do that, they need to forecast how many incoming jobs/requests will arrive in the near future (the job arrival rate, or JAR). This paper proposes the Diffusion Autoformer (DAF), a neural-network forecaster that does three things at once: (1) it splits the past workload signal into a smooth long-term trend plus a repeating seasonal pattern (the “Autoformer” idea), (2) it generates the future not as one single guessed line but as a range of plausible futures with a probability attached, using a diffusion model (the same family of generative AI that powers image generators), and (3) it mixes in context like time-of-day and day-of-week. The result is a forecast that is both more accurate (up to ~13% lower error than strong baselines) and uncertainty-aware (it tells you a confidence band, not just a point), while still being fast enough (~68 milliseconds per prediction) to feed a live autoscaler. The model only does the prediction part; a standard scaler (e.g. Kubernetes HPA) would act on its forecast.


The Problem (and why simple autoscaling isn’t enough)

Imagine an online store the night before a big sale. Traffic is calm now, but at midnight thousands of shoppers arrive at once. If the store only adds servers after it notices CPU is overloaded (this is reactive autoscaling), it is already too late — new servers take time to boot (the “cold start” / provisioning delay), so customers see slow pages or errors in the meantime. The opposite mistake — keeping tons of servers running “just in case” — wastes money.

The fix is proactive (predictive) autoscaling: forecast the demand spike and add servers ahead of time so they are warm and ready when the wave arrives. That makes the forecast the heart of the whole system. But forecasting real cloud workloads is hard:

Older tools fall short on at least one axis:

DAF is built to hit all of these targets together: accuracy + uncertainty + context + low latency.


Background

A few terms, defined once:


Contribution in Simple Terms

The genuinely new idea is fusing three previously separate techniques into one forecaster for cloud autoscaling:

  1. Autoformer-style trend/seasonal decomposition in the encoder — to cleanly capture long-term structure and cycles.

  2. A diffusion-based decoder — to turn forecasting into generation, so the model outputs a probabilistic forecast (a confidence band, not a single line). This is what gives operators uncertainty quantification.

  3. Exogenous attention — to condition the forecast on context (time-of-day, day-of-week).

Earlier work had pieces of this but not the combination: Autoformer decomposes but is deterministic; TimeGrad uses diffusion but without decomposition or external-feature conditioning; Temporal Fusion Transformer uses context but isn’t a diffusion model. DAF stitches them together and adds a practical speed trick (trend-guided initialization, explained below) so the diffusion process — normally slow — runs fast enough (~68 ms) for live autoscaling.

In one line: a transformer that forecasts cloud job arrivals as a probability band instead of a single guess, while staying fast enough to drive a real autoscaler.


How It Works, Step by Step

Training and inference, walked through:

  1. Decompose the input. Take the recent workload window X (length 96 time steps). Apply a moving-average filter (kernel size K) to extract the smooth Trend. Subtract it to get the Seasonal component (Seasonal = X − Trend). Trend models long-term structure; seasonal captures short-term cycles.

  2. Encode with the Autoformer encoder. Feed the seasonal component through multi-head self-attention (3 encoder layers). This produces a hidden embedding H_E summarizing the workload’s patterns. (The trend is kept aside for later, in step 5.)

  3. Encode the context. Project the exogenous features C (hour-of-day, day-of-week — encoded as sine/cosine waves so the model understands their cyclic nature) into an embedding.

  4. Fuse. Combine the workload embedding and the context embedding via cross-attention (E_C = Attention(C, H_E)), then concatenate into one conditioning vector Z_E = Concat(H_E, E_C). This Z_E is the “everything we know about the situation” summary that guides the generator.

  5. Generate the future with the diffusion decoder (2 decoder layers):

    • Training: take the real future Y, progressively add Gaussian noise over many steps (the forward diffusion process) until it is noise. The model is trained to predict the noise that was added at each step, conditioned on Z_E. Loss = how well it predicts the noise (denoising loss) plus a forecast-reconstruction term (MSE between predicted and true future), balanced by a weight λ.

    • Inference (the speed trick — Trend-Guided Initialization): instead of starting the denoising from pure random noise, start from the extrapolated future trend plus a little noise (y_N = Trend_future + noise). Because the starting point is already close to a sensible forecast, far fewer denoising steps are needed — the paper uses just 20 steps, which is what keeps latency low.

  6. Produce the probabilistic forecast. Run the reverse (denoising) process to generate the next 24 time steps. Because diffusion is stochastic, sampling it multiple times yields a distribution of futures — giving both a central prediction and a calibrated confidence band (its prediction intervals cover the truth >90% of the time).

The whole training loop is given as Algorithm 1 in the paper: decompose → encode workload → attend over context → fuse → sample a diffusion timestep → noise the target → predict the noise → compute combined loss → update weights.


Inputs (what it consumes)


Outputs (what it produces)


How It Fits the Autoscaling Framework (MAPE-K)

DAF lives squarely in the ANALYZE stage of the MAPE-K loop — it is the forecasting “brain” that turns monitored metrics into a look-ahead prediction. It makes scaling proactive rather than reactive.

In short: DAF is the predictive engine that makes an otherwise-reactive autoscaler proactive — and adds a confidence band so the resulting scaling policy can be tuned for safety vs. cost. Closed-loop control (using reinforcement learning to actually act on the forecast) is listed as future work.


Evaluation (datasets & metrics, briefly)


Training & pre-training

Trained from scratch — no pretrained or foundation model.

DAF is initialized randomly and trained end-to-end on the cloud job-arrival traces themselves (Google, Azure, Alibaba, Facebook, Wikipedia). It is not adapted from any pretrained or foundation time-series model. The supervised loop (Algorithm 1, “Training the Diffusion Autoformer”) optimizes a combined objective: the diffusion denoising loss ||ε − ε_θ||² plus a λ-weighted forecast-reconstruction MSE. Optimization uses AdamW (lr 1e-4), batch 64, mixed precision, and early stopping on a validation split; sequences are min-max normalized with lookback 96, horizon 24, and 20 denoising steps, and every experiment is repeated 3 times with different seeds.

One clarification to avoid a false friend: the paper calls DAF a “hybrid forecasting model”, but “hybrid” refers only to combining decomposition + diffusion + exogenous attention — it does not mean mixing pretrained and from-scratch components. There is no pretraining, fine-tuning, foundation model, or zero-shot transfer anywhere in the pipeline.


Strengths


Limitations


Glossary

References
  1. Kumar, S., Chauhan, M. K., Priyadarshni, Tripathi, S., Misra, R., & Singh, T. N. (2025). Predicting Cloud Workload Job Arrival Rates Using a Diffusion Autoformer Model. 2025 IEEE International Conference on Big Data (BigData), 6102–6107. 10.1109/bigdata66926.2025.11401693