defaultOptions
) and in any of the location-specific options.autoscaling.metric
.
disabled
)
concurrency
)
(requests * requestDuration)/(timePeriod * replicas)
.(1000 * .05)/(1 * 5) = 10
.rps
)
cpu
)
latency
)
memory
)
latency
scaling strategy or any of the Multi Metric scaling strategies.concurrency
scaling strategyrps
or concurrency
scaling strategies.latency
scaling strategyBecause request latency is represented as a distribution, when using the latency
scaling strategy, you must choose a metric percentile by setting the autoscaling.metricPercentile
property to one of the following values:p50
p75
p99
autoscaling.minScale
)
Maximum Scale
inclusive.autoscaling.maxScale
)
autoscaling.scaleToZeroDelay
)
autoscaling.maxConcurrency
)
autoscaling.metric
)
concurrency
: Uses the number of concurrent requests for the target.cpu
: Uses % processor time for the target.memory
: Uses memory in Mi for the target.rps
: Uses requests per second for the target.latency
: Uses the average request response time for the target. Not available for Serverless workloads.autoscaling.multi
)
autoscaling.metricPercentile
)
p50
.