Scale-to-Zero
Scale-to-Zero allows you to automatically scale your Services using Standard CPU and GPU Instances down to zero when there is no incoming traffic. This feature allows you to optimize your costs by only paying for real compute usage.
To enable Scale-to-Zero on your Services, you need to use a Standard CPU or GPU Instance and set the minimum number of Instances to zero.
When your Service remains idle for a given period of time without receiving requests, it will automatically scale down your active Instances to zero and update your Deployment to the Sleeping
status.
As soon as a new request is received, the Service wakes up and scaled up to at least one Instance or more depending on your autoscaling criteria.
How Scale-to-Zero works
Your Service will be scaled down to zero if all of the following conditions are met for a given period of time called idle period:
- No traffic is received from the Internet.
- No held connection (e.g. websocket or HTTP/2 stream) from the Internet to your Service.
- No new deployment occurred.
Idle period
Standard CPU and GPU Instances default idle period is set to 5 minutes.
If your Organization is on the Pro
, Scale
, or Enterprise
plan,
you can override the default idle period based on your needs in your Service configuration according to the following values:
- For organizations on the Pro plan: up to 6 hours
- For organizations on the Scale or Enterprise plan: up to 12 hours
This can be useful for low-traffic applications with slow start times, such as some machine learning workloads.
When to use Scale-to-Zero
Scale-to-Zero is ideal for a wide range of use cases that involve handling intermittent traffic, like:
- Inference Efficiency: Inference is compute intensive, you need high-performance GPUs to answer requests quickly, but you might only need them for a couple of minutes every few hours. Scale-to-Zero dramatically improves costs and efficiency for inferencing tasks with intermittent traffic, without infrastructure management.
- Dedicated Services for Multi-Tenant SaaS and Platforms: Scale-to-Zero allows you to deploy dedicated and isolated services per tenant with controlled performance and costs. Operate fleets with thousands of services, paying only for real usage.
- Infinite Development Environments: Software engineering teams need environments identical to production to run integration tests. Creating dozens of services to replicate your production is now cost-effective thanks to Scale-to-Zero and our automation tools (API, CLI, Terraform, Pulumi). Every developer in your team can have a full replica of the production setup, billed per second of usage.
- Compute Efficiency: For apps with high CPU demands but intermittent traffic, Scale-to-Zero automatically optimizes your infrastructure and costs.
- Global Deployments: Multi-region deployment can quickly become expensive. With Scale-to-Zero you can deploy globally without incurring a base fee for each additional region that you add.
Light Sleep and Deep Sleep for CPU
When using Scale-to-Zero on the Starter
, Pro
, Scale
, or Enterprise
plan, your CPU instances can be configured with two states: Light Sleep and Deep Sleep.
- The Deep Sleep state occurs by default when your Instances scale down to zero. This state leads to a cold start time of 1-5 seconds when spinning up the Service.
- The Light Sleep state utilizes snapshotting to enable Instances to spin up significantly faster after scaling to zero, allowing for a 200 ms start time.
You can enable and configure Light Sleep in your Service configuration through a configuration file or the Koyeb control panel.
For Pro
, Scale
, or Enterprise
plans, you can set the time periods for when your Instances enter both Light Sleep and Deep Sleep. When Scale-to-Zero is enabled on a Service, the Idle period represents when Instances enter Light Sleep and the entry below represents when the Instance enters Deep Sleep.
By configuring Light Sleep and Deep Sleep times, you can maximize the advantages of Scale-to-Zero: reduce costs by scaling down when machines are not in use, and spin up quickly as needed.
The following chart outlines the minimum and maximum values for configuring Light Sleep and Deep Sleep start times:
Plan | Light Sleep Min | Light Sleep Max | Deep Sleep Min | Deep Sleep Max |
---|---|---|---|---|
Starter | 300s / 5min | 300s / 5min | 300s / 5min | 3900s / 1hr 5min |
Pro | 300s / 5min | 1080s / 3hr | 300s / 5min | 21600s / 6hr |
Scale | 300s / 5min | 21600s / 6hr | 300s / 5min | 43200s / 12hr |
Limitations
- Inbound requests to a Service in Deep Sleep may be slower due to a cold start, which typically takes 1 to 5 seconds to create a new dedicated virtual machine.
- Scale-to-Zero works only for Services exposed to the Internet.
- HTTP/2 requests cannot be used to wake up a sleeping Service.
- You can wake a Service up using a WebSocket connection, but that connection may only live for a few minutes.