Run & Scale
Scale-to-Zero

Scale-to-Zero

Scale-to-Zero allows you to automatically scale your Services using Standard CPU and GPU Instances down to zero when there is no incoming traffic. This feature allows you to optimize your costs by only paying for real compute usage.

⚠️
Note: Scale-to-Zero is currently in public preview.

To enable Scale-to-Zero on your Services, you need to use a Standard CPU or GPU Instance and set the minimum number of Instances to zero.

When your Service remains idle for a given period of time without receiving requests, it will automatically scale down your active Instances to zero and update your Deployment to the Sleeping status.

As soon as a new request is received, the Service wakes up and scaled up to at least one Instance or more depending on your autoscaling criteria.

How Scale-to-Zero works

Your Service will be scaled down to zero if all of the following conditions are met for a given period of time called idle period:

  • No traffic is received from the Internet.
  • No held connection (e.g. websocket or HTTP/2 stream) from the Internet to your Service.
  • No new deployment occurred.

Idle period

Standard CPU and GPU Instances default idle period is set to 5 minutes.

If your Organization is on the Pro, Scale, or Enterprise plan, you can override the default idle period based on your needs in your Service configuration according to the following values:

  • For organizations on the Pro plan: up to 6 hours
  • For organizations on the Scale or Enterprise plan: up to 12 hours

This can be useful for low-traffic applications with slow start times, such as some machine learning workloads.

The Koyeb Free Instance automatically scales down to zero when it doesn’t receive any traffic for 1 hour. Scale-to-zero on this Instance cannot be disabled, and the idle period cannot be customized.

When to use Scale-to-Zero

Scale-to-Zero is ideal for a wide range of use cases that involve handling intermittent traffic, like:

  • Inference Efficiency: Inference is compute intensive, you need high-performance GPUs to answer requests quickly, but you might only need them for a couple of minutes every few hours. Scale-to-Zero dramatically improves costs and efficiency for inferencing tasks with intermittent traffic, without infrastructure management.
  • Dedicated Services for Multi-Tenant SaaS and Platforms: Scale-to-Zero allows you to deploy dedicated and isolated services per tenant with controlled performance and costs. Operate fleets with thousands of services, paying only for real usage.
  • Infinite Development Environments: Software engineering teams need environments identical to production to run integration tests. Creating dozens of services to replicate your production is now cost-effective thanks to Scale-to-Zero and our automation tools (API, CLI, Terraform, Pulumi). Every developer in your team can have a full replica of the production setup, billed per second of usage.
  • Compute Efficiency: For apps with high CPU demands but intermittent traffic, Scale-to-Zero automatically optimizes your infrastructure and costs.
  • Global Deployments: Multi-region deployment can quickly become expensive. With Scale-to-Zero you can deploy globally without incurring a base fee for each additional region that you add.

Light Sleep and Deep Sleep for CPU

When using Scale-to-Zero on the Starter, Pro, Scale, or Enterprise plan, your CPU instances can be configured with two states: Light Sleep and Deep Sleep.

⚠️
Note: The Light Sleep feature is currently in public preview. During public preview, this feature is available at no cost. Once it is Generally Available, there will be a cost associated, and this pricing will be available on the pricing page (opens in a new tab).
  • The Deep Sleep state occurs by default when your Instances scale down to zero. This state leads to a cold start time of 1-5 seconds when spinning up the Service.
  • The Light Sleep state utilizes snapshotting to enable Instances to spin up significantly faster after scaling to zero, allowing for a 200 ms start time.

You can enable and configure Light Sleep in your Service configuration through a configuration file or the Koyeb control panel.

For Pro, Scale, or Enterprise plans, you can set the time periods for when your Instances enter both Light Sleep and Deep Sleep. When Scale-to-Zero is enabled on a Service, the Idle period represents when Instances enter Light Sleep and the entry below represents when the Instance enters Deep Sleep.

By configuring Light Sleep and Deep Sleep times, you can maximize the advantages of Scale-to-Zero: reduce costs by scaling down when machines are not in use, and spin up quickly as needed.

The following chart outlines the minimum and maximum values for configuring Light Sleep and Deep Sleep start times:

PlanLight Sleep MinLight Sleep MaxDeep Sleep MinDeep Sleep Max
Starter300s / 5min300s / 5min300s / 5min3900s / 1hr 5min
Pro300s / 5min1080s / 3hr300s / 5min21600s / 6hr
Scale300s / 5min21600s / 6hr300s / 5min43200s / 12hr

Limitations

  • Inbound requests to a Service in Deep Sleep may be slower due to a cold start, which typically takes 1 to 5 seconds to create a new dedicated virtual machine.
  • Scale-to-Zero works only for Services exposed to the Internet.
  • HTTP/2 requests cannot be used to wake up a sleeping Service.
  • You can wake a Service up using a WebSocket connection, but that connection may only live for a few minutes.