App Config

Compute

Pick the GPU type for your replica containers.

Container

Replica config for container image, entrypoint, disk size, and more.

Scaling

Scaling config for replicas, you can control min, max and how fast to scale.

Environment

Env variables to expose to your replica container.

App Config Details

Let’s get started, follow these steps to create the app.

Compute

Pick the best GPU option depending on your application.

Container

App Name

string

required

Pick any name

Container Image

string

required

Define the container image, it can be hosted on dockerhub, gcr, or any other private repository. Example: 8scale/hello-world:latest

Registry Auth

string

Use the UI to create auth if necessary. We encrypt auth secrets to properly store them securely.

Entrypoint

string

Define container entrypoint to run a different service than the default.

Container Disk

number

default:"1"

required

Temporary disk in GiB given to a replica container while active. It is removed when replica is not active.

Cache Volume

number

default:"0"

Persistent disk in GiB given to a replica container that persists even when replica is not active. This can be used to cache artifacts.Multiple replicas on the same server will share the same cache volume.

Cache Volume Path

string

Persistent disk is mounted in replica container using this path. If this path currently exists in container image, it is overwritten by this volume and its data.

Scaling

Min Replica(s)

number

default:"0"

required

Minimum replicas to keep active at all times.If set to 0, cost will be lower but first request may incure higher cold start times as replicas scale up to become active.

Max Replica(s)

number

default:"1"

required

Maximum replicas the app can scale to during traffic spikes.You can also use this setting to control max spend per app.

Requests Per Replica

number

default:"1"

required

Number of requests a single replica can handle.If set to 1, every request will require 1 replica and cause the app to scale up.If set to 20, then first replica can handle up to 20 requests before a scaling event occurs to run another replica to handle more traffic.

Scale Down Delay

number

default:"60"

required

When a runnning replica gets no traffic, it will wait this many seconds before scaling down and changing to idle state.This can help curve pre-mature scale down and scale up events causing higher cold starts. Keeping app replicas active for longer can help provide better experience for your users.

Environment

Define any environment variables your replica container may require.

Need more help?

Please reach out to us on discord.

Get Started

How To Deploy

Compute

Container

Scaling

Environment

App Config Details

Need more help?

Get Started

How To Deploy

Compute

Container

Scaling

Environment

​App Config Details

​Need more help?

App Config Details

Need more help?