Skip to main content

Compute

Pick the GPU type for your replica containers.

Container

Replica config for container image, entrypoint, disk size, and more.

Scaling

Scaling config for replicas, you can control min, max and how fast to scale.

Environment

Env variables to expose to your replica container.

App Config Details

Let’s get started, follow these steps to create the app.
1

Compute

Pick the best GPU option depending on your application.
2

Container

App Name
string
required
Pick any name
Container Image
string
required
Define the container image, it can be hosted on dockerhub, gcr, or any other private repository. Example: 8scale/hello-world:latest
Registry Auth
string
Use the UI to create auth if necessary. We encrypt auth secrets to properly store them securely.
Entrypoint
string
Define container entrypoint to run a different service than the default.
Container Disk
number
default:"1"
required
Temporary disk in GiB given to a replica container while active. It is removed when replica is not active.
Cache Volume
number
default:"0"
Persistent disk in GiB given to a replica container that persists even when replica is not active. This can be used to cache artifacts.Multiple replicas on the same server will share the same cache volume.
Cache Volume Path
string
Persistent disk is mounted in replica container using this path. If this path currently exists in container image, it is overwritten by this volume and its data.
3

Scaling

Min Replica(s)
number
default:"0"
required
Minimum replicas to keep active at all times.If set to 0, cost will be lower but first request may incure higher cold start times as replicas scale up to become active.
Max Replica(s)
number
default:"1"
required
Maximum replicas the app can scale to during traffic spikes.You can also use this setting to control max spend per app.
Requests Per Replica
number
default:"1"
required
Number of requests a single replica can handle.If set to 1, every request will require 1 replica and cause the app to scale up.If set to 20, then first replica can handle up to 20 requests before a scaling event occurs to run another replica to handle more traffic.
Scale Down Delay
number
default:"60"
required
When a runnning replica gets no traffic, it will wait this many seconds before scaling down and changing to idle state.This can help curve pre-mature scale down and scale up events causing higher cold starts. Keeping app replicas active for longer can help provide better experience for your users.
4

Environment

Define any environment variables your replica container may require.

Need more help?

Please reach out to us on discord.
I