AI Model Cache

8Scale model cache injects huggingface model into your container. You need to define 8SCALE_HF_MODEL env variable with the huggingface model name.

8SCALE_HF_MODEL = Qwen/Qwen2.5-3B

If the model is private or needs huggingface token, you can define HF_TOKEN env variable. The model will be mounted in your container at /8scale_hf_model. Every replica goes through the following initialization steps.

Init - Download the container image
Caching - Download the model from huggingface
Idle - The replica waits for scale up events.

Get Started

How To Deploy