Skip to main content
8Scale model cache injects huggingface model into your container. You need to define 8SCALE_HF_MODEL env variable with the huggingface model name.
8SCALE_HF_MODEL = Qwen/Qwen2.5-3B
If the model is private or needs huggingface token, you can define HF_TOKEN env variable. The model will be mounted in your container at /8scale_hf_model. Every replica goes through the following initialization steps.
  1. Init - Download the container image
  2. Caching - Download the model from huggingface
  3. Idle - The replica waits for scale up events.
I