8SCALE_HF_MODEL
env variable with the huggingface model name.
8SCALE_HF_MODEL = Qwen/Qwen2.5-3B
HF_TOKEN
env variable.
The model will be mounted in your container at /8scale_hf_model
. Every replica goes through the following initialization steps.
- Init - Download the container image
- Caching - Download the model from huggingface
- Idle - The replica waits for scale up events.