dstack configuration
YAML dstack configurations
Validate with Lintel
npx @lintel/lintel check
One of
Definitions
Credentials for pulling a private Docker image.
Attributes: username (str): The username password (str): The password or access token
The username
The password or access token
An enumeration.
Env represents a mapping of process environment variables, as in environ(7).
Environment values may be omitted, in that case the :class:EnvSentinel
object is used as a placeholder.
To create an instance from a dict[str, str] or a list[str] use pydantic's
:meth:BaseModel.parse_obj(dict | list) method.
NB: this is NOT a CoreModel, pydantic-duality, which is used as a base for the CoreModel, doesn't play well with custom root models.
An enumeration.
An enumeration.
The vendor of the GPU/accelerator, one of: nvidia, amd, google (alias: tpu), intel
The name of the GPU (e.g., A100 or H100)
The RAM size (e.g., 16GB). Can be set to a range (e.g. 16GB.., or 16GB..80GB)
The total RAM size (e.g., 32GB). Can be set to a range (e.g. 16GB.., or 16GB..80GB)
The minimum compute capability of the GPU (e.g., 7.5)
Disk size
The CPU requirements
{
"arch": null,
"count": {
"min": 2,
"max": null
}
}
The size of shared memory (e.g., 8GB). If you are using parallel communicating processes (e.g., dataloaders in PyTorch), you may need to configure this
The GPU requirements
{
"vendor": null,
"name": null,
"count": {
"min": 0,
"max": null
},
"memory": null,
"total_memory": null,
"compute_capability": null
}
The disk resources
{
"size": {
"min": 100.0,
"max": null
}
}
The network volume name or the list of network volume names to mount. If a list is specified, one of the volumes in the list will be mounted. Specify volumes from different backends/regions to increase availability
The absolute container path to mount the volume at
The absolute path on the instance (host)
The absolute path in the container
Allow running without this volume in backends that do not support instance volumes
An enumeration.
The path to the Git repo on the user's machine. Relative paths are resolved relative to the parent directory of the the configuration file. Mutually exclusive with url
The Git repo URL. Mutually exclusive with local_path
The repo branch. Defaults to the active branch for local paths and the default branch for URLs
The commit hash
The repo path inside the run container. Relative paths are resolved relative to the working directory
The action to be taken if path exists and is not empty. One of: error, skip
The path on the user's machine. Relative paths are resolved relative to the parent directory of the the configuration file
The path in the container. Relative paths are resolved relative to the working directory
Attributes: AMDDEVCLOUD (BackendType): AMD Developer Cloud AWS (BackendType): Amazon Web Services AZURE (BackendType): Microsoft Azure CLOUDRIFT (BackendType): CloudRift CRUSOE (BackendType): Crusoe CUDO (BackendType): Cudo DATACRUNCH (BackendType): DataCrunch (for backward compatibility) DIGITALOCEAN (BackendType): DigitalOcean DSTACK (BackendType): dstack Sky GCP (BackendType): Google Cloud Platform HOTAISLE (BackendType): Hot Aisle KUBERNETES (BackendType): Kubernetes LAMBDA (BackendType): Lambda Cloud NEBIUS (BackendType): Nebius AI Cloud OCI (BackendType): Oracle Cloud Infrastructure RUNPOD (BackendType): Runpod Cloud TENSORDOCK (BackendType): TensorDock Marketplace VASTAI (BackendType): Vast.ai Marketplace VERDA (BackendType): Verda Cloud VULTR (BackendType): Vultr
An enumeration.
An enumeration.
The list of events that should be handled with retry. Supported events are no-capacity, interruption, error. Omit to retry on all events
The maximum period of retrying the run, e.g., 4h or 1d. The period is calculated as a run age for no-capacity event and as a time passed since the last interruption and error for interruption and error events.
An enumeration.
Minimum required GPU utilization, percent. If any GPU has utilization below specified value during the whole time window, the run is terminated
The time window of metric samples taking into account to measure utilization (e.g., 30m, 1h). Minimum is 5m
An enumeration.
An enumeration.
A cron expression or a list of cron expressions specifying the UTC time when the run needs to be started
Cross-project entity reference.
The entity name
The project name. If unspecified, refers to the current project
The IDE to pre-install. Supported values include vscode, cursor, and windsurf. Defaults to no IDE (SSH only)
The version of the IDE. For windsurf, the version is in the format version@commit
The shell commands to run on startup
[]
The maximum amount of time the dev environment can be inactive (e.g., 2h, 1d, etc). After it elapses, the dev environment is automatically stopped. Inactivity is defined as the absence of SSH connections to the dev environment, including VS Code connections, ssh <run name> shells, and attached dstack apply or dstack attach commands. Use off for unlimited duration. Can be updated in-place. Defaults to off
Port numbers/mapping to expose
[]
The run name. If not specified, a random name is generated
The name of the Docker image to run
The user inside the container, user_name_or_id[:group_name_or_id] (e.g., ubuntu, 1000:1000). Defaults to the default user from the image
Run the container in privileged mode
The Docker entrypoint
The absolute path to the working directory inside the container. Defaults to the image's default working directory
Use image with NVIDIA CUDA Compiler (NVCC) included. Mutually exclusive with image and docker
Whether to clone and track only the current branch or all remote branches. Relevant only when using remote Git repos. Defaults to false for dev environments and to true for tasks and services
The shell used to run commands. Allowed values are sh, bash, or an absolute path, e.g., /usr/bin/zsh. Defaults to /bin/sh if the image is specified, /bin/bash otherwise
The resources requirements to run the configuration
{
"cpu": {
"min": 2,
"max": null
},
"memory": {
"min": 8.0,
"max": null
},
"shm_size": null,
"gpu": {
"vendor": null,
"name": null,
"count": {
"min": 0,
"max": null
},
"memory": null,
"total_memory": null,
"compute_capability": null
},
"disk": {
"size": {
"min": 100.0,
"max": null
}
}
}
The priority of the run, an integer between 0 and 100. dstack tries to provision runs with higher priority first. Defaults to 0
The volumes mount points
[]
Use Docker inside the container. Mutually exclusive with image, python, and nvcc. Overrides privileged
The local to container file path mappings
[]
[]
The backends to consider for provisioning (e.g., [aws, gcp])
The regions to consider for provisioning (e.g., [eu-west-1, us-west4, westeurope])
The availability zones to consider for provisioning (e.g., [eu-west-1a, us-west4-a])
The cloud-specific instance types to consider for provisioning (e.g., [p3.8xlarge, n1-standard-4])
The existing reservation to use for instance provisioning. Supports AWS Capacity Reservations, AWS Capacity Blocks, and GCP reservations
The policy for provisioning spot or on-demand instances: spot, on-demand, auto. Defaults to on-demand
The policy for resubmitting the run. Defaults to false
The maximum duration of a run (e.g., 2h, 1d, etc) in a running state, excluding provisioning and pulling. After it elapses, the run is automatically stopped. Use off for unlimited duration. Defaults to off
The maximum duration of a run graceful stopping. After it elapses, the run is automatically forced stopped. This includes force detaching volumes used by the run. Use off for unlimited duration. Defaults to 5m
The maximum instance price per hour, in dollars
The policy for using instances from fleets: reuse, reuse-or-create. Defaults to reuse-or-create
Time to wait before terminating idle instances. When the run reuses an existing fleet instance, the fleet's idle_duration applies. When the run provisions a new instance, the shorter of the fleet's and run's values is used. Defaults to 5m for runs and 3d for fleets. Use off for unlimited duration. Only applied for VM-based backends
The order in which master and workers jobs are started: any, master-first, workers-first. Defaults to any
The criteria determining when a multi-node run should be considered finished: all-done, master-done. Defaults to all-done
The fleets considered for reuse. For fleets owned by the current project, specify fleet names. For imported fleets, specify <project name>/<fleet name>
The custom tags to associate with the resource. The tags are also propagated to the underlying backend resources. If there is a conflict with backend-level tags, does not override them
Number of nodes
Port numbers/mapping to expose
[]
The shell commands to run
[]
The run name. If not specified, a random name is generated
The name of the Docker image to run
The user inside the container, user_name_or_id[:group_name_or_id] (e.g., ubuntu, 1000:1000). Defaults to the default user from the image
Run the container in privileged mode
The Docker entrypoint
The absolute path to the working directory inside the container. Defaults to the image's default working directory
Use image with NVIDIA CUDA Compiler (NVCC) included. Mutually exclusive with image and docker
Whether to clone and track only the current branch or all remote branches. Relevant only when using remote Git repos. Defaults to false for dev environments and to true for tasks and services
The shell used to run commands. Allowed values are sh, bash, or an absolute path, e.g., /usr/bin/zsh. Defaults to /bin/sh if the image is specified, /bin/bash otherwise
The resources requirements to run the configuration
{
"cpu": {
"min": 2,
"max": null
},
"memory": {
"min": 8.0,
"max": null
},
"shm_size": null,
"gpu": {
"vendor": null,
"name": null,
"count": {
"min": 0,
"max": null
},
"memory": null,
"total_memory": null,
"compute_capability": null
},
"disk": {
"size": {
"min": 100.0,
"max": null
}
}
}
The priority of the run, an integer between 0 and 100. dstack tries to provision runs with higher priority first. Defaults to 0
The volumes mount points
[]
Use Docker inside the container. Mutually exclusive with image, python, and nvcc. Overrides privileged
The local to container file path mappings
[]
[]
The backends to consider for provisioning (e.g., [aws, gcp])
The regions to consider for provisioning (e.g., [eu-west-1, us-west4, westeurope])
The availability zones to consider for provisioning (e.g., [eu-west-1a, us-west4-a])
The cloud-specific instance types to consider for provisioning (e.g., [p3.8xlarge, n1-standard-4])
The existing reservation to use for instance provisioning. Supports AWS Capacity Reservations, AWS Capacity Blocks, and GCP reservations
The policy for provisioning spot or on-demand instances: spot, on-demand, auto. Defaults to on-demand
The policy for resubmitting the run. Defaults to false
The maximum duration of a run (e.g., 2h, 1d, etc) in a running state, excluding provisioning and pulling. After it elapses, the run is automatically stopped. Use off for unlimited duration. Defaults to off
The maximum duration of a run graceful stopping. After it elapses, the run is automatically forced stopped. This includes force detaching volumes used by the run. Use off for unlimited duration. Defaults to 5m
The maximum instance price per hour, in dollars
The policy for using instances from fleets: reuse, reuse-or-create. Defaults to reuse-or-create
Time to wait before terminating idle instances. When the run reuses an existing fleet instance, the fleet's idle_duration applies. When the run provisions a new instance, the shorter of the fleet's and run's values is used. Defaults to 5m for runs and 3d for fleets. Use off for unlimited duration. Only applied for VM-based backends
The order in which master and workers jobs are started: any, master-first, workers-first. Defaults to any
The criteria determining when a multi-node run should be considered finished: all-done, master-done. Defaults to all-done
The fleets considered for reuse. For fleets owned by the current project, specify fleet names. For imported fleets, specify <project name>/<fleet name>
The custom tags to associate with the resource. The tags are also propagated to the underlying backend resources. If there is a conflict with backend-level tags, does not override them
Mapping of the model for the OpenAI-compatible endpoint.
Attributes: type (str): The type of the model, e.g. "chat" name (str): The name of the model. This name will be used both to load model configuration from the HuggingFace Hub and in the OpenAI-compatible endpoint. format (str): The format of the model, e.g. "tgi" if the model is served with HuggingFace's Text Generation Inference. chat_template (Optional[str]): The custom prompt template for the model. If not specified, the default prompt template from the HuggingFace Hub configuration will be used. eos_token (Optional[str]): The custom end of sentence token. If not specified, the default end of sentence token from the HuggingFace Hub configuration will be used.
The name of the model
The serving format. Must be set to tgi
The type of the model
The custom prompt template for the model. If not specified, the default prompt template from the HuggingFace Hub configuration will be used
The custom end of sentence token. If not specified, the default end of sentence token from the HuggingFace Hub configuration will be used
Mapping of the model for the OpenAI-compatible endpoint.
Attributes:
type (str): The type of the model, e.g. "chat"
name (str): The name of the model. This name will be used both to load model configuration from the HuggingFace Hub and in the OpenAI-compatible endpoint.
format (str): The format of the model, i.e. "openai".
prefix (str): The base_url prefix: <http://hostname/{prefix}/chat/completions>. Defaults to /v1.
The name of the model
The serving format. Must be set to openai
The type of the model
The base_url prefix (after hostname)
The target metric to track. Currently, the only supported value is rps (meaning requests per second)
The target value of the metric. The number of replicas is calculated based on this number and automatically adjusts (scales up or down) as this metric changes
The delay in seconds before scaling up
The delay in seconds before scaling down
Partitioning type
Name of the header to use for partitioning
Partitioning type
Max allowed number of requests per second. Requests are tracked at millisecond granularity. For example, rps: 10 means at most 1 request per 100ms
URL path prefix to which this limit is applied. If an incoming request matches several prefixes, the longest prefix is applied
The partitioning key. Each incoming request belongs to a partition and rate limits are applied per partition. Defaults to partitioning by client IP address
{
"type": "ip_address"
}
Max number of requests that can be passed to the service ahead of the rate limit
The name of the HTTP header
The value of the HTTP header
The probe type. Must be http
The URL to request. Defaults to /
The HTTP method to use for the probe (e.g., get, post, etc.). Defaults to get
The HTTP request body to send with the probe
Maximum amount of time the HTTP request is allowed to take. Defaults to 10s
Minimum amount of time between the end of one probe execution and the start of the next. Defaults to 15s
The number of consecutive successful probe executions required for the replica to be considered ready. Used during rolling deployments. Defaults to 1
If true, the probe will stop being executed as soon as it reaches the ready_after threshold of successful executions. Defaults to false
The number of replicas. Can be a number (e.g. 2) or a range (0..4 or 1..8). If it's a range, the scaling property is required
The name of the replica group. If not provided, defaults to '0', '1', etc. based on position.
The resources requirements for replicas in this group
{
"cpu": {
"min": 2,
"max": null
},
"memory": {
"min": 8.0,
"max": null
},
"shm_size": null,
"gpu": {
"vendor": null,
"name": null,
"count": {
"min": 0,
"max": null
},
"memory": null,
"total_memory": null,
"compute_capability": null
},
"disk": {
"size": {
"min": 100.0,
"max": null
}
}
}
The shell commands to run for replicas in this group
[]
The router type
The routing policy. Options: random, round_robin, cache_aware, power_of_two
Enable PD disaggregation mode for the SGLang router
The port the application listens on
The name of the gateway. Specify boolean false to run without a gateway. Specify boolean true to run with the default gateway. Omit to run with the default gateway if there is one, or without a gateway otherwise
Strip the /proxy/services/<project name>/<run name>/ path prefix when forwarding requests to the service. Only takes effect when running the service without a gateway
Mapping of the model for the OpenAI-compatible endpoint provided by dstack. Can be a full model format definition or just a model name. If it's a name, the service is expected to expose an OpenAI-compatible API at the /v1 path
Enable HTTPS if running with a gateway. Set to auto to determine automatically based on gateway configuration. Defaults to true
Enable the authorization
The list of probes to determine service health. If model is set, defaults to a /v1/chat/completions probe. Set explicitly to override
The number of replicas or a list of replica groups. Can be an integer (e.g., 2), a range (e.g., 0..4), or a list of replica groups. Each replica group defines replicas with shared configuration (commands, resources, scaling). When replicas is a list of replica groups, top-level scaling, commands, and resources are not allowed and must be specified in each replica group instead.
Router configuration for the service. Requires a gateway with matching router enabled.
The shell commands to run
[]
The run name. If not specified, a random name is generated
The name of the Docker image to run
The user inside the container, user_name_or_id[:group_name_or_id] (e.g., ubuntu, 1000:1000). Defaults to the default user from the image
Run the container in privileged mode
The Docker entrypoint
The absolute path to the working directory inside the container. Defaults to the image's default working directory
Use image with NVIDIA CUDA Compiler (NVCC) included. Mutually exclusive with image and docker
Whether to clone and track only the current branch or all remote branches. Relevant only when using remote Git repos. Defaults to false for dev environments and to true for tasks and services
The shell used to run commands. Allowed values are sh, bash, or an absolute path, e.g., /usr/bin/zsh. Defaults to /bin/sh if the image is specified, /bin/bash otherwise
The resources requirements to run the configuration
{
"cpu": {
"min": 2,
"max": null
},
"memory": {
"min": 8.0,
"max": null
},
"shm_size": null,
"gpu": {
"vendor": null,
"name": null,
"count": {
"min": 0,
"max": null
},
"memory": null,
"total_memory": null,
"compute_capability": null
},
"disk": {
"size": {
"min": 100.0,
"max": null
}
}
}
The priority of the run, an integer between 0 and 100. dstack tries to provision runs with higher priority first. Defaults to 0
The volumes mount points
[]
Use Docker inside the container. Mutually exclusive with image, python, and nvcc. Overrides privileged
The local to container file path mappings
[]
[]
The backends to consider for provisioning (e.g., [aws, gcp])
The regions to consider for provisioning (e.g., [eu-west-1, us-west4, westeurope])
The availability zones to consider for provisioning (e.g., [eu-west-1a, us-west4-a])
The cloud-specific instance types to consider for provisioning (e.g., [p3.8xlarge, n1-standard-4])
The existing reservation to use for instance provisioning. Supports AWS Capacity Reservations, AWS Capacity Blocks, and GCP reservations
The policy for provisioning spot or on-demand instances: spot, on-demand, auto. Defaults to on-demand
The policy for resubmitting the run. Defaults to false
The maximum duration of a run (e.g., 2h, 1d, etc) in a running state, excluding provisioning and pulling. After it elapses, the run is automatically stopped. Use off for unlimited duration. Defaults to off
The maximum duration of a run graceful stopping. After it elapses, the run is automatically forced stopped. This includes force detaching volumes used by the run. Use off for unlimited duration. Defaults to 5m
The maximum instance price per hour, in dollars
The policy for using instances from fleets: reuse, reuse-or-create. Defaults to reuse-or-create
Time to wait before terminating idle instances. When the run reuses an existing fleet instance, the fleet's idle_duration applies. When the run provisions a new instance, the shorter of the fleet's and run's values is used. Defaults to 5m for runs and 3d for fleets. Use off for unlimited duration. Only applied for VM-based backends
The order in which master and workers jobs are started: any, master-first, workers-first. Defaults to any
The criteria determining when a multi-node run should be considered finished: all-done, master-done. Defaults to all-done
The fleets considered for reuse. For fleets owned by the current project, specify fleet names. For imported fleets, specify <project name>/<fleet name>
The custom tags to associate with the resource. The tags are also propagated to the underlying backend resources. If there is a conflict with backend-level tags, does not override them
The IP address or domain of proxy host
The user to log in with for proxy host
The private key to use for proxy host
The SSH port of proxy host
2 nested properties
The IP address or domain to connect to
The SSH port to connect to for this host
The user to log in with for this host
The private key to use for this host
The internal IP of the host used for communication inside the cluster. If not specified, dstack will use the IP address from network or from the first found internal network.
2 nested properties
The amount of blocks to split the instance into, a number or auto. auto means as many as possible. The number of GPUs and CPUs must be divisible by the number of blocks. Defaults to the top-level blocks value.
The per host connection parameters: a hostname or an object that overrides default ssh parameters
The user to log in with on all hosts
The SSH port to connect to
The private key to use for all hosts
2 nested properties
The network address for cluster setup in the format <ip>/<netmask>. dstack will use IP addresses from this network for communication between hosts. If not specified, dstack will use IPs from the first found internal network.
The minimum number of instances to maintain in the fleet
The number of instances to provision on fleet apply. min <= target <= max Defaults to min
The maximum number of instances allowed in the fleet. Unlimited if not specified
An enumeration.
The fleet name
The number of instances in cloud fleet
The existing reservation to use for instance provisioning. Supports AWS Capacity Reservations, AWS Capacity Blocks, and GCP reservations
The resources requirements
{
"cpu": {
"min": 2,
"max": null
},
"memory": {
"min": 8.0,
"max": null
},
"shm_size": null,
"gpu": {
"vendor": null,
"name": null,
"count": {
"min": 0,
"max": null
},
"memory": null,
"total_memory": null,
"compute_capability": null
},
"disk": {
"size": {
"min": 100.0,
"max": null
}
}
}
The amount of blocks to split the instance into, a number or auto. auto means as many as possible. The number of GPUs and CPUs must be divisible by the number of blocks. Defaults to 1, i.e. do not split
The backends to consider for provisioning (e.g., [aws, gcp])
The regions to consider for provisioning (e.g., [eu-west-1, us-west4, westeurope])
The availability zones to consider for provisioning (e.g., [eu-west-1a, us-west4-a])
The cloud-specific instance types to consider for provisioning (e.g., [p3.8xlarge, n1-standard-4])
The policy for provisioning spot or on-demand instances: spot, on-demand, auto. Defaults to on-demand
The policy for provisioning retry. Defaults to false
The maximum instance price per hour, in dollars
Time to wait before terminating idle instances. Instances are not terminated if the fleet is already at nodes.min. Defaults to 5m for runs and 3d for fleets. Use off for unlimited duration
The custom tags to associate with the resource. The tags are also propagated to the underlying backend resources. If there is a conflict with backend-level tags, does not override them
Gateway-level router configuration. type and policy only. pd_disaggregation is service-level.
The router type enabled on this gateway.
The routing policy. Deprecated: prefer setting policy in the service's router config. Options: random, round_robin, cache_aware, power_of_two
Automatic certificates by Let's Encrypt
The ARN of the wildcard ACM certificate for the domain
Certificates by AWS Certificate Manager (ACM)
The gateway region
The gateway name
Make the gateway default
Backend-specific instance type to use for the gateway instance. Omit to use the backend's default, which is typically a small non-GPU instance
The router configuration for this gateway. E.g. { type: sglang, policy: round_robin }.
The gateway domain, e.g. example.com
Allocate public IP for the gateway
The SSL certificate configuration. Set to null to disable. Defaults to type: lets-encrypt
{
"type": "lets-encrypt"
}
The custom tags to associate with the gateway. The tags are also propagated to the underlying backend resources. If there is a conflict with backend-level tags, does not override them
The volume region
The volume name
The volume availability zone
The volume size. Must be specified when creating new volumes
The volume ID. Must be specified when registering external volumes
Time to wait after volume is no longer used by any job before deleting it. Defaults to keep the volume indefinitely. Use the value 'off' or -1 to disable auto-cleanup.
The custom tags to associate with the volume. The tags are also propagated to the underlying backend resources. If there is a conflict with backend-level tags, does not override them