Type DevEnvironmentConfigurationRequest | TaskConfigurationRequest | ServiceConfigurationRequest | FleetConfigurationRequest | GatewayConfigurationRequest | VolumeConfigurationRequest
File match *.dstack.yml *.dstack.yaml
Schema URL https://catalog.lintel.tools/schemas/schemastore/dstack-configuration/latest.json
Source https://dstack-runner-downloads.s3.eu-west-1.amazonaws.com/latest/schemas/configuration.json

Validate with Lintel

npx @lintel/lintel check

One of

Definitions

PortMappingRequest object
container_port integer required
max=65536exclusiveMin=0
local_port integer
max=65536exclusiveMin=0
RegistryAuthRequest object

Credentials for pulling a private Docker image.

Attributes: username (str): The username password (str): The password or access token

username string required

The username

password string required

The password or access token

PythonVersion string

An enumeration.

EnvSentinelRequest object
key string required
Env string[] | object

Env represents a mapping of process environment variables, as in environ(7). Environment values may be omitted, in that case the :class:EnvSentinel object is used as a placeholder.

To create an instance from a dict[str, str] or a list[str] use pydantic's :meth:BaseModel.parse_obj(dict | list) method.

NB: this is NOT a CoreModel, pydantic-duality, which is used as a base for the CoreModel, doesn't play well with custom root models.

CPUArchitecture string

An enumeration.

Range_int_ object
min integer
max integer
CPUSpecRequest object
arch

The CPU architecture, one of: x86, arm

All of: CPUArchitecture string
count Range_int_ | integer | string

The number of CPU cores

Default:
{
  "min": 2,
  "max": null
}
Range_Memory_ object
min number
max number
AcceleratorVendor string

An enumeration.

GPUSpecRequest object
vendor

The vendor of the GPU/accelerator, one of: nvidia, amd, google (alias: tpu), intel

All of: AcceleratorVendor string
name array | string

The name of the GPU (e.g., A100 or H100)

count Range_int_ | integer | string

The number of GPUs

Default:
{
  "min": 1,
  "max": null
}
memory Range_Memory_ | integer | string

The RAM size (e.g., 16GB). Can be set to a range (e.g. 16GB.., or 16GB..80GB)

total_memory Range_Memory_ | integer | string

The total RAM size (e.g., 32GB). Can be set to a range (e.g. 16GB.., or 16GB..80GB)

compute_capability array

The minimum compute capability of the GPU (e.g., 7.5)

DiskSpecRequest object
size Range_Memory_ | integer | string required

Disk size

ResourcesSpecRequest object
cpu CPUSpecRequest | Range_int_ | integer | string

The CPU requirements

Default:
{
  "arch": null,
  "count": {
    "min": 2,
    "max": null
  }
}
memory Range_Memory_ | integer | string

The RAM size (e.g., 8GB)

Default:
{
  "min": 8.0,
  "max": null
}
shm_size number | integer | string

The size of shared memory (e.g., 8GB). If you are using parallel communicating processes (e.g., dataloaders in PyTorch), you may need to configure this

gpu GPUSpecRequest | integer | string

The GPU requirements

Default:
{
  "vendor": null,
  "name": null,
  "count": {
    "min": 0,
    "max": null
  },
  "memory": null,
  "total_memory": null,
  "compute_capability": null
}
disk DiskSpecRequest | integer | string

The disk resources

Default:
{
  "size": {
    "min": 100.0,
    "max": null
  }
}
VolumeMountPointRequest object
name string | string[] required

The network volume name or the list of network volume names to mount. If a list is specified, one of the volumes in the list will be mounted. Specify volumes from different backends/regions to increase availability

path string required

The absolute container path to mount the volume at

InstanceMountPointRequest object
instance_path string required

The absolute path on the instance (host)

path string required

The absolute path in the container

optional boolean

Allow running without this volume in backends that do not support instance volumes

Default: false
RepoExistsAction string

An enumeration.

RepoSpecRequest object
local_path string

The path to the Git repo on the user's machine. Relative paths are resolved relative to the parent directory of the the configuration file. Mutually exclusive with url

url string

The Git repo URL. Mutually exclusive with local_path

branch string

The repo branch. Defaults to the active branch for local paths and the default branch for URLs

hash string

The commit hash

path string

The repo path inside the run container. Relative paths are resolved relative to the working directory

Default: "."
if_exists

The action to be taken if path exists and is not empty. One of: error, skip

Default: "error"
All of: RepoExistsAction string
FilePathMappingRequest object
local_path string required

The path on the user's machine. Relative paths are resolved relative to the parent directory of the the configuration file

path string required

The path in the container. Relative paths are resolved relative to the working directory

BackendType string

Attributes: AMDDEVCLOUD (BackendType): AMD Developer Cloud AWS (BackendType): Amazon Web Services AZURE (BackendType): Microsoft Azure CLOUDRIFT (BackendType): CloudRift CRUSOE (BackendType): Crusoe CUDO (BackendType): Cudo DATACRUNCH (BackendType): DataCrunch (for backward compatibility) DIGITALOCEAN (BackendType): DigitalOcean DSTACK (BackendType): dstack Sky GCP (BackendType): Google Cloud Platform HOTAISLE (BackendType): Hot Aisle KUBERNETES (BackendType): Kubernetes LAMBDA (BackendType): Lambda Cloud NEBIUS (BackendType): Nebius AI Cloud OCI (BackendType): Oracle Cloud Infrastructure RUNPOD (BackendType): Runpod Cloud TENSORDOCK (BackendType): TensorDock Marketplace VASTAI (BackendType): Vast.ai Marketplace VERDA (BackendType): Verda Cloud VULTR (BackendType): Vultr

SpotPolicy string

An enumeration.

RetryEvent string

An enumeration.

ProfileRetryRequest object
on_events RetryEvent[]

The list of events that should be handled with retry. Supported events are no-capacity, interruption, error. Omit to retry on all events

duration integer | string

The maximum period of retrying the run, e.g., 4h or 1d. The period is calculated as a run age for no-capacity event and as a time passed since the last interruption and error for interruption and error events.

CreationPolicy string

An enumeration.

UtilizationPolicyRequest object
min_gpu_utilization integer required

Minimum required GPU utilization, percent. If any GPU has utilization below specified value during the whole time window, the run is terminated

min=0max=100
time_window integer | string required

The time window of metric samples taking into account to measure utilization (e.g., 30m, 1h). Minimum is 5m

StartupOrder string

An enumeration.

StopCriteria string

An enumeration.

ScheduleRequest object
cron string[] | string required

A cron expression or a list of cron expressions specifying the UTC time when the run needs to be started

EntityReferenceRequest object

Cross-project entity reference.

name string required

The entity name

project string

The project name. If unspecified, refers to the current project

DevEnvironmentConfigurationRequest object
ide string | string | string

The IDE to pre-install. Supported values include vscode, cursor, and windsurf. Defaults to no IDE (SSH only)

version string

The version of the IDE. For windsurf, the version is in the format version@commit

init string[]

The shell commands to run on startup

Default:
[]
inactivity_duration string | integer | boolean | string

The maximum amount of time the dev environment can be inactive (e.g., 2h, 1d, etc). After it elapses, the dev environment is automatically stopped. Inactivity is defined as the absence of SSH connections to the dev environment, including VS Code connections, ssh <run name> shells, and attached dstack apply or dstack attach commands. Use off for unlimited duration. Can be updated in-place. Defaults to off

ports integer | string | PortMappingRequest[]

Port numbers/mapping to expose

Default:
[]
type string
Default: "dev-environment"
Values: "dev-environment"
name string

The run name. If not specified, a random name is generated

image string

The name of the Docker image to run

user string

The user inside the container, user_name_or_id[:group_name_or_id] (e.g., ubuntu, 1000:1000). Defaults to the default user from the image

privileged boolean

Run the container in privileged mode

Default: false
entrypoint string

The Docker entrypoint

working_dir string

The absolute path to the working directory inside the container. Defaults to the image's default working directory

home_dir string
Default: "/root"
registry_auth

Credentials for pulling a private Docker image

All of: RegistryAuthRequest object
python

The major version of Python. Mutually exclusive with image and docker

All of: PythonVersion string
nvcc boolean

Use image with NVIDIA CUDA Compiler (NVCC) included. Mutually exclusive with image and docker

single_branch boolean

Whether to clone and track only the current branch or all remote branches. Relevant only when using remote Git repos. Defaults to false for dev environments and to true for tasks and services

env

The mapping or the list of environment variables

Default:
{
  "__root__": {}
}
All of: Env string[] | object
shell string

The shell used to run commands. Allowed values are sh, bash, or an absolute path, e.g., /usr/bin/zsh. Defaults to /bin/sh if the image is specified, /bin/bash otherwise

resources

The resources requirements to run the configuration

Default:
{
  "cpu": {
    "min": 2,
    "max": null
  },
  "memory": {
    "min": 8.0,
    "max": null
  },
  "shm_size": null,
  "gpu": {
    "vendor": null,
    "name": null,
    "count": {
      "min": 0,
      "max": null
    },
    "memory": null,
    "total_memory": null,
    "compute_capability": null
  },
  "disk": {
    "size": {
      "min": 100.0,
      "max": null
    }
  }
}
All of: ResourcesSpecRequest object
priority integer

The priority of the run, an integer between 0 and 100. dstack tries to provision runs with higher priority first. Defaults to 0

min=0max=100
volumes VolumeMountPointRequest | InstanceMountPointRequest | string[]

The volumes mount points

Default:
[]
docker boolean

Use Docker inside the container. Mutually exclusive with image, python, and nvcc. Overrides privileged

The list of Git repos

Default:
[]
files FilePathMappingRequest | string[]

The local to container file path mappings

Default:
[]
setup string[]
Default:
[]
backends BackendType[]

The backends to consider for provisioning (e.g., [aws, gcp])

regions string[]

The regions to consider for provisioning (e.g., [eu-west-1, us-west4, westeurope])

availability_zones string[]

The availability zones to consider for provisioning (e.g., [eu-west-1a, us-west4-a])

instance_types string[]

The cloud-specific instance types to consider for provisioning (e.g., [p3.8xlarge, n1-standard-4])

reservation string

The existing reservation to use for instance provisioning. Supports AWS Capacity Reservations, AWS Capacity Blocks, and GCP reservations

spot_policy

The policy for provisioning spot or on-demand instances: spot, on-demand, auto. Defaults to on-demand

All of: SpotPolicy string
retry ProfileRetryRequest | boolean

The policy for resubmitting the run. Defaults to false

max_duration string | integer | boolean | string

The maximum duration of a run (e.g., 2h, 1d, etc) in a running state, excluding provisioning and pulling. After it elapses, the run is automatically stopped. Use off for unlimited duration. Defaults to off

stop_duration string | integer | boolean | string

The maximum duration of a run graceful stopping. After it elapses, the run is automatically forced stopped. This includes force detaching volumes used by the run. Use off for unlimited duration. Defaults to 5m

max_price number

The maximum instance price per hour, in dollars

exclusiveMin=0.0
creation_policy

The policy for using instances from fleets: reuse, reuse-or-create. Defaults to reuse-or-create

All of: CreationPolicy string
idle_duration integer | string

Time to wait before terminating idle instances. When the run reuses an existing fleet instance, the fleet's idle_duration applies. When the run provisions a new instance, the shorter of the fleet's and run's values is used. Defaults to 5m for runs and 3d for fleets. Use off for unlimited duration. Only applied for VM-based backends

utilization_policy

Run termination policy based on utilization

All of: UtilizationPolicyRequest object
startup_order

The order in which master and workers jobs are started: any, master-first, workers-first. Defaults to any

All of: StartupOrder string
stop_criteria

The criteria determining when a multi-node run should be considered finished: all-done, master-done. Defaults to all-done

All of: StopCriteria string
schedule

The schedule for starting the run at specified time

All of: ScheduleRequest object
fleets EntityReferenceRequest | string[]

The fleets considered for reuse. For fleets owned by the current project, specify fleet names. For imported fleets, specify <project name>/<fleet name>

tags Record<string, string>

The custom tags to associate with the resource. The tags are also propagated to the underlying backend resources. If there is a conflict with backend-level tags, does not override them

TaskConfigurationRequest object
nodes integer

Number of nodes

Default: 1
min=1
ports integer | string | PortMappingRequest[]

Port numbers/mapping to expose

Default:
[]
commands string[]

The shell commands to run

Default:
[]
type string
Default: "task"
Values: "task"
name string

The run name. If not specified, a random name is generated

image string

The name of the Docker image to run

user string

The user inside the container, user_name_or_id[:group_name_or_id] (e.g., ubuntu, 1000:1000). Defaults to the default user from the image

privileged boolean

Run the container in privileged mode

Default: false
entrypoint string

The Docker entrypoint

working_dir string

The absolute path to the working directory inside the container. Defaults to the image's default working directory

home_dir string
Default: "/root"
registry_auth

Credentials for pulling a private Docker image

All of: RegistryAuthRequest object
python

The major version of Python. Mutually exclusive with image and docker

All of: PythonVersion string
nvcc boolean

Use image with NVIDIA CUDA Compiler (NVCC) included. Mutually exclusive with image and docker

single_branch boolean

Whether to clone and track only the current branch or all remote branches. Relevant only when using remote Git repos. Defaults to false for dev environments and to true for tasks and services

env

The mapping or the list of environment variables

Default:
{
  "__root__": {}
}
All of: Env string[] | object
shell string

The shell used to run commands. Allowed values are sh, bash, or an absolute path, e.g., /usr/bin/zsh. Defaults to /bin/sh if the image is specified, /bin/bash otherwise

resources

The resources requirements to run the configuration

Default:
{
  "cpu": {
    "min": 2,
    "max": null
  },
  "memory": {
    "min": 8.0,
    "max": null
  },
  "shm_size": null,
  "gpu": {
    "vendor": null,
    "name": null,
    "count": {
      "min": 0,
      "max": null
    },
    "memory": null,
    "total_memory": null,
    "compute_capability": null
  },
  "disk": {
    "size": {
      "min": 100.0,
      "max": null
    }
  }
}
All of: ResourcesSpecRequest object
priority integer

The priority of the run, an integer between 0 and 100. dstack tries to provision runs with higher priority first. Defaults to 0

min=0max=100
volumes VolumeMountPointRequest | InstanceMountPointRequest | string[]

The volumes mount points

Default:
[]
docker boolean

Use Docker inside the container. Mutually exclusive with image, python, and nvcc. Overrides privileged

The list of Git repos

Default:
[]
files FilePathMappingRequest | string[]

The local to container file path mappings

Default:
[]
setup string[]
Default:
[]
backends BackendType[]

The backends to consider for provisioning (e.g., [aws, gcp])

regions string[]

The regions to consider for provisioning (e.g., [eu-west-1, us-west4, westeurope])

availability_zones string[]

The availability zones to consider for provisioning (e.g., [eu-west-1a, us-west4-a])

instance_types string[]

The cloud-specific instance types to consider for provisioning (e.g., [p3.8xlarge, n1-standard-4])

reservation string

The existing reservation to use for instance provisioning. Supports AWS Capacity Reservations, AWS Capacity Blocks, and GCP reservations

spot_policy

The policy for provisioning spot or on-demand instances: spot, on-demand, auto. Defaults to on-demand

All of: SpotPolicy string
retry ProfileRetryRequest | boolean

The policy for resubmitting the run. Defaults to false

max_duration string | integer | boolean | string

The maximum duration of a run (e.g., 2h, 1d, etc) in a running state, excluding provisioning and pulling. After it elapses, the run is automatically stopped. Use off for unlimited duration. Defaults to off

stop_duration string | integer | boolean | string

The maximum duration of a run graceful stopping. After it elapses, the run is automatically forced stopped. This includes force detaching volumes used by the run. Use off for unlimited duration. Defaults to 5m

max_price number

The maximum instance price per hour, in dollars

exclusiveMin=0.0
creation_policy

The policy for using instances from fleets: reuse, reuse-or-create. Defaults to reuse-or-create

All of: CreationPolicy string
idle_duration integer | string

Time to wait before terminating idle instances. When the run reuses an existing fleet instance, the fleet's idle_duration applies. When the run provisions a new instance, the shorter of the fleet's and run's values is used. Defaults to 5m for runs and 3d for fleets. Use off for unlimited duration. Only applied for VM-based backends

utilization_policy

Run termination policy based on utilization

All of: UtilizationPolicyRequest object
startup_order

The order in which master and workers jobs are started: any, master-first, workers-first. Defaults to any

All of: StartupOrder string
stop_criteria

The criteria determining when a multi-node run should be considered finished: all-done, master-done. Defaults to all-done

All of: StopCriteria string
schedule

The schedule for starting the run at specified time

All of: ScheduleRequest object
fleets EntityReferenceRequest | string[]

The fleets considered for reuse. For fleets owned by the current project, specify fleet names. For imported fleets, specify <project name>/<fleet name>

tags Record<string, string>

The custom tags to associate with the resource. The tags are also propagated to the underlying backend resources. If there is a conflict with backend-level tags, does not override them

TGIChatModelRequest object

Mapping of the model for the OpenAI-compatible endpoint.

Attributes: type (str): The type of the model, e.g. "chat" name (str): The name of the model. This name will be used both to load model configuration from the HuggingFace Hub and in the OpenAI-compatible endpoint. format (str): The format of the model, e.g. "tgi" if the model is served with HuggingFace's Text Generation Inference. chat_template (Optional[str]): The custom prompt template for the model. If not specified, the default prompt template from the HuggingFace Hub configuration will be used. eos_token (Optional[str]): The custom end of sentence token. If not specified, the default end of sentence token from the HuggingFace Hub configuration will be used.

name string required

The name of the model

format string required

The serving format. Must be set to tgi

Values: "tgi"
type string

The type of the model

Default: "chat"
Values: "chat"
chat_template string

The custom prompt template for the model. If not specified, the default prompt template from the HuggingFace Hub configuration will be used

eos_token string

The custom end of sentence token. If not specified, the default end of sentence token from the HuggingFace Hub configuration will be used

OpenAIChatModelRequest object

Mapping of the model for the OpenAI-compatible endpoint.

Attributes: type (str): The type of the model, e.g. "chat" name (str): The name of the model. This name will be used both to load model configuration from the HuggingFace Hub and in the OpenAI-compatible endpoint. format (str): The format of the model, i.e. "openai". prefix (str): The base_url prefix: <http://hostname/{prefix}/chat/completions>. Defaults to /v1.

name string required

The name of the model

format string required

The serving format. Must be set to openai

Values: "openai"
type string

The type of the model

Default: "chat"
Values: "chat"
prefix string

The base_url prefix (after hostname)

Default: "/v1"
ScalingSpecRequest object
metric string required

The target metric to track. Currently, the only supported value is rps (meaning requests per second)

Values: "rps"
target number required

The target value of the metric. The number of replicas is calculated based on this number and automatically adjusts (scales up or down) as this metric changes

exclusiveMin=0
scale_up_delay integer

The delay in seconds before scaling up

Default: 300
scale_down_delay integer

The delay in seconds before scaling down

Default: 600
IPAddressPartitioningKeyRequest object
type string

Partitioning type

Default: "ip_address"
Values: "ip_address"
HeaderPartitioningKeyRequest object
header string required

Name of the header to use for partitioning

maxLength=500pattern=^[a-zA-Z0-9-_]+$
type string

Partitioning type

Default: "header"
Values: "header"
RateLimitRequest object
rps number required

Max allowed number of requests per second. Requests are tracked at millisecond granularity. For example, rps: 10 means at most 1 request per 100ms

min=0.016666666666666666max=153722867280912930
prefix string

URL path prefix to which this limit is applied. If an incoming request matches several prefixes, the longest prefix is applied

Default: "/"
maxLength=4094pattern=^/[^\s\\{}]*$

The partitioning key. Each incoming request belongs to a partition and rate limits are applied per partition. Defaults to partitioning by client IP address

Default:
{
  "type": "ip_address"
}
burst integer

Max number of requests that can be passed to the service ahead of the rate limit

Default: 0
min=0max=9223372036854775807
HTTPHeaderSpecRequest object
name string required

The name of the HTTP header

minLength=1maxLength=256
value string required

The value of the HTTP header

minLength=1maxLength=2048
ProbeConfigRequest object
type string required

The probe type. Must be http

Values: "http"
url string

The URL to request. Defaults to /

method string

The HTTP method to use for the probe (e.g., get, post, etc.). Defaults to get

Values: "get" "post" "put" "delete" "patch" "head"

A list of HTTP headers to include in the request

Default:
[]
maxItems=16
body string

The HTTP request body to send with the probe

minLength=1maxLength=2048
timeout integer | string

Maximum amount of time the HTTP request is allowed to take. Defaults to 10s

interval integer | string

Minimum amount of time between the end of one probe execution and the start of the next. Defaults to 15s

ready_after integer

The number of consecutive successful probe executions required for the replica to be considered ready. Used during rolling deployments. Defaults to 1

min=1
until_ready boolean

If true, the probe will stop being executed as soon as it reaches the ready_after threshold of successful executions. Defaults to false

ReplicaGroupRequest object
count required

The number of replicas. Can be a number (e.g. 2) or a range (0..4 or 1..8). If it's a range, the scaling property is required

All of: Range[int] object
name string

The name of the replica group. If not provided, defaults to '0', '1', etc. based on position.

scaling

The auto-scaling rules. Required if count is set to a range

All of: ScalingSpecRequest object
resources

The resources requirements for replicas in this group

Default:
{
  "cpu": {
    "min": 2,
    "max": null
  },
  "memory": {
    "min": 8.0,
    "max": null
  },
  "shm_size": null,
  "gpu": {
    "vendor": null,
    "name": null,
    "count": {
      "min": 0,
      "max": null
    },
    "memory": null,
    "total_memory": null,
    "compute_capability": null
  },
  "disk": {
    "size": {
      "min": 100.0,
      "max": null
    }
  }
}
All of: ResourcesSpecRequest object
commands string[]

The shell commands to run for replicas in this group

Default:
[]
SGLangServiceRouterConfigRequest object
type string

The router type

Default: "sglang"
Values: "sglang"
policy string

The routing policy. Options: random, round_robin, cache_aware, power_of_two

Default: "cache_aware"
Values: "random" "round_robin" "cache_aware" "power_of_two"
pd_disaggregation boolean

Enable PD disaggregation mode for the SGLang router

Default: false
ServiceConfigurationRequest object
port integer | string | PortMappingRequest required

The port the application listens on

gateway boolean | string

The name of the gateway. Specify boolean false to run without a gateway. Specify boolean true to run with the default gateway. Omit to run with the default gateway if there is one, or without a gateway otherwise

strip_prefix boolean

Strip the /proxy/services/<project name>/<run name>/ path prefix when forwarding requests to the service. Only takes effect when running the service without a gateway

Default: true

Mapping of the model for the OpenAI-compatible endpoint provided by dstack. Can be a full model format definition or just a model name. If it's a name, the service is expected to expose an OpenAI-compatible API at the /v1 path

https boolean | string

Enable HTTPS if running with a gateway. Set to auto to determine automatically based on gateway configuration. Defaults to true

auth boolean

Enable the authorization

Default: true
scaling

The auto-scaling rules. Required if replicas is set to a range

All of: ScalingSpecRequest object
rate_limits RateLimitRequest[]

Rate limiting rules

Default:
[]

The list of probes to determine service health. If model is set, defaults to a /v1/chat/completions probe. Set explicitly to override

replicas ReplicaGroupRequest[] | Range_int_ | integer | string

The number of replicas or a list of replica groups. Can be an integer (e.g., 2), a range (e.g., 0..4), or a list of replica groups. Each replica group defines replicas with shared configuration (commands, resources, scaling). When replicas is a list of replica groups, top-level scaling, commands, and resources are not allowed and must be specified in each replica group instead.

router

Router configuration for the service. Requires a gateway with matching router enabled.

commands string[]

The shell commands to run

Default:
[]
type string
Default: "service"
Values: "service"
name string

The run name. If not specified, a random name is generated

image string

The name of the Docker image to run

user string

The user inside the container, user_name_or_id[:group_name_or_id] (e.g., ubuntu, 1000:1000). Defaults to the default user from the image

privileged boolean

Run the container in privileged mode

Default: false
entrypoint string

The Docker entrypoint

working_dir string

The absolute path to the working directory inside the container. Defaults to the image's default working directory

home_dir string
Default: "/root"
registry_auth

Credentials for pulling a private Docker image

All of: RegistryAuthRequest object
python

The major version of Python. Mutually exclusive with image and docker

All of: PythonVersion string
nvcc boolean

Use image with NVIDIA CUDA Compiler (NVCC) included. Mutually exclusive with image and docker

single_branch boolean

Whether to clone and track only the current branch or all remote branches. Relevant only when using remote Git repos. Defaults to false for dev environments and to true for tasks and services

env

The mapping or the list of environment variables

Default:
{
  "__root__": {}
}
All of: Env string[] | object
shell string

The shell used to run commands. Allowed values are sh, bash, or an absolute path, e.g., /usr/bin/zsh. Defaults to /bin/sh if the image is specified, /bin/bash otherwise

resources

The resources requirements to run the configuration

Default:
{
  "cpu": {
    "min": 2,
    "max": null
  },
  "memory": {
    "min": 8.0,
    "max": null
  },
  "shm_size": null,
  "gpu": {
    "vendor": null,
    "name": null,
    "count": {
      "min": 0,
      "max": null
    },
    "memory": null,
    "total_memory": null,
    "compute_capability": null
  },
  "disk": {
    "size": {
      "min": 100.0,
      "max": null
    }
  }
}
All of: ResourcesSpecRequest object
priority integer

The priority of the run, an integer between 0 and 100. dstack tries to provision runs with higher priority first. Defaults to 0

min=0max=100
volumes VolumeMountPointRequest | InstanceMountPointRequest | string[]

The volumes mount points

Default:
[]
docker boolean

Use Docker inside the container. Mutually exclusive with image, python, and nvcc. Overrides privileged

The list of Git repos

Default:
[]
files FilePathMappingRequest | string[]

The local to container file path mappings

Default:
[]
setup string[]
Default:
[]
backends BackendType[]

The backends to consider for provisioning (e.g., [aws, gcp])

regions string[]

The regions to consider for provisioning (e.g., [eu-west-1, us-west4, westeurope])

availability_zones string[]

The availability zones to consider for provisioning (e.g., [eu-west-1a, us-west4-a])

instance_types string[]

The cloud-specific instance types to consider for provisioning (e.g., [p3.8xlarge, n1-standard-4])

reservation string

The existing reservation to use for instance provisioning. Supports AWS Capacity Reservations, AWS Capacity Blocks, and GCP reservations

spot_policy

The policy for provisioning spot or on-demand instances: spot, on-demand, auto. Defaults to on-demand

All of: SpotPolicy string
retry ProfileRetryRequest | boolean

The policy for resubmitting the run. Defaults to false

max_duration string | integer | boolean | string

The maximum duration of a run (e.g., 2h, 1d, etc) in a running state, excluding provisioning and pulling. After it elapses, the run is automatically stopped. Use off for unlimited duration. Defaults to off

stop_duration string | integer | boolean | string

The maximum duration of a run graceful stopping. After it elapses, the run is automatically forced stopped. This includes force detaching volumes used by the run. Use off for unlimited duration. Defaults to 5m

max_price number

The maximum instance price per hour, in dollars

exclusiveMin=0.0
creation_policy

The policy for using instances from fleets: reuse, reuse-or-create. Defaults to reuse-or-create

All of: CreationPolicy string
idle_duration integer | string

Time to wait before terminating idle instances. When the run reuses an existing fleet instance, the fleet's idle_duration applies. When the run provisions a new instance, the shorter of the fleet's and run's values is used. Defaults to 5m for runs and 3d for fleets. Use off for unlimited duration. Only applied for VM-based backends

utilization_policy

Run termination policy based on utilization

All of: UtilizationPolicyRequest object
startup_order

The order in which master and workers jobs are started: any, master-first, workers-first. Defaults to any

All of: StartupOrder string
stop_criteria

The criteria determining when a multi-node run should be considered finished: all-done, master-done. Defaults to all-done

All of: StopCriteria string
schedule

The schedule for starting the run at specified time

All of: ScheduleRequest object
fleets EntityReferenceRequest | string[]

The fleets considered for reuse. For fleets owned by the current project, specify fleet names. For imported fleets, specify <project name>/<fleet name>

tags Record<string, string>

The custom tags to associate with the resource. The tags are also propagated to the underlying backend resources. If there is a conflict with backend-level tags, does not override them

SSHKeyRequest object
public string required
private string
SSHProxyParamsRequest object
hostname string required

The IP address or domain of proxy host

user string required

The user to log in with for proxy host

identity_file string required

The private key to use for proxy host

port integer

The SSH port of proxy host

ssh_key object
2 nested properties
public string required
private string
SSHHostParamsRequest object
hostname string required

The IP address or domain to connect to

port integer

The SSH port to connect to for this host

user string

The user to log in with for this host

identity_file string

The private key to use for this host

proxy_jump

The SSH proxy configuration for this host

All of: SSHProxyParamsRequest object
internal_ip string

The internal IP of the host used for communication inside the cluster. If not specified, dstack will use the IP address from network or from the first found internal network.

ssh_key object
2 nested properties
public string required
private string
blocks string | integer

The amount of blocks to split the instance into, a number or auto. auto means as many as possible. The number of GPUs and CPUs must be divisible by the number of blocks. Defaults to the top-level blocks value.

SSHParamsRequest object
hosts SSHHostParamsRequest | string[] required

The per host connection parameters: a hostname or an object that overrides default ssh parameters

user string

The user to log in with on all hosts

port integer

The SSH port to connect to

identity_file string

The private key to use for all hosts

ssh_key object
2 nested properties
public string required
private string
proxy_jump

The SSH proxy configuration for all hosts

All of: SSHProxyParamsRequest object
network string

The network address for cluster setup in the format <ip>/<netmask>. dstack will use IP addresses from this network for communication between hosts. If not specified, dstack will use IPs from the first found internal network.

FleetNodesSpecRequest object
min integer required

The minimum number of instances to maintain in the fleet

target integer required

The number of instances to provision on fleet apply. min <= target <= max Defaults to min

max integer

The maximum number of instances allowed in the fleet. Unlimited if not specified

InstanceGroupPlacement string

An enumeration.

FleetConfigurationRequest object
type string
Default: "fleet"
Values: "fleet"
name string

The fleet name

env

The mapping or the list of environment variables

Default:
{
  "__root__": {}
}
All of: Env string[] | object
ssh_config

The parameters for adding instances via SSH

All of: SSHParamsRequest object
nodes FleetNodesSpecRequest | integer | string

The number of instances in cloud fleet

placement

The placement of instances: any or cluster

All of: InstanceGroupPlacement string
reservation string

The existing reservation to use for instance provisioning. Supports AWS Capacity Reservations, AWS Capacity Blocks, and GCP reservations

resources

The resources requirements

Default:
{
  "cpu": {
    "min": 2,
    "max": null
  },
  "memory": {
    "min": 8.0,
    "max": null
  },
  "shm_size": null,
  "gpu": {
    "vendor": null,
    "name": null,
    "count": {
      "min": 0,
      "max": null
    },
    "memory": null,
    "total_memory": null,
    "compute_capability": null
  },
  "disk": {
    "size": {
      "min": 100.0,
      "max": null
    }
  }
}
All of: ResourcesSpecRequest object
blocks string | integer

The amount of blocks to split the instance into, a number or auto. auto means as many as possible. The number of GPUs and CPUs must be divisible by the number of blocks. Defaults to 1, i.e. do not split

Default: 1
backends BackendType[]

The backends to consider for provisioning (e.g., [aws, gcp])

regions string[]

The regions to consider for provisioning (e.g., [eu-west-1, us-west4, westeurope])

availability_zones string[]

The availability zones to consider for provisioning (e.g., [eu-west-1a, us-west4-a])

instance_types string[]

The cloud-specific instance types to consider for provisioning (e.g., [p3.8xlarge, n1-standard-4])

spot_policy

The policy for provisioning spot or on-demand instances: spot, on-demand, auto. Defaults to on-demand

All of: SpotPolicy string
retry ProfileRetryRequest | boolean

The policy for provisioning retry. Defaults to false

max_price number

The maximum instance price per hour, in dollars

exclusiveMin=0.0
idle_duration integer | string

Time to wait before terminating idle instances. Instances are not terminated if the fleet is already at nodes.min. Defaults to 5m for runs and 3d for fleets. Use off for unlimited duration

tags Record<string, string>

The custom tags to associate with the resource. The tags are also propagated to the underlying backend resources. If there is a conflict with backend-level tags, does not override them

SGLangGatewayRouterConfigRequest object

Gateway-level router configuration. type and policy only. pd_disaggregation is service-level.

type string

The router type enabled on this gateway.

Default: "sglang"
Values: "sglang"
policy string

The routing policy. Deprecated: prefer setting policy in the service's router config. Options: random, round_robin, cache_aware, power_of_two

Default: "cache_aware"
Values: "random" "round_robin" "cache_aware" "power_of_two"
LetsEncryptGatewayCertificateRequest object
type string

Automatic certificates by Let's Encrypt

Default: "lets-encrypt"
Values: "lets-encrypt"
ACMGatewayCertificateRequest object
arn string required

The ARN of the wildcard ACM certificate for the domain

type string

Certificates by AWS Certificate Manager (ACM)

Default: "acm"
Values: "acm"
GatewayConfigurationRequest object
backend required

The gateway backend

All of: BackendType string
region string required

The gateway region

type string
Default: "gateway"
Values: "gateway"
name string

The gateway name

default boolean

Make the gateway default

Default: false
instance_type string

Backend-specific instance type to use for the gateway instance. Omit to use the backend's default, which is typically a small non-GPU instance

minLength=1
router

The router configuration for this gateway. E.g. { type: sglang, policy: round_robin }.

domain string

The gateway domain, e.g. example.com

public_ip boolean

Allocate public IP for the gateway

Default: true

The SSL certificate configuration. Set to null to disable. Defaults to type: lets-encrypt

Default:
{
  "type": "lets-encrypt"
}
tags Record<string, string>

The custom tags to associate with the gateway. The tags are also propagated to the underlying backend resources. If there is a conflict with backend-level tags, does not override them

VolumeConfigurationRequest object
backend required

The volume backend

All of: BackendType string
region string required

The volume region

type string
Default: "volume"
Values: "volume"
name string

The volume name

availability_zone string

The volume availability zone

size number

The volume size. Must be specified when creating new volumes

volume_id string

The volume ID. Must be specified when registering external volumes

auto_cleanup_duration integer | string

Time to wait after volume is no longer used by any job before deleting it. Defaults to keep the volume indefinitely. Use the value 'off' or -1 to disable auto-cleanup.

tags Record<string, string>

The custom tags to associate with the volume. The tags are also propagated to the underlying backend resources. If there is a conflict with backend-level tags, does not override them