Ray
Ray autocluster configuration file
| Type | object |
|---|---|
| File match |
ray-*-cluster.yaml
|
| Schema URL | https://catalog.lintel.tools/schemas/schemastore/ray/latest.json |
| Source | https://raw.githubusercontent.com/ray-project/ray/master/python/ray/autoscaler/ray-schema.json |
Validate with Lintel
npx @lintel/lintel check
Ray autoscaler schema
Properties
A unique identifier for the head node and workers of this cluster.
Cloud-provider specific configuration.
24 nested properties
e.g. aws, azure, gcp,...
e.g. us-east-1
module, if using external node provider
gcp project id, if using gcp
local cluster head node
don't require public ips
k8s namespace, if using k8s
Azure location
Azure resource group
Azure user-defined tags
Azure subscription id
User-defined managed identity (generated by config)
User-defined managed identity principal id (generated by config)
Network subnet id
k8s autoscaler permissions, if using k8s
k8s autoscaler permissions, if using k8s
k8s autoscaler permissions, if using k8s
Whether to try to reuse previously stopped nodes instead of launching nodes. This will also cause the autoscaler to stop nodes instead of terminating them. Only implemented for AWS.
GCP availability zone
GCP globally unique project id
AWS security group
2 nested properties
Security group name
Security group in bound rules
Disables node updaters if set to True. Default is False. (For Kubernetes operator usage.)
Credentials for authenticating with the GCP client
2 nested properties
Credentials type: either temporary OAuth 2.0 token or permanent service account credentials blob.
Oauth token or JSON string constituting service account credentials
The maximum number of workers nodes to launch in addition to the head node. This should be no larger than the sum of min_workers for all available node types.
The autoscaler will scale up the cluster faster with higher upscaling speed. E.g., if the task requires adding more nodes then autoscaler will gradually scale up the cluster in chunks of upscaling_speed*currently_running_nodes. This number should be > 0.
If a node is idle for this many minutes, it will be removed.
How Ray will authenticate with newly launched nodes.
4 nested properties
A value for ProxyCommand ssh option, for connecting through proxies. Example: nc -x proxy.example.com:1234 %h %p
Docker configuration. If this is specified, all setup and start commands will be executed in the container.
11 nested properties
the docker image name
run docker pull first
shared options for starting head/worker docker
image for head node, takes precedence over 'image' if specified
head specific run options, appended to run_options
analogous to head_image
analogous to head_run_options
disable Ray from automatically using the NVIDIA runtime if available
disable Ray from automatically detecting /dev/shm size for the container
Use 'podman' command in place of 'docker'
If using multiple node types, specifies the head node type.
Map of remote paths to local paths, e.g. {"/tmp/data": "/my/local/data"}
List of paths on the head node which should sync to the worker nodes, e.g. ["/some/data/somehwere"]
If enabled, file mounts will sync continously between the head node and the worker nodes. The nodes will not re-run setup commands if only the contents of the file mounts folders change.
File pattern to not sync up or down when using the rsync command. Matches the format of rsync's --exclude param.
Pattern files to lookup patterns to exclude when using rsync up or rsync down. This file is checked for recursively in all directories. For example, if .gitignore is provided here, the behavior will match git's .gitignore behavior.
Metadata field that can be used to store user-defined data in the cluster config. Ray does not interpret these fields.
Whether to avoid restarting the cluster during updates. This field is controlled by the ray --no-restart flag and cannot be set by the user.
A list of node types for multi-node-type autoscaling.