Starlake Data Pipeline (SchemaStore) JSON Schema

Type	object
File match	`*.sl.yml`
Schema URL	https://catalog.lintel.tools/schemas/schemastore/starlake-data-pipeline/latest.json
Source	https://www.schemastore.org/starlake.json

ConvertibleToString string | boolean | number | integer | null

MergeOnV1 const: "TARGET" | const: "SOURCE_AND_TARGET"

PrimitiveTypeV1 string | boolean | number | integer | null

TrimV1 string | boolean | number | integer | null

TableSync string | boolean | number | integer | null

TableDdlV1 object

DDL used to create a table

createSql string | boolean | number | integer | null required

pingSql string | boolean | number | integer | null

selectSql string | boolean | number | integer | null

Materialization string | boolean | number | integer | null

TableTypeBase string | boolean | number | integer | null

TableTypeV1 string | boolean | number | integer | null

TypeV1 object

Custom type definition. Custom types are defined in the types/types.sl.yml file

name string | boolean | number | integer | null required

pattern string | boolean | number | integer | null required

Define the value type

zone string | boolean | number | integer | null

sample string | boolean | number | integer | null

comment string | boolean | number | integer | null

ddlMapping Record<string, string | boolean | number | integer | null>

Map of string

PositionV1 object

First and last char positions of an attribute in a fixed length record

first number required

Zero based position of the first character for this attribute

last number required

Zero based position of the last character to include in this attribute

ConnectionV1 object

Connection properties to a datawarehouse.

type string | boolean | number | integer | null required

sparkFormat string | boolean | number | integer | null

loader string | boolean | number | integer | null

quote string | boolean | number | integer | null

separator string | boolean | number | integer | null

options Record<string, string | boolean | number | integer | null>

Map of string

DagGenerationConfigV1 object

Dag configuration.

template string | boolean | number | integer | null required

filename string | boolean | number | integer | null required

comment string | boolean | number | integer | null

options Record<string, string | boolean | number | integer | null>

Map of string

RowLevelSecurityV1 object

Row level security policy to apply to the output data.

name string | boolean | number | integer | null required

grants ConvertibleToString[] required

user / groups / service accounts to which this security level is applied. ex : user:[email protected],group:[email protected],serviceAccount:[email protected]

predicate string | boolean | number | integer | null

description string | boolean | number | integer | null

AccessControlEntryV1 object

Column level security policy to apply to the attribute.

role string | boolean | number | integer | null required

grants ConvertibleToString[] required

user / groups / service accounts to which this security level is applied. ex : user:[email protected],group:[email protected],serviceAccount:[email protected]

name string | boolean | number | integer | null

FormatV1 string | boolean | number | integer | null

MapString Record<string, string | boolean | number | integer | null>

Map of string

MapConnectionV1 Record<string, object>

Map of jdbc engines

MapJdbcEngineV1 Record<string, object>

Map of jdbc engines

MapTableDdlV1 Record<string, object>

Map of table ddl

JdbcEngineV1 object

Jdbc engine

tables Record<string, object> required

Map of table ddl

quote string required

How to quote identifiers

strategyBuilder string required

Override the default strategy builder used to write data. A strategy is a folder located under metadata/templates/write-strategies/[strategyBuilder]

viewPrefix string

When creating views, how they should be prefixed. Some databases like redshift require view name to be prefixed by the character '#'. This is not required for other databases like snowflake or bigquery. Default is empty string

preActions string

SQL statements to execute immediately after the database connection is opened (e.g., SET commands)

partitionBy string

keyword used to partition the table. Default is PARTITION BY

clusterBy string

keyword used to cluster the table. Default is CLUSTER BY

columnRemarks string

How to get column remarks

tableRemarks string

How to get table remarks

PrivacyV1 object

options Record<string, string | boolean | number | integer | null>

Map of string

InternalV1 object

configure Spark internal options

cacheStorageLevel string | boolean | number | integer | null

intermediateBigqueryFormat string | boolean | number | integer | null

temporaryGcsBucket string | boolean | number | integer | null

substituteVars boolean

Internal use. Do not modify.

bqAuditSaveInBatchMode boolean

Should audit logs when using BigQuery be saved in batch or interactive mode ? Interactive by default (false)

AccessPoliciesV1 object

apply boolean

Should access policies be enforced ?

location string | boolean | number | integer | null

database string | boolean | number | integer | null

taxonomy string | boolean | number | integer | null

SparkSchedulingV1 object

maxJobs integer

Max number of Spark jobs to run in parallel, default is 1

poolName string | boolean | number | integer | null

mode string | boolean | number | integer | null

file string | boolean | number | integer | null

ExpectationsConfigV1 object

path string | boolean | number | integer | null

active boolean

should expectations be executed ?

failOnError boolean

should load / transform fail on expectation error ?

ExpectationItemV1 object

expect string | boolean | number | integer | null

failOnError boolean

should load / transform fail on expectation error ?

MetricsV1 object

path string | boolean | number | integer | null

discreteMaxCardinality integer

Max number of unique values accepted for a discrete column. Default is 10

active boolean

Should metrics be computed ?

AllSinksV1 object

connectionRef string | boolean | number | integer | null

clustering ConvertibleToString[]

FS or BQ: List of attributes to use for clustering

days number

BQ: Number of days before this table is set as expired and deleted. Never by default.

requirePartitionFilter boolean

BQ: Should be require a partition filter on every request ? No by default.

materializedView const: "TABLE" | const: "VIEW" | const: "MATERIALIZED_VIEW" | const: "HYBRID"

Table types supported by the Sink option

enableRefresh boolean

BQ: Enable automatic refresh of materialized view ? false by default.

refreshIntervalMs number

BQ: Refresh interval in milliseconds. Default to BigQuery default value

id string | boolean | number | integer | null

format string | boolean | number | integer | null

extension string | boolean | number | integer | null

sharding ConvertibleToString[]

columns to use for sharding. table will be named table_{sharding(0)}_{sharding(1)}

partition string[]

FS or BQ: List of partition attributes

coalesce boolean

When outputting files, should we coalesce it to a single file. Useful when CSV is the output format.

path string

Optional path attribute if you want to save the file outside of the default location (datasets folder)

options Record<string, string | boolean | number | integer | null>

Map of string

WriteStrategyTypeBase string | boolean | number | integer | null

WriteStrategyTypeV1 string | boolean | number | integer | null

OpenWriteStrategyTypeV1 string | boolean | number | integer | null

WriteStrategyV1 object

type WriteStrategyTypeBase | ConvertibleToString

Write strategy type including custom strategies. Allows predefined strategies or custom strategy names

types Record<string, ConvertibleToString>

Map of connection type to write strategy. Allows different strategies per target database

key ConvertibleToString[]

List of columns to use as key(s) for the target table. This is used to update existing records in the target table.

timestamp string | boolean | number | integer | null

queryFilter string | boolean | number | integer | null

on const: "TARGET" | const: "SOURCE_AND_TARGET"

startTs string | boolean | number | integer | null

endTs string | boolean | number | integer | null

MetadataV1 object

DSV by default. Supported file formats are :\n- DSV : Delimiter-separated values file. Delimiter value is specified in the "separator" field.\n- POSITION : FIXED format file where values are located at an exact position in each line.\n- JSON_FLAT : For optimisation purpose, we differentiate JSON with top level values from JSON\n with deep level fields. JSON_FLAT are JSON files with top level fields only.\n- JSON : Deep JSON file. Use only when your json documents contain sub-documents, otherwise prefer to\n use JSON_FLAT since it is much faster.\n- XML : XML files

encoding string | boolean | number | integer | null

multiline boolean

are json objects on a single line or multiple line ? Single by default. false means single. false also means faster

array boolean

Is the json stored as a single object array ? false by default. This means that by default we have on json document per line.

withHeader boolean

does the dataset has a header ? true by default

separator string | boolean | number | integer | null

quote string | boolean | number | integer | null

escape string | boolean | number | integer | null

sink object

15 nested properties

connectionRef string | boolean | number | integer | null

clustering ConvertibleToString[]

FS or BQ: List of attributes to use for clustering

days number

BQ: Number of days before this table is set as expired and deleted. Never by default.

requirePartitionFilter boolean

BQ: Should be require a partition filter on every request ? No by default.

materializedView const: "TABLE" | const: "VIEW" | const: "MATERIALIZED_VIEW" | const: "HYBRID"

Table types supported by the Sink option

enableRefresh boolean

BQ: Enable automatic refresh of materialized view ? false by default.

refreshIntervalMs number

BQ: Refresh interval in milliseconds. Default to BigQuery default value

id string | boolean | number | integer | null

format string | boolean | number | integer | null

extension string | boolean | number | integer | null

sharding ConvertibleToString[]

columns to use for sharding. table will be named table_{sharding(0)}_{sharding(1)}

partition string[]

FS or BQ: List of partition attributes

coalesce boolean

When outputting files, should we coalesce it to a single file. Useful when CSV is the output format.

path string

Optional path attribute if you want to save the file outside of the default location (datasets folder)

options Record<string, string | boolean | number | integer | null>

Map of string

directory string | boolean | number | integer | null

extensions ConvertibleToString[]

recognized filename extensions. json, csv, dsv, psv are recognized by default. Only files with these extensions will be moved to the stage folder.

ack string | boolean | number | integer | null

options Record<string, string | boolean | number | integer | null>

Map of string

loader string | boolean | number | integer | null

emptyIsNull boolean

Treat empty columns as null in DSV files. Default to false

dagRef string | boolean | number | integer | null

freshness object

2 nested properties

warn string | boolean | number | integer | null

error string | boolean | number | integer | null

nullValue string | boolean | number | integer | null

schedule string | boolean | number | integer | null

writeStrategy object

8 nested properties

type WriteStrategyTypeBase | ConvertibleToString

Write strategy type including custom strategies. Allows predefined strategies or custom strategy names

types Record<string, ConvertibleToString>

Map of connection type to write strategy. Allows different strategies per target database

key ConvertibleToString[]

List of columns to use as key(s) for the target table. This is used to update existing records in the target table.

timestamp string | boolean | number | integer | null

queryFilter string | boolean | number | integer | null

on const: "TARGET" | const: "SOURCE_AND_TARGET"

startTs string | boolean | number | integer | null

endTs string | boolean | number | integer | null

AreaV1 object

incoming string | boolean | number | integer | null

stage string | boolean | number | integer | null

unresolved string | boolean | number | integer | null

archive string | boolean | number | integer | null

ingesting string | boolean | number | integer | null

replay string | boolean | number | integer | null

hiveDatabase string | boolean | number | integer | null

FreshnessV1 object

warn string | boolean | number | integer | null

error string | boolean | number | integer | null

TableV1 object

Table Schema definition.

name string | boolean | number | integer | null required

pattern string | boolean | number | integer | null required

attributes AttributeV1[] required

Attributes parsing rules.

metadata object

20 nested properties

DSV by default. Supported file formats are :\n- DSV : Delimiter-separated values file. Delimiter value is specified in the "separator" field.\n- POSITION : FIXED format file where values are located at an exact position in each line.\n- JSON_FLAT : For optimisation purpose, we differentiate JSON with top level values from JSON\n with deep level fields. JSON_FLAT are JSON files with top level fields only.\n- JSON : Deep JSON file. Use only when your json documents contain sub-documents, otherwise prefer to\n use JSON_FLAT since it is much faster.\n- XML : XML files

encoding string | boolean | number | integer | null

multiline boolean

are json objects on a single line or multiple line ? Single by default. false means single. false also means faster

array boolean

Is the json stored as a single object array ? false by default. This means that by default we have on json document per line.

withHeader boolean

does the dataset has a header ? true by default

separator string | boolean | number | integer | null

quote string | boolean | number | integer | null

escape string | boolean | number | integer | null

sink object

15 nested properties

connectionRef string | boolean | number | integer | null

clustering ConvertibleToString[]

FS or BQ: List of attributes to use for clustering

days number

BQ: Number of days before this table is set as expired and deleted. Never by default.

requirePartitionFilter boolean

BQ: Should be require a partition filter on every request ? No by default.

materializedView const: "TABLE" | const: "VIEW" | const: "MATERIALIZED_VIEW" | const: "HYBRID"

Table types supported by the Sink option

enableRefresh boolean

BQ: Enable automatic refresh of materialized view ? false by default.

refreshIntervalMs number

BQ: Refresh interval in milliseconds. Default to BigQuery default value

id string | boolean | number | integer | null

format string | boolean | number | integer | null

extension string | boolean | number | integer | null

sharding ConvertibleToString[]

columns to use for sharding. table will be named table_{sharding(0)}_{sharding(1)}

partition string[]

FS or BQ: List of partition attributes

coalesce boolean

When outputting files, should we coalesce it to a single file. Useful when CSV is the output format.

path string

Optional path attribute if you want to save the file outside of the default location (datasets folder)

options Record<string, string | boolean | number | integer | null>

Map of string

directory string | boolean | number | integer | null

extensions ConvertibleToString[]

recognized filename extensions. json, csv, dsv, psv are recognized by default. Only files with these extensions will be moved to the stage folder.

ack string | boolean | number | integer | null

options Record<string, string | boolean | number | integer | null>

Map of string

loader string | boolean | number | integer | null

emptyIsNull boolean

Treat empty columns as null in DSV files. Default to false

dagRef string | boolean | number | integer | null

freshness object

2 nested properties

warn string | boolean | number | integer | null

error string | boolean | number | integer | null

nullValue string | boolean | number | integer | null

schedule string | boolean | number | integer | null

writeStrategy object

8 nested properties

type WriteStrategyTypeBase | ConvertibleToString

Write strategy type including custom strategies. Allows predefined strategies or custom strategy names

types Record<string, ConvertibleToString>

Map of connection type to write strategy. Allows different strategies per target database

key ConvertibleToString[]

List of columns to use as key(s) for the target table. This is used to update existing records in the target table.

timestamp string | boolean | number | integer | null

queryFilter string | boolean | number | integer | null

on const: "TARGET" | const: "SOURCE_AND_TARGET"

startTs string | boolean | number | integer | null

endTs string | boolean | number | integer | null

comment string | boolean | number | integer | null

streams ConvertibleToString[]

attach streams to table (Snowflake only)

presql ConvertibleToString[]

Reserved for future use.

postsql ConvertibleToString[]

List of SQL requests to executed after the table has been loaded.

tags ConvertibleToString[]

Set of string to attach to this Schema

rls RowLevelSecurityV1[]

Row level security on this schema.

expectations ExpectationItemV1[]

Expectations to check after Load / Transform has succeeded

primaryKey ConvertibleToString[]

List of columns that make up the primary key

acl AccessControlEntryV1[]

Map of rolename -> List[Users].

rename string | boolean | number | integer | null

sample string | boolean | number | integer | null

filter string | boolean | number | integer | null

patternSample string | boolean | number | integer | null

MetricTypeV1 string | boolean | number | integer | null

AttributeV1 object

name string | boolean | number | integer | null required

type string | boolean | number | integer | null

array boolean

Is this attribute an array/list of values? Default is false

required boolean

Should this attribute always be present in the source. Default to true.

privacy string | boolean | number | integer | null

comment string | boolean | number | integer | null

rename string | boolean | number | integer | null

sample string | boolean | number | integer | null

metricType const: "DISCRETE" | const: "CONTINUOUS" | const: "TEXT" | const: "NONE"

Used to compute metrics on column values.

attributes AttributeV1[]

List of sub-attributes (valid for JSON and XML files only)

position object

First and last char positions of an attribute in a fixed length record

2 nested properties

first number required

Zero based position of the first character for this attribute

last number required

Zero based position of the last character to include in this attribute

default string | boolean | number | integer | null

tags ConvertibleToString[]

Tags associated with this attribute

trim const: "LEFT" | const: "RIGHT" | const: "BOTH" | const: "NONE"

How to trim the input string

script string | boolean | number | integer | null

foreignKey string | boolean | number | integer | null

ignore boolean

Should this attribute be ignored on ingestion. Default to false

accessPolicy string | boolean | number | integer | null

AutoTaskDescV1 object

name string | boolean | number | integer | null

sql string | boolean | number | integer | null

streams ConvertibleToString[]

attach streams to task (Snowflake only)

primaryKey ConvertibleToString[]

List of columns that make up the primary key for the output table

database string | boolean | number | integer | null

domain string | boolean | number | integer | null

table string | boolean | number | integer | null

partition ConvertibleToString[]

List of columns used for partitioning the output.

presql ConvertibleToString[]

List of SQL requests to executed before the main SQL request is run

postsql ConvertibleToString[]

List of SQL requests to executed after the main SQL request is run

sink object

15 nested properties

connectionRef string | boolean | number | integer | null

clustering ConvertibleToString[]

FS or BQ: List of attributes to use for clustering

days number

BQ: Number of days before this table is set as expired and deleted. Never by default.

requirePartitionFilter boolean

BQ: Should be require a partition filter on every request ? No by default.

materializedView const: "TABLE" | const: "VIEW" | const: "MATERIALIZED_VIEW" | const: "HYBRID"

Table types supported by the Sink option

enableRefresh boolean

BQ: Enable automatic refresh of materialized view ? false by default.

refreshIntervalMs number

BQ: Refresh interval in milliseconds. Default to BigQuery default value

id string | boolean | number | integer | null

format string | boolean | number | integer | null

extension string | boolean | number | integer | null

sharding ConvertibleToString[]

columns to use for sharding. table will be named table_{sharding(0)}_{sharding(1)}

partition string[]

FS or BQ: List of partition attributes

coalesce boolean

When outputting files, should we coalesce it to a single file. Useful when CSV is the output format.

path string

Optional path attribute if you want to save the file outside of the default location (datasets folder)

options Record<string, string | boolean | number | integer | null>

Map of string

rls RowLevelSecurityV1[]

expectations ExpectationItemV1[]

Expectations to check after Load / Transform has succeeded

acl AccessControlEntryV1[]

Map of rolename -> List[Users].

comment string | boolean | number | integer | null

freshness object

2 nested properties

warn string | boolean | number | integer | null

error string | boolean | number | integer | null

attributes AttributeV1[]

Attributes

python string | boolean | number | integer | null

tags ConvertibleToString[]

Set of string to attach to the output table

writeStrategy object

8 nested properties

type WriteStrategyTypeBase | ConvertibleToString

Write strategy type including custom strategies. Allows predefined strategies or custom strategy names

types Record<string, ConvertibleToString>

Map of connection type to write strategy. Allows different strategies per target database

key ConvertibleToString[]

List of columns to use as key(s) for the target table. This is used to update existing records in the target table.

timestamp string | boolean | number | integer | null

queryFilter string | boolean | number | integer | null

on const: "TARGET" | const: "SOURCE_AND_TARGET"

startTs string | boolean | number | integer | null

endTs string | boolean | number | integer | null

schedule string | boolean | number | integer | null

dagRef string | boolean | number | integer | null

taskTimeoutMs integer

Number of milliseconds before a communication timeout.

parseSQL boolean

Should we parse this SQL make it update the table according to write strategy or just execute it ?

connectionRef string

Used when the default connection ref present in the application.sl.yml file is not the one to use to run the SQL request for this task.

syncStrategy const: "NONE" | const: "ADD" | const: "ALL"

Should this YAML table schema be synchronized with the source table ?

dataset_triggering_strategy string

Dataset triggering strategy to determine when this task should be executed based on dataset changes: & and | operators are allowed (dataset1 & dataset2) | dataset3

LockV1 object

path string | boolean | number | integer | null

timeout integer

reserved

pollTime integer

Default 5 seconds

refreshTime integer

Default 5 seconds

AuditV1 object

path string | boolean | number | integer | null

sink object

15 nested properties

connectionRef string | boolean | number | integer | null

clustering ConvertibleToString[]

FS or BQ: List of attributes to use for clustering

days number

BQ: Number of days before this table is set as expired and deleted. Never by default.

requirePartitionFilter boolean

BQ: Should be require a partition filter on every request ? No by default.

materializedView const: "TABLE" | const: "VIEW" | const: "MATERIALIZED_VIEW" | const: "HYBRID"

Table types supported by the Sink option

enableRefresh boolean

BQ: Enable automatic refresh of materialized view ? false by default.

refreshIntervalMs number

BQ: Refresh interval in milliseconds. Default to BigQuery default value

id string | boolean | number | integer | null

format string | boolean | number | integer | null

extension string | boolean | number | integer | null

sharding ConvertibleToString[]

columns to use for sharding. table will be named table_{sharding(0)}_{sharding(1)}

partition string[]

FS or BQ: List of partition attributes

coalesce boolean

When outputting files, should we coalesce it to a single file. Useful when CSV is the output format.

path string

Optional path attribute if you want to save the file outside of the default location (datasets folder)

options Record<string, string | boolean | number | integer | null>

Map of string

maxErrors string | boolean | number | integer | null

database string | boolean | number | integer | null

domain string | boolean | number | integer | null

domainExpectation string | boolean | number | integer | null

domainRejected string | boolean | number | integer | null

detailedLoadAudit boolean

Create individual entry for each ingested file instead of a global one. Default: false

active boolean

Enable or disable audit logging. Default is true

sql string | boolean | number | integer | null

DomainV1 object

A schema in JDBC database or a folder in HDFS or a dataset in BigQuery.

name string | boolean | number | integer | null

metadata object

20 nested properties

DSV by default. Supported file formats are :\n- DSV : Delimiter-separated values file. Delimiter value is specified in the "separator" field.\n- POSITION : FIXED format file where values are located at an exact position in each line.\n- JSON_FLAT : For optimisation purpose, we differentiate JSON with top level values from JSON\n with deep level fields. JSON_FLAT are JSON files with top level fields only.\n- JSON : Deep JSON file. Use only when your json documents contain sub-documents, otherwise prefer to\n use JSON_FLAT since it is much faster.\n- XML : XML files

encoding string | boolean | number | integer | null

multiline boolean

are json objects on a single line or multiple line ? Single by default. false means single. false also means faster

array boolean

Is the json stored as a single object array ? false by default. This means that by default we have on json document per line.

withHeader boolean

does the dataset has a header ? true by default

separator string | boolean | number | integer | null

quote string | boolean | number | integer | null

escape string | boolean | number | integer | null

sink object

15 nested properties

connectionRef string | boolean | number | integer | null

clustering ConvertibleToString[]

FS or BQ: List of attributes to use for clustering

days number

BQ: Number of days before this table is set as expired and deleted. Never by default.

requirePartitionFilter boolean

BQ: Should be require a partition filter on every request ? No by default.

materializedView const: "TABLE" | const: "VIEW" | const: "MATERIALIZED_VIEW" | const: "HYBRID"

Table types supported by the Sink option

enableRefresh boolean

BQ: Enable automatic refresh of materialized view ? false by default.

refreshIntervalMs number

BQ: Refresh interval in milliseconds. Default to BigQuery default value

id string | boolean | number | integer | null

format string | boolean | number | integer | null

extension string | boolean | number | integer | null

sharding ConvertibleToString[]

columns to use for sharding. table will be named table_{sharding(0)}_{sharding(1)}

partition string[]

FS or BQ: List of partition attributes

coalesce boolean

When outputting files, should we coalesce it to a single file. Useful when CSV is the output format.

path string

Optional path attribute if you want to save the file outside of the default location (datasets folder)

options Record<string, string | boolean | number | integer | null>

Map of string

directory string | boolean | number | integer | null

extensions ConvertibleToString[]

recognized filename extensions. json, csv, dsv, psv are recognized by default. Only files with these extensions will be moved to the stage folder.

ack string | boolean | number | integer | null

options Record<string, string | boolean | number | integer | null>

Map of string

loader string | boolean | number | integer | null

emptyIsNull boolean

Treat empty columns as null in DSV files. Default to false

dagRef string | boolean | number | integer | null

freshness object

2 nested properties

warn string | boolean | number | integer | null

error string | boolean | number | integer | null

nullValue string | boolean | number | integer | null

schedule string | boolean | number | integer | null

writeStrategy object

8 nested properties

type WriteStrategyTypeBase | ConvertibleToString

Write strategy type including custom strategies. Allows predefined strategies or custom strategy names

types Record<string, ConvertibleToString>

Map of connection type to write strategy. Allows different strategies per target database

key ConvertibleToString[]

List of columns to use as key(s) for the target table. This is used to update existing records in the target table.

timestamp string | boolean | number | integer | null

queryFilter string | boolean | number | integer | null

on const: "TARGET" | const: "SOURCE_AND_TARGET"

startTs string | boolean | number | integer | null

endTs string | boolean | number | integer | null

comment string | boolean | number | integer | null

tags ConvertibleToString[]

Set of string to attach to this domain

rename string | boolean | number | integer | null

database string | boolean | number | integer | null

AutoJobDescV1 object

name string | boolean | number | integer | null

tasks AutoTaskDescV1[]

comment string | boolean | number | integer | null

default object

27 nested properties

name string | boolean | number | integer | null

sql string | boolean | number | integer | null

streams ConvertibleToString[]

attach streams to task (Snowflake only)

primaryKey ConvertibleToString[]

List of columns that make up the primary key for the output table

database string | boolean | number | integer | null

domain string | boolean | number | integer | null

table string | boolean | number | integer | null

partition ConvertibleToString[]

List of columns used for partitioning the output.

presql ConvertibleToString[]

List of SQL requests to executed before the main SQL request is run

postsql ConvertibleToString[]

List of SQL requests to executed after the main SQL request is run

sink object

15 nested properties

connectionRef string | boolean | number | integer | null

clustering ConvertibleToString[]

FS or BQ: List of attributes to use for clustering

days number

BQ: Number of days before this table is set as expired and deleted. Never by default.

requirePartitionFilter boolean

BQ: Should be require a partition filter on every request ? No by default.

materializedView const: "TABLE" | const: "VIEW" | const: "MATERIALIZED_VIEW" | const: "HYBRID"

Table types supported by the Sink option

enableRefresh boolean

BQ: Enable automatic refresh of materialized view ? false by default.

refreshIntervalMs number

BQ: Refresh interval in milliseconds. Default to BigQuery default value

id string | boolean | number | integer | null

format string | boolean | number | integer | null

extension string | boolean | number | integer | null

sharding ConvertibleToString[]

columns to use for sharding. table will be named table_{sharding(0)}_{sharding(1)}

partition string[]

FS or BQ: List of partition attributes

coalesce boolean

When outputting files, should we coalesce it to a single file. Useful when CSV is the output format.

path string

Optional path attribute if you want to save the file outside of the default location (datasets folder)

options Record<string, string | boolean | number | integer | null>

Map of string

rls RowLevelSecurityV1[]

expectations ExpectationItemV1[]

Expectations to check after Load / Transform has succeeded

acl AccessControlEntryV1[]

Map of rolename -> List[Users].

comment string | boolean | number | integer | null

freshness object

2 nested properties

warn string | boolean | number | integer | null

error string | boolean | number | integer | null

attributes AttributeV1[]

Attributes

python string | boolean | number | integer | null

tags ConvertibleToString[]

Set of string to attach to the output table

writeStrategy object

8 nested properties

type WriteStrategyTypeBase | ConvertibleToString

Write strategy type including custom strategies. Allows predefined strategies or custom strategy names

types Record<string, ConvertibleToString>

Map of connection type to write strategy. Allows different strategies per target database

key ConvertibleToString[]

List of columns to use as key(s) for the target table. This is used to update existing records in the target table.

timestamp string | boolean | number | integer | null

queryFilter string | boolean | number | integer | null

on const: "TARGET" | const: "SOURCE_AND_TARGET"

startTs string | boolean | number | integer | null

endTs string | boolean | number | integer | null

schedule string | boolean | number | integer | null

dagRef string | boolean | number | integer | null

taskTimeoutMs integer

Number of milliseconds before a communication timeout.

parseSQL boolean

Should we parse this SQL make it update the table according to write strategy or just execute it ?

connectionRef string

Used when the default connection ref present in the application.sl.yml file is not the one to use to run the SQL request for this task.

syncStrategy const: "NONE" | const: "ADD" | const: "ALL"

Should this YAML table schema be synchronized with the source table ?

dataset_triggering_strategy string

Dataset triggering strategy to determine when this task should be executed based on dataset changes: & and | operators are allowed (dataset1 & dataset2) | dataset3

JDBCTableV1 object

name string | boolean | number | integer | null

sql string | boolean | number | integer | null

columns ConvertibleToString | object[]

List of columns to extract. All columns by default.

minItems=1

partitionColumn string | boolean | number | integer | null

numPartitions integer

Number of data partitions to create. Scope: Data extraction.

connectionOptions Record<string, string | boolean | number | integer | null>

Map of string

fetchSize integer

Number of rows to be fetched from the database when additional rows are needed. By default, most JDBC drivers use a fetch size of 10, so if you are reading 1000 objects, increasing the fetch size to 256 can significantly reduce the time required to fetch the query's results. The optimal fetch size is not always obvious. Scope: Data extraction.

fullExport boolean

If true, extract all data from the table. Scope: Data extraction.

filter string | boolean | number | integer | null

stringPartitionFunc string | boolean | number | integer | null

OutputV1 object

Output configuration for a domain

encoding string | boolean | number | integer | null

withHeader boolean

If true, writes the names of columns as the first line.

separator string | boolean | number | integer | null

quote string | boolean | number | integer | null

escape string | boolean | number | integer | null

nullValue string | boolean | number | integer | null

datePattern string | boolean | number | integer | null

timestampPattern string | boolean | number | integer | null

JDBCSchemaBase object

catalog string | boolean | number | integer | null

schema string | boolean | number | integer | null

tableRemarks string | boolean | number | integer | null

columnRemarks string | boolean | number | integer | null

tableTypes TableTypeV1[]

One or many of the predefined table types. Scope: Schema and Data extraction.

template string | boolean | number | integer | null

pattern string | boolean | number | integer | null

numericTrim const: "LEFT" | const: "RIGHT" | const: "BOTH" | const: "NONE"

How to trim the input string

partitionColumn string | boolean | number | integer | null

numPartitions integer

Number of data partitions to create. Scope: Data extraction.

connectionOptions Record<string, string | boolean | number | integer | null>

Map of string

fetchSize integer

Number of rows to be fetched from the database when additional rows are needed. By default, most JDBC drivers use a fetch size of 10, so if you are reading 1000 objects, increasing the fetch size to 256 can significantly reduce the time required to fetch the query's results. The optimal fetch size is not always obvious. Scope: Data extraction.

stringPartitionFunc string | boolean | number | integer | null

fullExport boolean

Define if we should fetch the entire table's or not. If not, maximum value of partitionColumn seen during last extraction is used in order to fetch incremental data. Scope: Data extraction.

sanitizeName boolean

Sanitize domain's name by keeping alpha numeric characters only. Scope: Schema and Data extraction.

DefaultJDBCSchemaV1 object

catalog string | boolean | number | integer | null

schema string | boolean | number | integer | null

tableRemarks string | boolean | number | integer | null

columnRemarks string | boolean | number | integer | null

tableTypes TableTypeV1[]

One or many of the predefined table types. Scope: Schema and Data extraction.

template string | boolean | number | integer | null

pattern string | boolean | number | integer | null

numericTrim const: "LEFT" | const: "RIGHT" | const: "BOTH" | const: "NONE"

How to trim the input string

partitionColumn string | boolean | number | integer | null

numPartitions integer

Number of data partitions to create. Scope: Data extraction.

connectionOptions Record<string, string | boolean | number | integer | null>

Map of string

fetchSize integer

Number of rows to be fetched from the database when additional rows are needed. By default, most JDBC drivers use a fetch size of 10, so if you are reading 1000 objects, increasing the fetch size to 256 can significantly reduce the time required to fetch the query's results. The optimal fetch size is not always obvious. Scope: Data extraction.

stringPartitionFunc string | boolean | number | integer | null

fullExport boolean

Define if we should fetch the entire table's or not. If not, maximum value of partitionColumn seen during last extraction is used in order to fetch incremental data. Scope: Data extraction.

sanitizeName boolean

Sanitize domain's name by keeping alpha numeric characters only. Scope: Schema and Data extraction.

JDBCSchemaV1 object

JDBCSchemasV1 object

OpenAPIObjectSchemasV1 object

include ConvertibleToString[]

List of regex used to include open api schemas (#/components/schemas). Defaults to ['.*']. 'Includes' is evaluated before 'excludes'

minItems=1

exclude ConvertibleToString[]

List of regex used to exclude open api schemas (#/components/schemas). Defaults to [].

OpenAPIRouteObjectExplosionV1 object

on string

Explode route's object to more object definition. Use object's path with route path as final name. Defaults to ALL

Any of: Keep properties of type object or array. const: "ALL", Keep properties of type object. Don't dive on array type. const: "OBJECT", Keep properties of type array. If encounters an object, dive deeper. const: "ARRAY"

exclude ConvertibleToString[]

filter out on field path. Each field is separated by _. Default to []

rename Record<string, string | boolean | number | integer | null>

Regex applied on object path. If matches, use the given name otherwise fallback to route_path + object path as final name

OpenAPIRoutesV1 object

paths ConvertibleToString[]

List of regex used to include open api path '.*'

minItems=1

as string | boolean | number | integer | null

operations const: "GET" | const: "POST"[]

List of operations to retrieve schema from. Defaults to ['GET']. Supported values are GET and POST.

minItems=1

exclude ConvertibleToString[]

List of regex used to excludes api path []

minItems=1

excludeFields ConvertibleToString[]

List of regex used to excludes fields. Fields and their subfields are separated by _.

minItems=1

explode object

3 nested properties

on string

Explode route's object to more object definition. Use object's path with route path as final name. Defaults to ALL

Any of: Keep properties of type object or array. const: "ALL", Keep properties of type object. Don't dive on array type. const: "OBJECT", Keep properties of type array. If encounters an object, dive deeper. const: "ARRAY"

exclude ConvertibleToString[]

filter out on field path. Each field is separated by _. Default to []

rename Record<string, string | boolean | number | integer | null>

Regex applied on object path. If matches, use the given name otherwise fallback to route_path + object path as final name

OpenAPIDomainV1 object

name string | boolean | number | integer | null required

basePath string | boolean | number | integer | null

schemas object

2 nested properties

include ConvertibleToString[]

List of regex used to include open api schemas (#/components/schemas). Defaults to ['.*']. 'Includes' is evaluated before 'excludes'

minItems=1

exclude ConvertibleToString[]

List of regex used to exclude open api schemas (#/components/schemas). Defaults to [].

routes OpenAPIRoutesV1[]

Describe what to fetch from data connection. Scope: Schema and Data extraction.

minItems=1

OpenAPIV1 object

basePath string | boolean | number | integer | null

formatTypeMapping Record<string, string | boolean | number | integer | null>

Map of string

domains OpenAPIDomainV1[]

Describe what to fetch from data connection. Scope: Schema and Data extraction.

minItems=1

OpenAPIsV1 object

ExtractV1Base object

sanitizeAttributeName string

Any of: const: "ON_EXTRACT" const: "ON_EXTRACT", attribute name is sanitized and stored as rename property when attribute's name differs from sanitized name const: "ON_LOAD"

connectionRef string | boolean | number | integer | null

InputRefV1 object

Input for ref object

table string | boolean | number | integer | null required

database string | boolean | number | integer | null

domain string | boolean | number | integer | null

OutputRefV1 object

Output for ref object

database string | boolean | number | integer | null required

domain string | boolean | number | integer | null required

table string | boolean | number | integer | null required

RefV1 object

Describe how to resolve a reference in a transform task

input object required

Input for ref object

3 nested properties

table string | boolean | number | integer | null required

database string | boolean | number | integer | null

domain string | boolean | number | integer | null

output object required

Output for ref object

3 nested properties

database string | boolean | number | integer | null required

domain string | boolean | number | integer | null required

table string | boolean | number | integer | null required

KafkaTopicConfigV1

topicName string | boolean | number | integer | null

maxRead integer

Maximum number of records to read from the topic in a single batch. Default is unlimited

fields ConvertibleToString[]

List of fields to extract from Kafka messages

partitions integer

Number of partitions for the Kafka topic when creating it

replicationFactor integer

Replication factor for the Kafka topic when creating it

createOptions Record<string, string | boolean | number | integer | null>

Map of string

accessOptions Record<string, string | boolean | number | integer | null>

Map of string

headers Record<string, object>

HTTP headers to include when accessing Kafka via HTTP proxy

KafkaConfigV1 object

serverOptions Record<string, string | boolean | number | integer | null>

Map of string

topics Record<string, KafkaTopicConfigV1>

Map of topic name to topic configuration

cometOffsetsMode string | boolean | number | integer | null

customDeserializers Record<string, string | boolean | number | integer | null>

Map of string

DagRefV1 object

load string | boolean | number | integer | null

transform string | boolean | number | integer | null

GizmoV1 object

url string

Gizmo server URL. Default is 'http://localhost:10900'

apiKey string

API key for authenticating with the Gizmo server

HttpV1 object

interface string | boolean | number | integer | null

port integer

Port number for the HTTP server. Default is 8080

AppConfigV1 object

env string | boolean | number | integer | null

datasets string | boolean | number | integer | null

incoming string | boolean | number | integer | null

dags string | boolean | number | integer | null

types string | boolean | number | integer | null

macros string | boolean | number | integer | null

tests string | boolean | number | integer | null

prunePartitionOnMerge boolean

Pre-compute incoming partitions to prune partitions on merge statement

writeStrategies string | boolean | number | integer | null

loadStrategies string | boolean | number | integer | null

metadata string | boolean | number | integer | null

metrics object

3 nested properties

path string | boolean | number | integer | null

discreteMaxCardinality integer

Max number of unique values accepted for a discrete column. Default is 10

active boolean

Should metrics be computed ?

validateOnLoad boolean

Validate the YAML file when loading it. If set to true fails on any error

rejectWithValue boolean

Add value along with the rejection error. Not enabled by default for security reason. Default: false

audit object

10 nested properties

path string | boolean | number | integer | null

sink object

15 nested properties

connectionRef string | boolean | number | integer | null

clustering ConvertibleToString[]

FS or BQ: List of attributes to use for clustering

days number

BQ: Number of days before this table is set as expired and deleted. Never by default.

requirePartitionFilter boolean

BQ: Should be require a partition filter on every request ? No by default.

materializedView const: "TABLE" | const: "VIEW" | const: "MATERIALIZED_VIEW" | const: "HYBRID"

Table types supported by the Sink option

enableRefresh boolean

BQ: Enable automatic refresh of materialized view ? false by default.

refreshIntervalMs number

BQ: Refresh interval in milliseconds. Default to BigQuery default value

id string | boolean | number | integer | null

format string | boolean | number | integer | null

extension string | boolean | number | integer | null

sharding ConvertibleToString[]

columns to use for sharding. table will be named table_{sharding(0)}_{sharding(1)}

partition string[]

FS or BQ: List of partition attributes

coalesce boolean

When outputting files, should we coalesce it to a single file. Useful when CSV is the output format.

path string

Optional path attribute if you want to save the file outside of the default location (datasets folder)

options Record<string, string | boolean | number | integer | null>

Map of string

maxErrors string | boolean | number | integer | null

database string | boolean | number | integer | null

domain string | boolean | number | integer | null

domainExpectation string | boolean | number | integer | null

domainRejected string | boolean | number | integer | null

detailedLoadAudit boolean

Create individual entry for each ingested file instead of a global one. Default: false

active boolean

Enable or disable audit logging. Default is true

sql string | boolean | number | integer | null

archive boolean

Should ingested files be archived after ingestion ?

sinkReplayToFile boolean

Should invalid records be stored in a replay file ?

lock object

4 nested properties

path string | boolean | number | integer | null

timeout integer

reserved

pollTime integer

Default 5 seconds

refreshTime integer

Default 5 seconds

defaultWriteFormat string | boolean | number | integer | null

defaultRejectedWriteFormat string | boolean | number | integer | null

defaultAuditWriteFormat string | boolean | number | integer | null

csvOutput boolean

output files in CSV format ? Default is false

csvOutputExt string | boolean | number | integer | null

privacyOnly boolean

Only generate privacy tasks. Reserved for internal use

emptyIsNull boolean

Should empty strings be considered as null values ?

loader string | boolean | number | integer | null

rowValidatorClass string | boolean | number | integer | null

loadStrategyClass string | boolean | number | integer | null

grouped boolean

Should we load of the files to be stored in the same table in a single task or one by one ?

groupedMax integer

Maximum number of files to be stored in the same table in a single task

scd2StartTimestamp string | boolean | number | integer | null

scd2EndTimestamp string | boolean | number | integer | null

area object

7 nested properties

incoming string | boolean | number | integer | null

stage string | boolean | number | integer | null

unresolved string | boolean | number | integer | null

archive string | boolean | number | integer | null

ingesting string | boolean | number | integer | null

replay string | boolean | number | integer | null

hiveDatabase string | boolean | number | integer | null

hadoop Record<string, string | boolean | number | integer | null>

Map of string

connections Record<string, object>

Map of jdbc engines

jdbcEngines Record<string, object>

Map of jdbc engines

privacy object

1 nested properties

options Record<string, string | boolean | number | integer | null>

Map of string

root string | boolean | number | integer | null

internal object

configure Spark internal options

5 nested properties

cacheStorageLevel string | boolean | number | integer | null

intermediateBigqueryFormat string | boolean | number | integer | null

temporaryGcsBucket string | boolean | number | integer | null

substituteVars boolean

Internal use. Do not modify.

bqAuditSaveInBatchMode boolean

Should audit logs when using BigQuery be saved in batch or interactive mode ? Interactive by default (false)

accessPolicies object

4 nested properties

apply boolean

Should access policies be enforced ?

location string | boolean | number | integer | null

database string | boolean | number | integer | null

taxonomy string | boolean | number | integer | null

sparkScheduling object

4 nested properties

maxJobs integer

Max number of Spark jobs to run in parallel, default is 1

poolName string | boolean | number | integer | null

mode string | boolean | number | integer | null

file string | boolean | number | integer | null

udfs string | boolean | number | integer | null

expectations object

3 nested properties

path string | boolean | number | integer | null

active boolean

should expectations be executed ?

failOnError boolean

should load / transform fail on expectation error ?

sqlParameterPattern string | boolean | number | integer | null

rejectAllOnError string | boolean | number | integer | null

rejectMaxRecords integer

Maximum number of records to reject when an error occurs. Default is 100

maxParCopy integer

Maximum number of parallel file copy operations during import. Default is 1

kafka object

4 nested properties

serverOptions Record<string, string | boolean | number | integer | null>

Map of string

topics Record<string, KafkaTopicConfigV1>

Map of topic name to topic configuration

cometOffsetsMode string | boolean | number | integer | null

customDeserializers Record<string, string | boolean | number | integer | null>

Map of string

dsvOptions Record<string, string | boolean | number | integer | null>

Map of string

forceViewPattern string | boolean | number | integer | null

forceDomainPattern string | boolean | number | integer | null

forceTablePattern string | boolean | number | integer | null

forceJobPattern string | boolean | number | integer | null

forceTaskPattern string | boolean | number | integer | null

useLocalFileSystem string | boolean | number | integer | null

sessionDurationServe string | boolean | number | integer | null

database string | boolean | number | integer | null

tenant string | boolean | number | integer | null

connectionRef string | boolean | number | integer | null

loadConnectionRef string | boolean | number | integer | null

transformConnectionRef string | boolean | number | integer | null

schedulePresets Record<string, string | boolean | number | integer | null>

Map of string

maxParTask integer

How many job to run simultaneously in dev mode (experimental)

refs RefV1[]

Reference mappings for resolving table references in SQL queries across different environments

dagRef object

2 nested properties

load string | boolean | number | integer | null

transform string | boolean | number | integer | null

forceHalt boolean

Force application to stop even when there is some pending thread.

jobIdEnvName string | boolean | number | integer | null

archiveTablePattern string | boolean | number | integer | null

archiveTable boolean

Enable table archiving before overwrite operations. Default is false

version string | boolean | number | integer | null

autoExportSchema boolean

Automatically export table schemas after load/transform operations. Default is false

longJobTimeoutMs integer

Timeout in milliseconds for long-running jobs. Default is 3600000 (1 hour)

shortJobTimeoutMs integer

Timeout in milliseconds for short-running jobs. Default is 300000 (5 minutes)

createSchemaIfNotExists boolean

Automatically create database schema/dataset if it does not exist. Default is true

http object

2 nested properties

interface string | boolean | number | integer | null

port integer

Port number for the HTTP server. Default is 8080

timezone string | boolean | number | integer | null

maxInteractiveRecords integer

Maximum number of records to return in interactive query mode. Default is 1000

duckdbMode boolean

is duckdb mode active

duckdbExtensions string

Comma separated list of duckdb extensions to load. Default is spatial, json, httpfs

duckdbPath string | boolean | number | integer | null

testCsvNullString string | boolean | number | integer | null

hiveInTest string | boolean | number | integer | null

spark object

Map of string

extra object

Map of string

duckDbEnableExternalAccess boolean

Allow DuckDB to load / Save data from / to external sources. Default to true

syncSqlWithYaml boolean

Update attributes in YAMl file when SQL is updated. Default to true

syncYamlWithDb boolean

Update database with YAML transform is run. Default to true

onExceptionRetries integer

Number of retries on transient exceptions

pythonLibsDir string

Directory containing python libraries to use instead of pip install

gizmosql object

2 nested properties

url string

Gizmo server URL. Default is 'http://localhost:10900'

apiKey string

API key for authenticating with the Gizmo server

StarlakeV1Base object

types TypeV1[]

dag object

Dag configuration.

4 nested properties

template string | boolean | number | integer | null required

filename string | boolean | number | integer | null required

comment string | boolean | number | integer | null

options Record<string, string | boolean | number | integer | null>

Map of string

extract JDBCSchemasV1 | OpenAPIsV1

load object

A schema in JDBC database or a folder in HDFS or a dataset in BigQuery.

6 nested properties

name string | boolean | number | integer | null

metadata object

20 nested properties

DSV by default. Supported file formats are :\n- DSV : Delimiter-separated values file. Delimiter value is specified in the "separator" field.\n- POSITION : FIXED format file where values are located at an exact position in each line.\n- JSON_FLAT : For optimisation purpose, we differentiate JSON with top level values from JSON\n with deep level fields. JSON_FLAT are JSON files with top level fields only.\n- JSON : Deep JSON file. Use only when your json documents contain sub-documents, otherwise prefer to\n use JSON_FLAT since it is much faster.\n- XML : XML files

encoding string | boolean | number | integer | null

multiline boolean

are json objects on a single line or multiple line ? Single by default. false means single. false also means faster

array boolean

Is the json stored as a single object array ? false by default. This means that by default we have on json document per line.

withHeader boolean

does the dataset has a header ? true by default

separator string | boolean | number | integer | null

quote string | boolean | number | integer | null

escape string | boolean | number | integer | null

sink object

directory string | boolean | number | integer | null

extensions ConvertibleToString[]

recognized filename extensions. json, csv, dsv, psv are recognized by default. Only files with these extensions will be moved to the stage folder.

ack string | boolean | number | integer | null

options Record<string, string | boolean | number | integer | null>

Map of string

loader string | boolean | number | integer | null

emptyIsNull boolean

Treat empty columns as null in DSV files. Default to false

dagRef string | boolean | number | integer | null

freshness object

nullValue string | boolean | number | integer | null

schedule string | boolean | number | integer | null

writeStrategy object

comment string | boolean | number | integer | null

tags ConvertibleToString[]

Set of string to attach to this domain

rename string | boolean | number | integer | null

database string | boolean | number | integer | null

transform object

4 nested properties

name string | boolean | number | integer | null

tasks AutoTaskDescV1[]

comment string | boolean | number | integer | null

default object

27 nested properties

name string | boolean | number | integer | null

sql string | boolean | number | integer | null

streams ConvertibleToString[]

attach streams to task (Snowflake only)

primaryKey ConvertibleToString[]

List of columns that make up the primary key for the output table

database string | boolean | number | integer | null

domain string | boolean | number | integer | null

table string | boolean | number | integer | null

partition ConvertibleToString[]

List of columns used for partitioning the output.

presql ConvertibleToString[]

List of SQL requests to executed before the main SQL request is run

postsql ConvertibleToString[]

List of SQL requests to executed after the main SQL request is run

sink object

rls RowLevelSecurityV1[]

expectations ExpectationItemV1[]

Expectations to check after Load / Transform has succeeded

acl AccessControlEntryV1[]

Map of rolename -> List[Users].

comment string | boolean | number | integer | null

freshness object

attributes AttributeV1[]

Attributes

python string | boolean | number | integer | null

tags ConvertibleToString[]

Set of string to attach to the output table

writeStrategy object

schedule string | boolean | number | integer | null

dagRef string | boolean | number | integer | null

taskTimeoutMs integer

Number of milliseconds before a communication timeout.

parseSQL boolean

Should we parse this SQL make it update the table according to write strategy or just execute it ?

connectionRef string

Used when the default connection ref present in the application.sl.yml file is not the one to use to run the SQL request for this task.

syncStrategy const: "NONE" | const: "ADD" | const: "ALL"

Should this YAML table schema be synchronized with the source table ?

dataset_triggering_strategy string

Dataset triggering strategy to determine when this task should be executed based on dataset changes: & and | operators are allowed (dataset1 & dataset2) | dataset3

task object

27 nested properties

name string | boolean | number | integer | null

sql string | boolean | number | integer | null

streams ConvertibleToString[]

attach streams to task (Snowflake only)

primaryKey ConvertibleToString[]

List of columns that make up the primary key for the output table

database string | boolean | number | integer | null

domain string | boolean | number | integer | null

table string | boolean | number | integer | null

partition ConvertibleToString[]

List of columns used for partitioning the output.

presql ConvertibleToString[]

List of SQL requests to executed before the main SQL request is run

postsql ConvertibleToString[]

List of SQL requests to executed after the main SQL request is run

sink object

15 nested properties

connectionRef string | boolean | number | integer | null

clustering ConvertibleToString[]

FS or BQ: List of attributes to use for clustering

days number

BQ: Number of days before this table is set as expired and deleted. Never by default.

requirePartitionFilter boolean

BQ: Should be require a partition filter on every request ? No by default.

materializedView const: "TABLE" | const: "VIEW" | const: "MATERIALIZED_VIEW" | const: "HYBRID"

Table types supported by the Sink option

enableRefresh boolean

BQ: Enable automatic refresh of materialized view ? false by default.

refreshIntervalMs number

BQ: Refresh interval in milliseconds. Default to BigQuery default value

id string | boolean | number | integer | null

format string | boolean | number | integer | null

extension string | boolean | number | integer | null

sharding ConvertibleToString[]

columns to use for sharding. table will be named table_{sharding(0)}_{sharding(1)}

partition string[]

FS or BQ: List of partition attributes

coalesce boolean

When outputting files, should we coalesce it to a single file. Useful when CSV is the output format.

path string

Optional path attribute if you want to save the file outside of the default location (datasets folder)

options Record<string, string | boolean | number | integer | null>

Map of string

rls RowLevelSecurityV1[]

expectations ExpectationItemV1[]

Expectations to check after Load / Transform has succeeded

acl AccessControlEntryV1[]

Map of rolename -> List[Users].

comment string | boolean | number | integer | null

freshness object

2 nested properties

warn string | boolean | number | integer | null

error string | boolean | number | integer | null

attributes AttributeV1[]

Attributes

python string | boolean | number | integer | null

tags ConvertibleToString[]

Set of string to attach to the output table

writeStrategy object

8 nested properties

type WriteStrategyTypeBase | ConvertibleToString

Write strategy type including custom strategies. Allows predefined strategies or custom strategy names

types Record<string, ConvertibleToString>

Map of connection type to write strategy. Allows different strategies per target database

key ConvertibleToString[]

List of columns to use as key(s) for the target table. This is used to update existing records in the target table.

timestamp string | boolean | number | integer | null

queryFilter string | boolean | number | integer | null

on const: "TARGET" | const: "SOURCE_AND_TARGET"

startTs string | boolean | number | integer | null

endTs string | boolean | number | integer | null

schedule string | boolean | number | integer | null

dagRef string | boolean | number | integer | null

taskTimeoutMs integer

Number of milliseconds before a communication timeout.

parseSQL boolean

Should we parse this SQL make it update the table according to write strategy or just execute it ?

connectionRef string

Used when the default connection ref present in the application.sl.yml file is not the one to use to run the SQL request for this task.

syncStrategy const: "NONE" | const: "ADD" | const: "ALL"

Should this YAML table schema be synchronized with the source table ?

dataset_triggering_strategy string

Dataset triggering strategy to determine when this task should be executed based on dataset changes: & and | operators are allowed (dataset1 & dataset2) | dataset3

env Record<string, string | boolean | number | integer | null>

Map of string

table object

Table Schema definition.

17 nested properties

name string | boolean | number | integer | null required

pattern string | boolean | number | integer | null required

attributes AttributeV1[] required

Attributes parsing rules.

metadata object

20 nested properties

DSV by default. Supported file formats are :\n- DSV : Delimiter-separated values file. Delimiter value is specified in the "separator" field.\n- POSITION : FIXED format file where values are located at an exact position in each line.\n- JSON_FLAT : For optimisation purpose, we differentiate JSON with top level values from JSON\n with deep level fields. JSON_FLAT are JSON files with top level fields only.\n- JSON : Deep JSON file. Use only when your json documents contain sub-documents, otherwise prefer to\n use JSON_FLAT since it is much faster.\n- XML : XML files

encoding string | boolean | number | integer | null

multiline boolean

are json objects on a single line or multiple line ? Single by default. false means single. false also means faster

array boolean

Is the json stored as a single object array ? false by default. This means that by default we have on json document per line.

withHeader boolean

does the dataset has a header ? true by default

separator string | boolean | number | integer | null

quote string | boolean | number | integer | null

escape string | boolean | number | integer | null

sink object

directory string | boolean | number | integer | null

extensions ConvertibleToString[]

recognized filename extensions. json, csv, dsv, psv are recognized by default. Only files with these extensions will be moved to the stage folder.

ack string | boolean | number | integer | null

options Record<string, string | boolean | number | integer | null>

Map of string

loader string | boolean | number | integer | null

emptyIsNull boolean

Treat empty columns as null in DSV files. Default to false

dagRef string | boolean | number | integer | null

freshness object

nullValue string | boolean | number | integer | null

schedule string | boolean | number | integer | null

writeStrategy object

comment string | boolean | number | integer | null

streams ConvertibleToString[]

attach streams to table (Snowflake only)

presql ConvertibleToString[]

Reserved for future use.

postsql ConvertibleToString[]

List of SQL requests to executed after the table has been loaded.

tags ConvertibleToString[]

Set of string to attach to this Schema

rls RowLevelSecurityV1[]

Row level security on this schema.

expectations ExpectationItemV1[]

Expectations to check after Load / Transform has succeeded

primaryKey ConvertibleToString[]

List of columns that make up the primary key

acl AccessControlEntryV1[]

Map of rolename -> List[Users].

rename string | boolean | number | integer | null

sample string | boolean | number | integer | null

filter string | boolean | number | integer | null

patternSample string | boolean | number | integer | null

refs RefV1[]

application object

90 nested properties

env string | boolean | number | integer | null

datasets string | boolean | number | integer | null

incoming string | boolean | number | integer | null

dags string | boolean | number | integer | null

types string | boolean | number | integer | null

macros string | boolean | number | integer | null

tests string | boolean | number | integer | null

prunePartitionOnMerge boolean

Pre-compute incoming partitions to prune partitions on merge statement

writeStrategies string | boolean | number | integer | null

loadStrategies string | boolean | number | integer | null

metadata string | boolean | number | integer | null

metrics object

3 nested properties

path string | boolean | number | integer | null

discreteMaxCardinality integer

Max number of unique values accepted for a discrete column. Default is 10

active boolean

Should metrics be computed ?

validateOnLoad boolean

Validate the YAML file when loading it. If set to true fails on any error

rejectWithValue boolean

Add value along with the rejection error. Not enabled by default for security reason. Default: false

audit object

10 nested properties

path string | boolean | number | integer | null

sink object

maxErrors string | boolean | number | integer | null

database string | boolean | number | integer | null

domain string | boolean | number | integer | null

domainExpectation string | boolean | number | integer | null

domainRejected string | boolean | number | integer | null

detailedLoadAudit boolean

Create individual entry for each ingested file instead of a global one. Default: false

active boolean

Enable or disable audit logging. Default is true

sql string | boolean | number | integer | null

archive boolean

Should ingested files be archived after ingestion ?

sinkReplayToFile boolean

Should invalid records be stored in a replay file ?

lock object

4 nested properties

path string | boolean | number | integer | null

timeout integer

reserved

pollTime integer

Default 5 seconds

refreshTime integer

Default 5 seconds

defaultWriteFormat string | boolean | number | integer | null

defaultRejectedWriteFormat string | boolean | number | integer | null

defaultAuditWriteFormat string | boolean | number | integer | null

csvOutput boolean

output files in CSV format ? Default is false

csvOutputExt string | boolean | number | integer | null

privacyOnly boolean

Only generate privacy tasks. Reserved for internal use

emptyIsNull boolean

Should empty strings be considered as null values ?

loader string | boolean | number | integer | null

rowValidatorClass string | boolean | number | integer | null

loadStrategyClass string | boolean | number | integer | null

grouped boolean

Should we load of the files to be stored in the same table in a single task or one by one ?

groupedMax integer

Maximum number of files to be stored in the same table in a single task

scd2StartTimestamp string | boolean | number | integer | null

scd2EndTimestamp string | boolean | number | integer | null

area object

7 nested properties

incoming string | boolean | number | integer | null

stage string | boolean | number | integer | null

unresolved string | boolean | number | integer | null

archive string | boolean | number | integer | null

ingesting string | boolean | number | integer | null

replay string | boolean | number | integer | null

hiveDatabase string | boolean | number | integer | null

hadoop Record<string, string | boolean | number | integer | null>

Map of string

connections Record<string, object>

Map of jdbc engines

jdbcEngines Record<string, object>

Map of jdbc engines

privacy object

1 nested properties

options Record<string, string | boolean | number | integer | null>

Map of string

root string | boolean | number | integer | null

internal object

configure Spark internal options

5 nested properties

cacheStorageLevel string | boolean | number | integer | null

intermediateBigqueryFormat string | boolean | number | integer | null

temporaryGcsBucket string | boolean | number | integer | null

substituteVars boolean

Internal use. Do not modify.

bqAuditSaveInBatchMode boolean

Should audit logs when using BigQuery be saved in batch or interactive mode ? Interactive by default (false)

accessPolicies object

4 nested properties

apply boolean

Should access policies be enforced ?

location string | boolean | number | integer | null

database string | boolean | number | integer | null

taxonomy string | boolean | number | integer | null

sparkScheduling object

4 nested properties

maxJobs integer

Max number of Spark jobs to run in parallel, default is 1

poolName string | boolean | number | integer | null

mode string | boolean | number | integer | null

file string | boolean | number | integer | null

udfs string | boolean | number | integer | null

expectations object

3 nested properties

path string | boolean | number | integer | null

active boolean

should expectations be executed ?

failOnError boolean

should load / transform fail on expectation error ?

sqlParameterPattern string | boolean | number | integer | null

rejectAllOnError string | boolean | number | integer | null

rejectMaxRecords integer

Maximum number of records to reject when an error occurs. Default is 100

maxParCopy integer

Maximum number of parallel file copy operations during import. Default is 1

kafka object

4 nested properties

serverOptions Record<string, string | boolean | number | integer | null>

Map of string

topics Record<string, KafkaTopicConfigV1>

Map of topic name to topic configuration

cometOffsetsMode string | boolean | number | integer | null

customDeserializers Record<string, string | boolean | number | integer | null>

Map of string

dsvOptions Record<string, string | boolean | number | integer | null>

Map of string

forceViewPattern string | boolean | number | integer | null

forceDomainPattern string | boolean | number | integer | null

forceTablePattern string | boolean | number | integer | null

forceJobPattern string | boolean | number | integer | null

forceTaskPattern string | boolean | number | integer | null

useLocalFileSystem string | boolean | number | integer | null

sessionDurationServe string | boolean | number | integer | null

database string | boolean | number | integer | null

tenant string | boolean | number | integer | null

connectionRef string | boolean | number | integer | null

loadConnectionRef string | boolean | number | integer | null

transformConnectionRef string | boolean | number | integer | null

schedulePresets Record<string, string | boolean | number | integer | null>

Map of string

maxParTask integer

How many job to run simultaneously in dev mode (experimental)

refs RefV1[]

Reference mappings for resolving table references in SQL queries across different environments

dagRef object

2 nested properties

load string | boolean | number | integer | null

transform string | boolean | number | integer | null

forceHalt boolean

Force application to stop even when there is some pending thread.

jobIdEnvName string | boolean | number | integer | null

archiveTablePattern string | boolean | number | integer | null

archiveTable boolean

Enable table archiving before overwrite operations. Default is false

version string | boolean | number | integer | null

autoExportSchema boolean

Automatically export table schemas after load/transform operations. Default is false

longJobTimeoutMs integer

Timeout in milliseconds for long-running jobs. Default is 3600000 (1 hour)

shortJobTimeoutMs integer

Timeout in milliseconds for short-running jobs. Default is 300000 (5 minutes)

createSchemaIfNotExists boolean

Automatically create database schema/dataset if it does not exist. Default is true

http object

2 nested properties

interface string | boolean | number | integer | null

port integer

Port number for the HTTP server. Default is 8080

timezone string | boolean | number | integer | null

maxInteractiveRecords integer

Maximum number of records to return in interactive query mode. Default is 1000

duckdbMode boolean

is duckdb mode active

duckdbExtensions string

Comma separated list of duckdb extensions to load. Default is spatial, json, httpfs

duckdbPath string | boolean | number | integer | null

testCsvNullString string | boolean | number | integer | null

hiveInTest string | boolean | number | integer | null

spark object

Map of string

extra object

Map of string

duckDbEnableExternalAccess boolean

Allow DuckDB to load / Save data from / to external sources. Default to true

syncSqlWithYaml boolean

Update attributes in YAMl file when SQL is updated. Default to true

syncYamlWithDb boolean

Update database with YAML transform is run. Default to true

onExceptionRetries integer

Number of retries on transient exceptions

pythonLibsDir string

Directory containing python libraries to use instead of pip install

gizmosql object

2 nested properties

url string

Gizmo server URL. Default is 'http://localhost:10900'

apiKey string

API key for authenticating with the Gizmo server

Starlake Data Pipeline

Validate with Lintel

Properties

All of

Definitions