Starlake Data Pipeline
Starlake Data Pipeline
| Type | object |
|---|---|
| File match |
*.sl.yml
|
| Schema URL | https://catalog.lintel.tools/schemas/schemastore/starlake-data-pipeline/latest.json |
| Source | https://www.schemastore.org/starlake.json |
Validate with Lintel
npx @lintel/lintel check
JSON Schema for Starlake Data Pipeline
Properties
All of
Definitions
DDL used to create a table
Custom type definition. Custom types are defined in the types/types.sl.yml file
Define the value type
Map of string
First and last char positions of an attribute in a fixed length record
Zero based position of the first character for this attribute
Zero based position of the last character to include in this attribute
Connection properties to a datawarehouse.
Map of string
Dag configuration.
Map of string
Row level security policy to apply to the output data.
user / groups / service accounts to which this security level is applied. ex : user:[email protected],group:[email protected],serviceAccount:[email protected]
Column level security policy to apply to the attribute.
user / groups / service accounts to which this security level is applied. ex : user:[email protected],group:[email protected],serviceAccount:[email protected]
Map of string
Map of jdbc engines
Map of jdbc engines
Map of table ddl
Jdbc engine
Map of table ddl
How to quote identifiers
Override the default strategy builder used to write data. A strategy is a folder located under metadata/templates/write-strategies/[strategyBuilder]
When creating views, how they should be prefixed. Some databases like redshift require view name to be prefixed by the character '#'. This is not required for other databases like snowflake or bigquery. Default is empty string
SQL statements to execute immediately after the database connection is opened (e.g., SET commands)
keyword used to partition the table. Default is PARTITION BY
keyword used to cluster the table. Default is CLUSTER BY
How to get column remarks
How to get table remarks
Map of string
configure Spark internal options
Internal use. Do not modify.
Should audit logs when using BigQuery be saved in batch or interactive mode ? Interactive by default (false)
Should access policies be enforced ?
Max number of Spark jobs to run in parallel, default is 1
should expectations be executed ?
should load / transform fail on expectation error ?
should load / transform fail on expectation error ?
Max number of unique values accepted for a discrete column. Default is 10
Should metrics be computed ?
FS or BQ: List of attributes to use for clustering
BQ: Number of days before this table is set as expired and deleted. Never by default.
BQ: Should be require a partition filter on every request ? No by default.
Table types supported by the Sink option
BQ: Enable automatic refresh of materialized view ? false by default.
BQ: Refresh interval in milliseconds. Default to BigQuery default value
columns to use for sharding. table will be named table_{sharding(0)}_{sharding(1)}
FS or BQ: List of partition attributes
When outputting files, should we coalesce it to a single file. Useful when CSV is the output format.
Optional path attribute if you want to save the file outside of the default location (datasets folder)
Map of string
Write strategy type including custom strategies. Allows predefined strategies or custom strategy names
Map of connection type to write strategy. Allows different strategies per target database
List of columns to use as key(s) for the target table. This is used to update existing records in the target table.
DSV by default. Supported file formats are :\n- DSV : Delimiter-separated values file. Delimiter value is specified in the "separator" field.\n- POSITION : FIXED format file where values are located at an exact position in each line.\n- JSON_FLAT : For optimisation purpose, we differentiate JSON with top level values from JSON\n with deep level fields. JSON_FLAT are JSON files with top level fields only.\n- JSON : Deep JSON file. Use only when your json documents contain sub-documents, otherwise prefer to\n use JSON_FLAT since it is much faster.\n- XML : XML files
are json objects on a single line or multiple line ? Single by default. false means single. false also means faster
Is the json stored as a single object array ? false by default. This means that by default we have on json document per line.
does the dataset has a header ? true by default
15 nested properties
FS or BQ: List of attributes to use for clustering
BQ: Number of days before this table is set as expired and deleted. Never by default.
BQ: Should be require a partition filter on every request ? No by default.
Table types supported by the Sink option
BQ: Enable automatic refresh of materialized view ? false by default.
BQ: Refresh interval in milliseconds. Default to BigQuery default value
columns to use for sharding. table will be named table_{sharding(0)}_{sharding(1)}
FS or BQ: List of partition attributes
When outputting files, should we coalesce it to a single file. Useful when CSV is the output format.
Optional path attribute if you want to save the file outside of the default location (datasets folder)
Map of string
recognized filename extensions. json, csv, dsv, psv are recognized by default. Only files with these extensions will be moved to the stage folder.
Map of string
Treat empty columns as null in DSV files. Default to false
2 nested properties
8 nested properties
Write strategy type including custom strategies. Allows predefined strategies or custom strategy names
Map of connection type to write strategy. Allows different strategies per target database
List of columns to use as key(s) for the target table. This is used to update existing records in the target table.
Table Schema definition.
Attributes parsing rules.
20 nested properties
DSV by default. Supported file formats are :\n- DSV : Delimiter-separated values file. Delimiter value is specified in the "separator" field.\n- POSITION : FIXED format file where values are located at an exact position in each line.\n- JSON_FLAT : For optimisation purpose, we differentiate JSON with top level values from JSON\n with deep level fields. JSON_FLAT are JSON files with top level fields only.\n- JSON : Deep JSON file. Use only when your json documents contain sub-documents, otherwise prefer to\n use JSON_FLAT since it is much faster.\n- XML : XML files
are json objects on a single line or multiple line ? Single by default. false means single. false also means faster
Is the json stored as a single object array ? false by default. This means that by default we have on json document per line.
does the dataset has a header ? true by default
15 nested properties
FS or BQ: List of attributes to use for clustering
BQ: Number of days before this table is set as expired and deleted. Never by default.
BQ: Should be require a partition filter on every request ? No by default.
Table types supported by the Sink option
BQ: Enable automatic refresh of materialized view ? false by default.
BQ: Refresh interval in milliseconds. Default to BigQuery default value
columns to use for sharding. table will be named table_{sharding(0)}_{sharding(1)}
FS or BQ: List of partition attributes
When outputting files, should we coalesce it to a single file. Useful when CSV is the output format.
Optional path attribute if you want to save the file outside of the default location (datasets folder)
Map of string
recognized filename extensions. json, csv, dsv, psv are recognized by default. Only files with these extensions will be moved to the stage folder.
Map of string
Treat empty columns as null in DSV files. Default to false
2 nested properties
8 nested properties
Write strategy type including custom strategies. Allows predefined strategies or custom strategy names
Map of connection type to write strategy. Allows different strategies per target database
List of columns to use as key(s) for the target table. This is used to update existing records in the target table.
attach streams to table (Snowflake only)
Reserved for future use.
List of SQL requests to executed after the table has been loaded.
Set of string to attach to this Schema
Row level security on this schema.
Expectations to check after Load / Transform has succeeded
List of columns that make up the primary key
Map of rolename -> List[Users].
Is this attribute an array/list of values? Default is false
Should this attribute always be present in the source. Default to true.
Used to compute metrics on column values.
List of sub-attributes (valid for JSON and XML files only)
First and last char positions of an attribute in a fixed length record
2 nested properties
Zero based position of the first character for this attribute
Zero based position of the last character to include in this attribute
Tags associated with this attribute
How to trim the input string
Should this attribute be ignored on ingestion. Default to false
attach streams to task (Snowflake only)
List of columns that make up the primary key for the output table
List of columns used for partitioning the output.
List of SQL requests to executed before the main SQL request is run
List of SQL requests to executed after the main SQL request is run
15 nested properties
FS or BQ: List of attributes to use for clustering
BQ: Number of days before this table is set as expired and deleted. Never by default.
BQ: Should be require a partition filter on every request ? No by default.
Table types supported by the Sink option
BQ: Enable automatic refresh of materialized view ? false by default.
BQ: Refresh interval in milliseconds. Default to BigQuery default value
columns to use for sharding. table will be named table_{sharding(0)}_{sharding(1)}
FS or BQ: List of partition attributes
When outputting files, should we coalesce it to a single file. Useful when CSV is the output format.
Optional path attribute if you want to save the file outside of the default location (datasets folder)
Map of string
Expectations to check after Load / Transform has succeeded
Map of rolename -> List[Users].
2 nested properties
Attributes
Set of string to attach to the output table
8 nested properties
Write strategy type including custom strategies. Allows predefined strategies or custom strategy names
Map of connection type to write strategy. Allows different strategies per target database
List of columns to use as key(s) for the target table. This is used to update existing records in the target table.
Number of milliseconds before a communication timeout.
Should we parse this SQL make it update the table according to write strategy or just execute it ?
Used when the default connection ref present in the application.sl.yml file is not the one to use to run the SQL request for this task.
Should this YAML table schema be synchronized with the source table ?
Dataset triggering strategy to determine when this task should be executed based on dataset changes: & and | operators are allowed (dataset1 & dataset2) | dataset3
reserved
Default 5 seconds
Default 5 seconds
15 nested properties
FS or BQ: List of attributes to use for clustering
BQ: Number of days before this table is set as expired and deleted. Never by default.
BQ: Should be require a partition filter on every request ? No by default.
Table types supported by the Sink option
BQ: Enable automatic refresh of materialized view ? false by default.
BQ: Refresh interval in milliseconds. Default to BigQuery default value
columns to use for sharding. table will be named table_{sharding(0)}_{sharding(1)}
FS or BQ: List of partition attributes
When outputting files, should we coalesce it to a single file. Useful when CSV is the output format.
Optional path attribute if you want to save the file outside of the default location (datasets folder)
Map of string
Create individual entry for each ingested file instead of a global one. Default: false
Enable or disable audit logging. Default is true
A schema in JDBC database or a folder in HDFS or a dataset in BigQuery.
20 nested properties
DSV by default. Supported file formats are :\n- DSV : Delimiter-separated values file. Delimiter value is specified in the "separator" field.\n- POSITION : FIXED format file where values are located at an exact position in each line.\n- JSON_FLAT : For optimisation purpose, we differentiate JSON with top level values from JSON\n with deep level fields. JSON_FLAT are JSON files with top level fields only.\n- JSON : Deep JSON file. Use only when your json documents contain sub-documents, otherwise prefer to\n use JSON_FLAT since it is much faster.\n- XML : XML files
are json objects on a single line or multiple line ? Single by default. false means single. false also means faster
Is the json stored as a single object array ? false by default. This means that by default we have on json document per line.
does the dataset has a header ? true by default
15 nested properties
FS or BQ: List of attributes to use for clustering
BQ: Number of days before this table is set as expired and deleted. Never by default.
BQ: Should be require a partition filter on every request ? No by default.
Table types supported by the Sink option
BQ: Enable automatic refresh of materialized view ? false by default.
BQ: Refresh interval in milliseconds. Default to BigQuery default value
columns to use for sharding. table will be named table_{sharding(0)}_{sharding(1)}
FS or BQ: List of partition attributes
When outputting files, should we coalesce it to a single file. Useful when CSV is the output format.
Optional path attribute if you want to save the file outside of the default location (datasets folder)
Map of string
recognized filename extensions. json, csv, dsv, psv are recognized by default. Only files with these extensions will be moved to the stage folder.
Map of string
Treat empty columns as null in DSV files. Default to false
2 nested properties
8 nested properties
Write strategy type including custom strategies. Allows predefined strategies or custom strategy names
Map of connection type to write strategy. Allows different strategies per target database
List of columns to use as key(s) for the target table. This is used to update existing records in the target table.
Set of string to attach to this domain
27 nested properties
attach streams to task (Snowflake only)
List of columns that make up the primary key for the output table
List of columns used for partitioning the output.
List of SQL requests to executed before the main SQL request is run
List of SQL requests to executed after the main SQL request is run
15 nested properties
FS or BQ: List of attributes to use for clustering
BQ: Number of days before this table is set as expired and deleted. Never by default.
BQ: Should be require a partition filter on every request ? No by default.
Table types supported by the Sink option
BQ: Enable automatic refresh of materialized view ? false by default.
BQ: Refresh interval in milliseconds. Default to BigQuery default value
columns to use for sharding. table will be named table_{sharding(0)}_{sharding(1)}
FS or BQ: List of partition attributes
When outputting files, should we coalesce it to a single file. Useful when CSV is the output format.
Optional path attribute if you want to save the file outside of the default location (datasets folder)
Map of string
Expectations to check after Load / Transform has succeeded
Map of rolename -> List[Users].
2 nested properties
Attributes
Set of string to attach to the output table
8 nested properties
Write strategy type including custom strategies. Allows predefined strategies or custom strategy names
Map of connection type to write strategy. Allows different strategies per target database
List of columns to use as key(s) for the target table. This is used to update existing records in the target table.
Number of milliseconds before a communication timeout.
Should we parse this SQL make it update the table according to write strategy or just execute it ?
Used when the default connection ref present in the application.sl.yml file is not the one to use to run the SQL request for this task.
Should this YAML table schema be synchronized with the source table ?
Dataset triggering strategy to determine when this task should be executed based on dataset changes: & and | operators are allowed (dataset1 & dataset2) | dataset3
List of columns to extract. All columns by default.
Number of data partitions to create. Scope: Data extraction.
Map of string
Number of rows to be fetched from the database when additional rows are needed. By default, most JDBC drivers use a fetch size of 10, so if you are reading 1000 objects, increasing the fetch size to 256 can significantly reduce the time required to fetch the query's results. The optimal fetch size is not always obvious. Scope: Data extraction.
If true, extract all data from the table. Scope: Data extraction.
Output configuration for a domain
If true, writes the names of columns as the first line.
One or many of the predefined table types. Scope: Schema and Data extraction.
How to trim the input string
Number of data partitions to create. Scope: Data extraction.
Map of string
Number of rows to be fetched from the database when additional rows are needed. By default, most JDBC drivers use a fetch size of 10, so if you are reading 1000 objects, increasing the fetch size to 256 can significantly reduce the time required to fetch the query's results. The optimal fetch size is not always obvious. Scope: Data extraction.
Define if we should fetch the entire table's or not. If not, maximum value of partitionColumn seen during last extraction is used in order to fetch incremental data. Scope: Data extraction.
Sanitize domain's name by keeping alpha numeric characters only. Scope: Schema and Data extraction.
One or many of the predefined table types. Scope: Schema and Data extraction.
How to trim the input string
Number of data partitions to create. Scope: Data extraction.
Map of string
Number of rows to be fetched from the database when additional rows are needed. By default, most JDBC drivers use a fetch size of 10, so if you are reading 1000 objects, increasing the fetch size to 256 can significantly reduce the time required to fetch the query's results. The optimal fetch size is not always obvious. Scope: Data extraction.
Define if we should fetch the entire table's or not. If not, maximum value of partitionColumn seen during last extraction is used in order to fetch incremental data. Scope: Data extraction.
Sanitize domain's name by keeping alpha numeric characters only. Scope: Schema and Data extraction.
List of regex used to include open api schemas (#/components/schemas). Defaults to ['.*']. 'Includes' is evaluated before 'excludes'
List of regex used to exclude open api schemas (#/components/schemas). Defaults to [].
Explode route's object to more object definition. Use object's path with route path as final name. Defaults to ALL
filter out on field path. Each field is separated by _. Default to []
Regex applied on object path. If matches, use the given name otherwise fallback to route_path + object path as final name
List of operations to retrieve schema from. Defaults to ['GET']. Supported values are GET and POST.
List of regex used to excludes fields. Fields and their subfields are separated by _.
3 nested properties
Explode route's object to more object definition. Use object's path with route path as final name. Defaults to ALL
filter out on field path. Each field is separated by _. Default to []
Regex applied on object path. If matches, use the given name otherwise fallback to route_path + object path as final name
2 nested properties
List of regex used to include open api schemas (#/components/schemas). Defaults to ['.*']. 'Includes' is evaluated before 'excludes'
List of regex used to exclude open api schemas (#/components/schemas). Defaults to [].
Describe what to fetch from data connection. Scope: Schema and Data extraction.
Map of string
Describe what to fetch from data connection. Scope: Schema and Data extraction.
Input for ref object
Output for ref object
Describe how to resolve a reference in a transform task
Input for ref object
3 nested properties
Output for ref object
3 nested properties
Maximum number of records to read from the topic in a single batch. Default is unlimited
List of fields to extract from Kafka messages
Number of partitions for the Kafka topic when creating it
Replication factor for the Kafka topic when creating it
Map of string
Map of string
HTTP headers to include when accessing Kafka via HTTP proxy
Map of string
Map of topic name to topic configuration
Map of string
Gizmo server URL. Default is 'http://localhost:10900'
API key for authenticating with the Gizmo server
Port number for the HTTP server. Default is 8080
Pre-compute incoming partitions to prune partitions on merge statement
3 nested properties
Max number of unique values accepted for a discrete column. Default is 10
Should metrics be computed ?
Validate the YAML file when loading it. If set to true fails on any error
Add value along with the rejection error. Not enabled by default for security reason. Default: false
10 nested properties
15 nested properties
FS or BQ: List of attributes to use for clustering
BQ: Number of days before this table is set as expired and deleted. Never by default.
BQ: Should be require a partition filter on every request ? No by default.
Table types supported by the Sink option
BQ: Enable automatic refresh of materialized view ? false by default.
BQ: Refresh interval in milliseconds. Default to BigQuery default value
columns to use for sharding. table will be named table_{sharding(0)}_{sharding(1)}
FS or BQ: List of partition attributes
When outputting files, should we coalesce it to a single file. Useful when CSV is the output format.
Optional path attribute if you want to save the file outside of the default location (datasets folder)
Map of string
Create individual entry for each ingested file instead of a global one. Default: false
Enable or disable audit logging. Default is true
Should ingested files be archived after ingestion ?
Should invalid records be stored in a replay file ?
4 nested properties
reserved
Default 5 seconds
Default 5 seconds
output files in CSV format ? Default is false
Only generate privacy tasks. Reserved for internal use
Should empty strings be considered as null values ?
Should we load of the files to be stored in the same table in a single task or one by one ?
Maximum number of files to be stored in the same table in a single task
7 nested properties
Map of string
Map of jdbc engines
Map of jdbc engines
1 nested properties
Map of string
configure Spark internal options
5 nested properties
Internal use. Do not modify.
Should audit logs when using BigQuery be saved in batch or interactive mode ? Interactive by default (false)
4 nested properties
Should access policies be enforced ?
4 nested properties
Max number of Spark jobs to run in parallel, default is 1
3 nested properties
should expectations be executed ?
should load / transform fail on expectation error ?
Maximum number of records to reject when an error occurs. Default is 100
Maximum number of parallel file copy operations during import. Default is 1
4 nested properties
Map of string
Map of topic name to topic configuration
Map of string
Map of string
Map of string
How many job to run simultaneously in dev mode (experimental)
Reference mappings for resolving table references in SQL queries across different environments
2 nested properties
Force application to stop even when there is some pending thread.
Enable table archiving before overwrite operations. Default is false
Automatically export table schemas after load/transform operations. Default is false
Timeout in milliseconds for long-running jobs. Default is 3600000 (1 hour)
Timeout in milliseconds for short-running jobs. Default is 300000 (5 minutes)
Automatically create database schema/dataset if it does not exist. Default is true
2 nested properties
Port number for the HTTP server. Default is 8080
Maximum number of records to return in interactive query mode. Default is 1000
is duckdb mode active
Comma separated list of duckdb extensions to load. Default is spatial, json, httpfs
Map of string
Map of string
Allow DuckDB to load / Save data from / to external sources. Default to true
Update attributes in YAMl file when SQL is updated. Default to true
Update database with YAML transform is run. Default to true
Number of retries on transient exceptions
Directory containing python libraries to use instead of pip install
2 nested properties
Gizmo server URL. Default is 'http://localhost:10900'
API key for authenticating with the Gizmo server
Dag configuration.
4 nested properties
Map of string
A schema in JDBC database or a folder in HDFS or a dataset in BigQuery.
6 nested properties
20 nested properties
DSV by default. Supported file formats are :\n- DSV : Delimiter-separated values file. Delimiter value is specified in the "separator" field.\n- POSITION : FIXED format file where values are located at an exact position in each line.\n- JSON_FLAT : For optimisation purpose, we differentiate JSON with top level values from JSON\n with deep level fields. JSON_FLAT are JSON files with top level fields only.\n- JSON : Deep JSON file. Use only when your json documents contain sub-documents, otherwise prefer to\n use JSON_FLAT since it is much faster.\n- XML : XML files
are json objects on a single line or multiple line ? Single by default. false means single. false also means faster
Is the json stored as a single object array ? false by default. This means that by default we have on json document per line.
does the dataset has a header ? true by default
recognized filename extensions. json, csv, dsv, psv are recognized by default. Only files with these extensions will be moved to the stage folder.
Map of string
Treat empty columns as null in DSV files. Default to false
Set of string to attach to this domain
4 nested properties
27 nested properties
attach streams to task (Snowflake only)
List of columns that make up the primary key for the output table
List of columns used for partitioning the output.
List of SQL requests to executed before the main SQL request is run
List of SQL requests to executed after the main SQL request is run
Expectations to check after Load / Transform has succeeded
Map of rolename -> List[Users].
Attributes
Set of string to attach to the output table
Number of milliseconds before a communication timeout.
Should we parse this SQL make it update the table according to write strategy or just execute it ?
Used when the default connection ref present in the application.sl.yml file is not the one to use to run the SQL request for this task.
Should this YAML table schema be synchronized with the source table ?
Dataset triggering strategy to determine when this task should be executed based on dataset changes: & and | operators are allowed (dataset1 & dataset2) | dataset3
27 nested properties
attach streams to task (Snowflake only)
List of columns that make up the primary key for the output table
List of columns used for partitioning the output.
List of SQL requests to executed before the main SQL request is run
List of SQL requests to executed after the main SQL request is run
15 nested properties
FS or BQ: List of attributes to use for clustering
BQ: Number of days before this table is set as expired and deleted. Never by default.
BQ: Should be require a partition filter on every request ? No by default.
Table types supported by the Sink option
BQ: Enable automatic refresh of materialized view ? false by default.
BQ: Refresh interval in milliseconds. Default to BigQuery default value
columns to use for sharding. table will be named table_{sharding(0)}_{sharding(1)}
FS or BQ: List of partition attributes
When outputting files, should we coalesce it to a single file. Useful when CSV is the output format.
Optional path attribute if you want to save the file outside of the default location (datasets folder)
Map of string
Expectations to check after Load / Transform has succeeded
Map of rolename -> List[Users].
2 nested properties
Attributes
Set of string to attach to the output table
8 nested properties
Write strategy type including custom strategies. Allows predefined strategies or custom strategy names
Map of connection type to write strategy. Allows different strategies per target database
List of columns to use as key(s) for the target table. This is used to update existing records in the target table.
Number of milliseconds before a communication timeout.
Should we parse this SQL make it update the table according to write strategy or just execute it ?
Used when the default connection ref present in the application.sl.yml file is not the one to use to run the SQL request for this task.
Should this YAML table schema be synchronized with the source table ?
Dataset triggering strategy to determine when this task should be executed based on dataset changes: & and | operators are allowed (dataset1 & dataset2) | dataset3
Map of string
Table Schema definition.
17 nested properties
Attributes parsing rules.
20 nested properties
DSV by default. Supported file formats are :\n- DSV : Delimiter-separated values file. Delimiter value is specified in the "separator" field.\n- POSITION : FIXED format file where values are located at an exact position in each line.\n- JSON_FLAT : For optimisation purpose, we differentiate JSON with top level values from JSON\n with deep level fields. JSON_FLAT are JSON files with top level fields only.\n- JSON : Deep JSON file. Use only when your json documents contain sub-documents, otherwise prefer to\n use JSON_FLAT since it is much faster.\n- XML : XML files
are json objects on a single line or multiple line ? Single by default. false means single. false also means faster
Is the json stored as a single object array ? false by default. This means that by default we have on json document per line.
does the dataset has a header ? true by default
recognized filename extensions. json, csv, dsv, psv are recognized by default. Only files with these extensions will be moved to the stage folder.
Map of string
Treat empty columns as null in DSV files. Default to false
attach streams to table (Snowflake only)
Reserved for future use.
List of SQL requests to executed after the table has been loaded.
Set of string to attach to this Schema
Row level security on this schema.
Expectations to check after Load / Transform has succeeded
List of columns that make up the primary key
Map of rolename -> List[Users].
90 nested properties
Pre-compute incoming partitions to prune partitions on merge statement
3 nested properties
Max number of unique values accepted for a discrete column. Default is 10
Should metrics be computed ?
Validate the YAML file when loading it. If set to true fails on any error
Add value along with the rejection error. Not enabled by default for security reason. Default: false
10 nested properties
Create individual entry for each ingested file instead of a global one. Default: false
Enable or disable audit logging. Default is true
Should ingested files be archived after ingestion ?
Should invalid records be stored in a replay file ?
4 nested properties
reserved
Default 5 seconds
Default 5 seconds
output files in CSV format ? Default is false
Only generate privacy tasks. Reserved for internal use
Should empty strings be considered as null values ?
Should we load of the files to be stored in the same table in a single task or one by one ?
Maximum number of files to be stored in the same table in a single task
7 nested properties
Map of string
Map of jdbc engines
Map of jdbc engines
1 nested properties
Map of string
configure Spark internal options
5 nested properties
Internal use. Do not modify.
Should audit logs when using BigQuery be saved in batch or interactive mode ? Interactive by default (false)
4 nested properties
Should access policies be enforced ?
4 nested properties
Max number of Spark jobs to run in parallel, default is 1
3 nested properties
should expectations be executed ?
should load / transform fail on expectation error ?
Maximum number of records to reject when an error occurs. Default is 100
Maximum number of parallel file copy operations during import. Default is 1
4 nested properties
Map of string
Map of topic name to topic configuration
Map of string
Map of string
Map of string
How many job to run simultaneously in dev mode (experimental)
Reference mappings for resolving table references in SQL queries across different environments
2 nested properties
Force application to stop even when there is some pending thread.
Enable table archiving before overwrite operations. Default is false
Automatically export table schemas after load/transform operations. Default is false
Timeout in milliseconds for long-running jobs. Default is 3600000 (1 hour)
Timeout in milliseconds for short-running jobs. Default is 300000 (5 minutes)
Automatically create database schema/dataset if it does not exist. Default is true
2 nested properties
Port number for the HTTP server. Default is 8080
Maximum number of records to return in interactive query mode. Default is 1000
is duckdb mode active
Comma separated list of duckdb extensions to load. Default is spatial, json, httpfs
Map of string
Map of string
Allow DuckDB to load / Save data from / to external sources. Default to true
Update attributes in YAMl file when SQL is updated. Default to true
Update database with YAML transform is run. Default to true
Number of retries on transient exceptions
Directory containing python libraries to use instead of pip install
2 nested properties
Gizmo server URL. Default is 'http://localhost:10900'
API key for authenticating with the Gizmo server