Reference Guide
Metadata Tables
PROCESS
This is the table where you define information about data pipeline e.g., Name of Pipeline and other information.
Column Name | Description |
---|---|
PROCESS_ID | Sequentially generated ID. This is defined as Auto increment on table, so doesn't need to be included in DMLs |
PROCESS_NAME | Enter Name of Data Pipeline. This value is passed as parameters when TiPS is executed. |
PROCESS_DESCRIPTION | Description about Data Pipeline. This is optional field, but good to have proper description about the pipeline for others to easily understand |
ACTIVE | Active (Y) / Inactive (N) flag. When set to N (inactive), data pipeline can be disabled and TiPS will not execute it |
PROCESS_CMD
This is the table where you populate information about each step in a data pipeline. There are several columns in this table, some of which are specific to command types. Out of those some are mandatory, and some are optional. Further details about command types are documented below
Column Name | Description |
---|---|
PROCESS_ID | Foreign Key to PROCESS_ID in PROCESS table This needs to match ID of data pipeline defined in PROCESS table |
PROCESS_CMD_ID | Manually assigned sequential ID for Data Pipeline Step When TiPS is executed, steps of a data pipeline run serially in sequential order. Order of the step is identified using this ID. It is advisable to leave gaps between the IDs when setting up the data pipeline initially, so that if there is a need to add an interim step later, it can be inserted in between without needing to reassign all IDs. E.g., 10,20,30,40... or 100,200,300,400... can be assigned initially which leave sufficient gap for steps to be inserted in between, if need be, later. Also in a big team, where multiple data engineers might be working on same data pipeline in same development iteration (sprint), mechanism needs to be in place so that data engineers don't end up using overlapping IDs |
CMD_TYPE | This describes the operation type. Further details are given below E.g., APPEND / REFRESH / DELETE / TRUNCATE etc. |
CMD_SRC | Specify the name of source of data here. This is usually a data table or a view that encapsulates the transformation business logic. |
CMD_TGT | Specify the name of target destination of data here. This is usually a table. |
CMD_WHERE | This is where you can specify a filter clause that gets added to source as a WHERE clause at run time. WHERE keyword should not be included and numbered bind variables can be used for which actual bind replacements are mentioned in CMD_BINDS "C_MKTSEGMENT = :2 AND COBID = :1" |
CMD_BINDS | If bind variables are used in the step, here you specify the name for bind variables, delimited by Pipe "|" symbol. Bind variable values are passed in at runtime in a JSON format. Names of bind variables defined here are interpreted as keys from the variable values JSON object passed in at run-time |
REFRESH_TYPE | This is only applicable for REFRESH command type. Acceptable values are DI/TI/OI DI (Delete Insert) - Before inserting the data in target, a delete command is run (where optionally filter clause can be added through CMD_WHERE). TI (Truncate Insert) - Before inserting the data in target, truncate command on target is run. OI (Overwrite Insert) - Before inserting the data in target, any existing data is removed from target. This works similar to truncate, with the caveat that TRUNCATE is a DDL command invoking a commit to the transaction where OVERWRITE doesn't commits the transaction immediately after delete, thus tables is rolled back to previous state if INSERT DML throws an error. |
BUSINESS_KEY | This is only applicable for PUBLISH_SCD2_DIM command type. It is column(s) delimited by Pipe "|" symbol that defines a business key (also referred as natural key) for a dimension table. For a slowly changing dimension, this is combination of key columns that uniquely identifies a row in the dimension (not a surrogate key), excluding record effective dates and/or current record flag |
MERGE_ON_FIELDS | This is only applicable for MERGE command type. Here you specify columns that are to be used in generated MERGE SQL in the ON join clause. Multiple fields delimited by Pipe "|" symbol |
GENERATE_MERGE_MATCHED_CLAUSE | This is only applicable for MERGE command type. Here you specify whether ON MATCHED CLAUSE is to be generated in generated MERGE DML. Accepted values are Y/N. When Y is selected, ON MATCHED CLAUSE is generated which runs an UPDATE operation |
GENERATE_MERGE_NON_MATCHED_CLAUSE | This is only applicable for MERGE command type. Here you specify whether ON NOT MATCHED CLAUSE is to be generated in generated MERGE DML. Accepted values are Y/N. When Y is selected, ON NOT MATCHED CLAUSE is generated which runs an INSERT operation |
ADDITIONAL_FIELDS | This is where you specify any additional columns/fields to be added to generated SELECT clause from source, which is not available in source. E.g. TO_NUMBER(:1) COBID would add a column to generated SELECT statement, where it is transforming bind variable value passed in at run time, and aliased as COBID, which would then become part of INSERT/MERGE statement |
TEMP_TABLE | Acceptable values - Y/N/NULL When set to Y, this would trigger creating a temporary table with the same name as target in the same schema as target before the operation of the step is run. This feature utilises special functionality that has been introduced in Snowflake, that you can have permanent (or transient) table and a temporary table with the same name and temporary table is then given priority in the current running session. In TiPS we utilise this functionality where a data pipeline can be run concurrently withing multiple sessions with its own bind variables and dataset are consistently transformed and published at session level |
CMD_PIVOT_BY | This is only applicable for COPY_INTO_FILE command type. If you are looking to apply a PIVOT in the SQL query in the COPY INTO statement, then this can be used. This field will dictate which column's values are to be pivoted e.g., REPORTING_CCY |
CMD_PIVOT_FIELD | This is only applicable for COPY_INTO_FILE command type. If you are looking to apply a PIVOT in the SQL query in the COPY INTO statement, then this can be used. This field will be the aggregation of the field for the values which are included in the pivot e.g., SUM(REPORTING_CURRENCY) |
ACTIVE | Active (Y) / Inactive (N) flag. When set to N (inactive), data pipeline step can be disabled and TiPS will skip that step while execute the pipeline |
FILE_FORMAT_NAME | This option is applicable to COPY_INTO_FILE and COPY_INTO_TABLE command types. If a file format has been defined in the database, that can be used. |
COPY_INTO_FILE_PARITITION_BY | This option is applicable to COPY_INTO_FILE command type. This adds PARTITION BY clause in generated COPY INTO FILE command. COPY_INTO_FILE_PARITITION_BY field needs to be an SQL expression that outputs a string. The dataset specified by CMD_SRC will then be split into individual files based on the output of the expression. A directory will be created in the stage specified by CMD_TGT which will be named the same as the partition clause. The data will then be output into this location in the stage. |
COPY_AUTO_MAPPING | Only applicable for COPY_INTO_TABLE command type with CSV source files. Acceptable values - Y/N/NULL When set to 'Y', only fields that exist in both the source and target will be loaded. Source file must have a header. This option can be used to reorder fields from source file to target table or omit fields from the source. |
COPY_INTO_FORCE | Only applicable for COPY_INTO_TABLE command type. Acceptable values - Y/N/NULL When set to 'Y', this option will include the FORCE = TRUE option on COPY INTO commands. Data is loaded regardless of whether source file is unchanged or data already exists in target table. Can produce duplicate data in target table |
PARENT_PROCESS_CMD_ID | This is populated with PROCESS_CMD_ID of preceeding step. Where current step has multiple predecessors, pipe delimited PROCESS_CMD_ID should be entered. For steps with no preceding steps, "NONE" should be used as a DEFAULT value. |
WAREHOUSE_SIZE | If a step requires a different warehouse size to run instead of the default one used to execute the process, then the warehouse size (t-shirt sizes) can be specified here. Expected value are NULL, "XS", "S", "M", "L", "XL", "2XL", "3XL", "4XL", "5XL", "6XL". At run time, this value is replaced with part of string following the last underscore in default warehouse name (used to execute the process). This setting is only applicable when running process in parallel mode. |
PROCESS_LOG
This table holds logging information about each run of TiPS. This table is populated automatically at the end of execution of TiPS
Column Name | Description |
---|---|
PROCESS_LOG_ID | Sequentially generated ID |
PROCESS_NAME | Name of Data Pipeline that has been executed. |
PROCESS_LOG_CREATED_AT | Timestamp at which log records have been inserted to the table |
PROCESS_START_TIME | Timestamp of start of execution of Data Pipeline |
PROCESS_END_TIME | Timestamp of completion of execution of Data Pipeline |
PROCESS_ELAPSED_TIME_IN_SECONDS | Total time (in seconds) taken in execution of Data Pipeline |
EXECUTE_FLAG | Whether Data Pipeline was invoked with Execution Status. When EXECUTE_FLAG is passed as N, SQL statements are only generated and logged but not executed in the database |
STATUS | Status of Data Pipeline Execution |
ERROR_MESSAGE | If any steps errored, top level error message is populated here |
LOG_JSON | Complete Log information of Data Pipeline in JSON format. View VW_PROCESS_LOG displays flattened information of this column |
RUN_ID | Internal ID generated at process level. When process is execute in parallel mode, each step creates a record in PROCESS_LOG with a different PROCESS_LOG_ID, but maintaining the same RUN_ID |
PROCESS_DQ_TEST
This table is populated with data relating to Data Quality Tests. This table is shipped with some standard DQ Test definitions.
Column Name | Description |
---|---|
PROCESS_DQ_TEST_ID | Sequentially generated ID |
PROCESS_DQ_TEST_NAME | Uniquely identifiable Name for Data Quality Test |
PROCESS_DQ_TEST_DESCRIPTION | Descriptive information about Data Quality Test |
PROCESS_DQ_TEST_QUERY_TEMPLATE | Query template to be used when running Data Quality Test. Identifiers within curly braces {} are replaced with actual values at run time. Bind variables can also be specified inside curly braces, e.g. {:1} , which is then replaced by QUERY_BINDS in PROCESS_CMD_TGT_DQ_TEST at runtime |
PROCESS_DQ_TEST_ERROR_MESSAGE | Error Message to display when Test fails |
ACTIVE | TRUE/FALSE When FALSE, data quality test would not run |
PROCESS_CMD_TGT_DQ_TEST
This is the table that you populate with the information to enforce a predefined Data Quality test to a target (table) and additionally an attribute (column)
Column Name | Description |
---|---|
PROCESS_CMD_TGT_DQ_TEST_ID | Sequentially generated ID |
TGT_NAME | Specify the name of target on which Data Quality Test is to be run. This is usually a table. This should match target name defined on |
ATTRIBUTE_NAME | Enter column name on which Data Quality Test is to be run |
ACCEPTED_VALUES | For "Accepted Values" test, this should contain comma separated values that are acceptable in the target. E.g.
|
ERROR_AND_ABORT | TRUE/FALSE, indicating whether the process (data pipeline) should produce error and abort execution when this data quality test fails. When FALSE, process would just log warning and process would continue |
ACTIVE | TRUE/FALSE When FALSE, data quality test would not run |
QUERY_BINDS | Here you can enter any arbitriary bind values that you want to be used in query template. Multiple values can be entered delimited by pipe. Bind variable defined in PROCESS_DQ_TEST_QUERY_TEMPLATE in PROCESS_DQ_TEST are replaced by these values in the sequence of order (starting from :1) at runtime |
PROCESS_DQ_LOG
This table holds logging information about each Data Quality test expected withing a data pipeline when TiPS is run. This log is also associated to data in PROCESS_LOG
table.
Column Name | Description |
---|---|
PROCESS_DQ_LOG_ID | Sequentially generated ID |
PROCESS_LOG_ID | ID linking to record in PROCESS_LOG table |
TGT_NAME | Name of target on which data quality test was executed |
ATTRIBUTE_NAME | Attribute/Column on which data quality test was executed |
DQ_TEST_NAME | Data Quality Test Name |
DQ_TEST_QUERY | Transposed Query executed for the test |
DQ_TEST_RESULT | Array of values causing failure. For successful test, it should be an empty array [] |
START_TIME | Timestamp of start of execution of DQ Test Query |
END_TIME | Timestamp of completion of execution of DQ Test Query |
ELAPSED_TIME_IN_SECONDS | Total time (in seconds) taken in execution of DQ Test Query |
STATUS | Status [PASSED or ERROR or WARNING] of DQ Test |
STATUS_MESSAGE | Warning or Error Message returned |
RUN_ID | Internal ID generated at process level. When process is execute in parallel mode, each step creates a record in PROCESS_LOG with a different PROCESS_LOG_ID, but maintaining the same RUN_ID |
Command Types
APPEND
This effectively generates an "INSERT INTO [target table] ([columns]) SELECT [columns] FROM [source] additionally WHERE", if applicable
Following are the fields applicable for APPEND command type:
Field Name | Mandatory? | Description |
---|---|---|
CMD_SRC | Yes | Specify the name of source of data here. This is usually a data table or a view that encapsulates the transformation business logic. |
CMD_TGT | Yes | Specify the name of target destination of data here. This is usually a table. |
CMD_WHERE | No | This is where you can specify a filter clause that gets added to source as a WHERE clause at run time. WHERE keyword should not be included and numbered bind variables can be used for which actual bind replacements are mentioned in CMD_BINDS "C_MKTSEGMENT = :2 AND COBID = :1" |
CMD_BINDS | No | If bind variables are used in the step, here you specify the name for bind variables, delimited by Pipe "|" symbol. Bind variable values are passed in at runtime in a JSON format. Names of bind variables defined here are interpreted as keys from the variable values JSON object passed in at run-time |
ADDITIONAL_FIELDS | No | This is where you specify any additional columns/fields to be added to generated SELECT clause from source, which is not available in source. E.g. TO_NUMBER(:1) COBID would add a column to generated SELECT statement, where it is transforming bind variable value passed in at run time, and aliased as COBID, which would then become part of INSERT/MERGE statement |
TEMP_TABLE | No | Acceptable values - Y/N/NULL When set to Y, this would trigger creating a temporary table with the same name as target in the same schema as target before the operation of the step is run. This feature utilises special functionality that has been introduced in Snowflake, that you can have permanent (or transient) table and a temporary table with the same name and temporary table is then given priority in the current running session. In TiPS we utilise this functionality where a data pipeline can be run concurrently withing multiple sessions with its own bind variables and dataset are consistently transformed and published at session level |
COPY_INTO_FILE
This command type is for outputting data from a table/view into a file into an internal user stage or an internal named stage or an external stage.
Following are the fields applicable for COPY_INTO_FILE command type:
Field Name | Mandatory? | Description |
---|---|---|
CMD_SRC | Yes | Specify the name of source of data here. This is usually a data table or a view that would provide data in desired form. |
CMD_TGT | Yes | This is the file location for outputting data from source to this file. If the PARTITION_CLAUSE is not being used, then the CMD_TGT should be the exact path and filename that should be created. If the PARTTION_CLAUSE is being used, then this field should only contain a stage and a path. For example, if the file should go into a user stage in the XYZ directory it should be @~/XYZ. In both cases, it is permissible to use any BIND variables as part of the name i.e., @~/:1/XYZ would create a directory based on the first bind variable followed by XYZ. |
CMD_WHERE | No | This is where you can specify a filter clause that gets added to source as a WHERE clause at run time. WHERE keyword should not be included and numbered bind variables can be used for which actual bind replacements are mentioned in CMD_BINDS "C_MKTSEGMENT = :2 AND COBID = :1" |
CMD_BINDS | No | If bind variables are used in the step, here you specify the name for bind variables, delimited by Pipe "|" symbol. Bind variable values are passed in at runtime in a JSON format. Names of bind variables defined here are interpreted as keys from the variable values JSON object passed in at run-time |
CMD_PIVOT_BY | No | If you are looking to apply a PIVOT in the SQL query in the COPY INTO statement, then this can be used. This field will dictate which column's values are to be pivoted e.g., REPORTING_CCY |
CMD_PIVOT_FIELD | No | If you are looking to apply a PIVOT in the SQL query in the COPY INTO statement, then this can be used. This field will be the aggregation of the field for the values which are included in the pivot e.g., SUM(REPORTING_CURRENCY) |
FILE_FORMAT_NAME | No | If a file format has been defined in the database with all applicable configurations, that this field can be used. Please include schema name with file format name e.g. [SCHEMA NAME].[FILE FORMAT NAME] and all in CAPS please If this field is omitted, then File Type "CSV" and compression "GZIP" is used by default |
COPY_INTO_FILE_PARITITION_BY | No | This field is to be populated when you want to apply a PARTITION BY clause in generated COPY INTO FILE command. COPY_INTO_FILE_PARITITION_BY field needs to be an SQL expression that outputs a string. The dataset specified by CMD_SRC will then be split into individual files based on the output of the expression. A directory will be created in the stage specified by CMD_TGT which will be named the same as the partition clause. The data will then be output into this location in the stage. |
COPY_INTO_TABLE
This command type is for loading data from a staged file to the database table. Stage file can be stored in an internal user stage or an internal named stage or an external stage.
Following are the fields applicable for COPY_INTO_TABLE command type:
Field Name | Mandatory? | Description |
---|---|---|
CMD_SRC | Yes | This is the file location with data to be loaded. CMD_SRC should be the exact path and filename that should be loaded. It is permissible to use any BIND variables as part of the name i.e., @~/:1/XYZ/ABC.csv would create a directory based on the first bind variable followed by XYZ. E.g., @tips/EXTRACTS/:1/PUBLISH_CUSTOMER/CUSTOMER.csv When using File Format, extension can be omitted |
CMD_TGT | Yes | This is the name of table into which data from file is to be loaded Please include schema name along with table name, and all in CAPS |
CMD_BINDS | No | If bind variables are used in the step, here you specify the name for bind variables, delimited by Pipe "|" symbol. Bind variable values are passed in at runtime in a JSON format. Names of bind variables defined here are interpreted as keys from the variable values JSON object passed in at run-time |
FILE_FORMAT_NAME | No | If a file format has been defined in the database with all applicable configurations, that this field can be used. Please include schema name with file format name e.g. [SCHEMA NAME].[FILE FORMAT NAME] and all in CAPS please If this field is omitted, then File Type "CSV" and compression "GZIP" is used by default |
COPY_AUTO_MAPPING | No | Acceptable values - Y/N/NULL When set to 'Y', only fields that exist in both the source file and target table will be loaded. This option can be used to reorder fields from source file to target table or ommit fields from the source. **NOTE: Source file must be a CSV with a header. ** |
COPY_INTO_FORCE | No | Acceptable values - Y/N/NULL When set to 'Y', this option will include the FORCE = TRUE option on COPY INTO commands. Data is loaded regardless of whether source file is unchanged or data already exists in target table. Can produce duplicate data in target table |
ADDITIONAL_FIELDS | No | Specify any additional columns/fields to be added to generated SELECT clause from source file, that are not available in source. E.g. TO_NUMBER(:1) COBID would add a column to generated SELECT statement, where it is transforming bind variable value passed in at run time, and aliased as COBID, which would then become part of COPY INTO statement. **NOTE: Source file must be a CSV. ** |
DELETE
This effectively generates a "DELETE FROM [target table] additionally WHERE", if applicable
Following are the fields applicable for DELETE command type:
Field Name | Mandatory? | Description |
---|---|---|
CMD_TGT | Yes | Specify the name of table from which data is to be deleted. Please include schema name with the object name e.g. [SCHEMA NAME].[OBJECT NAME] and all in CAPS please |
CMD_WHERE | No | This is where you can specify a filter clause that gets added as a WHERE clause at run time. WHERE keyword should not be included and numbered bind variables can be used for which actual bind replacements are mentioned in CMD_BINDS "C_MKTSEGMENT = :2 AND COBID = :1" |
CMD_BINDS | No | If bind variables are used in the step, here you specify the name for bind variables, delimited by Pipe "|" symbol. Bind variable values are passed in at runtime in a JSON format. Names of bind variables defined here are interpreted as keys from the variable values JSON object passed in at run-time |
DQ_TEST
This command type is for specifying Data Quality Tests within data pipeline. All active Data Quality Tests defined on the target that is specified here, are run within this step.
Following are the fields applicable for DQ_TEST command type:
Field Name | Mandatory? | Description |
---|---|---|
CMD_TGT | Yes | Specify the name of table/view on which data quality tests are to be run from which data is to be deleted. This can also accept pipe delimited multiple targets, which gives a flexibility to define running of DQ Tests on multiple targets in a single step. Please include schema name with the object name e.g. [SCHEMA NAME].[OBJECT NAME] and all in CAPS please |
CMD_WHERE | No | This is where you can specify a filter clause that gets added as a WHERE clause at run time. WHERE keyword should not be included and numbered bind variables can be used for which actual bind replacements are mentioned in CMD_BINDS "C_MKTSEGMENT = :2 AND COBID = :1" |
CMD_BINDS | No | If bind variables are used in the step, here you specify the name for bind variables, delimited by Pipe "|" symbol. Bind variable values are passed in at runtime in a JSON format. Names of bind variables defined here are interpreted as keys from the variable values JSON object passed in at run-time |
MERGE
This command type is where we want to use a MERGE statement. It supports either WHEN MATCHED or WHEN NOT MATCHED or both.
Following are the fields applicable for MERGE command type:
Field Name | Mandatory? | Description |
---|---|---|
CMD_SRC | Yes | Specify the name of source of data here. This is usually a data table or a view that encapculates the transformation business logic. |
CMD_TGT | Yes | Specify the name of target destination of data here. This is usually a table. |
CMD_WHERE | No | This is where you can specify a filter clause that gets added to source as a WHERE clause at run time. WHERE keyword should not be included and numbered bind variables can be used for which actual bind replacements are mentioned in CMD_BINDS "C_MKTSEGMENT = :2 AND COBID = :1" |
CMD_BINDS | No | If bind variables are used in the step, here you specify the name for bind variables, delimited by Pipe "|" symbol. Bind variable values are passed in at runtime in a JSON format. Names of bind variables defined here are interpreted as keys from the variable values JSON object passed in at run-time |
MERGE_ON_FIELDS | Yes | Here you specify columns that are to be used in generated MERGE SQL in the ON join clause. Multiple fields delimited by Pipe "|" symbol |
GENERATE_MERGE_MATCHED_CLAUSE | No | Acceptable values are Y/N. When set to Y, "WHEN MATCHED UPDATE" clause is added to generated MERGE statement Either this field or GENERATE_MERGE_NON_MATCHED_CLAUSE should be set to Y |
GENERATE_MERGE_NON_MATCHED_CLAUSE | No | Acceptable values are Y/N. When set to Y, "WHEN NOT MATCHED INSERT" clause is added to generated MERGE statement Either this field or GENERATE_MERGE_MATCHED_CLAUSE should be set to Y |
ADDITIONAL_FIELDS | No | This is where you specify any additional columns/fields to be added to generated SELECT clause from source, which is not available in source. E.g. TO_NUMBER(:1) COBID would add a column to generated SELECT statement, where it is transforming bind variable value passed in at run time, and aliased as COBID, which would then become part of INSERT/MERGE statement |
TEMP_TABLE | No | Acceptable values - Y/N/NULL When set to Y, this would trigger creating a temporary table with the same name as target in the same schema as target before the operation of the step is run. This feature utilises special functionality that has been introduced in Snowflake, that you can have permanent (or transient) table and a temporary table with the same name and temporary table is then given priority in the current running session. In TiPS we utilise this functionality where a data pipeline can be run concurrently withing multiple sessions with its own bind variables and dataset are consistently transformed and published at session level |
PUBLISH_SCD2_DIM
This command type is specifically created for populating data to SCD (Slowly Changing Dimension) Type 2, where updates to attributes of dimension are handled by creating a version of record with latest values and closing off previous version. This is done by setting appropriate values to "EFFECTIVE_START_DATE", "EFFECTIVE_END_DATE" and "IS_CURRENT_ROW" columns.
Following are the fields applicable for PUBLISH_SCD2_DIM command type:
Field Name | Mandatory? | Description |
---|---|---|
CMD_SRC | Yes | Specify the name of source of data here. This is usually a data table or a view that encapsulates the transformation business logic. |
CMD_TGT | Yes | Specify the name of target destination of data here. This is usually a table. |
BUSINESS_KEY | Yes | It is column(s) delimitted by Pipe "|" symbol that define a business key (also referred as natural key) for a dimension table. For a slowly changing dimension, this is combination of key columns that uniquely identifies a row in the dimension (not a surrogate key), excluding record effective dates and/or current record flag |
CMD_WHERE | No | This is where you can specify a filter clause that gets added to source as a WHERE clause at run time. WHERE keyword should not be included and numbered bind variables can be used for which actual bind replacements are mentioned in CMD_BINDS "C_MKTSEGMENT = :2 AND COBID = :1" |
CMD_BINDS | No | If bind variables are used in the step, here you specify the name for bind variables, delimited by Pipe "|" symbol. Bind variable values are passed in at runtime in a JSON format. Names of bind variables defined here are interpretted as keys from the variable values JSON object passed in at run-time |
ADDITIONAL_FIELDS | No | This is where you specify any additional columns/fields to be added to generated SELECT clause from source, which is not available in source. E.g. TO_NUMBER(:1) COBID would add a column to generated SELECT statement, where it is transforming bind variable value passed in at run time, and aliased as COBID, which would then become part of INSERT/MERGE statement |
TEMP_TABLE | No | Acceptable values - Y/N/NULL When set to Y, this would trigger creating a temporary table with the same name as target in the same schema as target, before the operation of the step is run. This feature utilises special functionality that has been introduced in Snowflake, that you can have permanent(or transient) table and a temporary table with the same name and temporary table is then given priority in the current running session. In TiPS we utilise this functionality where a data pipeline can be run concurrently withing multiple sessions with its own bind variables and dataset are consistently transformed and published at session level |
REFRESH
This command type is for running a DELETE/TRUNCATE SQL command on target and then consecutively running INSERT SQL command. REFRESH command type supports "DELETE then INSERT", "OVERWRITE INSERT" or "TRUNCATE then INSERT", one of which should be specified with "REFRESH_TYPE" field setting.
Following are the fields applicable for REFRESH command type:
Field Name | Mandatory? | Description |
---|---|---|
CMD_SRC | Yes | Specify the name of source of data here. This is usually a data table or a view that encapsulates the transformation business logic. |
CMD_TGT | Yes | Specify the name of target destination of data here. This is usually a table. |
REFRESH_TYPE | Yes | Acceptable values are DI/TI/OI DI (Delete Insert) - Before inserting the data in target, a delete command is run (where optionally filter clause can be added through CMD_WHERE). TI (Truncate Insert) - Before inserting the data in target, truncate command on target is run. OI (Overwrite Insert) - Before inserting the data in target, any existing data is removed from target. This works similar to truncate, with the caveat that TRUNCATE is a DDL command invoking a commit to the transaction where OVERWRITE doesn't commits the transaction immediately after delete, thus tables is rolled back to previous state if INSERT DML throws an error. |
CMD_WHERE | No | This is where you can specify a filter clause that gets added to source as a WHERE clause at run time. WHERE keyword should not be included and numbered bind variables can be used for which actual bind replacements are mentioned in CMD_BINDS "C_MKTSEGMENT = :2 AND COBID = :1" |
CMD_BINDS | No | If bind variables are used in the step, here you specify the name for bind variables, delimited by Pipe "|" symbol. Bind variable values are passed in at runtime in a JSON format. Names of bind variables defined here are interpreted as keys from the variable values JSON object passed in at run-time |
ADDITIONAL_FIELDS | No | This is where you specify any additional columns/fields to be added to generated SELECT clause from source, which is not available in source. E.g. TO_NUMBER(:1) COBID would add a column to generated SELECT statement, where it is transforming bind variable value passed in at run time, and aliased as COBID, which would then become part of INSERT/MERGE statement |
TEMP_TABLE | No | Acceptable values - Y/N/NULL When set to Y, this would trigger creating a temporary table with the same name as target in the same schema as target before the operation of the step is run. This feature utilises special functionality that has been introduced in Snowflake, that you can have permanent (or transient) table and a temporary table with the same name and temporary table is then given priority in the current running session. In TiPS we utilise this functionality where a data pipeline can be run concurrently withing multiple sessions with its own bind variables and dataset are consistently transformed and published at session level |
TRUNCATE
This command type is for running TRUNCATE SQL command on the defined target.
Following are the fields applicable for TRUNCATE command type:
Field Name | Mandatory? | Description |
---|---|---|
CMD_TGT | Yes | Specify the name of table from which data is to be deleted. Please include schema name with the object name e.g. [SCHEMA NAME].[OBJECT NAME] and all in CAPS please |