Credit: Unsplash
Snowflake is a cloud-based data warehousing platform that provides a variety of options for storing and managing data. One of the key features of Snowflake is its support for multiple types of tables, each of which has its own unique characteristics and use cases. In addition to these table types, Snowflake also provides a variety of features for managing and optimizing table performance, such as clustering keys, compression settings, and automatic query optimization.
Overall, Snowflake's support for multiple types of tables provides a high degree of flexibility and scalability for data warehousing and analytics workloads. By choosing the right type of table for each use case, organizations can optimize their data management and analysis capabilities to meet their specific needs.
Let's take a closer look at each type of Snowflake table
Standard tables are the most commonly used type of table in Snowflake. They are created and managed entirely within Snowflake and can store both structured and semi-structured data. Standard tables are designed to provide high performance and scalability for data warehousing workloads, with support for features such as clustering, partitioning, and compression.
CREATE TABLE sales (
id NUMBER,
date DATE,
customer_name TEXT,
amount FLOAT
);
Views in Snowflake are virtual tables that do not store any data on their own. Instead, they are defined by a SQL query and provide a way to access and analyze data from one or more tables in a structured way. Views can be created using the CREATE VIEW statement.
CREATE VIEW monthly_sales AS
SELECT
date_trunc('month', date) AS month,
SUM(amount) AS total_sales
FROM sales
GROUP BY month;
Temporary tables are a type of table in Snowflake that are created for a specific session or query and are automatically deleted when the session or query ends. Temporary tables can be created from standard tables or from query results, and they are useful for storing intermediate data or breaking down complex queries into smaller parts.
CREATE TEMPORARY TABLE temp_sales AS
SELECT *
FROM sales
WHERE date BETWEEN '2022-01-01' AND '2022-01-31';
External tables in Snowflake are tables that reference data stored outside of Snowflake, such as in cloud object storage systems like Amazon S3 or Azure Blob Storage. External tables allow you to query data without having to load it into Snowflake first, which can be useful for working with very large datasets or data that is generated outside of Snowflake.
CREATE EXTERNAL TABLE ext_sales (
id NUMBER,
date DATE,
customer_name TEXT,
amount FLOAT
)
LOCATION = 's3://mybucket/sales.csv'
FILE_FORMAT = (TYPE = CSV);
Materialized views in Snowflake are precomputed views that are stored as tables in Snowflake. Materialized views can be used to improve query performance by precomputing and caching the results of complex queries, which can be faster to access than running the queries in real-time. Materialized views are automatically refreshed on a schedule or when the underlying data changes.
CREATE MATERIALIZED VIEW monthly_sales_mv AS
SELECT
date_trunc('month', date) AS month,
SUM(amount) AS total_sales
FROM sales
GROUP BY month;
Stage tables in Snowflake are used to load data into Snowflake from external sources, such as files on an external file system. They are temporary tables that can be used to stage data before it is loaded into a standard or external table. Stage tables can be created with the CREATE TEMPORARY TABLE statement.
CREATE TEMPORARY TABLE stage_sales (
id NUMBER,
date DATE,
customer_name TEXT,
amount FLOAT
)
FILE_FORMAT = (TYPE = CSV);
COPY INTO sales
FROM @stage_sales;
Transient tables are a type of temporary table in Snowflake that are created in the memory for a specific session and automatically dropped at the end of the session or transaction. These tables are used to hold intermediate data during ETL or data transformation processes, and are optimized for quick reads and writes. They are useful for reducing storage costs as they don't require permanent storage, and can be used for ad-hoc analysis or testing. In this example, a transient table called temp_sales is created to store temporary data. Data is inserted into the table using a SELECT statement from another table called source_sales, and then the data is queried to perform some analysis. Finally, the transient table is dropped when it is no longer needed.
CREATE TRANSIENT TABLE temp_sales (
sales_id INTEGER,
sales_amount DECIMAL(10,2),
sales_date DATE
);
INSERT INTO temp_sales (sales_id, sales_amount, sales_date)
SELECT id, amount, date
FROM source_sales
WHERE date >= '2022-01-01';
SELECT sales_date, SUM(sales_amount) AS total_sales
FROM temp_sales
GROUP BY sales_date;
DROP TABLE temp_sales;
Clone tables in Snowflake are copies of an existing table, including its schema and data. They can be created with the CREATE TABLE statement and the CLONE option. Clone tables are useful for creating test environments or making backups of important data.
CREATE TABLE sales_clone CLONE sales;
Secure views in Snowflake are views that provide additional security and privacy controls over the underlying data. Secure views can be used to restrict access to certain columns or rows of data, or to mask sensitive data such as credit card numbers or social security numbers.
Suppose we have a table employee_info which contains sensitive information about employees, including their salaries. We want to create a view that only shows employee names and their job titles, without exposing their salaries.
To create a secure view, we can follow these steps:
Create a role for users who should have access to the secure view, and grant that role the necessary privileges on the employee_info table:
CREATE ROLE view_users;
GRANT USAGE, SELECT ON TABLE employee_info TO ROLE view_users;
CREATE VIEW secure_employee_info
AS
SELECT name, job_title
FROM employee_info;
GRANT SELECT ON secure_employee_info TO ROLE view_users;
GRANT ROLE view_users TO USER alice;
GRANT ROLE view_users TO USER bob;
Now, Alice and Bob can query the secure_employee_info view to see employee names and job titles, but they do not have access to the sensitive salary information in the employee_info table.
Suppose we have a table employee_info which contains sensitive information about employees, including their salaries. We want to create a view that only shows employee names and their job titles, without exposing their salaries.
To create a secure view, we can follow these steps:
CREATE ROLE view_users;
GRANT USAGE, SELECT ON TABLE employee_info TO ROLE view_users;
CREATE VIEW secure_employee_info
AS
SELECT name, job_title
FROM employee_info;
GRANT SELECT ON secure_employee_info TO ROLE view_users;
GRANT ROLE view_users TO USER alice;
GRANT ROLE view_users TO USER bob;
Now, Alice and Bob can query the secure_employee_info view to see employee names and job titles, but they do not have access to the sensitive salary information in the employee_info table.
Transit tables in Snowflake are used for data transfers between Snowflake accounts or between Snowflake and other systems. Transit tables are designed to provide high-speed, secure data transfer capabilities between different environments, and they can be used for a variety of use cases, such as data replication or backup and recovery. Here is an example of how to create and use Transit Tables:
CREATE TRANSIT TABLE my_transit_table;
GRANT USAGE ON TRANSIT TABLE my_transit_table TO ROLE my_role;
INSERT INTO TRANSIT TABLE my_transit_table
SELECT *
FROM my_source_table;
INSERT INTO my_destination_table
SELECT *
FROM TRANSIT TABLE my_transit_table;
By using Transit Tables, you can securely and efficiently transfer data between Snowflake accounts or regions without having to export or import data to an intermediate location.
Hybrid tables in Snowflake are a type of table that can combine the features of both standard tables and external tables. Hybrid tables allow you to query data stored in external sources as if it were part of a standard table within Snowflake. This can be useful when working with large datasets that may not fit within Snowflake's storage limits, or when you need to join data from Snowflake with data stored outside of Snowflake.
Let's say we have two tables - sales and inventory - and we want to join them on the product_id column. We also want to store the results of this join as a materialized view so that we can easily query it later. Here is how we can do that with a hybrid table:
CREATE STAGE sales_stage
URL='s3://my-bucket/sales'
CREDENTIALS=(AWS_KEY_ID='my_key_id' AWS_SECRET_KEY='my_secret_key');
CREATE OR REPLACE EXTERNAL TABLE sales_ext (
product_id INT,
sales_amount FLOAT
)
FROM STAGE sales_stage
FILE_FORMAT = (TYPE = 'CSV' FIELD_DELIMITER = ',' SKIP_HEADER = 1)
;
CREATE OR REPLACE TABLE inventory_int (
product_id INT,
units_in_stock INT
);
CREATE TABLE sales_inventory_hybrid (
product_id INT,
sales_amount FLOAT,
units_in_stock INT
)
AS SELECT s.product_id, s.sales_amount, i.units_in_stock
FROM sales_ext s
JOIN inventory_int i
ON s.product_id = i.product_id;
CREATE MATERIALIZED VIEW sales_inventory_mv
AS SELECT *
FROM sales_inventory_hybrid;
Now, we have a materialized view called sales_inventory_mv that contains the results of the join between the sales and inventory tables. The hybrid table sales_inventory_hybrid contains the same data, but it is stored in an external stage and an internal table. This allows us to take advantage of the benefits of both external and internal tables in Snowflake.
Event tables are used to capture and log information about specific activities or events within the Snowflake environment, such as the execution of a stored procedure or a failed login attempt. Event tables can be useful for auditing, troubleshooting, or other purposes. In this example, we are creating a table called "my_event_table" with columns for "event_id", "event_timestamp", "event_type", "user_id", and "event_data". You can modify the table name and column names/types to suit your needs.
CREATE TABLE my_event_table (
event_id NUMBER,
event_timestamp TIMESTAMP,
event_type VARCHAR(50),
user_id VARCHAR(50),
event_data VARIANT
);
Once you have created your event table, you can use it to capture and log information about specific events or activities within Snowflake, such as the execution of a stored procedure or a failed login attempt. You can use SQL statements to insert data into the event table, or you can configure Snowflake to automatically capture certain events and log them to the table.
CREATE TABLE event_logs (
timestamp TIMESTAMP_NTZ(9),
event_type STRING,
event_details VARIANT
);
CREATE OR REPLACE PROCEDURE my_procedure()
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
INSERT INTO event_logs (timestamp, event_type, event_details)
SELECT CURRENT_TIMESTAMP(), 'my_procedure', PARSE_JSON('{"parameter": "value"}');
-- Rest of the procedure code
$$;
A new table type in Snowflake that enables low-latency analytics on large-scale data sets, with the ability to incrementally update data in a cost-effective manner. Iceberg tables are designed to handle large-scale data sets while still providing low-latency queries, making it ideal for use cases such as ad hoc analytics, machine learning, and interactive applications.
Here is an example of creating an Iceberg table in Snowflake:
CREATE TABLE my_iceberg_table
USING iceberg
AS SELECT * FROM my_source_table;
This statement creates an Iceberg table called my_iceberg_table and populates it with the data from my_source_table. The USING iceberg clause specifies that the table should use the Iceberg storage format.
Once the Iceberg table is created, you can query it like any other table in Snowflake:
SELECT * FROM my_iceberg_table WHERE some_column = 'some_value';
INSERT INTO my_iceberg_table (col1, col2, col3) VALUES (1, 'abc', '2022-03-15');
UPDATE my_iceberg_table SET col1 = 2 WHERE col2 = 'abc';
DELETE FROM my_iceberg_table WHERE col3 < '2022-03-01';
Iceberg tables in Snowflake offer several benefits, such as efficient data storage and query performance, support for incremental updates and deletes, and compatibility with existing Apache Iceberg ecosystems.
Dynamic tables are used to store temporary data that does not need to persist beyond the current query execution. Dynamic tables can be useful for storing intermediate results for complex queries or breaking down large queries into smaller, more manageable chunks.
Where <table_name> is the name of the dynamic table you want to create, and <SELECT statement>
is the SQL statement that defines the data pipeline. This SQL statement can include joins, aggregates, window functions, and other complex transformations.
CREATE DYNAMIC TABLE <table_name> AS (
<SELECT statement>
);
Here's an example of creating a dynamic table in Snowflake:
CREATE DYNAMIC TABLE dynamic_sales_data AS (
SELECT
date_trunc('month', sale_date) AS month,
product_name,
SUM(sales_amount) AS total_sales
FROM sales
JOIN products ON sales.product_id = products.product_id
GROUP BY 1, 2
);
This creates a dynamic table called dynamic_sales_data that summarizes sales data by month and product. The table will automatically refresh as new sales data is added to the sales table.
Challenges that Dynamic Tables solve
When to use Dynamic Tables
Overall, dynamic tables in Snowflake provide a powerful tool for building complex data pipelines using simple SQL statements, and can help data engineers create more resilient and cost-effective data processing solutions.
In 2022 and 2023, Snowflake introduced several new table types and enhancements to its platform, including:
Dynamic Tables: This table type allows teams to use simple SQL statements to declaratively define the result of data pipelines. Dynamic Tables automatically refresh as the underlying data changes, simplifying data engineering and enabling cost-effective pipelines.
Event Tables: This table type makes logging in Snowflake easier and more streamlined. It enables logging around Snowflake stored procedures and will be used to enhance Snowpark, Python, and other capabilities in the future.
Iceberg Tables: This table type brings Apache Iceberg's open-source format to Snowflake, providing users with an efficient, scalable way to manage large datasets. It enables incremental data loading and better support for large data sets.
Hybrid Tables: This table type provides users with the ability to store both structured and semi-structured data in a single table, simplifying data management and enabling more efficient querying.
Overall, these enhancements and new table types provide Snowflake users with greater flexibility, scalability, and efficiency in managing their data.