I’ve spent the last seven years using Amazon Redshift, and during that time Snowflake became the standard modern data warehouse. (I did a survey of relevant companies’ data stacks when we evaluated Looker, and Snowflake was the overwhelming favorite warehouse.) Odds are pretty good that I’ll use Snowflake in my next job. So how will I go from training colleagues about efficient Redshift queries (and Introduction to SQL for interns) to writing Snowflake queries in the future?
I started with fairly obvious Google queries, “snowflake sql tutorial” and “snowflake sql query examples”, then added “query from redshift to snowflake” to find people’s comments about migrating. I knew what types of differences I was looking for:
- Structural differences in how data is stored/accessed
- Syntax differences
- Timestamps and dates – always painful
- Special capabilities / functions
Data engineering posts about migrating your warehouse are good for the first two; analytics and analytics engineering posts about advanced query tips are good for the third. Snowflake documentation and writing is good for filling in syntax details and expert advice once you know what to look for.
Most helpful resources
Migrating from Redshift to Snowflake: How our Data Engineering team completed the massive “lift and shift” by Tamir Rozenberg, Instacart – code samples for rewriting Redshift query syntax into Snowflake syntax, timestamp issues, uppercase identifiers
The great migration from Redshift to Snowflake by Rafay Aleem, Faire – syntax rewriting, uppercase identifiers, tagging for PII masking and cost tracking
How Compatible are Redshift and Snowflake’s SQL Syntaxes? by Tristan Handy, dbt (2018) – out of date (Snowflake has improved their window functions), but gives examples of specific timestamp and uppercase identifier issues
5 Snowflake Query Tricks You Aren’t Using but Should Be by Anthony Li – QUALIFY
, IFF
, PIVOT
, TRY_TO_DATE()
Advanced Cool Things in Snowflake by Alisa Aylward – window functions beyond simple ROW_NUMBER()
, data cleaning for numbers with REPLACE
or REGEX
or TRY_TO_NUMBER()
What I learned
Unattributed code is from Snowflake’s documentation.
Similarity: columnar database
I knew this one before I started, but “don’t use SELECT *
” should be the first line of any efficient-query post about either of these. They’re columnar databases, meaning columns of data are stored and compressed together, so fetching all columns means looking in many places. Just fetch the columns you care about.
Structural difference: compute resources
I knew this one before I started researching too – it’s why Snowflake won.
Redshift now separates storage and compute, and now has both provisioned and on-demand compute (pricing page). Neither was true when I started using it with coupled, provisioned resources. I’ve regularly asked analysts to delete old tables or to optimize particular queries because they’re using too much storage or compute.
Snowflake separates storage and compute, and provisions warehouses on-demand for compute (including multiple warehouses using data concurrently). So someone needs to be watching that tasks use appropriately sized warehouses and that queries are optimized. Who does that might be an interviewee question for a more engineering-oriented role, or an onboarding question for a more analysis-oriented role.
Storage is now cheap for data the size of tracking human online activity, not just for tracking entered information. Bigger data like video would require more thought.
Structural difference: distribution vs clustering
Redshift is efficient if you specify how your table is distributed across its nodes (or now across S3) and sorted in each chunk, and then you join using those columns to another table that’s distributed and sorted in the same way. For example: I often join data for the same user, so distribute all my tables by user_id. I often filter by a time range, so sort all my event tables by created_at. My query was slow when I joined by username and didn’t mention user_id, because Redshift looked for that username in every chunk of data instead of only the ones with data for the same user_id. I don’t know if the cost of maintaining distkeys and sortkeys with Redshift’s on-demand compute is noticeable.
Snowflake generally doesn’t need analysts to think about clustering, it seems, since I only found references to it in data engineering migrate-your-warehouse posts. Clustering is aimed at colocating data that will often be selected or sorted together, so analysts probably include clustering keys without extra prompting. The company has useful commentary including documentation and Tips for Clustering in Snowflake by Brad Culberson. Maintaining clustering is noted to cost money for compute, so it’s most useful for often-queried, rarely updated data.
Syntax difference: timestamps

Timestamps and dates and time zone conversions and date math are terrible things. I look up details every time. The two warehouse-migration posts have translations, but since I don’t have the Redshift syntax memorized, that’s not so relevant for me. Snowflake’s defaults do seem better chosen for analysis, but their three timestamp types seem likely to cause more than the usual confusion and bugs. I hope most organizations enforce a single choice.
Sample code snippets:
CURRENT_TIMESTAMP() -- current timestamp in local time zone
SYSDATE() -- current timestamp in UTC, TIMESTAMP_NTZ type
DATE '2023-06-19' -- literal
SELECT TO_DATE('2019-02-28') + INTERVAL '1 year, 1 day' -- 2020-02-29
SELECT DATEADD(MONTH, 1, '2000-01-31'::DATE) -- 2000-02-29
Syntax difference: JSON
You can parse JSON data in Redshift. It’s another “look syntax up every time” area, and the SQL is hard to read. I’d want to extract the data and use Python instead.
Snowflake makes parsing JSON data fairly easy, e.g. <column>:<level1_element>.'<level2 element with spaces>'[0]
. Use FLATTEN
to turn nested objects into multiple rows.
Special capability: attempted type conversions
select -- quoting Anthony Li
coalesce(
try_to_date(freeflow_sale_date),
try_to_date(freeflow_sale_date, 'DD Mon YYYY'), -- 19 September 2020
try_to_date(freeflow_sale_date, 'DD/MM/YYYY'), -- 14/12/2020
try_to_date(freeflow_sale_date, 'YYYY/MM/DD') -- 2020/12/08
) as sale_date
If you’re cleaning input data, you can try several type conversions in one step. This code will take the first one that works, so don’t trust it as-is for mixed MM/DD/YYYY and DD/MM/YYYY date formats.
Also see REPLACE and REGEXP_REPLACE for data cleaning / transformation before trying to cast.
Special capability: IFF
CASE
statements with one clause are wordy, so replace them with IFF
. I wish this had a different name, since my math background says “iff” means “if and only if,” but the syntax otherwise nicely matches Excel’s, which I use all the time.
IFF(<condition> , <expr1> , <expr2>) AS <column_name>
Special capability: PIVOT
Turn a long table of categories plus values into PIVOT‘s columns per category, like Excel pivot tables. This is clever, but I’d rather use a BI tool.
SELECT *
FROM monthly_sales
PIVOT(SUM(amount) FOR MONTH IN ('JAN', 'FEB', 'MAR', 'APR'))
AS p (EMP_ID_renamed, JAN, FEB, MAR, APR)
ORDER BY EMP_ID_renamed;
+----------------+-------+-------+-------+-------+
| EMP_ID_RENAMED | JAN | FEB | MAR | APR |
|----------------+-------+-------+-------+-------|
| 1 | 10400 | 8000 | 11000 | 18000 |
| 2 | 39500 | 90700 | 12000 | 5300 |
+----------------+-------+-------+-------+-------+
Special capability: QUALIFY
Last because it’s most beautiful! Think of this as an follow-up to WHERE
clauses (for ordinary selected data) and HAVING
clauses (for aggregated data from GROUP BY
operations) that allows you to filter results based on aggregated data from window functions. Alisa Aylward’s post How the Snowflake Qualify Clause Changed My Life includes this useful example for finding all rows with duplicate IDs:
SELECT * FROM cat_table -- quoting Alisa Aylward
QUALIFY COUNT(*) OVER (PARTITION BY cat_id) > 1;