With code examples on how to use them
If you’re a new Redshift user, you may find that the SQL syntax varies from the SQL you’ve written within other data warehouses.
Each data warehouse has its own flavor of SQL and Redshift is no exception.
At first, it can be frustrating to discover that your favorite functions do not exist. However, there are a lot of great Redshift functions that you can take advantage of in your code.
In this article, I will walk you through the most helpful Redshift functions I’ve discovered in my work. Each function includes a definition and code example of how to use it.
PIVOT is a function that’s built into Redshift that allows you, well, to pivot your data. What do I mean by this? Pivoting allows you to reshape your data where the values in rows become columns or values in columns become rows.
PIVOT can help you:
- count values in a column
- aggregate row values
- derive boolean fields based on column or row values
I recently used PIVOT in Redshift to find whether different pages were active or not for each user. To do this, I needed to PIVOT the page_typefield and use the user_id field to group the data.
I set a condition within the PIVOT function to COUNT(*) for each of the different page types, as each user could only have one of each type.
Keep in mind that if a user can have multiple of each page type then using COUNT to return a boolean will not work.
The code looked like this:
SELECTid, has_homepage::boolean, has_contacts_page::boolean, has_about_page::booleanFROM (SELECT id, page_type FROM user_pages WHERE is_active) PIVOT(COUNT(*) FOR page_type IN (‘home’ AS has_homepage, ‘contact’ AS has_contact_page, ‘about’ AS has_about_page))
Without the use of PIVOT, I would have had to create a separate CTE for each page_type and then JOIN all of these together in the final CTE. Using PIVOT made my code much more clear and concise.