As the popularity of ClickHouse increased over the years, more and more people know about the core features of ClickHouse like its incredible performance, its high compression ratio or the huge capabilities of reading and writing nearly all data formats there are. But ClickHouse also has a lot of hidden gems, which can help in your day to day work, a lot of the people don’t know about. Even so, most of them are well documented in the official Documentation, if you don’t know you are looking for them, you will not find them. In this blog post, I’ll highlight some of my favorite small features that more people should know about.
In column based DBMS like ClickHouse, queries like SELECT * FROM table should generally be avoided. At least that’s true when it comes to regular queries issued by your application. The typical day to day work of a DBA or database developer however often includes these types of queries, as otherwise it would take a huge amount of time to type down all X columns of a table manually. But what if you want to run a lot of operations on some columns? Imagine you have the following table:
“`html
CREATE TABLE customer (
customerId UInt64,
custom_num_1 UInt64,
custom_num_2 UInt64,
… custom_num_50 UInt64,
custom_string_1 String,
custom_string_2 String,
… custom_string_50 String
);
“`
When optimizing your table, you might be interested in the average length of the String columns in your table, as well as the maximum value of your number columns. If you want to handwrite a query to collect this information, you would have to write the correct function for 100 columns this way. In some databases (like MySQL but also ClickHouse) you can utilize the INFORMATION_SCHEMA.COLUMNS table to build your query. This can be convenient for a lot of DBAs as they might already be used to this, but ClickHouse provides an even faster way to achieve your goal: Select Modifiers
Utilizing a combination of modifiers, our task comes down to a simple query:
“`html
SELECT
COLUMNS(‘.*num.*’) APPLY max,
COLUMNS(‘.*string.*’) APPLY length APPLY max
FROM customer
“`
We are utilizing the COLUMNS modifier to apply a regex to the column names to only get columns with names num or string in it, and on all those columns we apply the function max if it has been in the set of number columns, or we first apply the function length and afterwards the function max. This gives us the wanted results, and takes you way less time than building a query via the information schema, or writing down 100 columns manually.
Source link