3 Swaps to Improve Your SQL Query Performance

SQL

Jan 16

With SQL Code Examples

Whether you’re a data analyst, data engineer, data scientist, or analytics engineer, it’s imperative that you write clean, fast code. You shouldn’t be the only one that can understand what you are writing. It needs to be readable by others on your team so that it can be used and improved for years to come. It also needs to be optimized for query performance.

One of my biggest annoyances is when code takes forever to run because someone didn’t use the most efficient functions. This is a constant struggle in my current role where I am rewriting all of our core data models.

Believe it or not, there are good and bad ways of writing SQL code. Just because it runs and gets you the result you’re looking for, does NOT mean it is well-written code. It needs to be accurate, readable AND fast to be considered good code.

In this article I include SQL code examples, including the queries and their answers, as well as explanation of the important SQL commands used in them. We’ll go over the SQL row number function, SQL subqueries and CTEs, and the SQL between function. These swaps are guaranteed to improve the query performance of your SQL code.

SQL ROW NUMBER INSTEAD OF TOP

While rewriting old data models, I often came across subqueries using the SQL function TOP. While this may get you the solution you are looking for, it is hard to read and can take a long time to run.

Here’s a piece of example code highlighting what I found:

SELECT
   customer_id,   
   (
   SELECT TOP 
       order_id 
   FROM orders 
   WHERE orders.customer_id = customers.customer_id 
   ORDER BY date DESC
   )
FROM customers 

As engineers and analysts, we want to make sure we are writing the cleanest and most efficient code possible. So, if you’re using TOP, I recommend using the SQL row number function instead.

ROW_NUMBER() is a window function that allows you to assign a sequence number to each row that meets a certain condition. These conditions are specified using ORDER BY and PARTITION BY.

ORDER BY specifies how you wish to order the values in your table. A date is typically used for this.

PARTITION BY specifies the column you want to group your values by. This depends on the purpose of your query.

With this example, we would want to ORDER BY date and PARTITION BY customer_id. Since we are trying to find the most recent order for each customer, partitioning by the customer’s id will count a new sequence for each unique id.

SELECT    
    customer_id,    
    order_id,   
    ROW_NUMBER() OVER(PARTITION BY customer_id ORDER BY date) AS order_sequence_number 
FROM orders

Now, this will result in a table with the same number of rows as the original table, however, this time there is a new column order_sequence_number that indicates the order sequence for each customer. In order to get a customer’s first order, we need to write another query that chooses only the sequence numbers that are equal to 1.

WITH

order_row_number_calculated AS (   
    SELECT       
        customer_id,       
        order_id,      
        ROW_NUMBER() OVER(PARTITION BY customer_id ORDER BY date) AS order_sequence_number    
    FROM orders)

SELECT
   customer_id,   
   order_id 
FROM order_row_number_calculated
WHERE order_sequence_number = 1

Now we have a CTE which finds the row number of each row in the first query and then filters that query by looking for only the row numbers that are first in their grouping. This is much easier for anyone to read and will typically run faster.

Subqueries → CTEs

That last SQL subquery example shown above can also be used for this swap! If you look above, we start with a subquery within a query and then finish by using CTEs.

What exactly is a CTE? CTE stands for common table expression. It creates a temporary set of results that you can use in your proceeding queries. It requires that you start it with WITH and use table_name AS before each individual CTE. Your last CTE in the sequence should be a simple SELECT statement without a table_name nor AS.

Here’s a SQL CTE example: what it should look like:

WITH money_summed AS (   SELECT       customer_id,      date,      SUM(balance) AS balance,      SUM(debt) AS debt   FROM bank_balance    GROUP BY customer_id, date),money_available_calculated AS (   SELECT       customer_id,      date,      (balance - debt) AS money_available    FROM money_summed    GROUP BY customer_id, date)SELECT   customer_id,   money_available FROM money_available_calculated WHERE date = '2022-01-01'

Notice that before the last SELECT statement that we don’t use a table name or set of parentheses. There is also no comma after the previous set of parenthesis. The last query in a series of CTEs is always written as a normal query. Be sure to remember this because errors due to formatting are common when using CTEs!

SQL BETWEEN FUNCTION INSTEAD OF > or <

It is very common to see two tables being joined together using a date column. Most of the time, when I do see this, they are being joined by using ≥ and ≤. While this works, it looks messy. Luckily, there is a better way to do this!

BETWEEN is a SQL function that allows you to select a range between two given values. It returns the values between the values you feed it. It can be used with numbers, dates, and even strings.

However, the most important thing to note is that this SQL between function is inclusive. This means that the values specified will also be included in your result. So, it can only really replace ≤ or ≥ rather than < or >.

First, let’s look at a query that uses ≤ and ≥ signs to join two tables using a date.

SELECT   customers.customer_id,   customers.activated_at,    campaigns.campaign_idFROM customersLEFT JOIN campaigns ON customers.campaign_id = campaigns.campaign_id WHERE customers.activated_at >= campaigns.started_at AND customers.activated_at <= campaigns.ended_at

Within the WHERE clause, we need to specify two different statements using activated_at. One where this column is ≥ started_at and another where it is ≤ ended_at.

Now, let’s use the BETWEEN function.

SELECT   customers.customer_id,   customers.activated_at,    campaigns.campaign_idFROM customersLEFT JOIN campaigns ON customers.campaign_id = campaigns.campaign_id WHERE customers.activated_at BETWEEN campaigns.started_at AND campaigns.ended_at

Here, we only have to specify activated_at once rather than twice.

Which query is easier to read and understand? While the first may not be too complicated, it is small differences in your code that make a huge difference. It’s what separates the good from the great.

Conclusion

When first learning SQL, you want to focus on being able to solve the problems given to you and get the right answer. As you develop your skills and become more confident, you need to work on improving how you’re finding the solutions. This includes switching out less efficient functions for more efficient ones like SQL CTEs, the SQL row number function, and between SQL function.

You are no longer just writing working code, you are writing code to optimize your SQL query performance. You want your models to run as fast as possible!

Not only this, every engineer and analyst want their teammates to be able to read their code and easily understand it. Not only will you hate having to answer a million questions, but they’ll also hate having to ask you about it. Work on making your SQL code efficient and readable. If there’s a SQL function that acts as a shortcut, use it!

Also, don’t forget the importance of documentation. Commenting your code can make someone else’s job a lot easier and save the need for questions. If you think something you wrote may be hard to understand, explain what it means! I’m a firm believer that code can never have too many comments.

sqlimprove query performancesql functionsdatacodinganalytics engineeringanalytics

Madison Schott

A software engineer and data enthusiast, my services focus on health and wellness brands. I cater to food brands, health practitioners, fitness studios, and wellness services. I will provide you design, website, and business expertise to attract fellow gurus in this space to your product or service.