Get the most recent transaction from each customer with SQL

David Hamrick

Note: Don’t do it this way - it’s very slow! I benchmarked different approaches here

I recently was trying to solve a problem and needed to write a SQL query to solve it. The answer wasn’t immediately obvious to me so I wanted to share my solution. Partially so the next time I searched for it I would hopefully find my own answer.

I wanted to send customers an email one year after they purchased our software, VueScan But only if they had not purchased any other product since then. So really what I needed was a table with the most recent purchase a customer had made. I knew I could use a view or CTE for that part, but I wasn’t initially sure how to get the last purchase a customer had made.

Here is an example csv file that we will load into a database

id,purchase_date,email,sku
1,2022-01-01,customer1@test.com,Standard Edition
2,2022-01-02,customer2@test.com,Professional Edition
3,2022-01-03,customer3@test.com,Basic Edition
4,2022-01-04,customer1@test.com,Professional Edition

Here is how you can import that into a sqlite table

bash> sqlite3 test.db

sqlite> .import --csv test.csv customers
sqlite> select * from customers;

1|2022-01-01|customer1@test.com|Standard Edition
2|2022-01-02|customer2@test.com|Professional Edition
3|2022-01-03|customer3@test.com|Basic Edition
4|2022-01-04|customer1@test.com|Professional Edition

Since we want to send an email 1 year after their first purchase only, we need to create a query that only selects their last purchase. We can do this first by grouping by email, but we can’t just group by the email.

SELECT
    purchase_date, email, sku
FROM customers c
WHERE
    id=(SELECT id FROM customers WHERE email=c.email ORDER BY purchase_date DESC LIMIT 1)
ORDER BY purchase_date;

2022-01-02|customer2@test.com|Professional Edition
2022-01-03|customer3@test.com|Basic Edition
2022-01-04|customer1@test.com|Professional Edition

This gave the result that I was looking for. It correctly filtered out the duplicate orders from customer1@test.com and returned the more recent one. This query uses a SQL feature called a correlated subquery. Depending on the database, for each row in the outer query (the initial select clause) the sql engine will run the subquery and filter out the results. As a result, it can be an inefficient way of accomplishing this. However, if you are just doing a quick and dirty ad-hoc query, this is a nice way of doing it. You might

And for completeness, even though it isn’t needed for this example and won’t work because we only have sample data for a few days, I’ll add the part of the query to select for one year ago.

SELECT
    purchase_date, email, sku
FROM customers c
WHERE
    id=(SELECT id FROM customers WHERE email=c.email ORDER BY purchase_date DESC LIMIT 1)
    AND date(purchase_date) == date('now','-1 year')
ORDER BY purchase_date;

2022-01-02|customer2@test.com|Professional Edition
2022-01-03|customer3@test.com|Basic Edition
2022-01-04|customer1@test.com|Professional Edition