Get the most recent transaction from each customer with SQL
Note: Don’t do it this way - it’s very slow! I benchmarked different approaches here
I recently was trying to solve a problem and needed to write a SQL query to solve it. The answer wasn’t immediately obvious to me so I wanted to share my solution. Partially so the next time I searched for it I would hopefully find my own answer.
I wanted to send customers an email one year after they purchased our software, VueScan But only if they had not purchased any other product since then. So really what I needed was a table with the most recent purchase a customer had made. I knew I could use a view or CTE for that part, but I wasn’t initially sure how to get the last purchase a customer had made.
Here is an example csv file that we will load into a database
id,purchase_date,email,sku
1,2022-01-01,customer1@test.com,Standard Edition
2,2022-01-02,customer2@test.com,Professional Edition
3,2022-01-03,customer3@test.com,Basic Edition
4,2022-01-04,customer1@test.com,Professional Edition
Here is how you can import that into a sqlite table
bash> sqlite3 test.db
sqlite> .import --csv test.csv customers
sqlite> select * from customers;
1|2022-01-01|customer1@test.com|Standard Edition
2|2022-01-02|customer2@test.com|Professional Edition
3|2022-01-03|customer3@test.com|Basic Edition
4|2022-01-04|customer1@test.com|Professional Edition
Since we want to send an email 1 year after their first purchase only, we need to create a query that only selects their last purchase. We can do this first by grouping by email, but we can’t just group by the email.
SELECT
purchase_date, email, sku
FROM customers c
WHERE
id=(SELECT id FROM customers WHERE email=c.email ORDER BY purchase_date DESC LIMIT 1)
ORDER BY purchase_date;
2022-01-02|customer2@test.com|Professional Edition
2022-01-03|customer3@test.com|Basic Edition
2022-01-04|customer1@test.com|Professional Edition
This gave the result that I was looking for. It correctly filtered out the duplicate orders from customer1@test.com
and returned the more recent one. This query uses a SQL feature called a correlated subquery. Depending on the database, for each row in the outer query (the initial select clause) the sql engine will run the subquery and filter out the results. As a result, it can be an inefficient way of accomplishing this. However, if you are just doing a quick and dirty ad-hoc query, this is a nice way of doing it. You might
And for completeness, even though it isn’t needed for this example and won’t work because we only have sample data for a few days, I’ll add the part of the query to select for one year ago.
SELECT
purchase_date, email, sku
FROM customers c
WHERE
id=(SELECT id FROM customers WHERE email=c.email ORDER BY purchase_date DESC LIMIT 1)
AND date(purchase_date) == date('now','-1 year')
ORDER BY purchase_date;
2022-01-02|customer2@test.com|Professional Edition
2022-01-03|customer3@test.com|Basic Edition
2022-01-04|customer1@test.com|Professional Edition