[Postgres] - DISTINCT ON in Postgres
Truong Bui
Published on:
•
Last updated:
In PostgreSQL, the
DISTINCT ON
expression is a very useful feature that allows us to select
distinct rows based on a certain column or set of columns.
Let's consider a simple example. Suppose we have a table orders
with the following data:
File name: orders.sql
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INT,
order_date DATE,
amount DECIMAL
);
INSERT INTO orders (customer_id, order_date, amount)
VALUES (1, '2023-01-01', 100),
(1, '2023-01-02', 200),
(2, '2023-01-01', 150),
(2, '2023-01-03', 300);
Now, if we want to get the latest order of each customer, we can use the
DISTINCT ON
expression as follows:
SELECT DISTINCT ON (customer_id) customer_id, order_date, amount
FROM orders
ORDER BY customer_id, order_date DESC;
This query will return the latest order of each customer. The
DISTINCT ON
expression is used in conjunction with the
ORDER BY
clause to determine which row among the duplicates should be returned.
customer_id
|
order_date
|
amount
|
---|---|---|
1 |
2023-01-02 |
200 |
2 |
2023-01-03 |
300 |
If we use DISTINCT
only
SELECT customer_id, order_date, amount
FROM orders
ORDER BY customer_id, order_date DESC;
This query will return all distinct rows in the table, not just the latest order of each customer.
customer_id
|
order_date
|
amount
|
---|---|---|
1 |
2023-01-02 |
200 |
1 |
2023-01-01 |
100 |
2 |
2023-01-03 |
300 |
2 |
2023-01-01 |
150 |
WARNING:
Be aware that the
DISTINCT ON
expression is a PostgreSQL-specific feature and may not be available in other SQL databases.
Hope you find this article useful. If you have any questions, please let me know in the comment section below. 👇