How to write a SQL query with no duplicates correctly ?
There are two tables. Table 1 CUSTOMERS and table 2 ORDERS
CUSTOMERS (ID, FIRSTNAME, LASTNAME, ADDRESS);
ORDERS (ID, PRODUCT_NAME, PRODUCT_PRICE, DATE_ORDER, ID_CUSTOMER, AMOUNT);
ID in CUSTOMERS table is a primary key, and ORDERS table has ID_CUSTOMER a foreign key How should I write a query to show the ID of a customers without duplicates, who have ordered 'Apple MacBook Air 13' ?
CUSTOMERS (ID, FIRSTNAME, LASTNAME, ADDRESS);
ORDERS (ID, PRODUCT_NAME, PRODUCT_PRICE, DATE_ORDER, ID_CUSTOMER, AMOUNT);
SELECT DISTINCT CUSTOMERS.ID, ORDERS.PRODUCT_NAME
FROM CUSTOMERS
INNER JOIN ORDERS
ON CUSTOMERS.ID = ORDERS.ID_CUSTOMERS
WHERE PRODUCT_NAME = ‘Apple MacBook Air 13’;
I have written a query but it seems unclear whether to use DISTINCT operator with INNER JOIN. Is it possible to have DISTINCT and joins ?
DISTINCT
is a meant to remove duplicates. This is useful sometimes. But it is also often an indicator for a badly written query, because: why are there duplicate rows in the result in the first place?
As to your query: Why are you joining the customer table? It doesn't provide anything we don't get from the orders table already. Your query can be shortened to:
SELECT DISTINCT id_customers, 'Apple MacBook Air 13'
FROM orders
WHERE product_name = 'Apple MacBook Air 13';
Or, as it's only about one product anyway:
SELECT DISTINCT id_customers
FROM orders
WHERE product_name = 'Apple MacBook Air 13';
We are using DISTINCT
here, because a customer can order a product multiple times and would hence show up in the results multiple times, didn't we use DISTINCT
. However, we read all 'Apple MacBook Air 13' rows, only to remove some or even many of them in the end.
We can write the query differently to avoid that. We start with the customers table and show those IDs / rows where we find at least one order on 'Apple MacBook Air 13'.
SELECT id
FROM customers
WHERE id IN
(
SELECT id_customers
FROM orders
WHERE product_name = 'Apple MacBook Air 13'
);
or
SELECT id
FROM customers c
WHERE EXISTS
(
SELECT *
FROM orders o
WHERE o.id_customers = c.id
AND o.product_name = 'Apple MacBook Air 13'
);
In these queries we only select from the customers table (the ID here, but we could just as well show the customer's name etc.) and use the orders table for lookup. Once the DBMS finds an 'Apple MacBook Air 13' order for acustomer, it doesn't have to look for more such orders for the same customer. That can save quite some time. And we only get each customer once and don't have to apply DISTINCT
(for which all result rows would have to be sorted and compared). The queries are also very readable (albeit a tad longer), because we clearly separate what table we are selecting rows from (FROM
clause) and what the search conditions are (WHERE
clause).