Search code examples
mysqlsqldistinct-values

How to write a SQL query with no duplicates


How to write a SQL query with no duplicates correctly ?

There are two tables. Table 1 CUSTOMERS and table 2 ORDERS

CUSTOMERS (ID, FIRSTNAME, LASTNAME, ADDRESS); 
ORDERS (ID, PRODUCT_NAME, PRODUCT_PRICE, DATE_ORDER, ID_CUSTOMER, AMOUNT);

ID in CUSTOMERS table is a primary key, and ORDERS table has ID_CUSTOMER a foreign key How should I write a query to show the ID of a customers without duplicates, who have ordered 'Apple MacBook Air 13' ?

 CUSTOMERS (ID, FIRSTNAME, LASTNAME, ADDRESS); 
 ORDERS (ID, PRODUCT_NAME, PRODUCT_PRICE, DATE_ORDER, ID_CUSTOMER, AMOUNT); 

SELECT DISTINCT CUSTOMERS.ID, ORDERS.PRODUCT_NAME 
FROM CUSTOMERS 
INNER JOIN ORDERS 
ON CUSTOMERS.ID = ORDERS.ID_CUSTOMERS
WHERE PRODUCT_NAME = ‘Apple MacBook Air 13’; 

I have written a query but it seems unclear whether to use DISTINCT operator with INNER JOIN. Is it possible to have DISTINCT and joins ?


Solution

  • DISTINCT is a meant to remove duplicates. This is useful sometimes. But it is also often an indicator for a badly written query, because: why are there duplicate rows in the result in the first place?

    As to your query: Why are you joining the customer table? It doesn't provide anything we don't get from the orders table already. Your query can be shortened to:

    SELECT DISTINCT id_customers, 'Apple MacBook Air 13'
    FROM orders 
    WHERE product_name = 'Apple MacBook Air 13';
    

    Or, as it's only about one product anyway:

    SELECT DISTINCT id_customers
    FROM orders 
    WHERE product_name = 'Apple MacBook Air 13';
    

    We are using DISTINCT here, because a customer can order a product multiple times and would hence show up in the results multiple times, didn't we use DISTINCT. However, we read all 'Apple MacBook Air 13' rows, only to remove some or even many of them in the end.

    We can write the query differently to avoid that. We start with the customers table and show those IDs / rows where we find at least one order on 'Apple MacBook Air 13'.

    SELECT id
    FROM customers
    WHERE id IN
    (
      SELECT id_customers
      FROM orders
      WHERE product_name = 'Apple MacBook Air 13'
    );
    

    or

    SELECT id
    FROM customers c
    WHERE EXISTS
    (
      SELECT *
      FROM orders o
      WHERE o.id_customers = c.id
      AND o.product_name = 'Apple MacBook Air 13'
    );
    

    In these queries we only select from the customers table (the ID here, but we could just as well show the customer's name etc.) and use the orders table for lookup. Once the DBMS finds an 'Apple MacBook Air 13' order for acustomer, it doesn't have to look for more such orders for the same customer. That can save quite some time. And we only get each customer once and don't have to apply DISTINCT (for which all result rows would have to be sorted and compared). The queries are also very readable (albeit a tad longer), because we clearly separate what table we are selecting rows from (FROM clause) and what the search conditions are (WHERE clause).