Search code examples
sqlpostgresqlrelational-division

Selecting all rows for names that satisfy condition


I always come across these kinds of SQL problems and find it confusing what to do. For example, suppose there is a table of : names, year, food.

`CREATE TABLE t (
    id SERIAL PRIMARY KEY,
    name VARCHAR(50),
    food VARCHAR(50)
);
INSERT INTO t (name, food) VALUES
('john', 'pizza'),
('john', 'cake'),
('andrew', 'pizza'),
('andrew', 'pizza'),
('andrew', 'pizza'),
('matt', 'pizza'),
('matt', 'pizza'),
('matt', 'burger'),
('david', 'cake'),
('david', 'pizza'),
('david', 'pizza'),
('elen', 'cake'),
('elen', 'pizza'),
('elen', 'donuts'),
('claire', 'cake'),
('claire', 'donuts'),
('claire', 'tacos'),
('john', 'pizza'),
('john', 'cake'),
('matt', 'apples'),
('matt', 'tacos');

` If a name has some rows that meet a certain condition, I want to select all rows and all columns for that name (even the rows that for that name that don't directly match). For example:

  • Problem 1 Select all rows/columns for names that ONLY like pizza.

Types of rows that would be selected:

 john : pizza ;
 andrew: pizza, pizza, pizza 

Types of rows that would not be selected:

 matt: pizza, pizza, burger
  • Problem 2: Select all rows/columns for names that ONLY like pizza and cake

Types of rows that would be selected:

 john: cake, pizza ;
 david: cake, pizza, pizza

Types of rows that would not be selected:

elen : cake, pizza, donuts
  • Problem 3: Select all rows/columns for names that DONT like pizza at all

Types of rows that would be selected:

claire: cake, donuts, tacos

Types of rows that would be not be selected:

 john: pizza, cake 
  • Problem 4: Select all rows/columns for names that DONT like pizza and cake at all

Types of rows that would be selected:

matt : apples, tacos

Types of rows that would not be selected:

john : cake, apples, donuts;
elen: pizza, chocolate;
tyler: pizza, cake;

I never how to write the query for these problems.

For example:

Problem 1:

SELECT *
FROM t
WHERE name IN (
  SELECT name
  FROM t
  WHERE food IN ('pizza')
  GROUP BY name
  HAVING COUNT(DISTINCT food) = 1 AND MIN(CASE WHEN food NOT IN ('pizza') THEN 1 ELSE 0 END) = 0
);

Problem 2:

SELECT *
FROM t
WHERE name IN (
  SELECT name
  FROM t
  WHERE food IN ('pizza', 'cake')
  GROUP BY name
  HAVING COUNT(DISTINCT food) <= 2 AND MIN(CASE WHEN food NOT IN ('pizza', 'cake') THEN 1 ELSE 0 END) = 0
);

Problem 3:

SELECT *
FROM t
WHERE name NOT IN (
  SELECT name
  FROM t
  WHERE food IN ('pizza')
);

Problem 4:

SELECT *
FROM t
WHERE name NOT IN (
  SELECT name
  FROM t
  WHERE food IN ('pizza', 'cake')
);

I get so confused when writing these SQL statements, thinking about which conditions are needed to correctly identify the logical conditions. Is there a way to simplify the logic for these types of queries?


Solution

  • I think queries 3 and 4 are perfect like that, and what you need to do in 1 and 2 is try to think more in terms of sets, like in 3 and 4, instead of trying to find conditions based on counts of rows.
    Here's how I would do them

    Problem 1:

    Basically select all the names with food pizza, and that are not included in the set of names with any other food. If I understood you correctly we should only get andrew here (because jonh also liked cake)

    select * 
    from t 
    where food = 'pizza' 
    and not exists (
    select 1 from t tnp where tnp.food <> 'pizza' and tnp.name = t.name
    );
    

    Problem 2:

    Select all names that are included in the set of names that liked pizza and/or cake, and that are not included in the set of names that liked something different than cake/pizza

    select * 
    from t 
    where 
    exists (select 1 from t tpc where tpc.food in ('pizza', 'cake') and tpc.name = t.name) 
    and not exists (select 1 from t tot where tot.food not in ('pizza', 'cake') and tot.name = t.name);
    

    Be careful in problem 2, the query above is for names that liked pizza AND/OR cake, and nothing else. If you wanted to get names that liked pizza AND cake and nothing else (eliminating andrew, who liked pizza but not cake), then you would have to add the condition that both pizza and cake should exist in the foods:

    select * 
    from t 
    where 
    exists (select 1 from t tp where tp.food in ('pizza') and tp.name = t.name) 
    and exists (select 1 from t tc where food in ('cake') and tc.name = t.name) 
    and not exists (select 1 from t tot where tot.food not in ('pizza', 'cake') and tot.name = t.name);
    

    If you are not used to the exists/not exists operators you can easily change the queries to use in/not in instead:

    Problem 1:

    select * 
    from t where food = 'pizza' 
    and name not in (select name from t tnp where tnp.food <> 'pizza');
    

    Problem 2:

    select * 
    from t 
    where 
    name in (select name from t tpc where tpc.food in ('pizza', 'cake')) 
    and name not in (select name from t tot where tot.food not in ('pizza', 'cake'));