Search code examples
sqlpostgresqllogic

SQL Query to return a table of specific matching values based on a criteria


I have 3 tables in PostgreSQL database:

person (id, first_name, last_name, age)
interest (id, title, person_id REFERENCES person)
location (id, city, state text NOT NULL, country, person_id REFERENCES person)

city can be null, but state and country cannot. A person can have many interests but only one location. My challenge is to return a table of people who share the same interest and location.

All ID's are serialized and thus created automatically. Let's say I have 4 people living in "TX", they each have two interests a piece, BUT only person 1 and 3 share a similar interest, lets say "Guns" (cause its Texas after all). I need to select all people from person table where the person's interest title (because the id is auto generated, two Guns interest would result in two different ID keys) equals that of another persons interest title AND the city or state is also equal.

I was looking at the answer to this question here Select Rows with matching columns from SQL Server and I feel like the logic is sort of similar to my question, the difference is he has two tables, to join together where I have three.


Solution

  • return a table of people who share the same interest and location.

    I'll interpret this as "all rows from table person where another rows exists that shares at least one matching row in interest and a matching row in location. No particular order."

    A simple solution with a window function in a subquery:

    SELECT p.*
    FROM  (
       SELECT person_id AS id, i.title, l.city, l.state, l.country
            , count(*) OVER (PARTITION BY i.title, l.city, l.state, l.country) AS ct
       FROM   interest i
       JOIN   location l USING (person_id)
       ) x
    JOIN   person p USING (id)
    WHERE  x.ct > 1;
    

    This treats NULL values as "equal". (You did not specify clearly.)

    Depending on undisclosed cardinalities, there may be faster query styles. (Like reducing to duplicative interests and / or locations first.)

    Asides 1:

    It's almost always better to have a column birthday (or year_of_birth) than age, which starts to bit-rot immediately.

    Asides 2:

    A person can have [...] only one location.

    You might at least add a UNIQUE constraint on location.person_id to enforce that. (If you cannot make it the PK or just append location columns to the person table.)