Search code examples
mysqlsqlentity-attribute-valuerelational-division

Search a table based on multiple rows in another table


Basically I have three MySQL tables:

Users - contains base information on users
Fields - describes additional fields for said users (e.g. location, dob etc.)
Data - Contains user data described via links to the fields table

With the basic design as follows (the below is a stripped down version)

Users:

 ID | username | password | email | registered_date

Fields

 ID | name | type

Data:

 ID | User_ID | Field_ID | value

what I want to do is search Users by the values for the fields they have, e.g. example fields might be:

Full Name
Town/City
Postcode
etc.

I've got the following, which works when you're only wanting to search by one field:

SELECT `users`.`ID`,
       `users`.`username`,
       `users`.`email`,
       `data`.`value`,
       `fields`.`name`

FROM `users`,
     `fields`,
     `data`

WHERE `data`.`Field_ID` = '2'
AND `data`.`value` LIKE 'london'
AND `users`.`ID` = `data`.`User_ID`
AND `data`.`Field_ID` = `fields`.`ID`

GROUP BY `users`.`ID`

But what about if you want to search for Multiple fields? e.g. say I want to search for Full Name "Joe Bloggs" With Town/City set to "London"? This is the real sticking point for me.

Is something like this possible with MySQL?


Solution

  • I'm going with the assumption that "searching multiple fields" is talking about the Entity-Attribute-Value structure.

    In that case, I propose that the first step is to create a derived query - basically, we want to limit the "EAV data joined" to only include the records that have the values we are interested in finding. (I've altered some column names, but the same premise holds.)

    SELECT d.userId
    FROM data d
    JOIN fields f
      ON f.fieldId = d.fieldId
    -- now that we establish data/field relation, filter rows
    WHERE f.type = "location" AND d.value = "london"
       OR f.type = "job" AND d.value = "programmer"
    

    This resulting rows are derived from the filtered EAV triplets that match our conditions. Only the userId is selected in this case (as it will be used to join against the user relation), but it is also possible to push fieldId/value/etc through.

    Then we can use all of this as a derived query:

    SELECT * 
    FROM users u
    JOIN (
      -- look, just goes in here :)
      SELECT DISTINCT d.userId
      FROM data d
      JOIN fields f
        ON f.fieldId = d.fieldId
      WHERE f.type = "location" AND d.value = "london"
         OR f.type = "job" AND d.value = "programmer"
    ) AS e
    ON e.userId = u.userId
    

    Notes:

    1. The query planner will figure all the RA stuff out peachy keen; don't worry about this "nesting" as there is no dependent subquery.
    2. I avoid the use of implicit cross-joins as I feel they muddle most queries, this case being a particularly good example.
    3. I've "cheated" and added a DISTINCT to the derived query. This will ensure that at most one record will be joined/returned per user and avoids the use of GROUP BY.

    While the above gets "OR" semantics well (it's both easier and I may have misread the question), modifications are required to get "AND" semantics. Here are some ways that the derived query can be written to get such. (And at this point I must apologize to Tony - I forget that I've already done all the plumbing to generate such queries trivially in my environment.)

    Count the number of matches to ensure that all rows match. This will only work if each entity is unique per user. It also eliminates the need for DISTINCT to maintain correct multiplicity.

    SELECT d.userId
    FROM data d
    JOIN fields f
      ON f.fieldId = d.fieldId
    -- now that we establish data/field relation, filter rows
    WHERE f.type = "location" AND d.value = "london"
       OR f.type = "job" AND d.value = "programmer"
    GROUP BY d.userId
    HAVING COUNT(*) = 2
    

    Find the intersecting matches:

    SELECT d.userId
    FROM data d
    JOIN fields f ON f.fieldId = d.fieldId  
    WHERE f.type = "location" AND d.value = "london"
    INTERSECT
    SELECT d.userId
    FROM data d
    JOIN fields f ON f.fieldId = d.fieldId  
    WHERE f.type = "job" AND d.value = "programmer"
    

    Using JOINS (see Tony's answer).

    SELECT d1.userId
    FROM data d1
    JOIN data d2   ON d2.userId = d1.userId
    JOIN fields f1 ON f1.fieldId = d1.fieldId
    JOIN fields f2 ON f2.fieldId = d2.fieldId
    -- requires AND here across row
    WHERE f1.type = "location" AND d1.value = "london"
      AND f2.type = "job"      AND d2.value = "programmer"
    

    An inner JOIN itself provides conjunction semantics when applied outside of the condition. In this case I show "re-normalize" the data. This can also be written such that [sub-]selects appear in the select clause.

    SELECT userId
    FROM (
      -- renormalize, many SO questions on this
      SELECT q1.userId, q1.value as location, q2.value as job
      FROM (SELECT d.userId, d.value
         FROM data d
         JOIN fields f ON f.fieldId = d.fieldId
         WHERE f.type = "location") AS q1
      JOIN (SELECT d.userId, d.value
         FROM data d
         JOIN fields f ON f.fieldId = d.fieldId
         WHERE f.type = "job") AS q2
      ON q1.userId = q2.userId
    ) AS q
    WHERE location = "london"
      AND job = "programmer"
    

    The above duplicity is relatively easy to generate via code and some databases (such as SQL Server) support CTEs which make writing such much simpler. YMMV.