Search code examples
mysqlsqlentity-attribute-value

SQL - querying Entity-Attribute-Value (EAV) with missing attributes


I have database that is equivalent to following table:

id |  foo |  bar
---+------+-----
 1 |    5 |    6
 2 |    7 | NULL

but, unfortunately, implemented as Entity-Attribute-Value:

CREATE TABLE obj(id INTEGER NOT NULL PRIMARY KEY);
CREATE TABLE attrdef(id INTEGER NOT NULL PRIMARY KEY, name VARCHAR(4));
CREATE TABLE attr(obj_id INTEGER NOT NULL, attrdef_id INTEGER NOT NULL, value INTEGER NOT NULL);

INSERT INTO obj VALUES(1);
INSERT INTO obj VALUES(2);

INSERT INTO attrdef VALUES(3, 'foo');
INSERT INTO attrdef VALUES(4, 'bar');

INSERT INTO attr VALUES(1,3,5);
INSERT INTO attr VALUES(1,4,6);
INSERT INTO attr VALUES(2,3,7);

I need to query that database to get data in "proper" form - like in example table. I have tried:

SELECT obj.id, foo.value, bar.value
FROM obj 
LEFT JOIN attr foo ON (obj.id = foo.obj_id)
LEFT JOIN attrdef foo_def ON (foo.attrdef_id = foo_def.id)
LEFT JOIN attr bar ON (obj.id = bar.obj_id)
LEFT JOIN attrdef bar_def ON (bar.attrdef_id = bar_def.id)
WHERE foo_def.name = 'foo' AND bar_def.name = 'bar';

but the second row is missing:

id |  foo |  bar
---+------+-----
 1 |    5 |    6

and

SELECT obj.id,
    MAX(CASE WHEN name='foo' THEN value ELSE NULL END) foo,
    MAX(CASE WHEN name='bar' THEN value ELSE NULL END) bar
FROM obj LEFT JOIN attr ON (obj.id = attr.obj_id)
LEFT JOIN attrdef ON (attr.attrdef_id = attrdef.id)
GROUP BY obj.id;

which gives correct result:

id |  foo |  bar
---+------+-----
 1 |    5 |    6
 2 |    7 | NULL

but performance of this query is unacceptable.

I'd like standard SQL query, but good MySQL-specific solution would be appreciated.


Solution

  • You just need to move the conditions to the on clauses:

    SELECT obj.id, foo.value, bar.value
    FROM obj LEFT JOIN
         attr foo
         ON obj.id = foo.obj_id LEFT JOIN
         attrdef foo_def
         ON foo.attrdef_id = foo_def.id AND foo_def.name = 'foo' LEFT JOIN
         attr bar
         ON obj.id = bar.obj_id LEFT JOIN
         attrdef bar_def
         ON bar.attrdef_id = bar_def.id AND bar_def.name = 'bar';
    

    For the aggregation method, I would go with:

    SELECT obj.id,
           MAX(CASE WHEN name = 'foo' THEN value END) foo,
           MAX(CASE WHEN name = 'bar' THEN value END) bar
    FROM obj LEFT JOIN
         attr
         ON obj.id = attr.obj_id LEFT JOIN
         attrdef
         ON attr.attrdef_id = attrdef.id
    WHERE name IN ('foo', 'bar')
    GROUP BY obj.id;
    

    The left joins might not be necessary in this case (depending on the distribution of missing values). In any case, if you start looking at more and more attributes, the JOIN method takes longer and longer. The GROUP BY method has about the same performance.

    EDIT:

    The correct query is:

    SELECT obj.id, foo.value, bar.value
    FROM obj LEFT JOIN
         (attr foo JOIN
          attrdef foo_def
          ON foo.attrdef_id = foo_def.id AND foo_def.name = 'foo'
         )
         ON obj.id = foo.obj_id LEFT JOIN
         (attr bar JOIN
          attrdef bar_def
          ON bar.attrdef_id = bar_def.id AND bar_def.name = 'bar'
         )
         ON obj.id = bar.obj_id ;
    

    Here is the SQL Fiddle.