I have 3 Mysql tables:
[block_value]
[metadata]
[metadata_value]
In these tables, there are pairs: metadata_name
= value
And list of pairs are put in blocks (id_block_value
)
(A) If I want height = 1080:
SELECT DISTINCT file_id
FROM metadata_value MV
INNER JOIN metadata M ON MV.meta_id = M.id_metadata
INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value
WHERE (metadata_name = "height" and value = "1080");
+---------+
| file_id |
+---------+
| 21 |
| 22 |
(...)
| 6962 |
(...)
| 8146 |
| 8147 |
+---------+
794 rows in set (0.06 sec)
(B) If I want file extension = mpeg:
SELECT DISTINCT file_id
FROM metadata_value MV
INNER JOIN metadata M ON MV.meta_id = M.id_metadata
INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value
WHERE (metadata_name = "file extension" and value = "mpeg");
+---------+
| file_id |
+---------+
| 6889 |
| 6898 |
| 6962 |
+---------+
3 rows in set (0.06 sec)
BUT, if I want:
Then, I don't know what is the best.
For A or B
, I tried A union B
which seems to do the trick.
SELECT DISTINCT file_id
FROM metadata_value MV
INNER JOIN metadata M ON MV.meta_id = M.id_metadata
INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value
WHERE (metadata_name = "height" and value = "1080")
UNION
SELECT DISTINCT file_id
FROM metadata_value MV
INNER JOIN metadata M ON MV.meta_id = M.id_metadata
INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value
WHERE (metadata_name = "file extension" and value = "mpeg");
+---------+
| file_id |
+---------+
| 21 |
| 22 |
| 34 |
(...)
| 6889 |
| 6898 |
+---------+
796 rows in set (0.13 sec)
For A and B
, since there are no intersect
in Mysql, I tried A and file_id in(B)
, but look at perfs (>4mn)...
SELECT DISTINCT file_id
FROM metadata_value MV
INNER JOIN metadata M ON MV.meta_id = M.id_metadata
INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value
WHERE (metadata_name = "height" and value = "1080")
and file_id in(
SELECT DISTINCT file_id
FROM metadata_value MV
INNER JOIN metadata M ON MV.meta_id = M.id_metadata
INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value
WHERE (metadata_name = "file extension" and value = "mpeg"));
+---------+
| file_id |
+---------+
| 6962 |
+---------+
1 row in set (4 min 36.22 sec)
I tried B and file_id in(A)
too, which is a lot better, but I will never know how which one to put first.
SELECT DISTINCT file_id
FROM metadata_value MV
INNER JOIN metadata M ON MV.meta_id = M.id_metadata
INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value
WHERE (metadata_name = "file extension" and value = "mpeg")
and file_id in(
SELECT DISTINCT file_id
FROM metadata_value MV
INNER JOIN metadata M ON MV.meta_id = M.id_metadata
INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value
WHERE (metadata_name = "height" and value = "1080"));
+---------+
| file_id |
+---------+
| 6962 |
+---------+
1 row in set (0.75 sec)
So... what do I do now? Is there any better way for boolean operations? Any tip? Did I miss something?
EDIT: what data looks like:
This database contains a row in FILE
table for each audio/video file inserted:
There is a row in METADATA
table for each potential information:
Then, a row in BLOCK
table define a container:
A file can have several blocks of metadata, a BLOCK_VALUE
table contains instances of BLOCKS:
In this example, file 10 has 5 blocks: 3 Video (101) + 1 Audio (102) + 1 General (104)
Values are stored in METADATA_VALUE
I'm opening a new post only to keep the "correct" solution tidy..
Ok, sorry, it seemed that I was making the wrong assumption. I never thought about two blocks being defined exactly the same way.
So, since I'm a copycat, and I like my getting the AND from OR solution (:P), I got to these two solutions..
ORing: I like Chris's solution better...
SELECT DISTINCT file_id
FROM metadata_value MV
INNER JOIN metadata M ON MV.meta_id = M.id_metadata
INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value
WHERE (metadata_name = "height" and value = "1080")
OR (metadata_name = "file extension" and value = "mpeg")
ANDing: I'll use your ORing version (the one with the UNION all
SELECT FILE_ID FROM (
SELECT DISTINCT 1, file_id
FROM metadata_value MV
INNER JOIN metadata M ON MV.meta_id = M.id_metadata
INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value
WHERE (metadata_name = "height" and value = "1080")
UNION ALL
SELECT DISTINCT 2, file_id
FROM metadata_value MV
INNER JOIN metadata M ON MV.meta_id = M.id_metadata
INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value
WHERE (metadata_name = "file extension" and value = "mpeg")
) IHATEAND
GROUP BY FILE_ID
HAVING COUNT(1)>1
Which gives:
+---------+
| FILE_ID |
+---------+
| 6962 |
+---------+
1 row in set (0.24 sec)
it should be a little less fast than the ORing seeing the performances you pasted and mines (I am 3 times as slow, time to upgrade -.-), but still significantly faster than the previous queries ;)
Anyway, how does the ANDing work? Put pretty simply, it just does the two separate queries and names the records according to the branch they come from, then counts the different file ids coming from them
UPDATE: another way of doing it without having to "name" the branches:
SELECT FILE_ID FROM (
SELECT file_id
FROM metadata_value MV
INNER JOIN metadata M ON MV.meta_id = M.id_metadata
INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value
WHERE (metadata_name = "height" and value = "1080")
GROUP BY FILE_ID
UNION ALL
SELECT file_id
FROM metadata_value MV
INNER JOIN metadata M ON MV.meta_id = M.id_metadata
INNER JOIN block_value BV ON MV.blockvalue_id = BV.id_block_value
WHERE (metadata_name = "file extension" and value = "mpeg")
GROUP BY FILE_ID
) IHATEAND
GROUP BY FILE_ID
HAVING COUNT(1)>1
Here the results are the same (and performances as well) and I'm exploiting the fact that while UNION automatically sorts the duplicates and removes the duplicates, UNION ALL does not... which is perfect since I don't want them removed (and in general union all is also faster than union :) ), this way I can forget about naming.