I'm working on a user database where the profile data has been changed from a simple table into a Entity-Attribute-Value table.
Where as before the structure was along these lines:
userid (int)
address 1 (varchar)
city (varchar)
country (varchar)
It's now along these lines:
userid (int)
key (varchar)
value (varchar)
eg
userid key value
150 city London
150 country UK
151 city New York
151 country USA
152 country Mexico
I need to get a distinct list of city / country pairs and a count of all users for each country:
city country count
London UK 18
New York USA 25
There is no guarantee each key value pair will exist for each user, i.e there could be city, or country or both or neither as well as any number of other key values pairs.
This was straightforward with the old structure, but I can't even think how to begin on this, and would be grateful for some pointers
Your best solution is to go back to the traditional table because EAV makes most querying much harder than it should be - witness your problems here. You're going to be doing self-joins until you're sick of them, remanufacturing the table structure that allows you to perform sensible queries.
Cities and countries for each user ID:
SELECT a.userID, a.value AS city, b.value AS country
FROM EAV AS a
JOIN EAV AS b ON a.UserID = b.UserID
WHERE a.key = 'city'
AND b.key = 'country';
So, you end up with:
SELECT city, country, count(*)
FROM (SELECT a.userID, a.value AS city, b.value AS country
FROM EAV AS a
JOIN EAV AS b ON a.UserID = b.UserID
WHERE a.key = 'city'
AND b.key = 'country'
) AS c
GROUP BY city, country;
If there's a chance that someone might have two city or two country records, this will give you a Cartesian product with as many rows for that user as the product of the number of city and country records for that user.
This quite deliberately and consciously ignores users who have a city and no country or a country and no city (let alone those who have neither). Extending the solution to deal with those is only modestly painful - you end up with a 3-way UNION, I think, though you might be able to devise something with multiple left outer joins. But the fact that data can be entered into an EAV system without the necessary constraints to ensure that there is a city and a country for a user is simply one of the many reasons for rejecting EAV.
I'm sorry you had this foisted on you. I recommend looking at http://careers.stackoverflow.com/ as a way out of your pain, for this is only the beginning of it.
Dealing with users without either city or country or both. I think this will more or less do it:
SELECT a.userID, b.value AS city, c.value AS country
FROM (SELECT DISTINCT UserID FROM EAV) AS a
LEFT JOIN EAV AS b ON a.UserID = b.UserID
LEFT JOIN EAV AS c ON a.UserID = c.UserID
WHERE b.key = 'city'
AND c.key = 'country';
This should give you one record per user as long as there are no multiple city or country records for that user. The a
scan gives you the list of unique user IDs that exist in the EAV table; the two outer joins give you the corresponding city or cities and corresponding country or countries for each such user ID, with nulls being generated if there is no city record or country record (or both) for the given user ID.