I'm trying to build a searchable database of the languages spoken by my users.
For example I might have
$john = array("english", "french", "spanish");
$jack = array("french", "spanish");
$jill = array("english", "spanish");
And I want to save them to a MySQL database so that I can later run something along the lines of (pseudo code)
SELECT * FROM users WHERE spoken_languages = "french" and "spanish"
I know if I had speaks_english
, speaks_french
, and speaks_spanish
columns then I could search for
SELECT * FROM users WHERE speaks_french = "true" and speaks_spanish = "true"
But the scalability of adding a new column every time I encounter a new language is not very good. I've considered a table such as
john | english
john | french
john | spanish
jack | french
jack | spanish
jill | english
jill | spanish
Because at least then to get back languages spoken by a user I could just run
SELECT * FROM spoken_languages WHERE user = "jack"
But in order to search this for people that speak both french and spanish I would need to query for all of the users that speak french, all of the users that speak spanish and then calculate the intersections. That seems horribly inefficient.
So I ask you, how can I save this array of spoken languages so that I can search the database later without destroying the server?
You have the right solution in your question, with the person_language
table that looks like this
john | english
john | french
jack | spanish
You can query it like so.
SELECT person
FROM person_language
WHERE language IN ( 'english', 'spanish')
GROUP BY person
HAVING COUNT(*) = 2
Put an index on (language, person)
and this will scale up fine.
If you want everybody who speaks Spanish and at least one other language you can do this.
SELECT a.person
FROM person_language AS a
JOIN ( SELECT person
FROM person_language
GROUP BY person
HAVING COUNT(*) >= 2
) AS b ON a.person = b.person
WHERE a.language = 'spanish'
This uses a JOIN directive to take the intersection of the people who speak spanish with the people who speak two or more languages.