These are the only two tables that are germane. No need to bother you with the others.
mysql> describe skill_usage;
+----------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+---------+------+-----+---------+-------+
| skill_id | int(11) | NO | MUL | NULL | |
| job_id | int(11) | NO | MUL | NULL | |
+----------+---------+------+-----+---------+-------+
mysql> describe skill_names;
+------------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+----------+------+-----+---------+----------------+
| skill_id | int(11) | NO | PRI | NULL | auto_increment |
| skill_name | char(32) | NO | MUL | NULL | |
+------------+----------+------+-----+---------+----------------+
Basically, users enter a boolean search string, using skill names.
I will convert the skill snames to skill_id
, and then want to generate a MySql query to get all matching job_id
from table skill_usage
, by parsing the user's search string.
Strings can contain skill name, the operators AND and OR, plus brackets for precedence.
Some examples might be
But there is no limit to the complexity of expression allowed - and that's my problem.
I'm no SQL guru, so correct me if I am wrong. I think that I want to start SELECT job_id FROM skill_usage
then parse, and build up the rest of the query.
For the first example, just skill name C, I want to add WHERE skillId = X
, where X is gotten from the table skill_names
.
For a simple OR
, like C OR C++
, I can use an IN
clause - WHERE skillId IN (X, Y)
(again, X & Y are lookups of the skill names to get a skill_id
).
For a simple AND
, like C++ AND UML
, I reckon I need an INNER JOIN
, something like WHERE skill_id = X INNER JOIN skill_usage ON skill_usage.skill_id = Y
(where X is the skill_id
for C++ and Y for UML).
I think that is roughly correct, for those simple cases (?).
But, when I get to even slightly more complex cases like (C AND kernel) OR (C++ AND UML)
, I get confused.
Would a regex or an algorithm be appropriate here?
@AnthonyVallée-Dubois answer to this question looks like I might be able to modify it, but it also seems very complex. I am hoping to make something simpler, but am unsure how to start (the PHP coding is not my problem, just the regex or algorithm).
I am trying to separate the parsing from the queries, and am using this question to sort out the queries.
I am getting answers like
SELECT job_id
FROM skill_usage
WHERE skill_id IN (3, 4)
GROUP BY job_id
HAVING MIN(skill_id) <> MAX(skill_id);
and
select s1.job_id
from skill_usage s1
where s1.skill_id = 3
and s1.job_id in (
select s2.job_id
from skill_usage s2
where s2.skill_id = 4
)
Where the latter looks more extensible.
Whereas my pseudo-code for the PHP to convert a search string to an SQL query is roughly
fail if mis-matched brackets
reduce multiple spaces to single
removes spaces before and after closing/opening bracket "( " & " )"
foreach c in string
if c == (
else
if c === )
else
if AND
else
if OE
else
# it's a skill name
Simple query generator, assuming PDO
## for simple tokenisation, the terms are separated by space here.
## ###############################################################
$string = "( C AND kernel ) OR ( C++ AND UML )";
function emit_term( $tag ) {
$res = " EXISTS (
SELECT *
FROM skill_usage su
JOIN skill_names sn ON sn.skill_id = su.skill_id
WHERE su.Job_id = j.job_id
AND sn.skillname = :" . $tag . ")\n";
return $res;
}
$fixed_part ="
SELECT job_id, job_name
FROM jobs j
WHERE 1=1
AND \n" ;
# $tokens = explode( ' ' , $string ); #splits on any single space
$tokens = preg_split( '/[\s]+/' , $string ); # accepts multiple whitespace
# print_r ( $tokens );
$query = $fixed_part;
$args = array();
$num = 1;
foreach ( $tokens as $tok ) {
switch ($tok) {
case '': # skip empty tokens
case ';': # No, you should not!
case '"':
case "'":
case ';': break;
case '(': $query .= '('; break;
case ')': $query .= ')'; break;
case '&':
case 'AND': $query .= ' AND '; break;
case '|':
case 'OR': $query .= ' OR '; break;
case '!':
case 'NOT': $query .= ' NOT '; break;
default:
$tag = '_q' . $num ;
$query .= emit_term ( $tag );
$args[$tag] = $tok;
$num += 1;
break;
}
}
$query .= ";\n\n";
echo "Query + parameters (for PDO):\n" ;
echo $query;
print_r ( $args) ;
Output:
SELECT job_id, job_name
FROM jobs j
WHERE 1=1
AND
( EXISTS (
SELECT *
FROM skill_usage su
JOIN skill_names sn ON sn.skill_id = su.skill_id
WHERE su.Job_id = j.job_id
AND sn.skillname = :_q1)
AND EXISTS (
SELECT *
FROM skill_usage su
JOIN skill_names sn ON sn.skill_id = su.skill_id
WHERE su.Job_id = j.job_id
AND sn.skillname = :_q2)
) OR ( EXISTS (
SELECT *
FROM skill_usage su
JOIN skill_names sn ON sn.skill_id = su.skill_id
WHERE su.Job_id = j.job_id
AND sn.skillname = :_q3)
AND EXISTS (
SELECT *
FROM skill_usage su
JOIN skill_names sn ON sn.skill_id = su.skill_id
WHERE su.Job_id = j.job_id
AND sn.skillname = :_q4)
);
Array
(
[_q1] => C
[_q2] => kernel
[_q3] => C++
[_q4] => UML
)