I'm trying to write a search query to find articles from a database. I would like to take the search string the user enters and look for a specific set of possible search terms. If the user entered the search string "listing of average salaries in germany for 2011" I would like to generate a list of terms to hunt for. I figured I would look for the whole string and for partial strings of consecutive words. That is I want to search for "listing of average salaries" and "germany for 2011" but not "listing germany 2011".
So far I have this bit of code to generate my search terms:
$searchString = "listing of average salaries in germany for 2011";
$searchTokens = explode(" ", $searchString);
$searchTerms = array($searchString);
$tokenCount = count($searchTokens);
for($max=$tokenCount - 1; $max>0; $max--) {
$termA = "";
$termB = "";
for ($i=0; $i < $max; $i++) {
$termA .= $searchTokens[$i] . " ";
$termB .= $searchTokens[($tokenCount-$max) + $i] . " ";
}
array_push($searchTerms, $termA);
array_push($searchTerms, $termB);
}
print_r($searchTerms);
and its giving me this list of terms:
What I'm not sure how to get are the missing terms:
Update
I'm not looking for a "power set" so answers like this or this aren't valid. For example I do not want these in my list of terms:
I'm looking for consecutive words only.
You want to find all sequential subsets of the exploded string, just start at offset=0
and split the array with length=1
up to count-offset
:
$search_string = 'listing of average salaries in germany for 2011';
$search_array = explode(' ',$search_string);
$count = count($search_array);
$s = array();
$min_length = 1;
for ($offset=0;$offset<$count;$offset++) {
for ($length=$min_length;$length<=$count-$offset;$length++) {
$match = array_slice($search_array,$offset,$length);
$search_matches []= join(' ',$match);
}
}
print_r($search_array);
print_r($search_matches);