Right now I have a function that searches all posts of a certain user for key words (specified by the user), and return any posts that have matches for all of the key words.
public function fullTextSearch($text, $userId, $offset = 0, $limit = 0) {
$tokens = explode(' ', trim($text,' '));
$requiredMatches = count($tokens);
$matchingId = array();
$result = false;
$sql = "SELECT posts.content "
. "FROM posts "
. "WHERE posts.user_id = '" . $userId . "'";
$primaryResults = $db->fetchAll($sql);
foreach ($primaryResults as $primaryResult) { //results from query
$postTokens = explode(' ', $primaryResult['ent_posts_content']);
$foundMatches = 0;
foreach ($tokens as $token) { //each of the required words
foreach ($postTokens as $postToken) { //each of the words in the post
$distance = levenshtein(strtolower($token), strtolower(rtrim($postToken)));
if ($distance < 2) {
$foundMatches++;
}
}
if ($foundMatches >= $requiredMatches) {
$matchingId[] = $primaryResult['id'];
}
}
}
the issue I am having with this is that one of my users likes to title his posts, and search for those posts by his makeshift 'title', for example;
My Radio
It plays all the music
As you can see in the code I rtrim the tokens from the contents of the post to try and avoid this issue. But when I go to search for Radio in the provided code I don't get that post as a result, I thought it had to do with using the levenshtein and the whitespace character at the end of radio throwing it off, but it doesn't seem to be the case as I am rtrimm-ing the post token for radio.
I ended up using a regular expression to find and replace any whitespace with a " " in the string so it would tokenize properly.
$pregTokens = $pregText = preg_replace('/\s+/', ' ', $primaryResult['ent_posts_content']);
$postTokens = explode(' ', $pregTokens);