Search code examples
phpsortinglevenshtein-distance

Sort Array by combining orders from multiple Arrays


I'm making a simple search engine, and I have already indexed a lot of websites in a MySQL database. Now I would like to get a relevant list of results by keywords.

Websites are indexed in my database with the following columns : hostname (without protocol an port), title, description. (We don't care about path)

When I type some keywords on my search engine homepage, it first starts by fetching 50 websites using FULLTEXT indexes.

Now, and because using Levenshtein algorithm in MySQL is really slow, I would like to sort those results with the Levenshtein PHP function for each columns I listed previously.

I would like to sort them in this order (most important first) : hostname, title, and then description.

So I have five arrays :

  • $results, returned by MySQL
  • $sorted_by_mysql, containing keys of $results in the original order : 0, 1, 2, ...
  • $sorted_by_hostname, containing keys of $results sorted by hostname's relevance using Levenshtein, ex: 3, 0, 1, 2, ...
  • $sorted_by_title, containing keys of $results sorted by title's relevance using Levenshtein, ex: 0, 2, 1, 3, ...
  • $sorted_by_description, containing keys of $results sorted by description's relevance using Levenshtein, ex: 1, 3, 0, 2, ...

Here's the code :

$results = $req->fetchAll();
$search = strtolower($q);
$temp_arr = [];
$sorted_by_mysql = $sorted_by_hostname = $sorted_by_title = $sorted_by_description = [];

// We keep the original order in an array
for($i = 0; $i < count($results); $i++) $sorted_by_mysql[] = $i;

// Sort by hostname
for($i = 0; $i < count($results); $i++) $temp_arr[$i] = levenshtein($search, strtolower($results[$i]->hostname));
asort($temp_arr);
foreach($temp_arr as $k => $v) $sorted_by_hostname[] = $k;

// Sort by title
for($i = 0; $i < count($results); $i++) $temp_arr[$i] = levenshtein($search, strtolower($results[$i]->title));
asort($temp_arr);
foreach($temp_arr as $k => $v) $sorted_by_title[] = $k;

// Sort by description
for($i = 0; $i < count($results); $i++) $temp_arr[$i] = levenshtein($search, strtolower($results[$i]->description));
asort($temp_arr);
foreach($temp_arr as $k => $v) $sorted_by_description[] = $k;

Finally I would like to sort $results by combining (by priority) all thoses different arrays. But I have no idea on how, so here's where I need some help !

EDIT : Solution !

$data = $req->fetchAll();
$search = strtolower($q);
$temp = [];

    foreach($data as $i => $row) {
        $temp[] = [
            'id' => $i,
            'lev1' => levenshtein($search, strtolower($row->hostname)),
            'lev2' => levenshtein($search, strtolower($row->title)),
            'lev3' => levenshtein($search, strtolower($row->description))
        ];
    }

$sorted = array_orderby($temp, 'lev1', SORT_ASC, 'lev2', SORT_ASC, 'lev3', SORT_ASC, 'id', SORT_ASC);

$results = [];

    foreach($sorted as $row) {
        $results[] = $data[$row['id']];
    }

// Perfectly sorted !

Here's array_orderby function :

// Credits :  jimpoz at jimpoz dot com (PHP.net)
function array_orderby()
{
    $args = func_get_args();
    $data = array_shift($args);
    foreach ($args as $n => $field) {
        if (is_string($field)) {
            $tmp = array();
            foreach ($data as $key => $row)
                $tmp[$key] = $row[$field];
            $args[$n] = $tmp;
            }
    }
    $args[] = &$data;
    call_user_func_array('array_multisort', $args);
    return array_pop($args);
}

Solution

  • See the answer to this SO question, they have a similar need but have structured their data in a way that makes the answer easier. It looks like PHP supports sorting by multiple attributes (in descending priority) as long as those attributes are built into the associative array that's being sorted.

    To apply this approach to your data, you'll probably want to restructure your results into one giant associative array where each element of the array contains a value for each "field" you're aiming to sort by. Does that make sense?

    Good luck!