Search code examples
phpsqldatabase-designquerying

How to best compare these two strings for values even though they are in random order?


I have a string in my database that is four characters long, such as A487. The user can search that database by typing in four characters. I am having a hard time conceptualizing how to implement this elegantly because of these factors:

  1. The user's input can be in any order. For example, the user may type in 2345 and this would need to match the database entry for 5432.
  2. The input needs to account for pairs of numbers. For example, one entry may be 2245. If the user enters 4252, it needs to find this entry.

In short, I need the search to return all results where there is an exact character for character match regardless of the order of the characters themselves. Any thoughts?

Solution:

I decided to go with the method of creating a sorted column on import so that I could then sort the user's input using the same method and compare exact strings. To do this:

I used PHP's natural sorting algorithm (it doesn't matter how you actually sort these, as long as it's consistent among both the database value and user input).

$hand = "AT77";    

// Create an array with each card in it.
$cards = [$hand[0], $hand[1], $hand[2], $hand[3]];

// Sort the array using natural sort algorithim
natsort($cards);

// Create a string out of array
$sortedHand = implode($cards);

I insert this value in the database under hand_sorted, and then I can simply use the same natsort on an array with the user's input to compare for exact strings. There was no discernible damage in terms of import time for my exact project, where the user is uploading CSVs between 40k and 100k rows long. Ended up being the pefect solution.


Solution

  • To allow for simple querying, I'll recommend that you add a new column to your database table that stores your 4-char strings with each letter occurring alphabetically. Then you can simply alphabetize your input string and seek exact matches. This should be highly efficient.

    ----------------------------------
    col1 | unsorted  | sorted | col2 |
    ----------------------------------
    blah | 3542      | 2345   | blah |
    blah | 4533      | 3345   | blah |
    blah | 4253      | 2345   | blah |
    ----------------------------------
    

    This keeps your data true and also allows for easy querying. You only need to query on the sorted column values.