Search code examples
javaalgorithmstringsortingcomparison

Sort on a string that may contain a number


I need to write a Java Comparator class that compares Strings, however with one twist. If the two strings being compared are the same at the beginning and the end, and the middle part that differs is an integer, then compare based on the numeric values of those integers. For example, I want the following strings to end up in the order they're shown:

  • aaa
  • bbb 3 ccc
  • bbb 12 ccc
  • ccc 11
  • ddd
  • eee 3 ddd jpeg2000 eee
  • eee 12 ddd jpeg2000 eee

As you can see, there might be other integers in the string, so I can't just use regular expressions to break out any integer. I'm thinking of just walking the strings from the beginning until I find a bit that doesn't match, then walking in from the end until I find a bit that doesn't match, and then comparing the bit in the middle to the regular expression "[0-9]+", and if it compares, then doing a numeric comparison, otherwise doing a lexical comparison.

Is there a better way?

Update I don't think I can guarantee that the other numbers in the string, the ones that may match, don't have spaces around them, or that the ones that differ do have spaces.


Solution

  • The Alphanum Algorithm

    From the website

    "People sort strings with numbers differently than software. Most sorting algorithms compare ASCII values, which produces an ordering that is inconsistent with human logic. Here's how to fix it."

    Edit: Here's a link to the Java Comparator Implementation from that site.