Search code examples
javaregexstringreadabilityliterate-programming

Help on a better way to parses digits from a String in Java


I have a string which contains digits and letters. I wish to split the string into contiguous chunks of digits and contiguous chunks of letters.

Consider the String "34A312O5M444123A".

I would like to output: ["34", "A", "312", "O", "5", "M", "444123", "A"]

I have code which works and looks like:

List<String> digitsAsElements(String str){
  StringBuilder digitCollector = new StringBuilder();

  List<String> output = new ArrayList<String>();

  for (int i = 0; i < str.length(); i++){
    char cChar = str.charAt(i);

    if (Character.isDigit(cChar))
       digitCollector.append(cChar);
    else{
      output.add(digitCollector.toString());
      output.add(""+cChar);

      digitCollector = new StringBuilder();
    }         
  }

  return output;
}

I considered splitting str twice to get an array containing all the numbers chunks and an array containing the all letters chunks. Then merging the results. I shied away from this as it would harm readability.

I have intentionally avoided solving this with a regex pattern as I find regex patterns to be a major impediment to readability.

  • Debuggers don't handle them well.
  • They interrupt the flow of someone reading source code.
  • Overtime regex's grow organically and become monsters.
  • They are deeply non intuitive.

My questions are:

  • How could I improve the readability of the above code?
  • Is there a better way to do this? A Util class that solves this problem elegantly.
  • Where do you draw the line between using a regEx and coding something simpilar to what I've written above?
  • How do you increase the readability/maintainability of regExes?

Solution

  • Would you be willing to use regexes if it meant solving the problem in one line of code?

    // Split at any position that's either:
    // preceded by a digit and followed by a non-digit, or
    // preceded by a non-digit and followed by a digit.
    String[] parts = str.split("(?<=\\d)(?=\\D)|(?<=\\D)(?=\\d)");
    

    With the comment to explain the regex, I think that's more readable than any of the non-regex solutions (or any of the other regex solutions, for that matter).