Search code examples
regexjava-8splittrim

How to trim substrings after a non-letter token in Java


I have a string. In my code, I'm trying to trim substrings after a non-letter token if there are any. What do you think would be a better way to do that?

I tried split, replaceAll functions and matches function with regex but couldn't deliver a good solution.

String initialString = "Brown 1fox jum'ps over 9 the_t la8zy dog.";
String[] splitString = initialString.split(" ");
String finalString= new String();   
for (int i = 0; i < splitString.length; i++) {
    finalString+=splitString[i].split("[^a-zA-Z]",2)[0]+" ";
}
finalString=finalString.trim().replaceAll("\\s+", " ");

Actual Result (as expected): "Brown jum over the la dog"


Solution

  • As an alternative you might use [^a-zA-Z ]+\S* to replace the matches with an empty string and after that replace the double whitespace characters with a single using \\s{2,}

    String string = "Brown 1fox jum'ps over 9 the_t la8zy dog.";
    String result = string.replaceAll("[^a-zA-Z ]+\\S*", "").replaceAll("\\s{2,}", " ");
    

    Demo