I'm searching for state abbreviations in a string. Here's an example input string:
String inputStr = 'Albany, NY + Chicago, IL and IN, NY, OH and WI';
The pattern that I'm using to match state abbreviations is:
String patternStr = '(^|\\W|\\G)[a-zA-Z]{2}($|\\W)';
I'm looping through the matches and stripping out the non-alpha characters during the loop, but I know that I should be able to do that in one pass. Here's the current approach:
Pattern myPattern = Pattern.compile(patternStr);
Matcher myMatcher = myPattern.matcher(inputStr);
Pattern alphasOnly = Pattern.compile('[a-zA-Z]+');
String[] states = new String[]{};
while (myMatcher.find()) {
String rawMatch = inputStr.substring(myMatcher.start(),myMatcher.end());
Matcher alphaMatcher = alphasOnly.matcher(rawMatch);
while (alphaMatcher.find()) {
states.add(rawMatch.substring(alphaMatcher.start(),alphaMatcher.end()));
}
}
System.debug(states);
|DEBUG|(NY, IL, IN, NY, OH, WI)
This works, but it's verbose and probably inefficient. What's the one-pass way to get this done in Java/Apex?
You need to use Matcher.group(). Try this:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Escaping
{
public static void main(String[] args)
{
String inputStr = "Albany, NY + Chicago, IL and IN, NY, OH and WI";
String patternStr = "(^|\\W|\\G)([a-zA-Z]{2})($|\\W)";
Pattern myPattern = Pattern.compile(patternStr);
Matcher myMatcher = myPattern.matcher(inputStr);
StringBuilder states = new StringBuilder();
while (myMatcher.find())
{
states.append(myMatcher.group(2));
states.append(" ");
}
System.out.println(states);
}
}
Output: NY IL IN NY OH WI
In a real system, you'd want to verify against a list of all valid state abbreviations, otherwise you could pick up all sorts of junk.