Search code examples
javajava-stream

Branching like syntax in the stream api


I hope you're doing well. I have this Java code

String expr = "490 + 34 + 8 * 42 /45 -cos(90)/sin(90)";

List<String> tokens = new ArrayList<>();
for (int i = 0, j = 0, len = expr.length(); i < len; i += j) {
    String s = expr.chars()
                   .skip(i)
                   .filter(x -> !Character.isWhitespace(x))
                   .takeWhile(Character::isDigit)
                   .takeWhile(Character::isLetter)
                   .mapToObj(Character::toString)
                   .reduce("", (acc, v) -> acc.concat(v));
    j += s.length();
    tokens.add(s);
}
System.out.println(tokens);

The goal is to get the following result when printing value of [variable] tokens:

[490,+,34,+,8,*,42,/,45,-,cos,(,90,),/,sin,(,90,)]

If you run this code you'll get an infinite loop. The problem is that .takeWhile(Character::isDigit) will return a stream consisting of "4,9,0" and then .takeWhile(Character::isLetter) will operate on it resulting in an empty stream and then .reduce("", (acc, v) -> acc.concat(v)) will return an empty string so j will be always 0.

To achieve the expected result, .takeWhile(Character::isLetter) should be executed if and only if .takeWhile(Character::isDigit) returns an empty stream.
Example
On the first iteration the result of .takeWhile(Character::isDigit) is "490" so it should jump directly to .mapToObj(Character::toString) and the code should proceed normally. On the second iteration, .skip(i) would skip "490" (i.e. the first 3 chars). .takeWhile(Character::isDigit) will result in an empty string and in this case .takeWhile(Character::isLetter) should operate on the result of filter(x -> !Character.isWhitespace(x)) so it would return "+" (just assume it's a letter) and continue processing the rest of the expression. This will go on until we parse the entire string.

From my research, I have found that this is not really possible and the examples I have found are very complicated, which is not surprising, and don't solve my problem. So, you guys are my last hope. Do you have any ideas on how to achieve this?

I've tried different approaches including collector.groupingBy() or flatMap() but none of them work. I know for sure that I can do it with a simple for loop but I really want to explore the stream API and go as far as I can with it.


Solution

  • Don’t implement a character-wise operation like that.

    First define what comprises a token, e.g. via regex:

    static final Pattern TOKEN = Pattern.compile("\\w+|\\S");
    

    This simple example defines it as either a word consisting of letters and/or digits (which includes identifiers and numbers) or a single non-space character, which must be neither letter nor digit as it didn’t match the word pattern. See Pattern for reference.

    Then, you may split a string into token strings as simple as

    String expr = "490 + 34 + 8 * 42 /45 -cos(90)/sin(90)";
    List<String> tokens = TOKEN.matcher(expr).results()
        .map(MatchResult::group).toList();
    
    System.out.println(tokens);
    

    which prints

    [490, +, 34, +, 8, *, 42, /, 45, -, cos, (, 90, ), /, sin, (, 90, )]
    

    See Matcher.results(). Alternatively you may use a Scanner, e.g.

    List<String> tokens = new Scanner(expr).findAll(TOKEN)
        .map(MatchResult::group).toList();
    

    This does not provide an advantage for the simple in-memory String, but Scanner may also operate on files or byte/character streams without loading the entire file into memory.