Search code examples
javaparsingtokenizelexer

Do I scan twice if I call scanner.hasNext and then scanner.next


Do I scan twice if I call scanner.hasNext(pattern) and then scanner.next(pattern) with the same pattern on java.util.Scanner

Let's say i have this code with a lots of cases (trying to make a lexer):

import java.util.*;
import java.util.regex.Pattern;

public class MainClass {
   public static void main(String[] args) {
      Scanner scanner = new Scanner("Hello World! 3 + 3.0 = 6 ");

      Pattern a = Pattern.compile("..rld!");
      Pattern b = Pattern.compile("...llo");


      while(scanner.hasNext()) {
         if (scanner.hasNext(a)) {
            scanner.next(a);
            /*Do something meaningful with it like create a token*/
         }
         else if(scanner.hasNext(b)) {
            scanner.next(b);
         }
         /*...*/
      }


      // close the scanner
      scanner.close();
   }
}

My questions are:

  • Does the hasNext(pattern) caches somehow the result of the search? So it doesn't search the same pattern twice
  • Is this slower or faster than using try { scanner.next(pattern) } catch { ... }
  • Or is there an easier way (without third-party libraries) to tokenize based on the regex patterns

Solution

  • Ok so I think that the answer is:

    Documentation doesn't say anything so it may be possible, but it probably does not.


    Also, I was primarily asking because i wanted to use it for parsing more complex things like string literals and not just white space separated tokens. And found out that Scanner still takes such token and then it checks if it matches. So it is now useless for my use case.