I am using jparsec to parse strings like:
[1,2, 3]
[ 3, 4]
[3 ,4,56, 7 ]
[]
I have implemented a few classes (inheriting from my Token
interface) to represent the tokens:
final class OpenListToken
final class CommaToken
final class CloseListToken
final class NumberToken // Has a public final property "value" that contains the int
I have also implemented tokenizers for each:
static final Parser<OpenListToken> openListTokenParser
static final Parser<CommaToken> commaTokenParser
static final Parser<CloseListToken> closeListTokenParser
static final Parser<NumberToken> numberTokenParser
These all work at a character level. For example:
final NumberToken t = numberTokenParser.parse("123");
// t.value == 123
final OpenListToken u = openListToken.parse("[");
// Succeeds
Now I would like to combine them to make a parser of ListExpression
, which is a class than represents a list of numbers. I have tried something like:
openListTokenParser
.next(numberTokenParser.sepBy(commaTokenParser))
.followedBy(closeListTokenParser)
This works for strings like [1,2,3]
but obviously not for strings like [ 1, 2 ]
.
Is there an operator that takes some parsers and puts whitespace*
between them?
Or is it possible to make my ListExpression
parser work on a stream of my Token
interface instances instead of characters?
You can directly build a tokenizer using the functions from Terminals
class. In your case, this would look like the following:
First define the set of our terminals, e.g. operators, keywords, words...
Terminals terminals = operators("[", "]", ",");
Our tokens are then either tokenized by our terminals or the IntegerLiteral
tokenizer:
Parser<?> tokens = Parsers.or(terminals.tokenizer(), IntegerLiteral.TOKENIZER);
Our final results from a syntactic parsers for integers (built from tokens tagged as INTEGER
), separated by our comma token, between our brackets token. We ignore any whitespace in between all tokens (this is the second argument to from
:
Parser<?> parser = IntegerLiteral.PARSER.sepBy(terminals.token(",")).between(terminals.token("["), terminals.token("]"))
.from(tokens, Scanners.WHITESPACES.many().cast());
Et voilà:
System.out.println(parser.parse( "[1,2,3]"));
System.out.println(parser.parse( "[ 1, 2 , 3 ] "));
System.out.println(parser.parse( " [1,2,3 ]"));
System.out.println(parser.parse( "[1, 2 , 3]"));