Search code examples
javaregexebnf

Splitting by regex or ebnf


I've got a string like:

create Person +fname : String, +lname: String, -age:int;

Is there any possibility to split it by regex or ebnf? I mean all the things like [a-zA-Z0-9] (things we don't know) will be stored in array?

In other words, by using this regexp:

^create [a-zA-Z][a-zA-Z0-9]* [s|b]?[+|[-]|=][a-zA-Z][a-zA-Z0-9]*[ ]?:[ ]?[a-zA-Z][a-zA-Z0-9]*(, [s|b]?[+|[-]|=][a-zA-Z][a-zA-Z0-9]*[ ]?:[ ]?[a-zA-Z][a-zA-Z0-9]*)*;

I want to obtain array:

  • Person
  • +
  • fname
  • String
  • +
  • lname
  • String
  • -
  • age
  • int

Solution

  • You can try to split it this way

    String[] tokens = "create Person +fname : String, +lname: String, -age:int;"
            .split("[\\s:;,]+|(?<=[+\\-])");
            //split on set of characters containing spaces:;, OR after + or -. 
    for (String s : tokens)
        System.out.println("=> " + s);
    

    output:

    => create
    => Person
    => +
    => fname
    => String
    => +
    => lname
    => String
    => -
    => age
    => int
    

    As you can see it will put create at start of your array so just start iterating from tokens[1].

    You could try do add ^create\\s as part of splitting rule, but this will produce empty string at start of tokens array, so won't solve anything.