I need to make a parser to be able to extract logical structure from a text input in order to construct a query for some web service.
I tried to use regular expressions but it gets really complicated to handle imbrication logic, so I decided to ask for help, maybe I am doing it the wrong way.
ex:
( (foo1 and bar) or (foo2 and bar2) ) and ( (foo3 and bar3) or foo4 ) and "this is quoted"
the result should be something like this:
{
{
foo1
AND
bar
}
OR
{
foo2
AND
bar2
}
}
AND
{
{
foo3
AND
bar3
}
OR
foo4
}
AND
{
"this is quoted"
}
Language used is actionscript 3, but I could adapt Java version.
well, the parser is quite simple ...
first you will need quite a lot of stuff (i'll omit constructors, since i guess you can write them on your own):
expressions (output):
class Expression {}
class Operation extends Expression {
public var operand1:Expression;
public var operator:String;
public var operand2:Expression;
}
class Atom extends Expression {
public var ident:String;
}
tokens (intermediary format):
class Token {
public var source:String;
public var pos:uint;
}
class Identiefier extends Token {
public var ident:String;
}
class OpenParenthesis extends Token {}
class CloseParenthesis extends Token {}
class Operator extends Token {
public var operator:String;
}
class Eof extends Token {}
and a tokenizer, that should implement this interface
interface TokenStream {
function read():Token;
}
i guess you'll figure out yourself how to tokenize ...
so the way is source --(tokenizer)--> tokens --(parser)--> expressions ...
and here the parsing routine, with a little helper:
function parse(t:TokenStream):Expression {
var tk:Token = t.read();
switch ((tk as Object).constructor) {//this is a really weird thing about AS3 ... need to cast to object, before you can access the constructor
case OpenParanthesis:
var e1:Expression = parse(t);
tk = t.read();
switch ((tk as Object).constructor) {
case CloseParenthesis:
return e1;
case Operator:
var op:String = (tk as Operator).operator;
var e2:Expression = parse(t);
tk = t.read();
if (tk is CloseParenthesis)
return new Operation(e1,op,e2);
else
unexpected(tk);
}
else
unexpected(tk);
break;
case Identifier:
return new Atom((tk as Identifier).ident);
default:
unexpected(tk);
}
}
function unexpected(tk:Token) {
throw "unexpected token "+tk.source+" at position "+tk.pos;
}
this is not a particularly good parser, but it shows the bare fundamentals of parsing routines ... well, actually, i didn't check the implementation, but it should work ... it is very primitive and unpermissive ... things like operator precedence etc. are completely missing, and so on ... but if you want that, have a go at it ...
btw. using Haxe with enums, the whole code would look much shorter and much more beautiful ... you may want to have a look at it ...