Search code examples
.netparsingxpathf#fxsl

Steps and involvement of implementing a parser (in .Net - and in this case XPath 2.0)


In the lack of any good free XPath 2.0 implementations for .Net build upon Linq to XML I have thought about implementing my own (also for the experience). But just to be clear (and not building something that exists) these are the XPath 2.0 implementations I have found:

  • Saxon .Net
  • Query Machine - I had problems with this - exceptions with the examples
  • XQSharp - may be good, but is commercial (single developer ~300 $)

Now, I want some thoughts on how difficult it is to implementing some language such as XPath 2.0 expressions. I have found this link which have a EBNF for XPath 2.0 expression: http://www.w3.org/TR/2007/REC-xpath20-20070123/#id-grammar and I'm thinking of making it in F# with the fslex/fsyacc combination.

My background (subjective): I have played with these tools before, but only for some simple expressions and a very simple programming language. Furthermore, I have read most of the Dragon book and Appel´s Modern compiler implementation in ML - but unfortunately, I have not put the theory in practice while reading. I've studied computer science in a year now where I have completed courses with theory about ex finite automaton, CFL and algorithms but I have been a developer for years before university (a few years with professional jobs - back-end of websites mainly).

Now, the steps of parsing and what I tend to cover:

  1. Lex - Parsing - Reductions: FsLex/FsYacc. I will properly not cover ALL of Xpath 2.0 at first but at least all of what XPath 1.0 can do + a little more.
  2. Sematic analysis - I'm not sure about how much there is to this
  3. Optimization - I do not tend to cover this (at least not at first)
  4. Actual traversing etc.
  5. ...?

Now, the concrete questions in addition to the above:

  1. How difficult is it to make a parser of this size? based on my background, would I could to it?
  2. Is there any crucial steps I have missed in regards to XPath 2.0 in particular?
  3. Is there any technology I have missed; Do I have to cover more than just XPath 2.0 and XDocument etc. to be able to make the parser?

To be clear: I want to make a XPath 2.0 expression parser and traverse XDocument etc. with this parsed expression. Which I guess combined is a query engine.

Update: I found this: http://www.w3.org/2007/01/applets/xpathApplet.html which contains code to parsing and traversing. I think it would be a nice start or reference :-)

Your answers will be appreciated.


Solution

  • I implemented an XPath 2.0 parser entirely in XSLT 2.0 three years ago.

    I used my LR Parsing Framework in FXSL and this was not so difficult. The grammar is quite big -- 209 rules, if I remember well. I used a modificationn of YACC (done by me) which I call Yaccx to generate the parsing tables as XML. These are the input for the general LR Parser, written in XSLT.

    For such kind of project you need to allocate at least 6 months full time, maybe 1 year. The difficulty is in implementing the enormous function library (F & O).

    Also, XPath is not a standalone language -- it must be hosted by another language. Due to this reason I didn't use this parser for anything meaningful, as I didn't have the access, influence and possibility to alter an existing hosting language.

    So, be prepared for all these difficulties.