Search code examples
parsingtextanalysisifcbim

Custom Parser for EXPRESS data modeling language


I need to write a custom parser for EXPRESS - which is mentioned to be a data modeling language that is used to define and pass construction information around for CAD software. Here are couple resources:

https://en.wikipedia.org/wiki/EXPRESS_(data_modeling_language) https://www.loc.gov/preservation/digital/formats/fdd/fdd000449.shtml

Well, I need to come up with a custom parser for this "data modeling language" . However, I have no idea what specifics I need to pay attention to before I can start implementing a decent parser. In what ways should I analyze this text-based format before deciding how to parse it and represent it in a meaningful way?

What do I specifically need to know about this "data modeling language" and its syntax so that I can come up with a reasonable parser?


Solution

  • There are descriptions of the EXPRESS language in Backus-Naur-Form on github. There are tools that take a description in BNF and generate a parser from it (for example bison or boost::spirit).

    These will give you a working text parser for the language. The next step is to give the parsed text a meaning. EXPRESS usually describes a class hierarchy and certain constraints, so you will need to model that with the tokens you get from the parser.

    You might want to take a look at existing implementations, for example stepcode. They have an EXPRESS parser which takes the EXPRESS schema and generates a STEP parser which can load files described by the EXPRESS schema.

    You should know that EXPRESS and STEP are very powerful and extensive tools, so you should consider using/modifying existing implementations instead of rolling your own.