Search code examples
algorithmcompiler-construction

An algorithm for compiler designing?


Recently I am thinking about an algorithm constructed by myself. I call it Replacment Compiling. It works as follows:

Define a language as well as its operators' precedence, such as

(1) store <value> as <id>, replace with: var <id> = <value>, precedence: 1
(2) add <num> to <num>, replace with: <num> + <num>, precedence: 2

Accept a line of input, such as store add 1 to 2 as a;

Tokenize it: <kw,store><kw,add><num,1><kw,to><2><kw,as><id,a><EOF>;

Then scan through all the tokens until reach the end-of-file, find the operation with highest precedence, and "pack" the operation:

<kw,store>(<kw,add><num,1><kw,to><2>)<kw,as><id,a><EOF>

Replace the "sub-statement", the expression in parenthesis, with the defined replacement:

<kw,store>(1 + 2)<kw,as><id,a><EOF>

Repeat until no more statements left:

(<kw,store>(1 + 2)<kw,as><id,a>)<EOF>
(var a = (1 + 2))

Then evaluate the code with the built-in function, eval().

eval("var a = (1 + 2)")

Then my question is: would this algorithm work, and what are the limitations? Is this algorithm works better on simple languages?


Solution

  • This won't work as-is, because there's no way of deciding the precedence of operations and keywords, but you have essentially defined parsing (and thrown in an interpretation step at the end). This looks pretty close to operator-precedence parsing, but I could be wrong in the details of your vision. The real keys to what makes a parsing algorithm are the direction/precedence it reads the code, whether the decisions are made top-down (figure out what kind of statement and apply the rules) or bottom-up (assemble small pieces into larger components until the types of statements are apparent), and whether the grammar is encoded as code or data for a generic parser. (I'm probably overlooking something, but this should give you a starting point to make sense out of further reading.)

    More typically, code is generally parsed using an LR technique (LL if it's top-down) that's driven from a state machine with look-ahead and next-step information, but you'll also find the occasional recursive descent. Since they're all doing very similar things (only implemented differently), your rough algorithm could probably be refined to look a lot like any of them.

    For most people learning about parsing, recursive-descent is the way to go, since everything is in the code instead of building what amounts to an interpreter for the state machine definition. But most parser generators build an LL or LR compiler.

    And I'm obviously over-simplifying the field, since you can see at the bottom of the Wikipedia pages that there's a smattering of related systems that partly revolve around the kind of grammar you have available. But for most languages, those are the big-three algorithms.