parsingpine-scriptlexer

How to implement a Pine Script preprocessor


The required steps for implementing a Pine Script preprocessor are documented here:

https://www.tradingview.com/pine-script-docs/en/v3/appendix/Pine_Script_v2_preprocessor.html

It is an 8 step process, quoting:

Algorithm of @version=2 Pine Script preprocessor in pseudo-code:

  1. Remove comments.
  2. Replace \r\n or \r with just \n.
  3. Add \n to the end of the text if it’s missing.
  4. Lines that contain only whitespace replace with just empty strings.
  5. Add |INDENT| tokens. They indicate that statement is in a block of code, such as function body, if or for body. Every tab or four spaces are replaced with token |INDENT|.
  6. Add |B| and |E| tokens. They indicate line begin and line end. Replace empty lines with |EMPTY| tokens.
  7. Join lines that represent one splitted statement.
  8. Add code block tokens (|BEGIN| — beginning of the block, |END| — end of the block, |PE| — possible end of the block).

Now, I find step 7 rather puzzling. We're building a preprocessor here, so we have not lexed/parsed anything yet, so how can we tell what lines represent "one splitted statement"?

As per their example:

After step 6):

"|EMPTY|
|B|study('Preprocessor example')|E|
|B|fun(x, y) =>|E|
|B||INDENT|if close > open |E|
|B||INDENT||INDENT|x + y |E|
|B||INDENT|else |E|
|B||INDENT||INDENT|x - y|E|
|EMPTY|
|EMPTY|
|B|a = sma(close, 10)|E|
|B|b = fun(a, 123)|E|
|B|c = security(tickerid, period, b)|E|
|B|plot(c, title='Out', color=c > c[1] ? lime : red, |E|
|B||INDENT| style=linebr, trackprice=true) |E|
|B|alertcondition(c > 100)|E|
|EMPTY|"

After step 7). Note that line with plot(c, title= has been joined with the next line:

"|EMPTY|
|B|study('Preprocessor example')|E|
|B|fun(x, y) =>|E|
|B||INDENT|if close > open |E|
|B||INDENT||INDENT|x + y |E|
|B||INDENT|else |E|
|B||INDENT||INDENT|x - y|E|
|EMPTY|
|EMPTY|
|B|a = sma(close, 10)|E|
|B|b = fun(a, 123)|E|
|B|c = security(tickerid, period, b)|E|
|B|plot(c, title='Out', color=c > c[1] ? lime : red, style=linebr, trackprice=true) |E|
|EMPTY|
|B|alertcondition(c > 100)|E|
|EMPTY|"

Note that not only the lines are joined, but also the |B||INDENT| is removed.

Suggestions, anybody?


Solution

  • According to the documentation, a continuation line starts with at least as many INDENTs (four spaces) as the line being continued, plus at least one space but not a multiple of four.

    In other words, after converting groups of four leading spaces to INDENTs, if there are still leading spaces, it's a continuation line. So no tokenisation or parsing is necessary.