parsing translation transformation text-processing context-free-grammar

Using Context-Free Grammar To Parse Options Spread Order Strings?

I need to create a tool that reads in an options spread order in string format and spits it out in human readable format.

Examples:

Input:

BUY +6 VERTICAL LUV 100 (Weeklys) 28 AUG 20 37.5/36.5 PUT @.49 LMT

Output:

VERTICAL
BUY +6 LUV 28 AUG 20 (Weeklys) 37.5 PUT
SELL -6 LUV 28 AUG 20 (Weeklys) 36.5 PUT
.49 DEBIT LMT

Input:

BUY +1 DIAGONAL AMGN 100 (Weeklys) 4 SEP 20/28 AUG 20 245/240 CALL @.07 LMT

Output:

DIAGONAL
BUY +1 AMGN 4 SEP 20 (Weeklys) 245 CALL
SELL +1 AMGN 28 AUG 20 (Weeklys) 240 CALL
-.07 CREDIT LMT

On the surface a context-free grammar appears to be a good solution to express the various syntax (diagonal spreads are more complicated). But having almost no experience with context-free grammars I am not sure how I would carry the numbers over and also how I would for instance add the SELL orders which are not explicitly mentioned in the original order string. The SELL leg is assumed due to it being a vertical spread for example.

Hope this makes sense even if you are not an option trader ;-) The basic idea here is that translating the original string requires a bit of intelligence and is not just a matter of generating different text.

Any insights and pointers would be welcome.

Solution

It's a little hard to tell from only 2 examples, but my guess is, using a context-free grammar (especially if you have almost no experience with them) is probably overkill. The grammar itself would probably be simple enough, but you would need to either add 'actions' to transform the recognized input into the desired output, or have the parser build a syntax-tree and then write code to extract the data from the tree and generate the desired output.

It would be simpler to use regular expressions with capturing. For instance, here's some python3 code that pretty much handles your 2 examples:

import sys, re

for line in sys.stdin:
    
    mo = re.fullmatch(r'BUY \+(\d+) (VERTICAL|DIAGONAL) (\S+) 100 \(Weeklys\) (\d+ \w+ \d+)(?:/(\d+ \w+ \d+))? ([\d.]+)/([\d.]+) (PUT|CALL) @(.\d+) LMT\n', line)
    (n_units, vert_or_diag, name, date1, date2, price1, price2, put_or_call, limit) = mo.groups()

    if vert_or_diag == 'VERTICAL':
        assert date2 is None
        date2 = date1

    print()
    print(vert_or_diag)
    print(f"BUY +{n_units} {name} {date1} (Weeklys) {price1} {put_or_call}")
    print(f"SELL -{n_units} {name} {date2} (Weeklys) {price2} {put_or_call}")
    print(f"{limit} DEBIT LMT")

It's not perfect, because the problem isn't perfectly specified (e.g., it's unclear what causes the human readable format to have a positive DEBIT vs a negative CREDIT). And the space of inputs is no doubt larger than the regex currently handles.

The point is just to show that, based on the examples given, regular expressions could be a compact solution to the general problem.