Search code examples
parsingf#abstract-syntax-tree

Implementing a function for building a very simple tree (AST) using tokens/symbols in F#


I looked at how to make a tree from a given data with F# and https://citizen428.net/blog/learning-fsharp-binary-search-tree/

Basically what I am attempting to do is to implementing a function for building an extremely simple AST using discriminated unions (DU) to represent the tree.

I want to use tokens/symbols to build the tree. I think these could also be represented by DU. I am struggling to implement the insert function.

Let's just say we use the following to represent the tree. The basic idea is that for addition and subtraction of integers I'll only need binary tree. The Expression could either be an operator or a constant. This might be the wrong way of implementing the tree, but I'm not sure.

type Tree =
| Node of Tree * Expression * Tree
| Empty
and Expression =
| Operator //could be a token or another type
| Constant of int

And let's use the following for representing tokens. There's probably a smarter way of doing this. This is just an example.

type Token =
    | Integer
    | Add
    | Subtract

How should I implement the insert function? I've written the function below and tried different ways of inserting elements.

let rec insert tree element = 
    match element, tree with
    //use Empty to initalize
    | x, Empty -> Node(Empty, x, Empty)
    | x, Node(Empty,y,Empty) when (*x is something here*) -> Node((*something*))
    | _, _ -> failwith "Missing case"

If you got any advice or maybe a link then I would appreciate it.


Solution

  • I think that thinking about the problem in terms of tree insertion is not very helpful, because what you really want to do is to parse a sequence of tokens. So, a plain tree insertion is not very useful. You instead need to construct the tree (expression) in a more specific way.

    For example, say I have:

    let input = [Integer 1; Add; Integer 2; Subtract; Integer 1;]
    

    Say I want to parse this sequence of tokens to get a representation of 1 + (2 - 1) (which has parentheses in the wrong way, but it makes it easier to explain the idea).

    My approach would be to define a recursive Expression type rather than using a general tree:

    type Token =
      | Integer of int
      | Add
      | Subtract
    
    type Operator = 
      | AddOp | SubtractOp
    
    type Expression =
      | Binary of Operator * Expression * Expression
      | Constant of int
    

    To parse a sequence of tokens, you can write something like:

    let rec parse input = 
      match input with 
      | Integer i::Add::rest ->
          Binary(AddOp, Constant i, parse rest)
      | Integer i::Subtract::rest ->
          Binary(SubtractOp, Constant i, parse rest)
      | Integer i::[] ->
          Constant i
      | _ -> failwith "Unexpected token"
    

    This looks for lists starting with Integer i; Add; ... or similar with subtract and constructs a tree recursively. Using the above input, you get:

    > parse input;;
    val it : Expression =
      Binary (AddOp, Constant 1, 
        Binary (SubtractOp, Constant 2, Constant 1))