Search code examples
parsingabstract-syntax-treebisonflex-lexeryacc

Representing an AST in C with different structs for types of nodes


I have many structs which look something like:

typedef struct ast_function_node
{        
    int node_type;
    ast_node* arguments;
    symbol* sym;
} ast_function_node;

typedef struct ast_while_node
{
    int node_type;
    ast_node* condition;
    ast_node* while_branch;
} ast_while_node;

typedef struct ast_assignment_node
{
    int node_type;
    symbol* sym;
    ast_node* value;
} ast_assignment_node;

typedef struct ast_number_node
{
    int node_type;
    double value;
} ast_number_node;

typedef struct ast_string_node 
{
    int node_type;
    char* value;
} ast_string_node;

etc...

And a base struct:

typedef struct // Basic AST node
{
  int node_type;
  struct ast_node* left;
  struct ast_node* right;
} ast_node;

I can populate this AST pretty easily, but when it comes to traversing it I get stuck in a hell of typecasting. If I wanted to visit each node, look at its type and then do something accordingly, what is the best way of doing this? Just casting it to the base node of ast_node won't do it of course.


Solution

  • The way I'd do this in C is to use a struct with a union member:

    typedef struct ast_function
    {        
        ast_node* arguments;
        symbol* sym;
    } ast_function;
    
    typedef struct ast_while
    {
        ast_node* condition;
        ast_node* while_branch;
    } ast_while;
    
    typedef struct ast_assignment
    {
        symbol* sym;
        ast_node* value;
    } ast_assignment;
    
    /* Etc. */
    
    typedef struct ast_node {
      int node_type;
      /* See anonymous unions in any C reference */
      union {
        ast_function   function_data;
        ast_while      while_data;
        ast_assignment assignment_data;
        /* Etc. */
      };
    }
    

    Then you don't need casts at all:

    switch (node->node_type) {
      case AST_FUNCTION:
        handle_function(&node->function_data); 
        break;
      /* Etc. */
    }
    

    If you make node_type an enum instead of an int, the compiler will be able to warn you if you miss a possibility in your switch statement.