Search code examples
c#dictionarycompiler-constructiontreelookup

Optimal data structure for lookup within nested scopes


Say, there's a tree data structure, each leaf of which defines a set of keys for lookup:

*
|- A = 1, B = 2
|- *
   |- C = 4
   |- *
      |- D = 5
      |- D = 6, E = 7

I need a way of finding the value of the key for any given leaf while the tree is being traversed depth-first.

There are two approaches I have thought of:

  1. If the value is not found in current leaf, check the dictionary of its parent and so on back to the root of the tree.

  2. There is a global dictionary and each leaf inserts/removes its keys when being traversed. The lookup in performed in this global dictionary.

It's most likely that there will be many leafs with a few keys in each, and about 3-4 lookups for each key.

Which approach is more efficient? Or, maybe, there's another way of doing it that is better than both?


Solution

  • The programming language you are implementing will for sure define exact rules for name resolution. I don't think it will lead to a depth-first search. Name resolution rules, very fruequently, look somethink like this:

    1. search the current scope, frequently considering only what has been declared “up” from the current position in the source code;
    2. when some specific rules are met, e.g. there is some form of a using / import or whatever other construct linking to some other scope, perform a search in that other scope (all of such scopes, sequentially), and recurse within it:
      1. search the given scope,
      2. recurse for any relevant nested scopes;
    3. go to the immediatelly enclosing scope;
    4. repeat from (1), where the 'current' scope is that determined in (3).

    In other words, you gradually go up in the tree of enclosing scopes, and decide whether to search any 'foreign' referenced scopes. Statements such as using / import lead to references among scopes, which in turn cause what is viewed as a tree of scopes to be in fact a directed graph.

    Regarding the lookup table construction, I'd start with a simple hashtable. Prefix trees (tries) work well for these scenarios, too.

    Last but not least, I wouldn't care much of lookup performance unless I'd face a real performance problem when compiling tens or maybe hundreds of thousands of lines of code.