Search code examples
graphcontext-free-grammarrascalconnected-components

How to get the strongly connected components of a Rascal context-free grammar


as the question states, I would like to be able to convert a grammar to a set of strongly connected (non-terminal only) components. I want to do this by constructing a graph from the grammar and then call the connected-components function. Which brings me to the real problem: how to construct this directed graph that has edges of the type:

For each A -> xBz where A,B in Non-terminals (N) and x and y in (sigma U N)*:
    construct an edge from A to B

Is something that does similar things to this build in, or would I have to completely implement this myself? If so, could you help me get started, by for example showing how to get only terminals, non-terminals and production rules from a grammar? Instead of only by machine parseable grammar-data structure?

If there is a better way to find the components instead of constructing the graph that would be a great answer as well of course. I hope this is clear enough, if not, just let me know!

Edit: The algorithm is relatively easy I think, I just don't see how to do this in Rascal. Here is a picture of the algorithm in pseudo-code. Here grammar.V are its non-terminals and P its production rules (different definition, so different naming :s)


Solution

  • First get a grammar for your syntax like so:

    import Grammar;
    gr = grammar(#YourTopNonterminal);
    

    Then you could use this library module (with example code on how to extract dependencies):

    import analysis::grammars::Dependency;
    deps = symbolDependencies(gr);
    

    And you'd get a binary relation between dependent symbols like this:

    rascal>symbolDependencies(g)
    Graph[Symbol]: {
      <sort("A"),sort("B")>,
      <sort("B"),sort("C")>
    }
    

    The basic code for symbolDependencies is this:

    Graph[Symbol] symbolDependencies(Grammar g) =
      { <from,to> | /prod(Symbol from,[_*,Symbol elem,_*],_) := g, /Symbol to := elem}
    

    The comprehension loops over all rules of the grammar, takes the head from and then finds all symbols to in the rule (possibly nested due to regular expressions) and creates a tuple for each pair.

    After that you'd start analyzing and transforming this relation to get the strongly connected components. The library module analysis::graphs::Graph has an example function which computes connected components (not strongly connected components, so you'd have to adapt that).

    rascal>import analysis::graphs::Graph;
    ok
    rascal>connectedComponents(symbolDependencies(g))
    set[set[Symbol]]: {{
        sort("A"),
        sort("C"),
        sort("B")
      }}
    

    Finally, to print a symbol like sort("A") back to a pretty name can come in handy, especially if you have regular expressions over non-terminals (like * and +):

    rascal>t = type(sort("A"),());
    type[value]: type(
      sort("A"),
      ())
    rascal>"<t>"
    str: "A"
    

    I'd also recommend visualizing the graphs using viz::Figure