as the question states, I would like to be able to convert a grammar to a set of strongly connected (non-terminal only) components. I want to do this by constructing a graph from the grammar and then call the connected-components function. Which brings me to the real problem: how to construct this directed graph that has edges of the type:
For each A -> xBz where A,B in Non-terminals (N) and x and y in (sigma U N)*:
construct an edge from A to B
Is something that does similar things to this build in, or would I have to completely implement this myself? If so, could you help me get started, by for example showing how to get only terminals, non-terminals and production rules from a grammar? Instead of only by machine parseable grammar-data structure?
If there is a better way to find the components instead of constructing the graph that would be a great answer as well of course. I hope this is clear enough, if not, just let me know!
Edit: The algorithm is relatively easy I think, I just don't see how to do this in Rascal. Here is a picture of the algorithm in pseudo-code. Here grammar.V are its non-terminals and P its production rules (different definition, so different naming :s)
First get a grammar for your syntax like so:
import Grammar;
gr = grammar(#YourTopNonterminal);
Then you could use this library module (with example code on how to extract dependencies):
import analysis::grammars::Dependency;
deps = symbolDependencies(gr);
And you'd get a binary relation between dependent symbols like this:
rascal>symbolDependencies(g)
Graph[Symbol]: {
<sort("A"),sort("B")>,
<sort("B"),sort("C")>
}
The basic code for symbolDependencies
is this:
Graph[Symbol] symbolDependencies(Grammar g) =
{ <from,to> | /prod(Symbol from,[_*,Symbol elem,_*],_) := g, /Symbol to := elem}
The comprehension loops over all rules of the grammar, takes the head from
and then finds all symbols to
in the rule (possibly nested due to regular expressions) and creates a tuple for each pair.
After that you'd start analyzing and transforming this relation to get the strongly connected components. The library module analysis::graphs::Graph
has an example function which computes connected components (not strongly connected components, so you'd have to adapt that).
rascal>import analysis::graphs::Graph;
ok
rascal>connectedComponents(symbolDependencies(g))
set[set[Symbol]]: {{
sort("A"),
sort("C"),
sort("B")
}}
Finally, to print a symbol like sort("A")
back to a pretty name can come in handy, especially if you have regular expressions over non-terminals (like * and +):
rascal>t = type(sort("A"),());
type[value]: type(
sort("A"),
())
rascal>"<t>"
str: "A"
I'd also recommend visualizing the graphs using viz::Figure