I know there are a lot of questions on this but I still haven't been able to find what I'm looking for.
As far as the procedure for finding the Follow sets for a given grammar, I've seen a lot of versions of it, but let's stick to the one given here since it's the one I've seen the most. I'll copy it here so that you don't have to have it open:
- First put $ (the end of input marker) in Follow(S) (S is the start symbol)
- If there is a production A → aBb, (where a can be a whole string) then everything in FIRST(b) except for ε is placed in FOLLOW(B).
- If there is a production A → aB, then everything in FOLLOW(A) is in FOLLOW(B)
- If there is a production A → aBb, where FIRST(b) contains ε, then everything in FOLLOW(A) is in FOLLOW(B)
Question 1: Are the rules 2 and 4 mutually exclusive or they can both be applied in the same "iteration" (writing it like this because that's actually going to be one of my questions)? What I mean here is that they match in the first part, that is they're both applied "If there is a production A -> aBb".
Does this mean that if I encounter a production A -> aBb such that ε is in First(b) I should apply both the second and the fourth rule, or only the fourth?
Question 2: What exactly are 'a' and 'b'? It's not formally specified anywhere other then partly in the rule 2, where it says that "a can be a whole string". What about rules 3 and 4? Let me actually generalize this question even further. If I have a production of the form: A -> abcd....efgBxyzw..... can any of these rules be applied? That is, do these rules require the production to literally contain only three elements on their right hand side? Or can it be interpreted like this:
A -> abcd...ef [gBx] yzw.... , where the gBx part would now correspond to the aBb part within the rules 2 and 4. Or does this only work for the rule 2 and only the 'a' part, as in:
A -> abcd....ef[gBx] , where the left part can be a whole string while the right part has to be exactly one symbol?
Note: The square brackets are not part of the syntax, I just use them to separate stuff so I can explain what I mean.
Question 3: Is this procedure even deterministic? The thing is, they don't mention how long we're supposed to do this and in which order. Well, actually, I've seen some sources say 'as long as there is anything left to add'. What about the order in which we're supposed to do this? I suppose we just take productions at random and apply them as long as we can. Are we supposed to try and derive further productions from the grammar rules that we have? Or is the procedure designed to 'catch' those stuff indirectly? Another very confusing thing here. Let's look at the following scenario:
I'm looking at one production and, by applying the rules, I determine that everything from Follow(A) should go into Follow(B). Let's informally write it as:
Follow(B) += Follow(A)
And let Follow(A), for now (based on my latest calculations), contain some elements, say {x, y, z}. Should I immediately write Follow(B) = {..., x, y, z} or should I wait til the very end? Why? Well what if, a few productions later, I encounter a production that modifies Follow(A), in any way, let's say it adds 'w' to it, so that now Follow(A) = {x, y, z, w}. Does this mean that, assuming I've written Follow(B) = {..., x, y, z} right away instead of waiting, I now need to go back there and add 'w' to it, or is the procedure supposed to 'catch' that 'w' in some of the later iterations?
Thanks in advance.
Question 1: No, They both cannot be applied in the same iteration. That is because the except for ε
part in it means that if there is no ε.
Question 2: 'a' and 'b' are α
(alpha) and β
(beta). They both stand for multiple symbols. (like A → Z b B n m
could be A → α B β
; Z b
was shortened to α
and n m
to β
). So no, it isn't limited to only 3 symbols on the rhs.
Question 3: Yes, this is detrministic. What you do, is you have a map (like std::unordered_map
to store the follow set), and then you start a while loop-- the condition being while changes were made during the current iteration to the follow set. Inside the the loop body, you put the rules. An example:
follow_sets = {} # map
changes = True
while changes:
changes = False
# follow set rules
# once a rule is applied, and things are added to the follow sets map, set changes to True
Now, your rule set is good, but not descriptive enough. This is the rule set that I use to build follow sets:
Follow(START) has $
If A → α B β (Provided β ≠ ε):
Follow(B) → First(β)
If A → α B:
Follow(B) → Follow(A)
If A → α B β (provided β → ε):
Follow(B) → { First(β) - ε } ∪ First(A)