Syntactic analysis - prolog

I have a Definite Clause Grammar: S → a S b | ɛ .

This was also written as the following rules: s --> [a], s, [b]. and s --> [].

This was translated to Prolog as follows:

s --> [a], s, [b].   became   s([a|S2],R) :- s(S2,[b|R]).
s --> [].            became   s(S,S).

Can somebody tell me what the value of S2 is here? Where does S2 come from?

Solution

It looks like this is a program to determine if a list is of the form aⁿ . X . bⁿ, where aⁿ means n iterations of a, and similarly, bⁿ means n iterations of b.

s --> [a], s, [b]. becomes s([a|S2],R) :- s(S2,[b|R]).
s --> [].          becomes s(S,S).

Here S2 is the middle of the list, a free variable that a part of the list can be bound to.

Naming it S2 is completely arbitrary. It could also have been called S. The only thing it should not be called is R, as that is already used for another variable in that statement.

What matters is what is bound to it - the tail of the list. If this predicate is tried on any list starting with a, then S2 will be bound to the tail.

A few examples to illustrate it:
If the input was [a,a,b,b], the value of S2 would be [a,b,b].
If the input was [a], the value of S2 would be the empty list [].
If the input was [a,x,y,z], the value of S2 would be [x,y,z].
If the input was [b,c,d,e], then it would not match, and S2 would not be bound to anything; instead, the predicate would fail.

Note that [a,x,y,z] actually matches the predicate, despite the fact that it is not of the form aⁿ . X . bⁿ.
The rule only looks at the first item, a, and notices that that matches. So it derives s([x,y,z],[b|R]). Then it will try to continue verifying the input. Only in a later derivation step will it notice that [x,y,z] does not start with a.

Let's go through this step by step.

First we have:
s([a,x,y,z],R) :- s([x,y,z],[b|R]). This works, and Prolog binds S2 to [x,y,z].
Then it gets s([x,y,z],R), but it cannot match that to s([a|S2]), because it does not start with a, and so this time the rule fails.
Now it tries the next rule: s(S,S). It fills in: s([x,y,z],[x,y,z]). With this, it goes back to its earlier call, and tries to match s([x,y,z],[x,y,z]) to its earlier rule s([x,y,z],[b|R]).
It cannot match the [x,y,z] on the rigthand side to [b|R] because it does not start with b. And this is where the rule fails - Prolog has decided that the string is not of the form aⁿ.bⁿ.

To see how R is used, let's look at a trace of a list that does match.

s([a,a,b,b],R):-s([a,b,b],[b|R]). /* We haven't got a value for R yet. */
s([a,b,b],R):-s([b,b],[b|R]).     /* We haven't got a value for R yet. */
s([b,b],[b,b]).                   /* Stop condition for the recursion. */

At this point, the righthand side is instantiated to [b,b].
Prolog had s([a,b,b],[b|R]) in the previous step, and it can now make this succeed by setting R=[b].
It then further unwinds the recursion, filling in values for the righthand side of the rule and applying these values to the lefthand side. Eventually it returns to where it started, and has the value s([a,a,b,b],[a,a,b,b]).