Search code examples
pythonqueuegrammarformal-languages

Trying to generate all sentences of a simple formal grammar


I am new to python and trying to generate all sentences possible in the grammar. Here is the grammar:

  #set of non terminals
  N = ('<subject>', '<predicate>', '<noun phrase>', '<noun>', '<article>', '<verb>',   '<direct object>')
  #set of teminals

  T = ('the', 'boy', 'dog', 'bit')
  #productions
  P = [ ('Sigma',           ['<subject>', '<predicate>']), \
  ('<subject>',       ['<noun phrase>']),            \
  ('<predicate>',     ['<verb>']),                   \
  ('<predicate>',     ['<verb>','<direct object>']), \
  ('<noun phrase>',   ['<article>','<noun>']),       \
  ('<direct object>', ['<noun phrase>']),            \
  ('<noun>',          ['boy']),                      \
  ('<noun>',          ['dog']),                      \
  ('<article>',       ['the']),                      \
  ('<verb>',          ['bit'])                       ]

Here is my attempt, I am using a queue class to implement it methodically,

# language defined by the previous grammar.
Q = Queue()
Q.enqueue(['Sigma'])
found = 0
while 0 < len(Q):
    print "One while loop done"
    # Get the next sentential form
    sf = Q.dequeue()
    sf1 = [y for y in sf]
    for production in P:
        for i in range(len(sf1)):
                if production[0] == sf1[i]:
                        sf[i:i+1] = [x for x in production[1]]
                        Q.enqueue(sf)
                        Q.printQ()

I am getting in infinite loop, and also I am facing some issue with shallow-deep copy, if I change one copy of sf, everything in queue changes too. Any help is appreciated, any directions, tips would be great

Here is the expected output:

       The dog bit the boy
       The boy bit the dog
       The boy bit the boy
       The dog bit the dog
       The dog bit
       The boy bit

Solution

  • I am facing some issue with shallow-deep copy, if I change one copy of sf, everything in queue changes too

    Yes. In Python, a list is an object with its own identity. So:

    Q.enqueue(['Sigma'])
    

    creates a (one-element) list and enqueues a reference to it.

    sf = Q.dequeue()
    

    pops that reference from Q and assigns it to variable 'sf'.

    sf[i:i+1] = ...
    

    makes a change to that list (the one that 'sf' refers to).

    Q.enqueue(sf)
    

    enqueues a reference to that same list.

    So there's only one list object involved, and Q just contains multiple references to it.

    Instead, you presumably want each entry in Q to be a reference to a separate list (sentential form), so you have to create a new list for each call to Q.enqueue.

    Depending on how you fix that, there might or might not be other problems in the code. Consider:

    (1) Each sentence has multiple derivations, and you only need to 'find' one (e.g., the leftmost derivation).

    (2) In general, though not in your example grammar, a production's RHS might have more than one occurrence of a non-terminal (e.g. if COND then STMT else STMT), and those occurrences need not derive the same sub-forms.

    (3) In general, a grammar can generate an infinite set of sentences.


    By the way, to copy a list in Python, instead of saying

    copy = [x for x in original]
    

    it's simpler to say:

    copy = original[:]