Parsing with nltk of a Recursive Grammar

I must be missing something fundamental about recursively defined nonterminals giving issue, but all I want to do is to recognize something like a regular expression, where a series of numbers followed by a series of letters.

from nltk import CFG
import nltk

grammar = CFG.fromstring("""
S -> N L
N -> N | '1' | '2' | '3'
L -> L | 'A' | 'B' | 'C'
""")

from nltk.parse import BottomUpChartParser
parser = nltk.ChartParser(grammar)

sentence = '1 2 1 3 A C B C'.split()

for t in parser.parse(sentence):
    print(t)

the above code returns an empty parse and nothing is printed

Solution

The grammar would need to be something like:

from nltk import CFG, ChartParser

grammar = CFG.fromstring("""
S -> N L
N -> N N | '1' | '2' | '3'
L -> L L | 'A' | 'B' | 'C'
""")

parser = ChartParser(grammar)
sentence = '1 2 1 3 A C B C'.split()
for t in parser.parse(sentence):
    print(t)
    break

<script src="https://cdn.jsdelivr.net/gh/pysnippet/pysnippet@latest/snippet.min.js"></script>

Using N -> N N as an example: the first N could be "eaten up" and transformed into a 1 when parsing the sentence, leaving the next N to go on and produce another N -> N N.

But this will result in a lot of possible parses, for something more efficient you probably want something like this:

from nltk import CFG, ChartParser

grammar = CFG.fromstring("""
S -> N L
N -> '1' N | '2' N | '3' N | '1' | '2' | '3'
L -> 'A' L | 'B' L | 'C' L | 'A' | 'B' | 'C'
""")

parser = ChartParser(grammar)
sentence = '1 2 1 3 A C B C'.split()
for t in parser.parse(sentence):
    print(t)
    break

<script src="https://cdn.jsdelivr.net/gh/pysnippet/pysnippet@latest/snippet.min.js"></script>

Regular Version. The language from the question: "one or more numbers followed by one or more letters" or (1,2,3)+(A,B,C)+ is a regular language, so we can represent it with a regular grammar:

from nltk import CFG, ChartParser

grammar = CFG.fromstring("""
S -> N
N -> '1' N | '2' N | '3' N | '1' | '2' | '3' | L
L -> 'A' L | 'B' L | 'C' L | 'A' | 'B' | 'C'
""")

parser = ChartParser(grammar)
sentence = '1 2 1 3 A C B C'.split()
for t in parser.parse(sentence):
    print(t)
    break

<script src="https://cdn.jsdelivr.net/gh/pysnippet/pysnippet@latest/snippet.min.js"></script>

Try all three out and see what the parses look like on different inputs!