Search code examples
gf

Compute concrete in PGF


I have these two files that run perfectly fine within GF shell

My GF code

Test.gf

abstract Test = {
    cat
        Sentence; Noun;
    fun
        MySentence : Noun -> Sentence;
}

TestEng.gf

concrete TestEng of Test = open SyntaxEng, ParadigmsEng, DictEng in {
    lincat
        Sentence = NP;
        Noun = N;
    lin
        MySentence noun = mkNP (aSg_Det) (noun);
}

The way I run them in GF shell is as follow:

> i -retain TestEng.gf
> cc -one MySentence dog_N
a dog

Which gives the expected result.

My PGF code

Then I used Linux command to translate this file into `.pgf' format using the command

> gf -make --output-format=haskell TestEng.gf
linking ... OK
Writing Test.pgf...
Writing Test.hs...

which output these two files Test.hs and Test.pgf

Question

My Python code

test.py

import pgf


gr = pgf.readPGF("Test.pgf")
e = pgf.readExpr("MySentence dog_N")

print(gr.languages.keys())          #To check all languages
eng = gr.languages["TestEng"]
print(eng.linearize(e))

When I run the above code I get the following output:

> python3 test.py
dict_keys(['TestEng'])
a [dog_N]

Why python output a [dog_N] and not a dog?


Solution

  • I will first give you three alternatives how to make the grammar work. Then I will explain the rest of the mysteries: why cc works with your initial approach, but parsing/linearisation doesn't, and also how to actually use cc from Python (just not with the PGF library).

    1. Fixing your grammar

    (a) Large lexicon, application grammar as a layer on top of RGL

    In your example, you are opening DictEng, so I assume that you would like your application to have a large lexicon.

    If you want to be able to parse with a large lexicon, it needs to be a part of the abstract syntax of your grammar. The first mistake is the fact that you're opening DictEng as a resource, instead of extending. (See tutorial to refresh your memory.)

    So if you want your abstract syntax to contain a lexicon entry called dog_N, which you can give as an argument to the function MySentence, you will need to modify your grammar as follows.

    Abstract:

    abstract Test = Cat, DictEngAbs ** {
        flags startcat = NP ;
        fun
            MySentence : N -> NP ;
    }
    

    Concrete:

    concrete TestEng of Test = CatEng, DictEng ** open SyntaxEng in {
        lin
            MySentence noun = mkNP aSg_Det noun ;
    }
    

    In this solution, I'm keeping the constraint that dog_N has to be correct, and changing everything else. So the changes are:

    • Removed your cats (Noun and Sentence)—instead, inherit the Cat module from the RGL abstract syntax.
    • The type of MySentence works now on the RGL cats N and NP. In your original approach, these were the lincats of your custom cats.

    So this grammar is an extension of a fragment of the RGL. In particular, we are reusing RGL types and lexicon, but none of the syntactic functions.

    (In fact, we are also using RGL syntactic functions, but via the API, not via extending the RGL abstract syntax! The mkNP oper comes from the RGL API, and we have it in scope because we open SyntaxEng in the concrete syntax.)

    (b) Small lexicon, pure application grammar, RGL is only a resource

    Here I decide to keep your custom cats and their lincats. This means that I need to add lexicon explicitly. Like this:

    abstract Test = {
      flags startcat = Sentence ;
      cat
        Sentence; Noun;
      fun
        MySentence : Noun -> Sentence;
    
        -- Lexicon needs to be given explicitly
        dog_N : Noun ;
        cat_N : Noun ;
    }
    

    If I don't extend DictEngAbs, like in the previous approach, and I want to have something in scope that is called dog_N, I must create it myself. In order to be able to parse or linearise anything, it must be in the abstract syntax.

    So in the concrete, we are opening DictEng again, and using it to linearise the lexical items of this abstract syntax.

    concrete TestEng of Test = open SyntaxEng, DictEng in {
      lincat
        Sentence = NP;
        Noun = N;
      lin
        MySentence noun = mkNP aSg_Det noun ;
    
        -- Lexicon can use the definitions from DictEng
        dog_N = DictEng.dog_N ;
        cat_N = DictEng.cat_N ;
    
    }
    

    Of course, if you want a large lexicon, this is not so useful. But if you actually didn't care about a large lexicon, this results in the simplest and smallest grammar. RGL is just used as a resource, strictly via the API. As we can see, this grammar just opens SyntaxEng and DictEng, all cats and funs it has are defined in the abstract. Nothing hidden, nothing surprising, no bulk. Also, no coverage, this grammar can literally just say "a dog" and "a cat".

    (c) Large lexicon, extend RGL but keep your custom cats too

    This is effectively the same as solution (a), but I'm just showing how to extend the RGL fragment and keep your custom cats, if you wanted to do that.

    Here's the abstract syntax.

    abstract Test = Cat, DictEngAbs ** {
      flags startcat = Sentence ;
      cat
        Sentence; Noun;
      fun
        MySentence : Noun -> Sentence;
    
        -- applied to lexical items from the dictionary
        n2noun : N -> Noun ;
    }
    

    We start again by extending Cat and DictEngAbs. But we also define our own cats. The reason why it works is our coercion function, n2noun : N -> Noun.

    • We have both RGL cats and our custom cats in scope
    • Our lexicon is all in RGL cats (DictEngAbs)
    • Our custom syntactic function works on our custom cats
    • Hence, we need a conversion from the N in DictEngAbs to Noun. Because MySentence only accepts Nouns.

    So n2noun does the conversion for us. Here's the concrete syntax:

    concrete TestEng of Test = CatEng, DictEng ** open SyntaxEng in {
      lincat
        Sentence = NP;
        Noun = N;
      lin
        MySentence noun = mkNP aSg_Det noun ;
    
        n2noun n = n ;
    }
    

    The syntax trees look like this now:

    Test> gr -number=3 | l -treebank
    Test: MySentence (n2noun hairstylist_N)
    TestEng: a hairstylist
    Test: MySentence (n2noun nomia_N)
    TestEng: a nomia
    Test: MySentence (n2noun seminar_N)
    TestEng: a seminar
    

    If you prefer the syntax trees shorter, just MySentence hairstylist_N, then go for solution (a).

    I can't think of concrete benefits compared to (a) for such a small example, but in a larger system, it can be useful for adding restrictions. For instance, suppose you added more Nouns from another source, and didn't want to give them as arguments to other functions, but the RGL Ns can be arguments to all functions, then it's useful to have a separation of cats with coercion functions.

    2. Remaining questions

    I touched on some things already in my three alternative solutions, but there are still issues I didn't explain. Here are the rest of the mysteries.

    Why did your first approach work in GF shell but not in Python?

    Because you didn't try to parse and linearise, instead you used cc (compute concrete) with -retain flag. When you open a grammar in GF shell with -retain, it keeps all local, temporary helper stuff in scope—this includes modules that you open. So dog_N from DictEng was in scope, but only for cc in the GF shell.

    Did you try to parse and linearise in the GF shell? If you try, you will already run into failure there:

    Test> l MySentence dog_N
    Function dog_N is not in scope
    

    In contrast to cc, parsing and linearisation cannot depend on local definitions. It has to be in the abstract syntax, otherwise it doesn't exist. And if you want to access a grammar from Python using the PGF library, then the grammar must be compiled into the PGF format. And the PGF format doesn't retain local definitions.

    Actually using cc from Python

    Technically, you can use cc from Python, but not using the PGF library. It works if you open GF shell as a subprocess, and give it the uncompiled GF file as input. This works, I put it in a file called test.py:

    from subprocess import Popen, PIPE
    
    
    gfscript = ['i -retain TestEng.gf',
                'cc -one MySentence dog_N']
    command = 'gf -run'.split()
    gfinput = '\n'.join(gfscript)
    gf = Popen(command, stdin=PIPE, stdout=PIPE)
    stdout, _stderr = gf.communicate(gfinput.encode('utf-8'))
    stdout = stdout.decode('utf-8')
    print(stdout)
    

    And running it on the command line, with your original grammar in the same directory, gives me the desired answer.

    python3 test.py
    a dog
    

    Remember, you can't parse anything with cc, not in GF shell, not from Python subprocess. It's just for generating output.

    Compilation to PGF

    Final minor nitpick: you don't need the flag --output-format=haskell if you don't need a Haskell version of the abstract syntax. Just gf -make TestEng.gf is enough to produce the PGF.