I have these two files that run perfectly fine within GF shell
Test.gf
abstract Test = {
cat
Sentence; Noun;
fun
MySentence : Noun -> Sentence;
}
TestEng.gf
concrete TestEng of Test = open SyntaxEng, ParadigmsEng, DictEng in {
lincat
Sentence = NP;
Noun = N;
lin
MySentence noun = mkNP (aSg_Det) (noun);
}
The way I run them in GF shell is as follow:
> i -retain TestEng.gf
> cc -one MySentence dog_N
a dog
Which gives the expected result.
Then I used Linux command to translate this file into `.pgf' format using the command
> gf -make --output-format=haskell TestEng.gf
linking ... OK
Writing Test.pgf...
Writing Test.hs...
which output these two files Test.hs
and Test.pgf
test.py
import pgf
gr = pgf.readPGF("Test.pgf")
e = pgf.readExpr("MySentence dog_N")
print(gr.languages.keys()) #To check all languages
eng = gr.languages["TestEng"]
print(eng.linearize(e))
When I run the above code I get the following output:
> python3 test.py
dict_keys(['TestEng'])
a [dog_N]
Why python output a [dog_N]
and not a dog
?
I will first give you three alternatives how to make the grammar work. Then I will explain the rest of the mysteries: why cc
works with your initial approach, but parsing/linearisation doesn't, and also how to actually use cc
from Python (just not with the PGF library).
In your example, you are opening DictEng
, so I assume that you would like your application to have a large lexicon.
If you want to be able to parse with a large lexicon, it needs to be a part of the abstract syntax of your grammar. The first mistake is the fact that you're opening DictEng as a resource, instead of extending. (See tutorial to refresh your memory.)
So if you want your abstract syntax to contain a lexicon entry called dog_N
, which you can give as an argument to the function MySentence
, you will need to modify your grammar as follows.
Abstract:
abstract Test = Cat, DictEngAbs ** {
flags startcat = NP ;
fun
MySentence : N -> NP ;
}
Concrete:
concrete TestEng of Test = CatEng, DictEng ** open SyntaxEng in {
lin
MySentence noun = mkNP aSg_Det noun ;
}
In this solution, I'm keeping the constraint that dog_N
has to be correct, and changing everything else. So the changes are:
Noun
and Sentence
)—instead, inherit the Cat module from the RGL abstract syntax.MySentence
works now on the RGL cats N
and NP
. In your original approach, these were the lincats of your custom cats.So this grammar is an extension of a fragment of the RGL. In particular, we are reusing RGL types and lexicon, but none of the syntactic functions.
(In fact, we are also using RGL syntactic functions, but via the API, not via extending the RGL abstract syntax! The mkNP
oper comes from the RGL API, and we have it in scope because we open SyntaxEng in the concrete syntax.)
Here I decide to keep your custom cats and their lincats. This means that I need to add lexicon explicitly. Like this:
abstract Test = {
flags startcat = Sentence ;
cat
Sentence; Noun;
fun
MySentence : Noun -> Sentence;
-- Lexicon needs to be given explicitly
dog_N : Noun ;
cat_N : Noun ;
}
If I don't extend DictEngAbs, like in the previous approach, and I want to have something in scope that is called dog_N
, I must create it myself. In order to be able to parse or linearise anything, it must be in the abstract syntax.
So in the concrete, we are opening DictEng again, and using it to linearise the lexical items of this abstract syntax.
concrete TestEng of Test = open SyntaxEng, DictEng in {
lincat
Sentence = NP;
Noun = N;
lin
MySentence noun = mkNP aSg_Det noun ;
-- Lexicon can use the definitions from DictEng
dog_N = DictEng.dog_N ;
cat_N = DictEng.cat_N ;
}
Of course, if you want a large lexicon, this is not so useful. But if you actually didn't care about a large lexicon, this results in the simplest and smallest grammar. RGL is just used as a resource, strictly via the API. As we can see, this grammar just opens SyntaxEng and DictEng, all cats and funs it has are defined in the abstract. Nothing hidden, nothing surprising, no bulk. Also, no coverage, this grammar can literally just say "a dog" and "a cat".
This is effectively the same as solution (a), but I'm just showing how to extend the RGL fragment and keep your custom cats, if you wanted to do that.
Here's the abstract syntax.
abstract Test = Cat, DictEngAbs ** {
flags startcat = Sentence ;
cat
Sentence; Noun;
fun
MySentence : Noun -> Sentence;
-- applied to lexical items from the dictionary
n2noun : N -> Noun ;
}
We start again by extending Cat and DictEngAbs. But we also define our own cats. The reason why it works is our coercion function, n2noun : N -> Noun
.
N
in DictEngAbs to Noun
. Because MySentence only accepts Nouns.So n2noun
does the conversion for us. Here's the concrete syntax:
concrete TestEng of Test = CatEng, DictEng ** open SyntaxEng in {
lincat
Sentence = NP;
Noun = N;
lin
MySentence noun = mkNP aSg_Det noun ;
n2noun n = n ;
}
The syntax trees look like this now:
Test> gr -number=3 | l -treebank
Test: MySentence (n2noun hairstylist_N)
TestEng: a hairstylist
Test: MySentence (n2noun nomia_N)
TestEng: a nomia
Test: MySentence (n2noun seminar_N)
TestEng: a seminar
If you prefer the syntax trees shorter, just MySentence hairstylist_N
, then go for solution (a).
I can't think of concrete benefits compared to (a) for such a small example, but in a larger system, it can be useful for adding restrictions. For instance, suppose you added more Noun
s from another source, and didn't want to give them as arguments to other functions, but the RGL N
s can be arguments to all functions, then it's useful to have a separation of cats with coercion functions.
I touched on some things already in my three alternative solutions, but there are still issues I didn't explain. Here are the rest of the mysteries.
Because you didn't try to parse and linearise, instead you used cc
(compute concrete) with -retain
flag. When you open a grammar in GF shell with -retain
, it keeps all local, temporary helper stuff in scope—this includes modules that you open. So dog_N
from DictEng
was in scope, but only for cc in the GF shell.
Did you try to parse and linearise in the GF shell? If you try, you will already run into failure there:
Test> l MySentence dog_N
Function dog_N is not in scope
In contrast to cc
, parsing and linearisation cannot depend on local definitions. It has to be in the abstract syntax, otherwise it doesn't exist. And if you want to access a grammar from Python using the PGF library, then the grammar must be compiled into the PGF format. And the PGF format doesn't retain local definitions.
cc
from PythonTechnically, you can use cc from Python, but not using the PGF library. It works if you open GF shell as a subprocess, and give it the uncompiled GF file as input. This works, I put it in a file called test.py
:
from subprocess import Popen, PIPE
gfscript = ['i -retain TestEng.gf',
'cc -one MySentence dog_N']
command = 'gf -run'.split()
gfinput = '\n'.join(gfscript)
gf = Popen(command, stdin=PIPE, stdout=PIPE)
stdout, _stderr = gf.communicate(gfinput.encode('utf-8'))
stdout = stdout.decode('utf-8')
print(stdout)
And running it on the command line, with your original grammar in the same directory, gives me the desired answer.
python3 test.py
a dog
Remember, you can't parse anything with cc
, not in GF shell, not from Python subprocess. It's just for generating output.
Final minor nitpick: you don't need the flag --output-format=haskell
if you don't need a Haskell version of the abstract syntax. Just gf -make TestEng.gf
is enough to produce the PGF.