I've created a record that contains different type of categories, but this caused GF to generate huge gfo
files (150Mb) or even breaking down in some languages after compiling for more than 10 mins. My record only contains 9 elements (8 different categories).
Take the following small example, let's say I want to create a self-intro text using GF. In order to generate such text, I created a record and I named it Person. The Person record would contain different information like name, age, address, hobbies, human traits, etc. Just creating this small record made GF generate a 1.28 Mb gfo
file, and it takes more than one second to compile it.
Here's my code
Abstract: test.gf
abstract Test = {
flags
startcat = Sentence;
cat
Sentence; Human;
fun
MySentence : Human -> Sentence;
Joan : Human;
}
Concrete: TestSpa.gf
concrete TestSpa of Test = open SyntaxSpa, ParadigmsSpa, Predef, DictSpa in{
lincat
Sentence = Text;
Human = Person;
lin
MySentence person = generateIntro person;
Joan = {name = "Joan" ; det = aSg_Det ; job = mkN ("Doctor") ; age = mkNumeral("22") ; hobby = bailar_V ; lastMeal = mkV2(hablar_V) ;
food = mkN ("spaghetti") ; prep = mkPrep ("at") ; location = mkN("resturant") ; meal = Lunch ; feeling = mkA("happy")};
oper
Person : Type = {name : Str ;
det : Det;
job : N ;
age : Numeral ;
hobby : V ;
lastMeal : V2 ;
food : N;
prep : Prep;
location : N;
meal : Meal;
feeling : A};
generateIntro : Person -> Text =
\per -> mkText (mkUtt (mkNP (mkDet(i_Pron)) (per.job)));
param
Meal = Breakfast | Lunch | Dinner;
}
I noticed that some categories when added together in one record that will cause the run time to get longer, but others would not affect as much.
Question:
1- Is there a way other than using records to keep different categories values together, but not using abstract's dependent types?
2- Is there a way to solve this problem and let GF run fast knowing that these values in the record are all needed, but they will not appear in every sentence?
Yes, the second version is smaller, because the lincats have fewer fields each. The original was:
Person : Type = {name : Str ;
det : Det;
job : N ;
age : Numeral ;
hobby : V ;
lastMeal : V2 ;
food : N;
prep : Prep;
location : N;
meal : Meal;
feeling : A};
And in your new version, this very large lincat is broken up to fewer and smaller ones. That's why the PGF is smaller.
First, a little background. I have a full blog post on the topic here, I recommend reading it if you are interested. But the short version is:
param
Bool = True | False ;
lincat
MyCat = {s : Str ; b : Bool} ;
The category MyCat
becomes 2 concrete categories in the PGF: one where the b
field is True, and other where it's False.
Now look at the concrete categories of the RGL categories you use in your lincat of Person
. The name
field is just a Str, so that doesn't contribute. But next is det : Det
. In the Romance languages in RGL, determiners have inherent parameters for number and whether they are negative: see the implementation here. So we get 4 for Det. Next we have N: Romance languages have inherent gender for nouns, so N compiles to 2 concrete categories. Let me just annotate the rest with the numbers:
Person : Type = {name : Str ; -- 1
det : Det; -- 4
job : N ; -- 2
age : Numeral ; -- 2
hobby : V ; -- 2
lastMeal : V2 ; -- 16
food : N; -- 2
prep : Prep; -- 8
location : N; -- 2
meal : Meal; -- 3
feeling : A}; -- 8
Here we get the multiplication 4 * 2 * 2 * 2 * 16 * 2 * 8 * 2 * 3 * 8, which equals 393,216 concrete categories for Person
.
In contrast, your new 4 types are much smaller:
So altogether 8 + 64 + 32 + 24 = 128 concrete categories for the different Persons.
If I were writing this grammar, I would just let the grammar generate all combinations, and restrict them in an external programming language.
The category of Person
would just include the name, and all other things, like job, hobby and meal would be separate cat
s and fun
s. For example:
cat
Person ; Hobby ; Meal ; Sentence ;
fun
Joan, Maria : Person ;
Singing, Dancing : Hobby ;
Pizza, Dumplings : Meal ;
hasHobby : Person -> Hobby -> Sentence ;
favFood : Person -> Meal -> Sentence ;
This grammar would construct all combinations of persons, hobbies and meals—maybe in real life only Joan sings and Maria dances, but the grammar generates also "Joan likes dancing" and "Maria likes singing".
Benefits to this design are:
Meal
is not a RGL param, it's custom created for this application.)This grammar generates more sentences than the restrictive versions in Yousef's posts, but the PGF is much smaller. That's just how GF works.
So then in my external application, I would control which GF trees are generated and linearised. I may have a spreadsheet where I store people's favourite foods and hobbies, and from that spreadsheet I can just generate GF trees. For example:
# people.csv
Person,favFood,hasHobby
"Maria","dumplings","dancing"
"Joan","pizza","singing"
Then write a script to generate GF trees:
favFood Maria Dumplings
hasHobby Maria Dancing
favFood Joan Pizza
hasHobby Joan Singing