Updated: see below
I have figured out how to get the Key Name field filled in for my dataset so that I have now reduced the write ops by 2 (down from 8). But I still have this extra empty column "ID". I have tried various configurations of the bulkloader.py, but I am unable to populate that column and hopefully reduce my write ops down further...
Here's what I have:
python_preamble:
- import: base64
- import: re
- import: google.appengine.ext.bulkload.transform
- import: google.appengine.ext.bulkload.bulkloader_wizard
- import: google.appengine.ext.db
- import: google.appengine.api.datastore
- import: google.appengine.api.users
transformers:
- kind: Word
connector: csv
property_map:
- property: __key__
external_name: word
export_transform: transform.key_id_or_name_as_string
- property: ID
external_name: ID
# How to configure this one to use up that silly empty column called "ID"?
- property: otherlangs
external_name: otherlangs
Here is the header of my csv file and some sample rows:
$ head allting.csv
ID,word,otherlangs
100,a,it|uno|
200,aaltos,fi|aaltojen|
300,aardvark,is|jarðsvín|nl|aardvarken|
Update: Okay, well, I found out how to populate the "ID" column at the expense of the "Key Name" column...
I changed my bulkload.py to look like this:
transformers:
- kind: Word
connector: csv
connector_options:
encoding: utf-8
columns: from_header
property_map:
- property: __key__
external_name: id
export_transform: transform.key_id_or_name_as_string
import_transform: transform.create_foreign_key('id', key_is_id=True)
- property: word·
external_name: word·
- property: otherlangs·
external_name: otherlangs·
The csv file looks like this:
id,word,otherlangs
100,a,it|uno|
200,aaltos,fi|aaltojen|
...
And the output in the Datastore Viewer looks like this now:
Still wondering if there's any way to populate the ID and the "Key Name" column and get the write ops down to 4?
Since no one has any thoughts on this question, I will document this workaround I came up with.
I was able to get the write ops down to 4 by doing this to the bulkload.yaml file
...
transformers:
- kind: Word
connector: csv
connector_options:
encoding: utf-8
columns: from_header
property_map:
- property: __key__
external_name: word
export_transform: transform.key_id_or_name_as_string
- property: otherlangs
external_name: otherlangs
changing my class to look like this:
class Word(db.Model):
word = db.StringProperty(multiline=False)
otherlangs = db.StringProperty(multiline=True)
def __str__(self): #encode('utf8')
return "word: " + str(self.key().name().encode('utf8')) + ", otherlangs: " + self.otherlangs.encode('utf8')
And now the write ops are 4, which is nice:
Querying in the interactive console got trickier, it took me a while to figure that out.
from google.appengine.api import users
from google.appengine.ext.db.metadata import Namespace
import words
foo = words.Word.get_by_key_name('abalone')
print foo
which produces:
word: abalone, otherlangs: fr|ormeaux|it|orecchie di mare|
What I don't understand is... how can I dynamically add new words in such a way that I can get the word to be the key, like the bulk uploader is doing for me. But I'm not going to worry about that too much, as long as the bulk loader can do it, that's good enough for the moment I guess.