I'm using the bulk loader to download my Review kind from one GAE server and upload to another GAE server. This Review kind contains about 30,000 entities.
I use the command to upload:
appcfg.py upload_data --kind=Review --filename=review.csv --application=<MY_APP_ID> --url=http://<MY_APP_ID>.appspot.com/remote_api --rps_limit=500 --bandwidth_limit=2500000 --batch_size=100
NOTE: The entire error is pretty long. So I've included it at the bottom of this question.
The upload doesn't start at all and stops at the line: [INFO ] Opening database: bulkloader-progress-20110611.222101.sql3
I see at the bottom of the entire trace: lambda val: _EPOCH + datetime.timedelta(microseconds=val), OverflowError: date value out of range
I suspect that this is caused by badly formated date's found in my publisheddate column (see below for entire Review kind).
I ran the GQL: SELECT * FROM Review where publisheddate = null
in the GAE admin portal and found around 15 entities with null values for the publisheddate column. So I deleted those, redownloaded and tried reuploading the Review kind. I still get the same error.
I'm thinking, since I'm using the upload command without a config file parameter e.g. --config_file=bulkloader.yaml
(see above upload that I'm not specifying the --config_file
)
I'm relying on the datastore statistics to determine what is downloaded to my review.csv file.
Now since I just deleted the entities with null publisheddate this redownloading and then reuploading immediately is not doing anything different from my initial download and upload. Since the datastore statistics aren't updated yet.
As there are 30, 000 entities I'm unsure as to how I can locate entities with badly formated dates. Is there a quick away to locate entities with bad publisheddate values?
Does anyone know how to fix this issue? I'm not sure from the error trace what the real problem is.
I'm using the objectify3.0.jar to manipulate my GAE datastore. So my Review kind looks like this:
public class Review {
@Id private Long id; //This is my key property, auto generated by objectify
private String reviewlink;
private Date publisheddate;
private double rating;
private Long gameid; //to make querying easier
//ommitting getters and setters
}
10:20:48>appcfg.py upload_data --kind=Review --filename=review.csv --application=<MY_APP_ID> --url=http://<MY_APP_ID>.appspot.com/remote_api --rps_limit=500 --bandwidth_limit=2500000 --batch_size=100
Uploading data records.
[INFO ] Logging to bulkloader-log-20110611.222101
[INFO ] Throttling transfers:
[INFO ] Bandwidth: 2500000 bytes/second
[INFO ] HTTP connections: 8/second
[INFO ] Entities inserted/fetched/modified: 500/second
[INFO ] Batch Size: 100
[INFO ] Opening database: bulkloader-progress-20110611.222101.sql3
Traceback (most recent call last):
File "/usr/local/bin/appcfg.py", line 76, in <module>
run_file(__file__, globals())
File "/usr/local/bin/appcfg.py", line 72, in run_file
execfile(script_path, globals_)
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/tools/appcfg.py", line 3592, in <module>
main(sys.argv)
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/tools/appcfg.py", line 3583, in main
result = AppCfgApp(argv).Run()
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/tools/appcfg.py", line 2229, in Run
self.action(self)
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/tools/appcfg.py", line 3368, in __call__
return method()
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/tools/appcfg.py", line 3195, in PerformUpload
run_fn(args)
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/tools/appcfg.py", line 3087, in RunBulkloader
sys.exit(bulkloader.Run(arg_dict))
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/tools/bulkloader.py", line 4360, in Run
return _PerformBulkload(arg_dict)
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/tools/bulkloader.py", line 4204, in _PerformBulkload
loader.initialize(filename, loader_opts)
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/tools/bulkloader.py", line 2879, in initialize
self.high_id_table = self._find_high_id(self.generate_records(filename))
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/tools/bulkloader.py", line 2896, in _find_high_id
for values in record_generator:
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/tools/bulkloader.py", line 2920, in generate_records
yield datastore.Entity._FromPb(fixed_entity_proto)
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/api/datastore.py", line 958, in _FromPb
value = datastore_types.FromPropertyPb(prop)
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/api/datastore_types.py", line 1792, in FromPropertyPb
value = conversion(value)
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/api/datastore_types.py", line 1722, in <lambda>
lambda val: _EPOCH + datetime.timedelta(microseconds=val),
OverflowError: date value out of range
The method I used to find the problematic row was using the bulk loader WITH a config file to download each row until it reached a row it couldn't download. Once I corrected the problmatic row(s) the download/upload worked fine.
NOTE: In fact it was multiple rows but once I found the first one it was just a matter of finding/replacing the others with the offending data.
The command I used to find the row was:
appcfg.py upload_data --coinfig_file=bulkloader.yaml --kind=Review --filename=review.csv --application=<MY_APP_ID> --url=http://<MY_APP_ID>.appspot.com/remote_api --rps_limit=500 --bandwidth_limit=2500000 --batch_size=100
NOTE: In my question above, I was NOT using a config file. In this command just above. I'm am using a config file. This config file is generated from the datastore statistics using this command:
appcfg.py create_bulkloader_config --filename=bulkloader.yaml --application=<appname> --url=http://<appname>.appspot.com/remote_api
The primary reason for using the config file is that it downloads my datastore formated in a proper csv file.
For some reason, when you don't specify a config file with the bulk loader. The downloaded csv is all messy i.e. it contains alot of random characters. However, there are no problems uploading these files back to your datastore.
Why is this the case that both properly formated csv files and these messy csv files work when uploading to your datastore?
The actual problem was that some of the data for the publisheddate
property had values set as 00/00/0000 (MM/DD/YYYY). Java converted it to a Date
with a value of Sunday 2nd January 0001 (Don't quote me on the value, but I recall it was 0001 for the year).
This was get inserted into the GAE datastore fine, but when the bulk loader tries to download or upload these values, it complains with the error: lambda val: _EPOCH + datetime.timedelta(microseconds=val), OverflowError: date value out of range
as posted above.