Search code examples
graph-databasesanzograph

Can't load bad data with Anzograph


I'm trying to load a filtered Wikidata dump with Anzograph using LOAD WITH 'global' <file:wdump-749.nt.gz> INTO GRAPH <WD_749>. The file exists; Anzograph gives out this error:

Error - At Turtle production subject=http://www.wikidata.org/entity/Q144> predicate=http://www.wikidata.org/prop/direct/P1319> file=wdump-749.nt.gz line=3229 details: -34000-01-01T00:00:00Z:Datum is not a datetime, use setting 'load_normalize_datetime' to patch bad data

I've set load_normalize_datetime=true in settings.conf and settings_anzograph.conf inside Anzograph's filesystem, restarted the server, but still can't load the dump. I get the exact same error.


Solution

  • load_normalize_datetime does not take a boolean. Change bad datetimes in loads to this value, e.g. 0001-01-01T00:00:00Z

    So instead try setting:

    load_normalize_datetime=0001-01-01T00:00:00Z

    in your settings.conf, which worked for me on that specific file using the command you listed.

    WD_749 has 38,131,614 statements, loaded in 372 seconds on my Thinkpad. It was relatively slow (102k triples per second) to load because it is a single file. If you break it up into smaller pieces (you can do this with the COPY command to dump the graph to a dir:/mydir/wdump-749.nt.gz) it will load in parallel (for me 114 seconds, 335k tps).