Search code examples
graphdb

import status lingers after GraphDB repository deleted


GraphDB Free/9.4.1, RDF4J/3.3.1

I'm working on using the /rest/data/import/server/{repo-id} endpoint to initiate the importing of an RDF/XML file.

Steps:

  1. put SysML.owl in the ${graphdb.workbench.importDirectory} directory. chmod a+r SysML.owl

  2. create repository test1 (in Workbench - using all defaults except RepositoryID := "test1")

  3. curl http://127.0.0.1:7200/rest/data/import/server/test1 => as expected: [{"name":"SysML.owl","status":"NONE"..."timestamp":1606848520821,...]

  4. curl -XPOST --header 'Content-Type: application/json' --header 'Accept: application/json' -d ' { "fileNames":[ "SysML.owl" ] }' http://127.0.0.1:7200/rest/data/import/server/test1 => SC==202

  5. after 60 seconds, curl http://127.0.0.1:7200/rest/data/import/server/test1 => [{"name":"SysML.owl","status":"DONE","message":"Imported successfully in 7s.","context":null,"replaceGraphs":[],"baseURI": "file:/home/steve/graphdb-import/SysML.owl", "forceSerial":false,"type":"file","format":null,"data":null,"timestamp": 1606848637716, [...other json content deleted] Repository test1 now has the 263,119 (824 inferred) statements from SysML.owl loaded

BUT if I then

  1. delete the repository using the Workbench page at http://localhost:7200/repository, wait 180 seconds
  2. curl http://127.0.0.1:7200/rest/data/import/server/test => same as in step 5 above, despite repository having been deleted. curl -X GET --header 'Accept: application/json' 'http://localhost:7200/rest/repositories' => test1 not shown.
  3. create the repository again, using the Workbench - same settings as previously. wait 60 seconds. Initial 70 statements present.
  4. curl http://127.0.0.1:7200/rest/data/import/server/test1 => The same output as from the earlier usage - when I was using the prior repository instance. "status":"DONE", same timestamp - which is prior to the time at which I deleted, recreated the test1 repository.

The main-2020-12-01.log shows the INFO messages pertaining to the repository test1, plugin registrations, etc. Nothing indicating why the prior repository instance's import status is lingering.

And this is of concern because I was expecting to use some polling of the status to determine when the data is loaded so my processing can proceed. Some good news - I can issue the import server file request again and after waiting 60 seconds, the 263,119 statements are present. But the timestamp on the import is the earlier repo instance's timestamp. It was not reset via the latest import request.

I'm probably missing some cleanup step(s), am hoping someone knows which.

Thanks, -Steve


Solution

  • The status is simply for your reference and doesn't represent the actual presence of data in the repository. You could achieve a similar thing simply by clearing all data in the repository without recreating it.

    If you really need to rely on those status records you can clear the status for a given file once you polled it and determined it's done (or prior to starting an import) with this curl:

    curl -X DELETE http://127.0.0.1:7200/rest/data/import/server/test1/status \
         -H 'content-type: application/json' -d '["SysML.owl"]'
    

    Note that this is an undocumented API and it may change without notice.