Search code examples
sparqldbpedialinked-data

What are the data differences between live.dbpedia.org, dbpedia.org, and the dbpedia data dump?


I understand that live.dbpedia.org is closer to a real time version of the dbpedia.org data, but that invites the question, how often is the regular dbpedia extraction/update process run? How often are the data dumps updated? Also, it's been said that the main endpoint incorporates other datasets in addition to what is extracted from Wikipedia.

What are the differences in data between dbpedia.org, live.dbpedia.org, and the data dumps?


Solution

  • I did some research on DBpedia for a project and I am going to share what I found out:

    Q: The live-updates of DBpedia (changesets) have the structure year/month/day/hour/xxxx.nt.gz. What does it mean if there are some gaps in between, e.g. a folder for some hour is missing?

    A: This means that the service was down at that time.

    And DBpedia live - 3. new features (WayBackMachineLink)says:

    5.Development of synchronization tool: The synchronization tool enables a DBpedia Live mirror to stay in synchronization with our live endpoint. It downloads the changeset files sequentially, decompresses them, and integrates them with another DBpedia Live mirror.

    So I think if you're synchron with the live-endpoint when applying the changeset, the live endpoint is also applying the changeset.