Search code examples
sparqluuid

How to generate UUIDs for updates via SPARQL


Using an RDF database, accessed via a SPARQL endpoint, what's the best way of generating new UUID IRIs and using them for new resources?

Here is an overview of some approaches I've tried. I am sharing this because I would have liked to find this question answered. My favourite is the last approach, but I'd say it's still up for debate.

Generate a UUID in the client and using it in the Update request

  • pro: fast
  • con: you cannot be sure the UUID is unique in the database. The chance of a collision is small, though.

Generate a UUID in the client and check if the RDF store contains triples with that id. Iterate until the UUID is new.

  • pro: you can be reasonably sure that the UUID is unique in the db (except for ones added in concurrent updates)
  • con: quite slow

Ask the RDF store for a UUID and use it

Query: SELECT (UUID() as ?id) WHERE{}

  • pro: UUID is guaranteed to be unique (i.e. not yet used) in the store
  • con: it's an additional request (but it's a quick one)

Ask the RDF store for N >> 1 UUIDs before doing a bigger amount of updates

Query (returns 1000 result rows):

SELECT (UUID() as ?id) WHERE {
   VALUES ?index1 { 0 1 2 3 4 5 6 7 8 9 } 
   VALUES ?index2 { 0 1 2 3 4 5 6 7 8 9 } 
   VALUES ?index3 { 0 1 2 3 4 5 6 7 8 9 } 
}
  • pro: Probably fastest per UUID if >> 1 are needed
  • pro: UUIDs are guaranteed to be unused in the store
  • con: this approach requires more client-side programming
  • and: this is an unusual query, is there a better way to achieve this?

Related:


Solution

  • Following AndyS's comment, I decided to look that up again and adjust my expectations: you have to generate 1 billion UUIDs per second for about 85 years to reach a 50% probability of one or more collisions. Therefore, Variant 1 is best:

    Generate a UUID in the client and use it in the update request.