Search code examples
rdfspecificationsw3crdfsblank-nodes

How to distinguish between two Blank Nodes in RDF?


I am having difficulty understanding a passage from w3.org. The confusing passage may be an error, or I may just be confused.

The following is Section 6.6 of the RDF Concepts Specification,

6.6 Blank Nodes

The blank nodes in an RDF graph are drawn from an infinite set. This set of blank nodes, the set of all RDF URI references and the set of all literals are pairwise disjoint.

Otherwise, this set of blank nodes is arbitrary.

RDF makes no reference to any internal structure of blank nodes. Given two blank nodes, it is possible to determine whether or not they are the same.

So, the thing I'm confused about is: If there is no way to know the "internal structure of blank notes", how can one tell them apart? Is this a typo?


Solution

  • It is not a typo and I agree, it is not straight forward to understand. This is a also recurrent issue. Blank nodes exist because sometimes there aren't ways to create an URI to represent a node. This case happens all the time in OWL when constructing constrains, for example.

    A blank node ID is created, normally, when the RDF file is parsed and it must be unique. So by definition you shouldn't find two blank node with same identifiers. One way of distinguish between two blank nodes is to look at all the incoming/out-coming predicates plus their objects/subjects in order to see if the connected sub-graphs are identical. This is hard to implement and it could be very expensive to compute for large graphs.

    This problem has been widely discussed in connection with finding differences between RDF graphs. One very interesting article is one of the TimBL's design issues Delta: an ontology for the distribution of differences between RDF graphs. Also have a look at How to diff RDF graphs wiki from the w3c.

    If you are the data publisher then try to avoid blank nodes if posible. If you need blank nodes then try to come up with a hash function that gives you a unique ID for different blank node constructions in such a way that two different blank nodes with the same graph structure will have the same ID and therefore you can put them appart.