Full meaning of pound-sign / hash-mark in RDFa

Normally, in XML the #, when prefixed onto a short string of text, is used to indicate an XML "fragment." It merely tells the browser to jump to the XHTML element with that text assigned as the ID using the @ID attribute after the whole page has been loaded.

In RDFa, it seems that people tend to use this when creating vocabularies to avoid the necessity of creating a different URL for each different term. Suppose the primary URI is something like "www.example.com/vocabulary/" then the vocabulary creator has the option of making the URIs for the terms of the vocabulary have URIs like this: "www.example.com/vocabulary/term1" or like this: "www.example.com/vocabulary#term1". If these URIs were only ever going to be used as ethereal URIs that do not resolve to an actual web page then it is a distinction without a difference. However, if said creator intends to create a web page to describe said vocabulary then it may be easier for them to use the latter because then the description of the whole vocabulary appears on one web page with a URL of "www.example.com/vocabulary/index.html" and the #term1 will cause the browser to jump to the XHTML element with term1 set as the ID using the @ID attribute.

However, and here is the question, I have also seen the hash-mark used a different way in various explanations and tutorials about RDFa. I have seen it used within the @about attribute like this:

<span about="#jane">
    <!-- Other RDFa or XHTML in here. -->
</span>

In this case, these tutorials claim that #jane is now a subject URI, about which, one could write predicates and objects. But, if #jane is a URI what would be the full URI for "her"? Would it be the current base URI of the page with #jane appended to the end of it? If this is so, then does the about="#jane" attribute provide the equivalent function as an ID="jane" attribute in the same XHTML element? But, about="#jane" could be used in many different XHTML elements which would give them all identical IDs, which is illegal.

Have we created a blank node (bNode) that uses #jane as its node-ID and then started saying things about that blank node? But, I thought the correct way to create a blank node would be about="[_:jane]" so I am confused.

Or are we talking about some XHTML element, containing an ID="jane" attribute, elsewhere on the same page that may or may not have been created but is simply not mentioned in the examples?

Or are all the writers of all these tutorials and examples simply using a shorthand that is commonly accepted within tutorials, but without explaining what the heck they are doing? If so, I have got a lot of dudes that I am gonna smack upside the head when I meet them.

Solution

In RDFa, about="#xyz" assigns a relative URI as the identifier for the subject of the following RDF statements. The formal specification of how a global URI is derived from a local identifier are a bit complicated, but practically, it will be the base URI of the resource plus the fragment part. If the base URI of the document is not set explicitly, this will be the URI from which the representation has been retrieved.

So if you have a file product.html and make this available under the URI

http://www.example.org/product.html (note that the local filename and the public URI are not hard-wired),

then a node

<div typeof="http://purl.org/goodrelations/v1#Offering" about="#offer">
...
</div>

will assign this data element the global URI

http://www.example.org/product.html#offer

Now, what do you need this for?

The most popular case is that you may want make statements about this entity in another resource, which can then be used to collate all information about the same object in a giant graph.

But you can also use the technique for collating meta-data scattered around the same HTML document, because you it is perfectly legal to use "about" with the same identifier multiple times.

For instance, you could use this

<div typeof="http://purl.org/goodrelations/v1#Offering" about="#offer">
<span property="gr:name">ACME Anvil</span>
</div>

and 500 lines below:

<div typeof="http://purl.org/goodrelations/v1#Offering" about="#offer">
<span property="gr:description">The ACME Anvil is the most advanced anvil that money can buy</span>
</div>

An RDFa parser will then know that both the gr:name and the gr:description properties belong to the same object.

In microdata syntax, you have the slightly more elegant "itemref" keyword for directly linking the HTML elements that contain meta-data referring to the same object. In RDFa, you can only do indirect integration of content scattered around the HTML via reusing the same identifier for "about".

So in a nutshell:

Using about in RDFa is a very good practice, because it helps linking your data with other data. Do it!
Do not reuse a fragment that you use for navigational purposes with id="#xyz" for the about property..

So if the fragment identifier to jump to the product is

<div id="product"> blablabla> </div>

do not use about=#product", but e.g. about="#product_data" for identifying the data object.

It is perfectly okay to use the about attribute multiple times with the same identifier, as long as you are talking about the same object (e.g. the same product, the same company).

Now, why is a fragment identifier so popular in examples? Because you will easily get global identifiers for each individual page using that template - think of a shop with 1,000 items on sale. If you add

about="#product_data"

to the product data markup element, then each single product will have a global identifier that others can refer to.

Caveat: Some templates set the base URI of all individual pages to the main page. In this case, all products would get the same URI. So relative URIs don't work if you set the base URI to anything but the canonical URI of that individual page.