Search code examples
xmlnamespacesrdfsemantic-web

Correctly expanding xml namespaces without defined end character into valid URIs


As far as I know, the semantic web consists of triples of URIs. Namespace shorthands are widely used to abbreviate them in daily use. I thought, namespace shorthands would be expanded to URIs by simple concatenation, e.g. the famous dc:title in the well-known dc: namespace (which is defined as http://purl.org/dc/elements/1.1/, note that the last character is a /) would be expanded to, and hence be semantically equal to http://purl.org/dc/elements/1.1/title.

Then I came over some namespace definitions which lack a sensible separation character at their end. Some examples from http://live.dbpedia.org/sparql?nsdecl

and some from the Most common RDF namespaces list:

How to expand such namespaces into valid linked data URIs?

The W3C Recommendation Namespaces in XML defines:

An expanded name is a pair consisting of a namespace name and a local name.

And Fredrik Lundh writes on effbot.org:

In an Element tree, qualified names are stored as universal names in Clark’s notation, which combines the URI and the local part into a single string, given as ‘{uri}local’.

This may be suitable for a wide range of use cases, but it doesn’t conform to the idea that linked data constists of URIs, which cannot start with a {.

I would have thought that xsd:element should not be expanded to http://www.w3.org/2001/XMLSchemaelement in linked data (nor to {http://www.w3.org/2001/XMLSchema}element), should it? How must this be implemented correctly?


Solution

  • From the RDF/XML Syntax Specification (Revised) [emphasis added]:

    In order to encode the graph in XML, the nodes and predicates have to be represented in XML terms — element names, attribute names, element contents and attribute values. RDF/XML uses XML QNames as defined in Namespaces in XML [XML-NS] to represent RDF URI references. All QNames have a namespace name which is a URI reference and a short local name. In addition, QNames can either have a short prefix or be declared with the default namespace declaration and have none (but still have a namespace name)

    The RDF URI reference represented by a QName is determined by appending the local name part of the QName after the namespace name (URI reference) part of the QName. This is used to shorten the RDF URI references of all predicates and some nodes. RDF URI references identifying subject and object nodes can also be stored as XML attribute values. RDF literals, which can only be object nodes, become either XML element text content or XML attribute values.

    It is simple concatenation. It's the concatenated result that matters. This means that I can use

    @prefix dcterms: <http://purl.org/dc/terms/>
    @prefix dctermsx: <http://purl.org/dc/terms/accrual>
    
    dcterms:accrualPolicy      === http://purl.org/dc/terms/accrualPolicy
    dctermsx:Policy            === http://purl.org/dc/terms/accrualPolicy
    dcterms:accrualPeriodicity === http://purl.org/dc/terms/accrualPeriodicity
    dctermsx:Periodicity       === http://purl.org/dc/terms/accrualPeriodicity
    

    It's interesting that the RDF/XML syntax specification has to define how QNames are interpreted. Why didn't it just inherit the meaning from the XML QName specifications? The answer is in the article that you cited:

    The XML Namespaces specification doesn’t explicitly state how an application should treat the (URI, local part) pair. While most applications treat them as two distinct components, some applications expect you to combine them in different ways.

    In RDF/XML, applications treat the (URI,local part) pair as a reference to the URI that is the concatenation of uri and local, as stated in the initial quotation from the RDF syntax document. The convention, of course, is that URIs defined by a vocabulary are such that there is a common namespace and that the terms are easy to write using that namespace as an XML prefix, so in practice you won't see the sort of namespace mangling that I showed above with the DCMI terms.

    In ElementTree, the QName corresponds to {uri}local. That's how that application treats the (URI,local part) pair.

    There are complications that arise from the fact that RDF/XML serializations have to be valid XML. Not every URI can be represented as a QName, because there are URIs that cannot be represented as a QName, because in a QName namespace:localname, there are restrictions on what characters can appear in namespace and in name. For instance,http://127.0.0.1/789234, you can't have the nice QName like localhost:789234 for it because the localname cannot start with with a number. (For instance, see this thread on the Jena-users mailing list.)

    Another complication or confusion arises from the fact that there are RDF serializations other than RDF/XML, and some of these adopt a prefix/suffix notation that is superficially similar to XML QNames, but relaxes some of these constraints, so you may see prefix/suffix combinations that wouldn't be valid XML QNames, but that's OK for those formats.

    The prefixes defined on the DBpedia SPARQL endpoint highlight this issue. From the SPARQL standard, section 4.1.1.1 Prefixed Names [emphasis added]:

    The PREFIX keyword associates a prefix label with an IRI. A prefixed name is a prefix label and a local part, separated by a colon ":". A prefixed name is mapped to an IRI by concatenating the IRI associated with the prefix and the local part. The prefix label or the local part may be empty. Note that SPARQL local names allow leading digits while XML local names do not. SPARQL local names also allow the non-alphanumeric characters allowed in IRIs via backslash character escapes (e.g. ns:id\=123). SPARQL local names have more syntactic restrictions than CURIEs.

    In this context, while a prefix like

    amz => http://webservices.amazon.com/AWSECommerceService/2005-10-05
    

    would be useless in an RDF/XML serialization, because you'd need to write illegal things like amz:#something or amz:/something, it would be useful (if possibly inconvenient) in SPARQL, where you can write amz:\#something and amz:\/something.