Search code examples
parsingsesamen-quads

Less stringent N-Quads parsing in Sesame


The Sesame parser for N-Quads is rather strict (generally not a bad thing!). In addition to parsing IRI terms according to [10] it also implements requirement [2] (both from the N-Quads 1.1 specification).

[1]   IRIREF ::= '<' ([^#x00-#x20<>"{}|^`\] | UCHAR)* '>'
[2]   IRIs may be written only as absolute IRIs.

Is there a way to perform parsing according to [1] only? I have already turned off the additional configuration settings I could find (e.g., do not interpret lexical expressions according to their datatype) but have not yet found a setting for disabling absolute IRI checking or an overview of all settings.


Solution

  • No, there isn't, currently. The N-Quads format (just like its sister format, N-Triples) specifically requires that only absolute IRIs are used. Any document that contains relative IRIs is by definition invalid.

    While not a good idea from an interoperability perspective, it technically wouldn't be hard to add such a feature though. A base URI is already provided with every Sesame parser, and this could easily be used to resolve relative IRIs against (in fact all the code necessary for this is already in place, it's just that the N-Quads parser doesn't make use of it).

    Feel free to log a feature request with the Sesame development team to include this. In the mean time you could easily tweak the parser yourself by make sure its parsing of URI references uses the method AbstractRDFParser.resolveURI to take relative URIs into account. Shouldn't be hard to create a task-specific subclass of NQuadsParser that does this.

    As an aside: while there is no extensive documentation on parser config, every Sesame Rio parser implements the getSupportedSettings method, which returns the list of parser settings that that parser understands and uses.