Search code examples
javardf4j

How to configure RDF4J Rio writer to write IRIs with special characters?


I want to write an rdf4j.model.Model with the rdf/turtle format. The model should contain IRIs with the characters {}.

When I try to write the RDF model with rdf4j.rio.Rio, the {} characters are written as %7B%7D. Is there a way to overcome this? e.g. create an rdf4j.model.IRI with path and query variables or configure the writer to preserve the {} characters?

I am using org.eclipse.rdf4j:rdf4j-runtime:3.6.2.

An example snippet:

import org.eclipse.rdf4j.model.BNode;
import org.eclipse.rdf4j.model.IRI;
import org.eclipse.rdf4j.model.Model;
import org.eclipse.rdf4j.model.impl.SimpleValueFactory;
import org.eclipse.rdf4j.model.util.ModelBuilder;
import org.eclipse.rdf4j.rio.*;
import org.eclipse.rdf4j.rio.helpers.BasicWriterSettings;

import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.util.logging.Level;
import java.util.logging.Logger;

public class ExamplePathVariable {

    private final static Logger LOG = Logger.getLogger(ExamplePathVariable.class.getCanonicalName());
    public static void main(String[] args) {

        SimpleValueFactory rdf = SimpleValueFactory.getInstance();
        ModelBuilder modelBuilder = new ModelBuilder();

        BNode subject = rdf.createBNode();
        IRI predicate = rdf.createIRI("http://example.org/onto#hasURI");

        // IRI with special characters !
        IRI object = rdf.createIRI("http://example.org/{token}");

        modelBuilder.add(subject, predicate, object);

        String turtleStr = writeToString(RDFFormat.TURTLE, modelBuilder.build());
        LOG.log(Level.INFO, turtleStr);
    }

    static String writeToString(RDFFormat format, Model model) {
        OutputStream out = new ByteArrayOutputStream();

        try {
            Rio.write(model, out, format,
                    new WriterConfig().set(BasicWriterSettings.INLINE_BLANK_NODES, true));
        } finally {
            try {
                out.close();
            } catch (IOException e) {
                LOG.log(Level.WARNING, e.getMessage());
            }
        }

        return out.toString();
    }
}

This is what I get:

INFO: 
[] <http://example.org/onto#hasURI> <http://example.org/%7Btoken%7D> .

Solution

  • There is no easy way to do what you want, because that would result in a syntactically invalid URI representation in Turtle.

    The characters '{' and '}', even though they are not actually reserved characters in URIs, are not allowed to exist in un-encoded form in a URI (see https://datatracker.ietf.org/doc/html/rfc3987). The only way to serialize them legally is by percent-encoding them.

    As an aside the only reason this bit of code:

    IRI object = rdf.createIRI("http://example.org/{token}");
    

    succeeds is that the SimpleValueFactory you are using does not do character validation (for performance reasons). If you instead use the recommended approach (since RDF4J 3.5) of using the Values static factory:

    IRI object = Values.iri("http://example.org/{token}");
    

    ...you would immediately have gotten a validation error.

    If you want to input a string where in advance you don't know if it's going to contain any invalid chars, and want to have a best-effort approach to convert it to a legal URI, you can use ParsedIRI.create:

    IRI object = Values.iri(ParsedIRI.create("http://example.org/{token}").toString());