Search code examples
sparqlgraphdbturtle-rdfiri

Upload of TTL by sparql-update queryinto GraphDB fails on diacritics


Upload of turtle data using following bash script:

#!/usr/bin/env bash
RDF4J_ENDPOINT=endpoint_uri
DIR="~/modelio/workspace/IPR/"
IFS=
FILE=tmp.rq

function runUpdateQuery() {
    cp $1 $FILE
    sed -i -e "s!__VOC_IRI__!$2!g" $FILE
    curl --netrc-file .netrc -X POST -H "Content-type: application/sparql-update" -T $FILE $RDF4J_ENDPOINT/statements
}

function transform() {
    VOC_IRI=$1
    PREFIX=$2

    URL="$RDF4J_ENDPOINT/rdf-graphs/service?graph=$VOC_IRI"
    curl --netrc-file .netrc -X POST -H "Content-type: text/turtle" -T "$DIR/$PREFIX-model.ttl" $URL
}

transform http://onto.fel.cvut.cz/ontologies/slovník/datový-psp-2016 psp-2016

fails on diacritics in vocabulary IRI (.../slovník/datový-...) with following error:

<!doctype html><html lang="en"><head><title>HTTP Status 400 – Bad Request</title><style type="text/css">h1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} h2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} h3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} body {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} b {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} p {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;} a {color:black;} a.name {color:black;} .line {height:1px;background-color:#525D76;border:none;}</style></head><body><h1>HTTP Status 400 – Bad Request</h1><hr class="line" /><p><b>Type</b> Exception Report</p><p><b>Message</b> Invalid character found in the request target. The valid characters are defined in RFC 7230 and RFC 3986</p><p><b>Description</b> The server cannot or will not process the request due to something that is perceived to be a client error (e.g., malformed request syntax, invalid request message framing, or deceptive request routing).</p><p><b>Exception</b></p><pre>java.lang.IllegalArgumentException: Invalid character found in the request target. The valid characters are defined in RFC 7230 and RFC 3986
    org.apache.coyote.http11.Http11InputBuffer.parseRequestLine(Http11InputBuffer.java:467)
    org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:294)
    org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
    org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:834)
    org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1417)
    org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
    java.lang.Thread.run(Thread.java:748)
</pre><p><b>Note</b> The full stack trace of the root cause is available in the server logs.</p><hr class="line" /><h3>Apache Tomcat/9.0.14</h3></body></html>

When diacritics is removed, it works well. Any idea what is wrong?


Solution

  • GraphDB uses Unicode and in particular the UTF-8 encoding for all communication over HTTP. In order to pass anything non-ASCII in the URL it needs to be encoded as UTF-8. Curl will not do that automatically if you use it like that. You can either URL-encode the UTF-8 representation of "í" and "ý" manually (%C3%AD and %C3%BD) or you can use this curl feature:

    curl -X POST -H "Content-type: text/turtle" -T file.ttl\
         -G --data-urlencode "graph=http://onto.fel.cvut.cz/ontologies/slovník/datový-psp-2016"\
         http://hostname:7200/repositories/repo/rdf-graphs/service
    

    Crucial is the -G option, which tells curl to append the URL-encoded parameter to the URL.