Search code examples
rdfsparqlsesame

Unable to upload webdatacommons example file into OpenRDF Sesame 2.7.0 (seemingly) because of encoded Chinese characters


I just tried uploading the example webdatacommons RDF file

Into Sesame 2.7.0 and get a message:

"'洪雄熊' was not recognised as a language literal, and could not be verified, with language zh_tw [line 3931]"

I checked that line in the file and it is as follows:

<http://bearhungfactory.mysinablog.com/index.php> <http://creativecommons.org/ns#attributionName> "\u6D2A\u96C4\u718A"@zh_tw <http://bearhungfactory.mysinablog.com/index.php>   .

I was wondering if there is a way to relax validation in Sesame so I could upload these files anyway? If not, can you please suggest if there is any other workaround to upload webdatacommons into Sesame? Or is there a SPARQL endpoint to this data that I can use?

Here is the full exception:

    WARNING: org.openrdf.workbench.exceptions.BadRequestException: '洪雄熊' was not recognised as a language literal, and could not be verified, with language zh_tw [line 3931]
org.openrdf.workbench.exceptions.BadRequestException: '洪雄熊' was not recognised as a language literal, and could not be verified, with language zh_tw [line 3931]
    at org.openrdf.workbench.commands.AddServlet.add(AddServlet.java:117)
    at org.openrdf.workbench.commands.AddServlet.doPost(AddServlet.java:69)
    at org.openrdf.workbench.base.TransformationServlet.service(TransformationServlet.java:95)
    at org.openrdf.workbench.base.BaseServlet.service(BaseServlet.java:137)
    at org.openrdf.workbench.proxy.ProxyRepositoryServlet.service(ProxyRepositoryServlet.java:104)
    at org.openrdf.workbench.proxy.WorkbenchServlet.service(WorkbenchServlet.java:222)
    at org.openrdf.workbench.proxy.WorkbenchServlet.handleRequest(WorkbenchServlet.java:151)
    at org.openrdf.workbench.proxy.WorkbenchServlet.service(WorkbenchServlet.java:119)
    at org.openrdf.workbench.proxy.WorkbenchGateway.service(WorkbenchGateway.java:131)
    at org.openrdf.workbench.base.BaseServlet.service(BaseServlet.java:137)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
    at org.openrdf.workbench.proxy.CookieCacheControlFilter.doFilter(CookieCacheControlFilter.java:63)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
    at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
    at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
    at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
    at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
    at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
    at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:947)
    at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
    at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
    at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1009)
    at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
    at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)
Caused by: org.openrdf.rio.RDFParseException: '洪雄熊' was not recognised as a language literal, and could not be verified, with language zh_tw [line 3931]
    at org.openrdf.http.client.SesameHTTPClient.upload(SesameHTTPClient.java:646)
    at org.openrdf.http.client.SesameHTTPClient.upload(SesameHTTPClient.java:563)
    at org.openrdf.repository.http.HTTPRepositoryConnection.add(HTTPRepositoryConnection.java:412)
    at org.openrdf.workbench.commands.AddServlet.add(AddServlet.java:114)
    ... 28 more

I am using a "Native Java Store RDF Schema and Direct Type Hierarchy" repository on Ubuntu 12.04 LTS, 64-bit with JDK 1.6 and Tomcat 7.0.

I'll appreciate any help or general advise with this. Thanks.


Solution

  • Answers from answers.semanticweb and from the Sesame mailing list:

    http://answers.semanticweb.com/questions/22526/unable-to-upload-webdatacommons-example-file-into-openrdf-sesame-270-seemingly-because-of-encoded-chinese-characters

    Summary: zh_tw is not a valid language tag. Convert to zh-tw