Search code examples
jspsparqlontologyurdu

Result of urdu ontology in jsp


I'm new to sparql. I'm trying to retrieve Urdu results from the sparql query and code is working fine in Java forms but when I try to print the result in jsp it shows "ا�? ر_ب�" like this.

 String novelname=request.getParameter("Id");
             novelname = novelname.replaceAll("\\s","");
              OntModel model = ModelFactory.createOntologyModel(OntModelSpec.OWL_MEM_MICRO_RULE_INF);
FileManager.get().readModel( model, "C:/Users/Bisma/Documents/NetBeansProjects/Novelmania/web/novelname.owl" );
       String queryStrings ="prefix uni: <http://www.semanticweb.org/novelname.owl#>" +
"select * {uni:"+novelname+" uni:translate ?Novelname. }"; 
Query query = QueryFactory.create(queryStrings);
QueryExecution qe= QueryExecutionFactory.create(query, model);
org.apache.jena.query.ResultSet resultset = qe.execSelect();
java.io.ByteArrayOutputStream baos= new java.io.ByteArrayOutputStream();
            ResultSetFormatter.outputAsCSV(baos, resultset);
            String answer= new String(baos.toString().getBytes("ISO8859_1"), "UTF-8");

            answer= java.util.Arrays.toString(answer.split("http://www.semanticweb.org/novelname.owl#"));
String[]  arrays = answer.split(",");
String nam=arrays[1];
nam=nam.substring(0, nam.length()-1);
nam=nam.replaceAll("\\s","");
out.print(nam);

? �?" something like this


Solution

  • Your code sample is a bit messy, so it's hard to see exactly what is going wrong, but I think that part of the problem is that you are incorrectly decoding the Urdu characters:

    String answer = new String(baos.toString().getBytes("ISO8859_1"), "UTF-8");
    

    So, you have a ByteArrayOutputStream on which you call toString, which decodes the byte array into a character string using the default platform encoding - this will only work correctly if the byte array was also produced using the default platform encoding. If that is not the case, you should be explicit which encoding you want to use: toString(charsetname).

    On the string you just produced you then call getBytes("ISO8859_1") - so you are converting the string you just created back into bytes, using ISO-8859-1 encoding. Apart from the fact that "ISO8859_1" is not the correct character set name to use (it should be "ISO-8859-1", or better yet, use the StandardCharsets.ISO_8859_1 constant), this just is wrong. ISO-8859-1 is an encoding for the basic Latin alphabet, so it is almost certainly not suitable for Urdu.

    Then, finally, you convert this second byte array back into a String again, but this time you use the UTF-8 charset encoding. This is certainly incorrect, since you just produced that byte array using a different encoding, so you know it's not UTF-8!

    In short, it's a mess. I think all you really need to do is this:

    String answer = baos.toString(charsetName);
    

    ...and then figure out what charsetName should be, that is, which character set encoding was used to create the byte array.