Search code examples
oraclehibernatespecial-characters

Hibernate Insert Greek Characters into Oracle


I have a requirement of inserting greek characters, such as 'ϕ', into Oracle. My existing DB structure wasn't supporting it. On investigating found various solutions and I adopted the solution of using NCLOB instead of CLOB. It works perfectly fine when I use unicode, 03A6, for 'ϕ', and use UNISTR function in SQL editor to insert. Like the one below.

UPDATE config set CLOB = UNISTR('\03A6')

However, it fails when I try to insert the character through my application, using hibernate. On debugging, I find that the string before inserting is '\u03A6'. After insert, I see it as ¿.

Can some one please help me how I can resolve this? How do I make use of UNISTR?

PN: I don't use any native sqls or hqls. I use entity object of the table.

Edit:

Hibernate version used is 3.5.6. Cannot change the version as there are so many other plugins dependent on this. Thus, cannot use @Nationalized or @Type(type="org.hibernate.type.NClobType") on my field in Hibernate Entity


Solution

  • After racking my brain on different articles and trying so many options, I finally decided to tweak my code a bit to handle this in Java and not through hibernate or Oracle DB.

    Before inserting into the DB, I identify unicode characters in the string and format it to append &#x in the beginning and ; at the end. This encoded string will be displayed with its actual unicode in the UI (JSP, HTML) that is UTF-8 compliant. Below is the code.

    Formatter formatter = new Formatter();
        for (char c : text.toCharArray()) {
            if (CharUtils.isAscii(c)) {
                formatter.format("%c", c);
            } else {
                formatter.format("&#x%04x;", (int) c);
            }
        }
    
        String result = formatter.toString();
    

    Ex:

    String test = "ABC \uf06c DEF \uf06cGHI";
    

    will be formatted to

    test = "ABC  DEF GHI";

    This string when rendered in UI (and even in word doc) it displays it as

    ABC  DEF GHI

    I tried it with various unicode characters and it works fine.