Search code examples
oracleplsqlencodingutf-8sqlplus

Encoding issues when spooling files from Oracle Database with UTF-8 encoding


Issue description:

I have a script which runs on Oracle Databases (Windows, or Unix OS). It extracts data and then spools it to .txt files.

To ensure the file is unchanged, the data is hashed while running the script and this hash is later recalculated in a webapplication. This works 9/10 times, but sometimes it provides a mismatch, even though the files are identical and I isolated this to be an encoding issue.

In order to determine the encoding used for the files, the script writes 3 NONASCII characters to a file, which are encoded differently in different encoding schemes. These are later mapped on the backend.

--Encoding related information
SPOOL &&file_desc/Encoding.txt
SELECT ('€'||';'||'ƒ'||';'||'‰') FROM sys.dual;
SPOOL off

Expected result

On a database with UTF-8 encoding, data with NONASCII chars should be spooled correctly and the 3 NONASCII characters should also be spooled correctly.

Actual result

When using .AL32UTF8 system charset (same as DB), the data is spooled correctly, but the 3 characters for encoding are not. This makes it impossible for me to determine which encoding scheme was used.

Database has the following charactersets (obtained from database_properties):

NLS_CHARACTESET: AL32UTF8

NLS_NCHAR_CHARACTERSET: AL16UTF16

SQL-Developer works

When using SQL-Developer (after setting the encoding to UTF8)), I have no issues. Both the Japanese and Greek characters show up correctly and the characters used for encoding correctly show up, leading to a successful hashing match when later recalculated.

SQL*Plus doesn’t work

I need it to work in SQL*Plus as well though and I’ve been running into issues. I’ve tried a range of different variations. DB is Oracle 18c express edition:

Variations tried on SQL*Plus

  1. Set just char code page to utf-8 corresponding to DB chcp 65001 (code for utf-8) NLS_LANG charset: .WE8MSWIN1252 The file with the tablename including the japanese characters gives an encoding ‘error’: JAPANESE¿ The file with the 3 chars used to determine encoding does work well: €;ƒ;‰

  2. No changes to code page, but updated NLS_LANG charset NLS_LANG charset: .AL32UTF8 The file with the tablename including the japanese character shows fine now: JAPANESE世 The file with the 3 chars used to determine encoding is now suddenly ‘empty’ however: ;;

  3. Set NLS_LANG to same as DB and updated code pages chcp 65001 (code for utf-8) NLS_LANG charset: .AL32UTF8 The file with the tablename including the japanese character shows fine now: JAPANESE世 The file with the 3 chars used to determine encoding is now suddenly ‘empty’ however: ;;

  4. Set NLS_LANG to NLS_NCHAR_Characterset. Just in case I also tried to set the system charset to AL16UTF16 which is equal to the NLS_NCHAR_Characterset, thinking that might help solve the issue, but then I get the following error: Error 19 initializing SQL*PLUS Invalid NLS character set for this OS environment

Conclusion / question

The code page doesn't really matter as it just concerns itself with outputting to the command window.

As could be expected, when using the WE8MSWIN1252 the spooling of NONASCII characters doesn't work, because it doesn't know the characters.

Yet when using AL32UTF8, the spooling of the 3 NONASCII characters form sys.dual no longer works, even though it does know the characters.

I've spent several days isolating this issue, I'm at a loss why the latter is the case, can anyone help me?


*Edit, kfinity offerd the solution, although not 100% certain yet why this is the case. So if anyone knows?

If we replace the select statement to the following, it does work:

select unistr('\20AC;\0192;\2030') from dual;

Solution

  • I'm not really sure what the issue is, but if you try doing:

    select unistr('\20AC;\0192;\2030') from dual;
    

    I think it would bypass the issue of getting SQL*Plus to read the correct character values from the .sql file.