On my coldfusion page I call a PL/SQL procedure. The HTML form is encoded using UTF-8, the oracle DB has the encoding WE8ISO8859P1
. So what kind of conversion happens in the case of i_value
below?
I'm especially interested in what happens to MS Word left quote (which is encoded in UTF-8 as E28098
). When I decode i_value
in PL/SQL this becomes 18
(Hex). Considering that it is U+2018 this makes some sense, but I still wonder why the 20
is swallowed.
<CFSTOREDPROC PROCEDURE = "my_schema.lib.write_field" datasource="#datasource#">
<cfprocparam cfsqltype="CF_SQL_VARCHAR" variable="i_name" value="remark" type="In">
<cfprocparam cfsqltype="CF_SQL_VARCHAR" variable="i_value" value="#Form.remark#" type="In">
</CFSTOREDPROC>
ColdFusion is Java based so I guess the string variables are encoded using UTF-16.
When going from UTF-16
(2 byte encoding) to a 1 byte encoding(e.g. WE8ISO8859P1
) as is the case in my example, the MSB is ignored and only the LSB is considered.
This would explain the observed behavior: I noticed that some of the bytes just seem to get swallowed when sent to the DB, e.g. the left single quotation mark (0x2018 in UTF-16) becomes just 0x18 in the DB.