I am very confused on the behavior of the system on the case where both collation and datatype differences are involved.
As a minimal example, I am inputting the same Unicode value to the single column of two different tables. In one table the column is varchar
and of a certain collation, and on the other it's nvarchar
and of another collation. Code and results:
create table cn(code nvarchar(max) collate Latin1_General_CI_AS)
create table cv(code varchar(max) collate SQL_Latin1_General_CP1253_CI_AI)
insert cn select N'3VT18021δ'
insert cv select N'3VT18021δ'
select * from cn
select * from cv
--1.
select * from cn inner join cv on cn.code=cv.code
-- Cannot resolve the collation conflict between "SQL_Latin1_General_CP1253_CI_AI" and "Latin1_General_CI_AS" in the equal to operation.
--2.
select * from cn inner join cv on cn.code=cv.code collate SQL_Latin1_General_CP1253_CI_AI
-- returns one row
--3.
select * from cn inner join cv on cn.code =cv.code collate Latin1_General_CI_AS
-- returns 0 rows
--4.
select * from cn inner join cv on cn.code collate SQL_Latin1_General_CP1253_CI_AI =cv.code
-- returns one row
--5.
select * from cn inner join cv on cn.code collate Latin1_General_CI_AS =cv.code
-- returns one row
My notes:
Case 1: collation difference, I understand
Cases 2 and 5: return (correctly) one row. Why does collating a field to its own collation do any good?
Cases 3 and 4: Why converting one's collation to the other works one time, but not the other?
Of course, all these get further complication from the datatype difference.
Cases 2 and 5: return (correctly) one row. Why does collating a field to its own collation do any good?
When you explicitly use COLLATE
on a value in a clause both sides of the expression are explicitly converted to that collation, thus there is no conflict.
Cases 3 and 4: Why converting one's collation to the other works one time, but not the other?
One of your columns is a varchar
, so when it's changed from one collation to the other, its value changes. This is, specifically, when you COLLATE
the value in your table cv
to the collation Latin1_General_CI_AS
. As 'δ'
isn't a character available in the collation for a varchar
, it changes to a 'd'
and '3VT18021d'
does not equal N'3VT18021δ'
. You can see this with the below:
SELECT code COLLATE Latin1_General_CI_AS
FROM cv;
You would need to explicitly convert the value to a nvarchar
first:
select *
from cn
inner join cv on cn.code = CONVERT(nvarchar(MAX),cv.code) collate Latin1_General_CI_AS;
--Returns one row now
Edit: To explain why Query 3 does not return data, and Query 5 does, this is because of the positioning of the COLLATE
s and when the implicit conversion happens.
cn.code =cv.code collate Latin1_General_CI_AS --3
cn.code collate Latin1_General_CI_AS =cv.code --5
For Query 3, the COLLATE
expression is on cv.code
, which is the varchar
. As a result the value has it's collation changed first and the character 'δ'
is lost. Then it is implicitly converted to an nvarchar
, due to data type precedence.
For Query 5, however, the COLLATE
is on cn.code
the nvarchar
. As a result when the value's collation is changed no characters are lost. As cv.code
doesn't have an explicit COLLATE
, it is instead first converted to an nvarchar
(due to data type precendence) and then collated; causing no loss of characters.