Search code examples
sql-serverunicodenon-ascii-characters

Defeat these dashed dashes in SQL server


I have a table that contains the names of various recording artists. One of them has a dash in their name. If I run the following:

Select artist
, substring(artist,8,1) as substring_artist
, ascii(substring(artist,8,1)) as ascii_table
, ascii('-') as ascii_dash_key /*The dash key next to zero */
, len(artist) as len_artist 
From [dbo].[mytable] where artist like 'Sleater%'

Then the following is returned. This seems to indicate that a dash (ascii 45) is being stored in the artist column

enter image description here

However, if I change the where clause to:

From [dbo].[mytable] where artist like 'Sleater' + char(45) + '%'

I get no results returned. If I copy and paste the output from the artist column into a hex editor, I can see that the dash is actually stored as E2 80 90, the Unicode byte sequence for the multi-byte hyphen character.

enter image description here

So, I'd like to find and replace such occurrences with a standard ascii hyphen, but I'm am at a loss as to what criteria to use to find these E2 80 90 hyphens?


Solution

  • Your char is the hyphen, information on it here :

    https://www.charbase.com/2010-unicode-hyphen

    You can see that the UTF16 code is 2010 so in T-SQL you can build it with

    SELECT NCHAR(2010) 
    

    From there you can use any SQL command with that car, for example in a select like :

    Select artist
    From [dbo].[mytable] where artist like N'Sleater' + NCHAR(2010) + '%'
    

    or as you want in a

    REPLACE( artist, NCHAR(2010), '-' )
    

    with a "real" dash

    EDIT:

    If the collation of your DB give you some trouble with the NCHAR(2010) you can also try to use the car N'‐' that you'll copy/paste from the charbase link I gave you so :

    REPLACE( artist , N'‐' , '-'  ) 
    

    that you can even take from the string here (made with the special car) so all made for you :

    update mytable set artist=REPLACE( artist, N'‐' , '-' )