Search code examples
sql-serverextended-ascii

Why T sql Char() function returns values diffrent from whats specified on asciitable.com


All,I am trying to create a T sql scalar function that removes any extended ascii characters in a passed in string value.I am doing this using the PatIndex() function as below.One thing I noticed is that the T sql Char() and ascii() functions return a different character/decimal value when invoked with the correct inputs from whats mentioned in the asciitable.com .Any idea why this is diffrent? The collation used on the database(Sql server 2012) is SQL_Latin1_General_CP1_CI_AS

I use this url for copying and pasting extended ascii characters http://www.rapidtables.com/code/text/ascii-table.htm

SELECT
  ASCII('Ñ') AS tsql_decimal_value,
  165 AS value_from_ascii_table


SELECT
  CHAR(165) AS tsql_value,
  'Ñ' AS value_from_asciitable

My Tsql Function

CREATE FUNCTION dbo.udf_Remove_NON_ASCII_characters (@value AS nvarchar(2000))
RETURNS varchar(2000)
AS
BEGIN

  DECLARE @incorrectcharLoc smallint --Position of bad character
  DECLARE @pattern varchar(140)  --Bad characters to look for

  SELECT
    @pattern = '%[' + CHAR(128) + .... + CHAR(255) + ']%'

  SELECT
    @incorrectcharLoc = PATINDEX(@pattern, @value)

  WHILE @incorrectcharLoc > 0
  BEGIN
    SELECT
      @value = STUFF(@value, @IncorrectcharLoc, 1, '')
    SELECT
      @IncorrectcharLoc = PATINDEX(@Pattern, @value)
  END
  RETURN @value
END

Please see attched screenshot for sql server result setsenter image description here Any help would be really appreciated


Solution

  • Having reached my own computer, I first checked the documentation on CHAR() and ASCII().

    CHAR() returns an INT ASCII character code

    ASCII() returns the ASCII code value of the leftmost character of a character expression

    With these presuppositions and for kicks using the letter ñ, I ran the following statement:

    SELECT  UPPER('ñ') AS Character         -- returns Ñ
          , ASCII( UPPER('ñ') ) ASCII_Value -- returns 209
          , CHAR( ASCII( UPPER('ñ') ) )     -- returns Ñ
          , CHAR(165) AS CHAR_VALUE2        -- returns ¥
    

    Notice that the ASCII code for the value Ñ is 209, and not 165. Also note that there are disparate unofficial versions that list the ASCII codes, such as ASCII-code.com, which does not list Ñ in the first 200 INT values. The trouble is unofficial sources.

    Therefore, we can conclude:

    • A) Online sources should always be taken with some measure of skepticism, particularly if they are not officially sanctioned documentation (I noticed on my computer that the original source was just a picture, rather than a table)
    • B) The ASCII table does not have Ñ in the first 200 INT values.

    One solution could be to build out your own mapping of the ASCII tables and reference that. Whatever you choose, SQL Server uses the implementation of the ISO ASCII standard, so you might as well make SQL the SOT for these issues.