Search code examples
c#stringunicodestrip

Question Mark ("?") Getting Appended to String


So I am writing a program and am using an existing library written by someone else. Their library is making a call to TheMovieDatabase.com and retrieving information about a movie, including the Youtube trailer name like 'sErD7Y00R_8'.

When I am debugging and view the trailer name string variable this value is stored in, it appears as 'sErD7Y00R_8', however when it gets inserted into my database or printed to console it seems to append a ? (question mark) to the end and appears like this: 'sErD7Y00R_8?'

This is causing me some problems obviously. I cannot figure out why it is doing this and how to fix it. I can only guess that it is some non regular text character or something, but that is only a guess.

Here is the link to the wrapper library: https://github.com/LordMike/TMDbLib/

This is the method I call in the wrapper library, passing in the ID 143049:

TMDbLib.Objects.Movies.Movie tmdbMovie = client.GetMovie(id, MovieMethods.Credits | MovieMethods.Keywords | MovieMethods.Images | MovieMethods.Trailers | MovieMethods.Reviews | MovieMethods.Releases);

and here is the print to console immediately after:

Console.WriteLine("'" + tmdbMovie.Trailers.Youtube[i].Source + "'");

.Length property returns 12 so it appears to be 1 character that it does not show in debugger but prints out as a ? in console

Per a comment I printed out the Encoding.GetBytes details:

Encoding the entire string:
System.Text.UTF7Encoding       : 20  38  :73 45 72 44 37 59 30 30 52 2B 41 46 38 2D 38 2B 49 41 34 2D 
System.Text.UTF8Encoding       : 14  39  :73 45 72 44 37 59 30 30 52 5F 38 E2 80 8E 
System.Text.UnicodeEncoding    : 24  26  :73 00 45 00 72 00 44 00 37 00 59 00 30 00 30 00 52 00 5F 00 38 00 0E 20 
System.Text.UnicodeEncoding    : 24  26  :00 73 00 45 00 72 00 44 00 37 00 59 00 30 00 30 00 52 00 5F 00 38 20 0E 
System.Text.UTF32Encoding      : 48  52  :73 00 00 00 45 00 00 00 72 00 00 00 44 00 00 00 37 00 00 00 59 00 00 00 30 00 00 00 30 00 00 00 52 00 00 00 5F 00 00 00 38 00 00 00 0E 20 00 00 

Debug screenshot


Solution

  • It seems that the question mark appears because an encoding mismatch and since the string should be in ASCII encoding we can remove Non-ASCII characters to resolve the mismatch.

    To do so we can use Regex to find Non-ASCII characters([^\u0000-\u007F]) and replace them with an empty string:

    str=Regex.Replace(str, @"[^\u0000-\u007F]", string.Empty);