Search code examples
c#encodingtddpex

Pex and unicode encoding


I'm playing around with Pex and have a simple class. The code is:

public void WriteLine(string line)
{
    Contract.Requires(line != null);
    if (_stream == null)
        _stream = getStream();

    var writer = new StreamWriter(_stream);
    writer.WriteLine(line);
}

private Stream getStream()
{
    return File.Open(Path, FileMode.Append, FileAccess.Write);
}

I created the following PexMethod:

[PexMethod(MaxRunsWithoutNewTests = 200)]
public void WriteLine(string line)
{
    var ms = new MemoryStream();

    MFile.BehaveAsNotImplemented();
    MFileStream.BehaveAsNotImplemented();
    MStreamWriter.BehaveAsNotImplemented( );

    MFile.OpenStringFileModeFileAccess = (p, m, a) => new FileStream(p, m);
    MFileStream.ConstructorStringFileMode = (s, p, m) => new StreamWriter(ms);
    MStreamWriter.AllInstances.BaseStreamGet = sw => ms;
    MStreamWriter.ConstructorStream = (sw, s) =>
    {
        ;
    };
    MTextWriter.AllInstances.WriteLineString = (tw, l) =>
    {
        var buf = Encoding.Unicode.GetBytes(line);
        ms.Write(buf, 0, buf.Length);
     };

     var path = "C:\test.txt";
     var target = new FileWriter(path);
     target.WriteLine(line);

     var buffer = ms.ToArray();
     var result = Encoding.Unicode.GetString(buffer);
     PexAssert.AreEqual<string>(line, result);
}

Pex Exploration came up with this unit test:

[TestMethod]
[PexGeneratedBy(typeof(FileWriterTest))]
[PexRaisedException(typeof(PexAssertFailedException))]
[HostType("Moles")]
public void WriteLineThrowsPexAssertFailedException25()
{
    this.WriteLine("\udc00");
}

The strange thing is, as soon as the stack leaves the unit test with udc00 and enters the parameterized test, the param Line is represented by: '�'

As you can see I'm doing all the buffering with Unicode. When I finally try to read the string from the memory stream, I get a weird symbol that looks like a diamond with a question mark inside.

There error I get is: "PexAssertFailedException" "Expected 'weird symbol', got '�'"

Does anyone know what's going on?


Solution

  • Encoding.Unicode is UTF-16, and can't represent surrogate pairs (U+D800 to U+DFFF) at all. They are replaced with the "Replacement Character"-character (U+FFFD) when you try.

    Surprisingly, .NET can store surrogate pairs in strings. So, when you encode, and later decode the character, it will no longer match the original string.