Search code examples
c#unit-testingassert

Assert.AreEqual failing when comparing a string to a file read from disc


I'm writing a code generator that will accept a data table, and using that will generate boiler plate c# code from what it finds in the data table.

I'm at the point of having a c# code file I created and comparing it to a string that is generated by my code generator.

I read the code file in from disk to one string, and compare it to the string generated, and pass the string as parameters to Assert.AreEqual - which fails. If I write out the generated string to a text file and compare, the text appears identical - however the file sizes are slightly different, and using a file compare utility, there appears to be an extra upper ascii type character at the end of the file that was created via my code generator.

Regarding the "upper ascii" characters, if I compare the files with a hex editor, there are a few extra hex values at the beginning and end of the file that was created with Visual Studio that do not exist in the file created by my application. Those hex values at the beginning are: "EF BB BF" and the values at the end are: "0D 0A".

An additional clue that might explain something: When I add the generated file to a project in Visual Studio, I'm presented with the message: "The line endings in the following file are not consistent. Do you want to normalize the line endings?"

Contents of unit test:

    [TestMethod]
    public void TestGenerateBDO()
    {

        const string originalCodePath = @"c:\temp\UnitTestGenerator\BugSource.cs";

        BusinessDomainGenerator generator = 
            new BusinessDomainGenerator(new System.Data.DataTable(), "BugsBDO", "Bug");

        // this adds the body of the text file
        AddTestGenerateBDOCodeLines(generator);

        // I've tried using the 2nd parameter of ReadAllText to pass
        //  different encodings - no difference
        string originalCode = System.IO.File.ReadAllText(originalCodePath);
        string formattedCode = generator.GetGeneratedCode();

        Assert.AreEqual(originalCode, formattedCode); 

    }

Solution

  • What I normally do in these situations:

    1. Debug the unit test until I get to the two strings being compared
    2. Copy and paste the strings into a text editor (use the "Text Visualizer")

    Screenshot of how to open the text visualizer

    1. Use a diff tool if the difference is not obvious.

    Note that 0D 0A is "carriage return and line feed" (\r\n), i.e. a new line. This could well be your issue as a string with a \r\n at the end is different from a string without. If this is the case you could probably handle this by calling Trim() on both strings first.

    EF BB BF is a byte order mark and is present at the start of the file to indicate that the file is encoded in UTF-8. When reading the file the .Net framework will use this information to decide what encoding to use, but they won't be part of the string and so wouldn't cause your test to fail.