Search code examples
c#unit-testingparsingflat-filemspec

How to effectively test a fixed length flat file parser using MSpec?


I have this method signature: List<ITMData> Parse(string[] lines)

ITMData has 35 properties.

How would you effectively test such a parser?

Questions:

  • Should I load the whole file (May I use System.IO)?
  • Should I put a line from the file into a string constant?
  • Should I test one or more lines
  • Should I test each property of ITMData or should I test the whole object?
  • What about the naming of my test?

EDIT

I changed the method signature to ITMData Parse(string line).

Test Code:

[Subject(typeof(ITMFileParser))]
public class When_parsing_from_index_59_to_79
{
    private const string Line = ".........";
    private static ITMFileParser _parser;
    private static ITMData _data;

    private Establish context = () => { _parser = new ITMFileParser(); };

    private Because of = () => { _data = _parser.Parse(Line); };

    private It should_get_fldName = () => _data.FldName.ShouldBeEqualIgnoringCase("HUMMELDUMM");
}

EDIT 2

I am still not sure if I should test only one property per class. In my opinion this allows me to give more information for the specification namely that when I parse a single line from index 59 to index 79 I get fldName. If I test all properties within one class I loss this information. Am I overspecifying my tests?

My Tests now looks like this:

[Subject(typeof(ITMFileParser))]
public class When_parsing_single_line_from_ITM_file
{
    const string Line = ""

    static ITMFileParser _parser;
    static ITMData _data;

    Establish context = () => { _parser = new ITMFileParser(); };

    private Because of = () => { _data = _parser.Parse(Line); };

    It should_get_fld??? = () => _data.Fld???.ShouldEqual(???);
    It should_get_fld??? = () => _data.Fld???.ShouldEqual(???);
    It should_get_fld??? = () => _data.Fld???.ShouldEqual(???);
    It should_get_fld??? = () => _data.Fld???.ShouldEqual(???);
    It should_get_fld??? = () => _data.Fld???.ShouldEqual(???);
    It should_get_fld??? = () => _data.Fld???.ShouldEqual(???);
    It should_get_fld??? = () => _data.Fld???.ShouldEqual(???);
    ...

}

Solution

  • Here's what I would normally do if I'm facing such a problem:

    One short disclaimer in advance: I think I would more go down the "integration testing" or "testing the parser as a whole" route rather than testing individual lines. In the past I've more than once faced the situation where lots of implementation details leaked into my tests and forced me to change the tests often when I changed implementation details. Typical case of overspecification I guess ;-/

    1. I wouldn't include file loading in the parser. As @mquander suggested I would rather go with a TextReader or an IEnumerable as the input parameter instead. This will result in way faster tests since you're able to specify the parser input in-memory and don't have to touch the file system.
    2. I'm not a big fan of hand rolling test data, so in most cases I'm using embedded resources and the ResourceManager to load test data directly from the specification assembly via assembly.GetManifestResource(). I typically have a bunch of extension methods in my solution to streamline the reading of resources (something like TextReader TextResource.Load("NAME_OF_SOME_RESOURCE")).
    3. Regarding MSpec: I'm using one class per file to parse. For each property that is tested in the parsed result I've a separate (It)assertion. These are normally one liners, so the additional amount of coding isn't that big. In terms of documentation and diagnostics imho it's a huge plus since when a property isn't parsed correctly you can see directly which assertion failed without having to look into the source or searching for line numbers. It also appears in your MSpec result file. Besides, you don't hide other failed assertions (the situation where you fix one assertion only to see the spec fail on the next line with the next assertion). This of course forces you to think more about the wording you use in your specifications but for me that's also a huge plus since I'm a proponent of the idea that language forms thinking. In other words, if you've no clue how to frackin name your assertion there's probably something fishy either about your specification or your implementation.
    4. Regarding your method signature for the parser: I wouldn't return a concrete type like List<T> or an array and I would also suggest not to return the mutable List<T> type. What you're basically saying here is: "Hey, you can muck around with the parsing result after I've finished" which in most cases is probably what you don't want. I would suggest to return IEnumerable<T> instead (or ICollection<T> if you REALLY need to modify it afterwards)