Search code examples
c#filewidthfixed

C# Processing Fixed Width Files - Solution Not Working


I have implemented Cuong's solution here: C# Processing Fixed Width Files

Here is my code:

        var lines = File.ReadAllLines(@fileFull);
        var widthList = lines.First().GroupBy(c => c)
        .Select(g => g.Count())
        .ToList();

        var list = new List<KeyValuePair<int, int>>();

        int startIndex = 0;

        for (int i = 0; i < widthList.Count(); i++)
        {
            var pair = new KeyValuePair<int, int>(startIndex, widthList[i]);
            list.Add(pair);

            startIndex += widthList[i];
        }

        var csvLines = lines.Select(line => string.Join(",",
        list.Select(pair => line.Substring(pair.Key, pair.Value))));

        File.WriteAllLines(filePath + "\\" + fileName + ".csv", csvLines);

@fileFull = File Path & Name

The issue I have is the first line of the input file also contains digits. So it could be AAAAAABBC111111111DD2EEEEEE etc. For some reason the output from Cuong's code gives me CSV headings like 1111RRRR and 222223333.

Does anyone know why this is and how I would fix it?


Header row example:

AAAAAAAAAAAAAAAABBBBBBBBBBCCCCCCCCDEFCCCCCCCCCGGGGGGGGHHHHHHHHIJJJJJJJJKKKKLLLLMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOPPPPQQQQ1111RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR222222222333333333444444444555555555666666666777777777888888888999999999S00001111TTTTTTTTTTTTUVWXYZ!"£$$$$$$%&  

Converted header row:

AAAAAAAAAAAAAAAA    BBBBBBBBBB  CCCCCCCCDEFCCCCCC   C   C   C   GGGGGGGG    HHHHHHHH    I   JJJJJJJJ    KKKK    LLLL    MMMMMMMMMMMMMMMMMMMMMMMMMMMMMM  NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN  OOOOOOOOOOOOOOOOOOOOOOOOOOOOOO  PPPP    QQQQ    1111RRRR    RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR2222    222223333   333334444   444445555   555556666   666667777   777778888   888889999   99999S000   0   1111    TTTTTTTTTTTT    U   V   W   X   Y   Z   !   ",�,$$$$$$,%,&,"  

Jodrell - I implemented your suggestion but the header output is like:

BBBBBBBBBBCCCCCC    CCCCCCCCD   DEFCCCC             GGGGGGGG    HHHHHHH IJJJJJJ     KKKKLLL LLL MMM NNNNNNNNNNNNNNNNNNNNNNNNNNNNN   OOOOOOOOOOOOOOOOOOOOOOOOOOOOO   PPPPQQQQ1111RRRRRRRRRRRRRRRRR   QQQ 111 RRR 33333333    44444444    55555555    66666666    77777777    88888888    99999999    S0000111        111 TTT UVWXYZ!"�$$                                       %&

Solution

  • As Jodrell already mentioned, your code doesn't work because it assumed that the character representing each column header is distinct. Change the code that parse the header widths would fix it.

    Replace:

    var widthList = lines.First().GroupBy(c => c)
    .Select(g => g.Count())
    .ToList();
    

    With:

    var widthList = new List<int>(); 
    var header = lines.First().ToArray(); 
    for (int i = 0; i < header.Length; i++) 
    { 
        if (i == 0 || header[i] != header[i-1]) 
            widthList.Add(0); 
        widthList[widthList.Count-1]++; 
    }
    

    Parsed header columns:

    AAAAAAAAAAAAAAAA    BBBBBBBBBB  CCCCCCCC    D   E   F   CCCCCCCCC   GGGGGGGG    HHHHHHHH    I   JJJJJJJJ    KKKK    LLLL    MMMMMMMMMMMMMMMMMMMMMMMMMMMMMM  NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN  OOOOOOOOOOOOOOOOOOOOOOOOOOOOOO  PPPP    QQQQ    1111    RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR    222222222   333333333   444444444   555555555   666666666   777777777   888888888   999999999   S   0000    1111    TTTTTTTTTTTT    U   V   W   X   Y   Z   !   "   £   $$$$$$  %   &