Search code examples
c#parsingfixed-width

Parsing variable length strings of fixed column widths


I am trying to parse a text report that is formatted into columns. Each column appears to be right justified with a fixed length. For each line, there are times that not all the columns are used. In that case it appears that spaces are used to justify each column in the line. Example input:

031   91    1221,154
043   66     312,222    1      3,047                       3,047    1.5%    .9%
040  118     529,626    1      1,842                       1,842     .8%    .3%
037   45     427,710
019   80     512,153    1     14,685                      14,685    1.2%   2.8%
009   68     520,301                      1    16,085     16,085    1.4%   3.0%
030   13     106,689                      1     1,581      1,581    7.6%   1.4%
008   54     377,593    1      7,098                       7,098    1.8%   1.8%
018   24     171,264
022   25       8,884    1        433                         433    4.0%   4.8%
035    9      42,043
041   13     112,355

The column widths appear to be as follows (in character counts including white spaces): 3,5,12,6,10,7,10,11,8,7.

What is a good way to parse this? I have tried using a regular expression to do it, but it obviously fails on the first line being read in because I am using an expression that expects the whole line to have data:

string pattern = @"^(?.{3})(?.{5})(?.{12})(?thirtyeightyninenumber>.{6})(?{10})(?.{7}(?.{10})(?.{11})(?.{8})(?.{7})";

Looking for a good way to read this into appropriate variables depending on whether that column has data or not. I feel like I need to throw a bunch of if checks in, but am hoping there is a better way I am not thinking of.

Thanks for any help.

BTW - I am reading the lines using a StreamReader and ReadLine.


Solution

  • Don't use regular expressions for this. You know the number of columns and the widths of those columns, so just use String.Substring and String.Trim:

    string field1 = line.Substring(0, 5).Trim();
    string field2 = line.Substring(5, 3).Trim();
    string field3 = line.Substring(12, 8).Trim();
    /* etc, etc */