Search code examples
c#regexsplitstreamreaderfixed-width

How to split lines with regex C#


Does anyone know how to split this file with regex

1 TESTAAA      SERNUM    A DESCRIPTION
2 TESTBBB      ANOTHR    ANOTHER DESCRIPTION
3 TESTXXX      BLAHBL

The lenght of each column

{id} {firsttext} {serialhere} {description}
 4    22          6            30+

I'm planning to do it with a regex to store all my values in a string[] like this.

        using (StreamReader sr = new StreamReader("c:\\file.txt"))
        {
            string line = string.Empty;
            string[] source = null;
            while ((line = sr.ReadLine()) != null)
            {
                source = Regex.Split(line, @"(.{4})(.{22})(.{6})(.+)", RegexOptions.Singleline);
            }

        }

But I have 2 problems.

  1. The split creates a 6 elements source[0] = "" and source[5] ="" when as you can see I have only 4 elements(columns) per line.
  2. In the case of 3rd line which have the 4th column, if I have blank spaces it creates a position for it but if there's no blank spaces this column is missed.

So what would be the best pattern or solution to split with regex or another solution will be aprreciate it!!! I want to split fixed width. Thanks.


Solution

  • Using a regular expression seems like overkill, when you already know exactly where to get the data. Use the Substring method to get the parts of the string:

    string[] source = new string[]{
      line.Substring(0, 4),
      line.Substring(4, 22),
      line.Substring(26, 6),
      line.Substring(32)
    };
    

    Edit:

    To make it more configurable, you can use column widths from an array:

    int[] cols = new int[] { 4, 22, 6 };
    
    string[] source = new string[cols.Length + 1];
    int ofs = 0;
    for (int i = 0; i < cols.Length; i++) {
      source[i] = line.Substring(ofs, cols[i]);
      ofs += cols[i];
    };
    source[cols.Length] = line.Substring(ofs)