Search code examples
c#csvcsvhelper

Looping through tabular CSV files doesn't return intended fields


I have a 20,000 line CSV file with this data:

Location, Light, Proximity, Ax, Ay, Az, Gx, Gy, Gz
SidePocket 2.0 0.0 -1.1259307861328125 -10.622817993164063 0.8393707275390625 0.7456817626953125 -2.3446502685546875 -0.6551361083984375
HandBag 2.0 0.0 -1.1259307861328125 -10.622817993164063 0.8393707275390625 0.8383636474609375 -3.1872711181640625 -0.064971923828125
SidePocket 2.0 0.0 0.5566253662109375 -9.675201416015625 1.7905426025390625 0.8383636474609375 -3.1872711181640625 -0.064971923828125
SidePocket 2.0 0.0 0.5566253662109375 -9.675201416015625 1.7905426025390625 0.170440673828125 -2.976348876953125 0.05218505859375
BackPocket 2.0 0.0 -0.3665771484375 -9.739242553710938 2.12567138671875 0.170440673828125 -2.976348876953125 0.05218505859375
SidePocket 2.0 0.0 -0.3665771484375 -9.739242553710938 2.12567138671875 -0.1981201171875 -1.846099853515625 0.290802001953125
Ear 2.0 0.0 -0.490264892578125 -9.91455078125 1.34954833984375 -0.1981201171875 -1.846099853515625 0.290802001953125

I want something to loop through all rows of a column based on the column header and do some computation. I tried using the CsvHelper library:

private void btnBrowse_Click(object sender, RoutedEventArgs e)
{
    OpenFileDialog openFileDialog = new OpenFileDialog();
    if (openFileDialog.ShowDialog() == true)
        textReader = File.ReadAllText(openFileDialog.FileName);
    stringParse = new StringReader(textReader);
    txtOutput.Text = "";
}

private void btnParse_Click(object sender, RoutedEventArgs e)
{
    var csv = new CsvReader(stringParse);
    while( csv.Read() )
    {               
        var stringField = csv.GetField<string>( "Location" );
        txtOutput.Text += DoSomething(stringField.ToString()) + "\n";
    }
}

The problem I face are:

  • Firstly, this is very, very slow. Halts my small WPF app.
  • Secondly, it doesn't return the values for the intended field. It dumps the whole CSV file in the output.

Where have I made a mistake?

Here's how the actual CSV file looks:

Location, Light, Proximity, Ax, Ay, Az, Gx, Gy, Gz
"SidePocket" 2.0 0.0 -1.1259307861328125 -10.622817993164063 0.8393707275390625 0.7456817626953125 -2.3446502685546875 -0.6551361083984375
"HandBag" 2.0 0.0 -1.1259307861328125 -10.622817993164063 0.8393707275390625 0.8383636474609375 -3.1872711181640625 -0.064971923828125
"SidePocket" 2.0 0.0 0.5566253662109375 -9.675201416015625 1.7905426025390625 0.8383636474609375 -3.1872711181640625 -0.064971923828125

Solution

  • I have no problem reading a csv file of 4 times the size of your 20,000 lines in under 2 seconds. I don't know what is happening in DoSomething so that might be your next call of investigation.

    I have made the following changes to address the two issues you describe.

    First: use a StringBuilder to prevent the creation and discarding of lots of strings in your loop.

    Second: Set the Delimiter in the Configuration object of the CsvHelper to a Space but do include the comma of your header field name in that case:

    var sw = new Stopwatch();
    sw.Start();
    using(var csv = new CsvReader(new StreamReader(@"csv-test.txt")))
    {
        csv.Configuration.Delimiter=" "; // space
    
        var sb = new StringBuilder();
    
        while (csv.Read())
        {
            var stringField = csv.GetField<string>("Location,"); // the comma is relevant
            // or use sb.AppendFormat("{0}\n", DoSomething(stringField));
            sb.AppendLine(stringField);
        }
        txtOutput.Text = sb.ToString();
    }
    sw.Stop();
    Console.WriteLine(sw.ElapsedMilliseconds);
    

    For 102,000 lines above code runs on my box in 1,7 seconds.