Search code examples
c#.net.net-corechoetl

Modify CSV file headers/column names using Cinchoo ETL


I have a .Net Core application where I want to change the column names of a csv file. I'm using the Cinchoo ETL library. I have tried the following:

string csv = "../../../../data.csv";
using (var w = new ChoCSVWriter(csv).WithFirstLineHeader().Setup(s => s.FileHeaderWrite += (o, e) =>
{
    e.HeaderText = "Test,Test2";
}))
{
    w.Write(csv);
}

This is what my data.csv file looks like:

ID,Name
1, David
2, Bob

This is what my csv looks like after running my code:

Test,Test2
../../../../data.csv

The csv header names have changed but my issue is that it deleted all my data and added the path to the file for some odd reason. Any ideas on why that is?


Solution

  • Couple of ways you can rename the columns with new names and produce the CSV output

    Option1:

    StringBuilder csvIn = new StringBuilder(@"ID,Name
    1, David
    2, Bob");
    
    StringBuilder csvOut = new StringBuilder();
    
    using (var r = new ChoCSVReader(csvIn)
        .WithFirstLineHeader()
        )
    {
        using (var w = new ChoCSVWriter(csvOut)
            .WithFirstLineHeader()
            )
            w.Write(r.Select(r1 => new { Test1 = r1.ID, Test2 = r1.Name }));
    }
    
    Console.WriteLine(csvOut.ToString());
    

    Option2:

    StringBuilder csvIn = new StringBuilder(@"ID,Name
    1, David
    2, Bob");
    
    StringBuilder csvOut = new StringBuilder();
    
    using (var r = new ChoCSVReader(csvIn)
        .WithFirstLineHeader()
        )
    {
        using (var w = new ChoCSVWriter(csvOut)
            .WithFirstLineHeader()
            .Setup(s => s.FileHeaderWrite += (o, e) =>
            {
                e.HeaderText = "Test,Test2";
            })
            )
            w.Write(r);
    }
    
    Console.WriteLine(csvOut.ToString());
    

    UPDATE:

    Using CSV files instead of text input

    string csvInFilePath = @"C:\CSVIn.csv"
    string csvOutFilePath = @"C:\CSVOut.csv"
    
    using (var r = new ChoCSVReader(csvInFilePath)
        .WithFirstLineHeader()
        )
    {
        using (var w = new ChoCSVWriter(csvOutFilePath)
            .WithFirstLineHeader()
            )
            w.Write(r.Select(r1 => new { Test1 = r1.ID, Test2 = r1.Name }));
    }
    

    UPDATE:

    To get the headers, cast record to IDictionary and use Keys property on it to get the keys

    string csvInFilePath = @"C:\CSVIn.csv"
    string csvOutFilePath = @"C:\CSVOut.csv"
    
    using (var r = new ChoCSVReader(csvInFilePath)
        .WithFirstLineHeader()
        )
    {
        foreach (IDictionary<string, object> rec in r)
        {
             var keys = rec.Keys.ToArray();
        }
    }
    

    In order to auto discover the datatypes of CSV columns, you must set the MaxScanRows on parser. Otherwise all columns will be treated as string type.

    StringBuilder csvIn = new StringBuilder(@"ID,Name,Date
    1, David, 1/1/2018
    2, Bob, 2/12/2019");
    
    using (var r = new ChoCSVReader(csvIn)
        .WithFirstLineHeader()
        .WithMaxScanRows(2)
        )
    {
        foreach (IDictionary<string, object> rec in r.Take(1))
        {
            foreach (var kvp in rec)
                Console.WriteLine($"{kvp.Key} - {r.Configuration[kvp.Key].FieldType}");
        }
    }
    

    Hope it helps.