Search code examples
c#csvcsvhelper

How to create a new CSV file and copy some of columns from old huge CSV file?


I'm a beginner in C#.

I have a huge CSV files with thousands of data(Rows and Columns). With one filter I have created a list called requiredheaders. In this list I took only those header names which are required further.

So, based on this header list, I wanted to create a new CSV file and need to copy those header names and data, which CSV headers are available in requiredheaders list.

var requiredheaders = new [] { "Column B", "Column D" };

For example: old csv

Column A Column B Column C Column D
DEF ABC BBB ABC
ABC DEF ABC BBB
BBB ABC DEF ABC
ABC BBB BBB DEF
BBB ABC ABC BBB
DEF DEF BBB DEF

New csv should be:

Column B Column D
ABC ABC
DEF BBB
ABC ABC
BBB DEF
ABC BBB
DEF DEF

I used CsvHelper package and I tried this solution but my requiredheaders list is dynamic and it changes based on some filters.

class Program
{
    static void Main()
    {
        // Specify the paths to the old and new CSV files
        string oldCsvFilePath = "old_data.csv";
        string newCsvFilePath = "new_data.csv";

        // Create a StreamWriter for the new CSV file
        using (var newFileWriter = new StreamWriter(newCsvFilePath))
        using (var csv = new CsvWriter(newFileWriter, new CsvConfiguration(CultureInfo.InvariantCulture)))
        {
            // Create a CsvReader for the old CSV file
            using (var oldFileReader = new StreamReader(oldCsvFilePath))
            using (var csvReader = new CsvReader(oldFileReader, new CsvConfiguration(CultureInfo.InvariantCulture)))
            {
                // Read the records from the old file and write selected data to the new file
                while (csvReader.Read())
                {
                    var record = csvReader.GetRecord<NewClass>();
                    
                    if (record.SomeCondition) 
                    {
                        csv.WriteRecord(record);
                    }
                }
            }
        }
    }
}


public class NewClass
{
    // Define properties that match the columns in your CSV file
    public int Id { get; set; }
    public string Name { get; set; }
    // Add more properties as needed
}


Solution

  • I maintain some packages that should make this very easy: Sylvan.Data.Csv and Sylvan.Data. The Sylvan.Data.Csv package provides the CSV reader/writer, while the Sylvan.Data package provides the Select method, which offers a LINQ-like projection over DbDataReader, of which CsvDataReader is an extension.

    Here is a complete C# program demonstrating how these libraries can be used to accomplish your goal. No need to define a strongly-typed class to represent your data, the requiredHeaders can be generated dynamically at runtime if needed.

    using Sylvan.Data;
    using Sylvan.Data.Csv;
    
    string oldCsvFilePath = "old_data.csv";
    string newCsvFilePath = "new_data.csv";
    var requiredheaders = new string[] { "Column B", "Column D" };
    
    using var reader = CsvDataReader.Create(oldCsvFilePath);
    using var writer = CsvDataWriter.Create(newCsvFilePath);
    
    writer.Write(reader.Select(requiredheaders));