Search code examples
c#filecsvheadertableheader

CSV File Splitting with specific size


Hi guys I've a function which will create multiple CSV files from a DataTable in smaller chunks based on size passed through app.config key/value pair.

Issues with the code below:

  1. I've hardcoded the file size to 1 kb, when I'll pass a value of 20, it should created csv file of 20kb. Currently it's creating a file size of 5kb for the same value.
  2. For the last left records it's not creating any file.

Kindly help me to fix this. Thanks!

code :

public static void CreateCSVFile(DataTable dt, string CSVFileName)
    {

        int size = Int32.Parse(ConfigurationManager.AppSettings["FileSize"]);
        size *= 1024; //1 KB size
        string CSVPath = ConfigurationManager.AppSettings["CSVPath"];

        StringBuilder FirstLine = new StringBuilder();
        StringBuilder records = new StringBuilder();

        int num = 0;
        int length = 0;

        IEnumerable<string> columnNames = dt.Columns.Cast<DataColumn>().Select(column => column.ColumnName);
        FirstLine.AppendLine(string.Join(",", columnNames));
        records.AppendLine(FirstLine.ToString());

        length += records.ToString().Length;

        foreach (DataRow row in dt.Rows)
        {
            //Putting field values in double quotes
            IEnumerable<string> fields = row.ItemArray.Select(field =>
                string.Concat("\"", field.ToString().Replace("\"", "\"\""), "\""));

            records.AppendLine(string.Join(",", fields));
            length += records.ToString().Length;

            if (length > size)
            {
                //Create a new file
                num++;
                File.WriteAllText(CSVPath + CSVFileName + DateTime.Now.ToString("yyyyMMddHHmmss") + num.ToString("_000") + ".csv", records.ToString());
                records.Clear();
                length = 0;
                records.AppendLine(FirstLine.ToString());
            }

        }            
    }  

Solution

  • Use File.ReadLines, Linq means deferred execution will be performed.

    foreach(var line in File.ReadLines(FilePath))
    {
       // logic here.
    }
    

    From MSDN

    The ReadLines and ReadAllLines methods differ as follows: When you use ReadLines, you can start enumerating the collection of strings before the whole collection is returned; when you use ReadAllLines, you must wait for the whole array of strings be returned before you can access the array. Therefore, when you are working with very large files, ReadLines can be more efficient.

    Now so, you could rewrite your method as below.

        public static void SplitCSV(string FilePath, string FileName)
        {
            //Read Specified file size
            int size = Int32.Parse(ConfigurationManager.AppSettings["FileSize"]);
    
            size *= 1024 * 1024;  //1 MB size
    
            int total = 0;
            int num = 0;
            string FirstLine = null;   // header to new file                  
            var writer = new StreamWriter(GetFileName(FileName, num));
    
            // Loop through all source lines
            foreach (var line in File.ReadLines(FilePath))
            {
                if (string.IsNullOrEmpty(FirstLine)) FirstLine = line;
                // Length of current line
                int length = line.Length;
    
                // See if adding this line would exceed the size threshold
                if (total + length >= size)
                {
                    // Create a new file
                    num++;
                    total = 0;
                    writer.Dispose();
                    writer = new StreamWriter(GetFileName(FileName, num));
                    writer.WriteLine(FirstLine);
                    length += FirstLine.Length;
                }
    
                // Write the line to the current file                
                writer.WriteLine(line);
    
                // Add length of line in bytes to running size
                total += length;
    
                // Add size of newlines
                total += Environment.NewLine.Length;
            }
       }