Search code examples
c#exceloledb

Reading multiple excel sheets into a single tab delimited text file


I have multiple xls and xlsx files I want to convert to tab delimited. The code below reads in the first sheet an amount of times equal to the number sheets. i.e. for example if file has three worksheets the first worksheet will be printed to file 3 times vs each worksheet printed once.

any ideas?

                    Microsoft.Office.Interop.Excel.Workbook book = appTwo.Workbooks.Open(strfileName);
                    int numSheet = book.Worksheets.Count;
                    List<Microsoft.Office.Interop.Excel.Worksheet> sheets = new List<Microsoft.Office.Interop.Excel.Worksheet>();
                    foreach (Microsoft.Office.Interop.Excel.Worksheet sheet in book.Worksheets)
                    {
                        int numberOfSheets = 0;
                        System.Data.OleDb.OleDbCommand objCommand = new System.Data.OleDb.OleDbCommand("Select * From [" + dR["TABLE_NAME"].ToString() + "]", Connection1);
                        System.Data.OleDb.OleDbDataAdapter objAdapter = new System.Data.OleDb.OleDbDataAdapter();
                        objAdapter.SelectCommand = objCommand;
                        DataSet ds = new DataSet();
                        objAdapter.Fill(ds);
                        DataRow dataRow = ds.Tables[0].Rows[0];
                        if (dR["TABLE_NAME"].ToString().StartsWith("Sheet"))
                        {
                            numberOfSheets++;
                        }

                        for (int i = 0; i <= ds.Tables[0].Rows.Count - 1; i++)
                        {
                            for (int j = 0; j < ds.Tables[0].Columns.Count; j++)
                            {
                                if (!string.IsNullOrEmpty(ds.Tables[0].Rows[i].ItemArray[j].ToString()))
                                {
                                    swOutput.Write(ds.Tables[0].Rows[i].ItemArray[j].ToString());

                                    if (j != ds.Tables[0].Columns.Count - 1)
                                    {
                                        swOutput.Write('\t');
                                    }

                                }```

Solution

  • For you code i think the problem might be here swOutput.Write(ds.Tables[0].Rows[i].ItemArray[j].ToString()); given that swOutput is defined outside your for each statement, the file name is not changing hence it's writing the worksheets to the same file.

    Personally wouldn't want this to be depended on Microsoft.Office.Interop.Excel as this only works if you have excel installed on the machine running the code. Nugets are your friends, for excel I use EPPLUS and for CSV i use CsvHelper

    Unfortunately Epplus doesn't seem to support .XLS files however there are a variety of tools online to convert XLS to XLSX programmatically,or you can use Microsoft.Office.Interop.Excel.Workbook just for the conversion if it comes to that .So i will leave this to you.

    Assumptions

    • All Files are XLSX
    • The data in the worksheets is the form of a table where the first row is the headers

    So wrote some methods and how you use them,

    How to use

        ExcelPackage.LicenseContext = OfficeOpenXml.LicenseContext.NonCommercial;
        
        var FolderWithFiles = @"D:\Temp\RubbishFolder";
    
        foreach (FileInfo file in Directory.EnumerateFiles(FolderWithFiles,"*.xlsx",SearchOption.TopDirectoryOnly).Select(t=>new FileInfo(t)))
        {           
            ExcelDocToCsv(file);
        }
    
    

    Methods

    private void ExcelDocToCsv(FileInfo filepath){
        
        using (ExcelPackage doc = new ExcelPackage(filepath))
        {
            int NumberOfWorksheets =doc.Workbook.Worksheets.Count();
            var WorkSheetNames= doc.Workbook.Worksheets.Select(w =>w.Name).ToArray();
            foreach (string worksheet in WorkSheetNames)
            {
                var CsvFileName =   Path.Combine(filepath.Directory.FullName,"[File_"+Path.GetFileNameWithoutExtension(filepath.FullName)+"] [WorkSheet_"+worksheet+"].csv");
                
                WriteToCSV(CreateTableFromWorksheet(doc.Workbook.Worksheets[worksheet]),CsvFileName);
            }
    
        }
    }
    
    private DataTable CreateTableFromWorksheet(ExcelWorksheet worksheet)
    {
        DataTable dt = new DataTable();
    
        ExcelCellAddress StartCell = worksheet.Dimension.Start;
        ExcelCellAddress EndCell = worksheet.Dimension.End;
    
        for (int i = StartCell.Column; i <= EndCell.Column; i++)
        {
            dt.Columns.Add(worksheet.Cells[1,i].Value.ToString());
        }
    
        for (int i = StartCell.Row+1; i <= EndCell.Row; i++)
        {
            DataRow row = dt.NewRow();
            int r = 0;  
            
            for (int b = StartCell.Column; b <= EndCell.Column; b++)
            {
                row[r++] = worksheet.Cells[i, b].Value;
            }
            
            dt.Rows.Add(row);
        }
    
        return dt;
        
    }
    
    private void WriteToCSV(DataTable dt,string WriteTo){
    
    // https://github.com/JoshClose/CsvHelper/issues/1399
    
        using (var writer = new StreamWriter(WriteTo))
        using (var csv = new CsvWriter(writer, CultureInfo.InvariantCulture))
        {
            csv.Configuration.Delimiter ="\t";
            foreach (DataColumn dc in dt.Columns)
            {
                csv.WriteField(dc.ColumnName);
            }
            csv.NextRecord();
    
            foreach (DataRow dr in dt.Rows)
            {
                foreach (DataColumn dc in dt.Columns)
                {
                    csv.WriteField(dr[dc]);
                }
                csv.NextRecord();
            }
        }
    
    }
    

    Will leave any error checking and handling to you.

    This will write each worksheet in a document to a csv