Search code examples
c#.netexceloledbepplus

How to match date to row then get the final column value using EPPlus?


So far I can do an easy data grab from the spreadsheet, just getting the ref number row but I currently don't know how to match the row to the data section before getting the correct data out.

I currently have to extract some data from the excel spreadsheet example below:

Start date  Ref number
29/07/2015  2342326
01/07/2016  5697455
02/08/2016  3453787
02/08/2016  5345355
02/08/2015  8364456
03/08/2016  1479789
04/07/2015  9334578

The main question is would it be possible to read in the data from a set date, from the row and get the ref number form the set date e.g. Start date.

For example if i just wanted the data from the date set to the 1st of last month and upwards.

How would this be best implemented.

Example of current code used to get the column, using basic OleDb:

using System;
using System.Data.OleDb;
using System.Text.RegularExpressions;

namespace Number_Cleaner
{
    public class NumberCleanerReport
    {
        public void runExcel_Report()
        {
            Console.ForegroundColor = ConsoleColor.Green;
            Console.WriteLine("[*][START OF: NumberExt.xls, Number Extraction]");
            Console.ResetColor();
            string con =
            @"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=NumberExt.xls;" +
            @"Extended Properties='Excel 8.0;HDR=Yes;'";

        string connectionString = ExcelWriter.GetConnectionString();

        using (OleDbConnection conn = new OleDbConnection(connectionString))
        {
            conn.Open();
            OleDbCommand cmd = new OleDbCommand();
            cmd.Connection = conn;

            using (OleDbConnection connection = new OleDbConnection(con))
            {
                connection.Open();
                OleDbCommand command = new OleDbCommand("select * from [Sheet1$]", connection);
                System.IO.StreamWriter files = new System.IO.StreamWriter(Controller.fpath + "NumberExtOutput.txt");
                using (OleDbDataReader dr = command.ExecuteReader())
                {
                    while (dr.Read())
                    {
                        var row1Col0 = dr[0];
                        string ExcelData = row1Col0.ToString();
                        string subStr = "null";

                        try
                        {
                            subStr = ExcelData.Substring(0, 6);
                        }
                        catch
                        {
                            //Console.WriteLine("Found Nulls.");
                        }

                        if (subStr == "00")
                        {
                            string result = Regex.Replace(ExcelData, "^00", "0");
                            Console.WriteLine(result);
                            files.WriteLine(result);
                            cmd.CommandText = "INSERT INTO [table1]('MainNmbers') VALUES(" + result + ");";
                            cmd.ExecuteNonQuery();
                        }
                    }
                    files.Close();
                    conn.Close();
                    Console.ForegroundColor = ConsoleColor.Green;
                    Console.WriteLine("[*][END OF: NumberExt.xls, RefNumber Extraction]");
                    Console.ResetColor();
                }
            }
        }
    }
}
}

Solution

  • You could load it into a DataTable and then filter it with Linq-To-DataTable. Here's a method that reads all as strings. You could modify it so that it parse the first column to DateTime and the second to int with DateTime.Parse or DateTime.ParseExact and int.Parse:

    public static DataTable GetDataTableFromExcel(string path, bool hasHeader = true)
    {
        using (var pck = new OfficeOpenXml.ExcelPackage())
        {
            using (var stream = File.OpenRead(path))
            {
                pck.Load(stream);
            }
            var ws = pck.Workbook.Worksheets.First();  
            DataTable tbl = new DataTable();
            foreach (var firstRowCell in ws.Cells[1, 1, 1, ws.Dimension.End.Column])
            {
                tbl.Columns.Add(hasHeader ? firstRowCell.Text : string.Format("Column {0}", firstRowCell.Start.Column));
            }
            var startRow = hasHeader ? 2 : 1;
            for (int rowNum = startRow; rowNum <= ws.Dimension.End.Row; rowNum++)
            {
                var wsRow = ws.Cells[rowNum, 1, rowNum, ws.Dimension.End.Column];
                DataRow row = tbl.Rows.Add();
                foreach (var cell in wsRow)
                {
                    row[cell.Start.Column - 1] = cell.Text;
                }
            }
            return tbl;
        }
    }
    

    Once you have the correct types in the table the query is simple:

    var rowsOfInterest = table.AsEnumerable()
        .Where(row => row.Field<DateTime>("Start date") >= new DateTime(2016, 7, 1))
        .ToList();
    

    If you need it as a DataTable:

    DataTable resultTable = table.Clone();  // empty table with correct columns
    if(rowsOfInterest.Count > 0)
        resultTable = rowsOfInterest.CopyToDataTable();