Search code examples
c#.netexcel.net-corepoco

ExcelMapper - Skip Row based on Condition


Reading an excel file containing financial data and mapping it to an object using ExcelMapper.

    public static async IAsyncEnumerable<TReceipt> ReadBankEntry(string file)
    {
        using (FileStream stream = new FileStream(file, FileMode.Open, FileAccess.ReadWrite))
        {
            ExcelMapper mapper = new ExcelMapper() 
            { 
                HeaderRowNumer = 0 // zero is the default
                MinRowNumber = 2 // actual data starts from here (skipping row=1)
                // MaxRowNumber = ? this is dynamic and can change in every excel file.
            };
            // my mapping table here
            foreach (TReceipt bankEntry in await mapper.FetchAsync<TReceipt>(stream, "Sheet1"))
            {
                yield return bankEntry;
            }
        };
    }

The last 3 rows in the file contain information that I do not need and also contains string values in certain columns that should only contain decimal types, which ends up throwing an exception. Also some columns are empty starting from this line.

Example -

+----+-----------------+--------------+
| SL |      Date       |      P1      |
+----+-----------------+--------------+
|    | Opening Balance | USD 10254.66 |
|  1 | 01-07-2020      | 445.25       |
|  2 | 01-07-2020      | 234.80       |
|  3 | 02-07-2020      | 13.00        |
|    | Total           | USD 10947.71 |
+----+-----------------+--------------+

I would like to stop reading data after it reaches this line. How do I do that using ExcelMapper?

Edit: Any other library with similar functionality will do. Note that files will have an xls extension (the old format).


Solution

  • After raising the issue with the creator of ExcelMapper, Michael Ganss; he said (link) -

    If the mapping doesn't generate any errors I would probably handle this with a classic .Where() (or .TakeWhile()). Otherwise you can try and use the custom mapping methods to avoid any errors and then filter.

    If you do know the range of rows in advance you can use the data row range properties.

    I was hoping there was an in-built way to handle rows with incorrect types to avoid performance drop (over 100000 rows) due to filtering after capturing all data points, but this is still the quickest library with a readable code compared to other libraries.