Search code examples
c#excelopenxmlepplusworksheet

When enumerating a range in EPPlus, are cells always retrieved in the following order : sorted by row, then by column?


Let's say I have the following code :

using (ExcelPackage package = new ExcelPackage(...))
{
   foreach(var cell in package.Workbook.Worksheets[1].Cells["A1:E5"])
   {
       //do something with "cell"
   }
}

and the following Excel worksheet :

   A      B      C      D      E
1  foo    bar    .      .      .
2  .      .      .      .      .
3  .      .      hello  .      .      
4  .      .      world  .      .
5  .      .      .      .      .

Will the order of retrieved elements always be :

foo => bar => hello => world ?

From what I have tried so far, it seems this is always true. But that doesn't mean it is always the case.

It seems Excel serialise cells in that order as well in the .xlsx file, even if the cells have been edited in a different order.

The documentation in the ExcelWorksheet class doesn't say too much :

//
// Summary:
//     Provides access to a range of cells
public ExcelRange Cells { get; }

Solution

  • I answer my own question.

    TLTR : in current EPPlus version (v4.5.1), cells of a given range are always enumerated in the following order : sorted by row, then by column.


    I found out this by taking a look at EPPlus implementation.

    Cells of a given worksheet are stored in a RangeCollection object :

    class ExcelWorksheet
    {
        RangeCollection _cells;
    }
    

    RangeCollection contains a list of cells. This list is always sorted by RangeID. RangeID is a combination of row / column / worksheet indexes. This allows EPPlus to quickly find the index of a cell (for a given row and column) by performing a binary search.

    class RangeCollection
    {
        List<IRangeID> _cells;
    
        int IndexOf(ulong rangeID)
        {
            return Array.BinarySearch(...);
        }
    }
    
    class ExcelCell : IRangeID
    {
        ulong RangeID
        {
            get
            {
                return GetCellID(_worksheet.SheetID, Row, Column);
            }
        }
    
        ulong GetCellID(int SheetID, int row, int col)
        {
            return ((ulong)SheetID) + (((ulong)col) << 15) + (((ulong)row) << 29);
        }
    }
    

    When enumerating cells of a given range, EPPlus will use that sorted list to enumerate cells inside the range :

    class ExcelRange 
    {
        public bool MoveNext()
        {
            _index++;
            //...
            if (...) 
            {
               GetStartIndexEnum(_fromRow, _fromCol, _toRow, _toCol);
               //...
               GetNextIndexEnum(_fromRow, _fromCol, _toRow, _toCol);
            }
        }
    
        object IEnumerator.Current
        {
            get
            {
                return /*...*/ _worksheet._cells[_index] as ExcelCell /*...*/
            }
        }
    }
    

    GetStartIndexEnum() and GetNextIndexEnum() are used to quickly skip cells that are outside the range currently enumerated. Cells are inspected and enumerated in the same order as the RangeCollection itself, which is always sorted.