Search code examples
c#.netexceloffice-interop

Fastest way to format multiple rows in an excel in C#


I have a huge dataset that I want to write into the Excel and need to perform conditional formatting of rows based on a business logic. So, for the data insertion part, I am using a data array to populate the Excel and it works pretty fast. However, I see a severe performance degradation when it comes to formatting the rows. It almost takes more than double the time just to do the formatting.

As of now, I am applying formatting to individual rows and loop through a series of rows. However, I am wondering if I can select multiple rows at a time and apply bulk formatting options to those rows: Here is what I have right now:

foreach (int row in rowsToBeFormatted)
{
    Excel.Range range = (Excel.Range)xlsWorksheet.Range[xlsWorksheet.Cells[row + introFormat, 1], xlsWorksheet.Cells[row + introFormat, 27]];
    range.Font.Size = 11;
    range.Interior.ColorIndex = 15;
    range.Font.Bold = true;
}

And here is a demo of how I am trying to select multiple rows to the range and apply the formatting:

string excelrange = "A3:AA3,A83:AA83,A88:AA88,A94:AA94,A102:AA102,A106:AA106,A110:AA110,...." (string with more than 3000 characters)
xlsWorksheet.get_Range(excelrange).Interior.Color = Color.SteelBlue;

However, I get the following error when I execute the code:

Exception from HRESULT: 0x800A03EC

and there is nothing in inner exception. Any ideas how can I achieve the desired result?


Solution

  • As per comments under the question, there's hard-coded limit of 255 characters for a range string, however I wasn't able to find any documentation about it. Another commenter suggested to use semicolon as separator, but the documentation clearly states that comma should be used as union operator in range string:

    The name of the range in A1-style notation in the language of the application. It can include the range operator (a colon), the intersection operator (a space), or the union operator (a comma). It can also include dollar signs, but they are ignored. You can use a local defined name in any part of the range. If you use a name, the name is assumed to be in the language of the application.

    So where do we go from here? Formatting each range individually is indeed inefficient. Application interface provides method Union, but calling it in a loop is as inefficient as individual formatting. So the natural choice is to use the range string limit to the maximum and thus minimizing number of calls to COM interface.

    You can split the full range to format into chunks; each not exceeding 255 characters limit. I would implement it using enumerators:

    static IEnumerable<string> GetChunks(IEnumerable<string> ranges)
    {
        const int MaxChunkLength = 255;
        var sb = new StringBuilder(MaxChunkLength);
        foreach (var range in ranges)
        {
            if (sb.Length > 0)
            {
                if (sb.Length + range.Length + 1 > MaxChunkLength)
                {
                    yield return sb.ToString();
                    sb.Clear();
                }
                else
                {
                    sb.Append(",");
                }
            }
            sb.Append(range);
        }
        if (sb.Length > 0)
        {
            yield return sb.ToString();
        }
    }
    
    var rowsToFormat = new[] { 3, 83, 88, 94, 102, 106, 110/*, ...*/ }
    var rowRanges = rowsToFormat.Select(row => "A" + row + ":" + "AA" + row);
    
    foreach (var chunk in GetChunks(rowRanges))
    {
        var range = xlsWorksheet.Range[chunk];
        // do formatting stuff here
    }
    

    The above is 10-15 times faster than individual formatting:

    foreach (var rangeStr in rowRanges)
    {
        var range = xlsWorksheet.Range[rangeStr];
        // do formatting stuff here
    }
    

    I can also see further space for optimization like grouping contiguous rows, but in case you are formatting discrete rows with subtotals, it won't help.