Search code examples
c#gembox-spreadsheet

Unable to load .xls file using Gembox in C#


I wanted to load .xls file(Type: 97-2003 spreadsheet) using C#. I am using Gembox library.

When I used below command, I encountered "file contains corrupted data." as error.

ExcelFile ef = ExcelFile.Load(filepath, XlsxLoadOptions.XlsxDefault);

When I removed XlsxLoadOptions parameter than I am getting "Reading error: file is not a valid OLE2 Compound File."

I am new to C# and unable to debug the root-cause of the issue. Please help!


Solution

  • UPDATE (2020-03-28)

    In the newer versions of GemBox.Spreadsheet, the ExcelFile.Load(String) will check the file's signature in case of a ".xls" file.
    In other words, there is no more need for that GetLoadOptions method from below.

    Also, there is a new overload method, ExcelFile.Load(Stream).
    This one will always check the file's signature in the provided stream.

    ORIGINAL

    This question was answered in the comments. Unfortunately, it is not well visible there, so here is the answer and also some additional details about it.

    GemBox.Spreadsheet provides few Load overload methods. When using the following:

    ExcelFile ef = ExcelFile.Load("C://temp//book.xls");
    

    It will result in following:

    ExcelFile ef = ExcelFile.Load("C://temp//book.xls", LoadOptions.XlsDefault);
    

    Which is the same as the following:

    ExcelFile ef = ExcelFile.Load("C://temp//book.xls", new XlsLoadOptions());
    

    The load options specify how the input file will be read and when using the ExcelFile.Load(String) method, the options will be based on the file's extension.

    In this case, the file has ".xls" extension, however, it was not of a binary XLS format (BIFF8), but rather it was of an HTML format. This is a trick that is somewhat commonly used, you can have HTML, CSV, even XLSX files with a ".xls" extension and MS Excel will be able to open it. It will detect the right file's format and it will prompt the user with something like the following message:

    File format and extension of 'book.xls' don't match. The file could be corrupted or unsafe. Unless you trust its source, don't open it. Do you want to open it anyway?

    Note that this trick only works with ".xls" extension, it does not work with for example ".xlsx". Nevertheless, we could use something like the following to detect the right file format:

    private static LoadOptions GetLoadOptions(string path)
    {
        string extension = Path.GetExtension(path).ToUpperInvariant();
        switch (extension)
        {
            case ".XLSX":
            case ".XLSM":
            case ".XLTX":
            case ".XLTM":
                return LoadOptions.XlsxDefault;
            case ".XLS":
            case ".XLT":
                return GetLoadOptions(path, null);
            case ".ODS":
            case ".OTS":
                return LoadOptions.OdsDefault;
            case ".TAB":
            case ".TSV":
                return new CsvLoadOptions(CsvType.TabDelimited);
            case ".CSV":
                return LoadOptions.CsvDefault;
            default:
                return null;
        }
    }
    
    private static LoadOptions GetLoadOptions(string xlsPath, LoadOptions defaultOptions)
    {
        byte[] signature = new byte[8];
        using (var stream = File.OpenRead(xlsPath))
            stream.Read(signature, 0, 8);
    
        byte[] xlsSignature = new byte[] { 0xD0, 0xCF, 0x11, 0xE0, 0xA1, 0xB1, 0x1A, 0xE1 };
        if (signature.SequenceEqual(xlsSignature))
            return LoadOptions.XlsDefault;
    
        byte[] xlsxSignature = new byte[] { 0x50, 0x4B, 0x03, 0x04 };
        if (signature.Take(4).SequenceEqual(xlsxSignature))
            return LoadOptions.XlsxDefault;
    
        string firstLine = File.ReadLines(xlsPath)
            .First(line => !string.IsNullOrWhiteSpace(line)).TrimStart().ToUpperInvariant();
        if (firstLine.StartsWith("<!DOCTYPE") ||
            firstLine.StartsWith("<HTML") ||
            firstLine.StartsWith("<BODY"))
            return LoadOptions.HtmlDefault;
    
        return defaultOptions;
    }
    

    Also here is a small demonstration example on how to use it:

    string filepath = "C://temp//book.xls";
    LoadOptions options = GetLoadOptions(filepath);
    
    if (options == null)
        throw new FileFormatException();
    
    ExcelFile ef = ExcelFile.Load(filepath, options);
    // ...