Search code examples
c#excelapache-poixlsxnpoi

Problematic corruption of .xlsx files with NPOI - Excel cannot open the file 'file.xlsx" because the file format or file extension is not valid


When reading or modifying some user-created .xlsx files, I get the following error message:

We found a problem with some content in 'test.xlsx'. Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes.

Clicking Yes gets me another message:

Excel cannot open the file 'test.xlsx' because the file format or file extension is not valid. Verify that the file has not been corrupted and that the file extension matches the format of the file.

Example of a problem .xlsx file here (before put in NPOI).

Here's the same file, now corrupted after being read from and written back with iWorkbook.Write(filestream); here.

I have no issues creating a new .xlsx file with the following code:

string newPath = @"C:\MyPath\test.xlsx";

using (FileStream fs = new FileStream(newPath, FileMode.Create, FileAccess.Write))
{
    IWorkbook wb = new XSSFWorkbook();
    wb.CreateSheet();
    ISheet s = wb.GetSheetAt(0);
    IRow r = s.CreateRow(0);
    r.CreateCell(0);
    ICell c = r.GetCell(0);
    c.SetCellValue("test");
    wb.Write(fs);
    fs.Close();
}

That works fine.

Even opening one of the problem child .xlsx files, setting it to an IWorkbook and writing it back to the file works:

string newPath = @"C:\MyPath\test.xlsx";

using (FileStream fs = new FileStream(newPath, FileMode.Open, FileAccess.ReadWrite))
{
    IWorkbook wb = new XSSFWorkbook(fs);
    wb.Write(fs);
    fs.Close();
}

However, after running through code that reads from it, gets ISheets, IRows, ICells, etc.... it corrupts the .xlsx file. Even though I specifically removed anything that modifies the workbook. No Creates, Sets, Styles, etc. with NPOI.

I can't really include my code because it would just be confusing, but for the sake of completeness I'm really only using the following types and functions from NPOI during this test:

IWorkbook
XSSFWorkbook
ISheet
IRow
ICell
.GetSheetAt
.GetRow
.GetCell
.LastRowNum

So one of those causes corruption. I would like to eventually set values again and get it working like I have for .xls.

Has anyone experienced this? What are some NPOI functions that could cause corruption? Any input would be appreciated.

Edit: Using NPOI v2.2.1.


Solution

  • I think the problem is that you are reading from, and writing to, the same FileStream. You should be doing the read and write using separate streams. Try it like this:

    string newPath = @"C:\MyPath\test.xlsx";
    
    // read the workbook
    IWorkbook wb;
    using (FileStream fs = new FileStream(newPath, FileMode.Open, FileAccess.Read))
    {
        wb = new XSSFWorkbook(fs);
    }
    
    // make changes
    ISheet s = wb.GetSheetAt(0);
    IRow r = s.GetRow(0) ?? s.CreateRow(0);
    ICell c = r.GetCell(1) ?? r.CreateCell(1);
    c.SetCellValue("test2");
    
    // overwrite the workbook using a new stream
    using (FileStream fs = new FileStream(newPath, FileMode.Create, FileAccess.Write))
    {
        wb.Write(fs);
    }