Search code examples
c#excelxlsxcorruptnpoi

read corrupt excell-file with NPOI


I asked a similar question recently but thanks to the people that commented on that question I learned that the problem is more with excel than NPOI, so I deleted that question and rephrase it here.

Anyway, my main problem is stated in my previous question. I need to read a downloaded .xls-file using NPOI. The problem is that my file I downloaded is most likely a HTML-table that has been imported to an excel document. Either that, or the excel-document is really an .xlsx (MIME?)that has been zipped/unzipped wrongly.

When I open the document in excel i get a warning saying that the file might be corrupt. I press "ok" and everything works fine. So apparently the file is readable by excel, but not NPOI.

Does anyone know what I can do about this? Or is it a lost cause?


Solution

  • I figured it out!

    Since the .xls file is really just a html-table, I opened it with notepad and saw that it was html-source for a table. So All I had to do was to make a parser to read from the html-file into a DataTable and proceed from there.

    Here's a start (Haven't completed the parser yet):

    private static void HTMLtoExcel(string fileName) //atm, reads the first cell value.
    {
        string text = File.ReadAllText(fileName);
        DataTable dt = new DataTable();
        string insert;
        int start = text.IndexOf("<td>");
        int stop = text.IndexOf("</td>");
        insert = text.Substring(start, stop - start);
        insert = insert.Remove(0, 4);
        Console.WriteLine(insert);
    }