Search code examples
c#asp.netexcelopenxmlopenxml-sdk

Having trouble reading excel file with the OpenXML sdk


I have a function that reads from an excel file and stores the results in a DataSet. I have another function that writes to an excel file. When I try to read from a regular human-generated excel file, the excel reading function returns a blank DataSet, but when I read from the excel file generated by the writing function, it works perfectly fine. The function then will not work on a regular generated excel file, even when I just copy and paste the contents of the function generated excel file. I finally tracked it down to this, but I have no idea where to go from here. Is there something wrong with my code?

Here is the excel generating function:

public static Boolean writeToExcel(string fileName, DataSet data)
{
    Boolean answer = false;
    using (SpreadsheetDocument excelDoc = SpreadsheetDocument.Create(tempPath + fileName, SpreadsheetDocumentType.Workbook))
    {
        WorkbookPart workbookPart = excelDoc.AddWorkbookPart();
        workbookPart.Workbook = new Workbook();
        WorksheetPart worksheetPart = workbookPart.AddNewPart<WorksheetPart>();
        Sheets sheets = excelDoc.WorkbookPart.Workbook.AppendChild<Sheets>(new Sheets());
        Sheet sheet = new Sheet()
        {
            Id = excelDoc.WorkbookPart.GetIdOfPart(worksheetPart),
            SheetId = 1,
            Name = "Page1"
        };

        sheets.Append(sheet);

        CreateWorkSheet(worksheetPart, data);
        answer = true;
    }

    return answer;
}

private static void CreateWorkSheet(WorksheetPart worksheetPart, DataSet data)
{
    Worksheet worksheet = new Worksheet();
    SheetData sheetData = new SheetData();
    UInt32Value currRowIndex = 1U;
    int colIndex = 0;
    Row excelRow;
    DataTable table = data.Tables[0];

    for (int rowIndex = -1; rowIndex < table.Rows.Count; rowIndex++)
    {
        excelRow = new Row();
        excelRow.RowIndex = currRowIndex++;
        for (colIndex = 0; colIndex < table.Columns.Count; colIndex++)
        {
            Cell cell = new Cell()
            {
                CellReference = Convert.ToString(Convert.ToChar(65 + colIndex)),
                DataType = CellValues.String
            };

            CellValue cellValue = new CellValue();

            if (rowIndex == -1)
            {
                cellValue.Text = table.Columns[colIndex].ColumnName.ToString();
            }
            else
            {
                cellValue.Text = (table.Rows[rowIndex].ItemArray[colIndex].ToString() != "") ? table.Rows[rowIndex].ItemArray[colIndex].ToString() : "*";
            }

            cell.Append(cellValue);
            excelRow.Append(cell);
        }

        sheetData.Append(excelRow);
    }

    SheetFormatProperties formattingProps = new SheetFormatProperties()
    {
        DefaultColumnWidth = 20D,
        DefaultRowHeight = 20D
    };

    worksheet.Append(formattingProps);
    worksheet.Append(sheetData);
    worksheetPart.Worksheet = worksheet;
}

while the reading function is as following:

public static void readInventoryExcel(string fileName, ref DataSet set)
{
    using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(fileName, false))
    {
        WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
        WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
        SheetData sheetData = worksheetPart.Worksheet.Elements<SheetData>().First();
        int count = -1;
        foreach (Row r in sheetData.Elements<Row>())
        {
            if (count >= 0)
            {
                DataRow row = set.Tables[0].NewRow();
                row["SerialNumber"]         = r.ChildElements[1].InnerXml;
                row["PartNumber"]           = r.ChildElements[2].InnerXml;
                row["EntryDate"]            = r.ChildElements[3].InnerXml;
                row["RetirementDate"]       = r.ChildElements[4].InnerXml;
                row["ReasonForReplacement"] = r.ChildElements[5].InnerXml;
                row["RetirementTech"]       = r.ChildElements[6].InnerXml;
                row["IncludeInMaintenance"] = r.ChildElements[7].InnerXml;
                row["MaintenanceTech"]      = r.ChildElements[8].InnerXml;
                row["Comment"]              = r.ChildElements[9].InnerXml;
                row["Station"]              = r.ChildElements[10].InnerXml;
                row["LocationStatus"]       = r.ChildElements[11].InnerXml;
                row["AssetName"]            = r.ChildElements[12].InnerXml;
                row["InventoryType"]        = r.ChildElements[13].InnerXml;
                row["Description"]          = r.ChildElements[14].InnerXml;
                set.Tables[0].Rows.Add(row);
            }
            count++;
        }
    }
}

Solution

  • I think this is caused by the fact that you have only one sheet whereas Excel has three. I'm not certain but I think the sheets are returned in reverse order so you should change the line:

    WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
    

    to

    WorksheetPart worksheetPart = workbookPart.WorksheetParts.Last();
    

    It might be safer to search for the WorksheetPart if you can identify it by the sheet name. You need to find the Sheet first then use the Id of that to find the SheetPart:

    private WorksheetPart GetWorksheetPartBySheetName(WorkbookPart workbookPart, string sheetName)
    {
        //find the sheet first.
        IEnumerable<Sheet> sheets = workbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>().Where(s => s.Name == sheetName);
    
        if (sheets.Count() > 0)
        {
            string relationshipId = sheets.First().Id.Value;
            WorksheetPart worksheetPart = (WorksheetPart)workbookPart.GetPartById(relationshipId);
            return worksheetPart;
        }
    
        return null;
    }
    

    You can then use:

    WorksheetPart worksheetPart = GetWorksheetPartBySheetName(workbookPart, "Sheet1");
    

    There are a couple of other things I've noticed whilst looking at your code which you may (or may not!) be interested in:

    In your code you are only reading the InnerXml so it might not matter to you but the way Excel stores strings is different to the way you are writing them so reading an Excel generated file may not give you the values you expect. In your example you are writing the string directly to the cell like this:

    XML of Cell value

    But Excel uses a SharedStrings concept where all strings are written to a separate XML file called sharedStrings.xml. That file contains the strings used in the Excel file with a reference and it's that value that is stored in the cell value in the sheet XML.

    The sharedString.xml looks like this:

    Shared Strings XML

    And the Cell then looks like this:

    Cell value with sharedString

    The 47 in the <v> element is a reference to the 47th shared string. Note that the type (the t attribute) in your generated XML is str but the type in the Excel generated file is s. This denotes yours is an inline string and theirs is a shared string.

    You can read the SharedStrings just as you would any other part:

    var stringTable = workbookPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault();
    
    if (stringTable != null)
    {
        sharedString = stringTable.SharedStringTable.ElementAt(int.Parse(value)).InnerText;
    }
    

    Secondly, if you look at the cell reference that your code generates and the cell reference that Excel generates you can see you are only outputting the column and not the row (e.g. you output A instead of A1). To fix this you should change the line:

    CellReference = Convert.ToString(Convert.ToChar(65 + colIndex)),
    

    to

    CellReference = Convert.ToString(Convert.ToChar(65 + colIndex) + rowIndex.ToString()),
    

    I hope that helps.