Search code examples
c#.netexcelopenxmlopenxml-sdk

Open XML SDK reads consecutive integers instead of the actual values in my Excel spreadsheet


I was following the Microsoft tutorial here on how to read Excel documents with the Open XML SDK. I was following the DOM approach outlined in the tutorial. However, when I try to read rows, I get consecutive numbers instead of the data that's actually written in the cells (see below for examples).

My Excel spreadsheet is as follows:

enter image description here

Here's my reader class:

public sealed class ExcelReader
{
    private readonly string[] headers;

    private SheetData sheetData;

    private Row[] rows;

    private int currRow = 0;

    public ExcelReader(string fileName)
    {
        SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(fileName, false);

        WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
        WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
        sheetData = worksheetPart.Worksheet.Elements<SheetData>().First();

        headers = ReadRow();
    }

    public string[] ReadRow()
    {
        if (rows == null)
        {
            rows = sheetData.Elements<Row>().ToArray();
        }

        // If we already read all of the rows, just return null on any subsequent call
        if (currRow == (rows.Length - 1))
        {
            return null;
        }

        Row row = rows[currRow];

        currRow++;

        var list = new List<string>();

        foreach (Cell c in row.Elements<Cell>())
        {
            list.Add(c.CellValue.Text);
        }

        return list.ToArray();
    }
}

And here's my unit test:

    [TestMethod]
    public void ReadExcelMCVETest()
    {
        var reader = new ExcelReader("ExcelReaderTest.xlsx");

        for (int i = 2; i <= 10; i++)
        {
            string[] row = reader.ReadRow();

            switch (i)
            {
                case 2:
                    WhatRowMustBe(row, "d", "e", "f");
                    break;

                case 3:
                    WhatRowMustBe(row, "g", "h", "i");
                    break;

                case 4:
                    WhatRowMustBe(row, "j", "k", "l");
                    break;

                case 5:
                    WhatRowMustBe(row, "m", "n", "o");
                    break;

                case 6:
                    WhatRowMustBe(row, "p", "q", "r");
                    break;

                case 7:
                    WhatRowMustBe(row, "s", "t", "u");
                    break;

                case 8:
                    WhatRowMustBe(row, "v", "w", "x");
                    break;

                case 9:
                    WhatRowMustBe(row, "y", "z", "zz");
                    break;

                case 10:
                    Assert.IsNull(row);
                    break;
            }
        }
    }

    private void WhatRowMustBe(string[] actualRow, params string[] expectedRow)
    {
        Assert.IsNotNull(actualRow);

        Assert.AreEqual(expectedRow.Length, actualRow.Length, "Wrong length row");

        for (int i = 0; i < actualRow.Length; i++)
        {
            Assert.AreEqual(expectedRow[i], actualRow[i], "Wrong value");
        }
    }

I expected that, when I call ReadRow(), it would retrieve an array populated with the values in the next row. For example, the first call (which currently happens in the constructor) should return an array with ["a", "b", "c"] (i.e. the header row), and the next call (which happens in the unit test) would result in ["d", "e", "f"]. Unfortunately, the actual result of the first call is ["0", "1", "2"], and the actual result of the second call is ["3", "4", "5"].

The confusing part of this is that, unless I'm missing something, my code seems to be following the tutorial's code sample quite closely. Can anyone see what my mistake is? Or is there a bug in the tutorial's code?


Solution

  • From @petelids's comment (as Community Wiki since this isn't my content):

    Those numbers are indices into the SharedStrings table - have a look at the bottom part of my answer here which might shed some light. If that doesn't help and I get a chance I'll write up an answer here later.