Task
Import data from excel
to DataTable
Problem
The cell that doesnot contain any data are getting skipped and the very next cell that has data in the row is used as the value of the empty colum. E.g
A1 is empty A2 has a value Tom
then while importing the data A1
get the value of A2 and A2 remains empty
To make it very clear I am providing some screen shots below
This is the excel data
This is the DataTable after importing the data from excel
Code
public class ImportExcelOpenXml
{
public static DataTable Fill_dataTable(string fileName)
{
DataTable dt = new DataTable();
using (SpreadsheetDocument spreadSheetDocument = SpreadsheetDocument.Open(fileName, false))
{
WorkbookPart workbookPart = spreadSheetDocument.WorkbookPart;
IEnumerable<Sheet> sheets = spreadSheetDocument.WorkbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>();
string relationshipId = sheets.First().Id.Value;
WorksheetPart worksheetPart = (WorksheetPart)spreadSheetDocument.WorkbookPart.GetPartById(relationshipId);
Worksheet workSheet = worksheetPart.Worksheet;
SheetData sheetData = workSheet.GetFirstChild<SheetData>();
IEnumerable<Row> rows = sheetData.Descendants<Row>();
foreach (Cell cell in rows.ElementAt(0))
{
dt.Columns.Add(GetCellValue(spreadSheetDocument, cell));
}
foreach (Row row in rows) //this will also include your header row...
{
DataRow tempRow = dt.NewRow();
for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
{
tempRow[i] = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));
}
dt.Rows.Add(tempRow);
}
}
dt.Rows.RemoveAt(0); //...so i'm taking it out here.
return dt;
}
public static string GetCellValue(SpreadsheetDocument document, Cell cell)
{
SharedStringTablePart stringTablePart = document.WorkbookPart.SharedStringTablePart;
string value = cell.CellValue.InnerXml;
if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
{
return stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText;
}
else
{
return value;
}
}
}
My Thoughts
I think there is some problem with
public IEnumerable<T> Descendants<T>() where T : OpenXmlElement;
In case I want the count of columns using Descendants
IEnumerable<Row> rows = sheetData.Descendants<<Row>();
int colCnt = rows.ElementAt(0).Count();
OR
If I am getting the count of rows using Descendants
IEnumerable<Row> rows = sheetData.Descendants<<Row>();
int rowCnt = rows.Count();`
In both cases Descendants
is skipping the empty cells
Is there any alternative of Descendants
.
Your suggestions are highly appreciated
P.S: I have also thought of getting the cells values by using column names like A1, A2 but in order to do that I will have to get the exact count of columns and rows which is not possible by using Descendants
function.
Had there been some data in all the cells of a row then everything works fine. But if you happen to have even single empty cell in a row then things go haywire.
Why it is happening in first place?
The reason lies in below line of code:
row.Descendants<Cell>().Count()
Count()
function gives you the number of non-empty cells in the row i.e. it will ignore all the empty cells while returning the count. So, when you pass row.Descendants<Cell>().ElementAt(i)
as argument to GetCellValue
method like this:
GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));
Then, it will find the content of the next non-empty cell, not necessarily the content of the cell at column index i
e.g. if the first column is empty and we call ElementAt(1)
, it returns the value in the second column instead and our program logic gets messed up.
Solution: We need to deal with the occurrence of empty cells in the row i.e. we need to figure out the actual/effective column index of the target cell in case there were some empty cells before it in the given row. So, you need to substitute your for
loop code below:
for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
{
tempRow[i] = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));
}
with
for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
{
Cell cell = row.Descendants<Cell>().ElementAt(i);
int actualCellIndex = CellReferenceToIndex(cell);
tempRow[actualCellIndex] = GetCellValue(spreadSheetDocument, cell);
}
Also, add below method in your code which is used in the above modified code snippet to obtain the actual/effective column index of any cell:
private static int CellReferenceToIndex(Cell cell)
{
int index = 0;
string reference = cell.CellReference.ToString().ToUpper();
foreach (char ch in reference)
{
if (Char.IsLetter(ch))
{
int value = (int)ch - (int)'A';
index = (index == 0) ? value : ((index + 1) * 26) + value;
}
else
{
return index;
}
}
return index;
}
Note: Index in an Excel row start with 1 unlike various programming languages where it starts at 0.