I have a Winform project, created on Microsoft Framework 3.5. The users may have installed Windows 7 or Windows XP, and Office 2007 or above.
I'm working on in a procedure to get the clipboard data and put in on a C# DataTable. I already created a method to get the raw data from the clipboard and upload it in a DataTable.
But in some cases, the Excel data shows a value, but internally have another:
I'm investigating a method to get the raw data from Excel:
string XmlFmt = "XML Spreadsheet";
var clipboard = Clipboard.GetDataObject();
if (clipboard.GetDataPresent(XmlFmt))
{
var clipData = clipboard.GetData(XmlFmt);
StreamReader streamReader = new StreamReader((MemoryStream)clipData);
streamReader.BaseStream.SetLength(streamReader.BaseStream.Length - 1);
string xmlText = streamReader.ReadToEnd();
var stream = new StringReader(xmlText);
XmlDocument xmlDocument = new XmlDocument();
xmlDocument.LoadXml(xmlText);
DataSet dsExcelData = new DataSet();
dsExcelData.ReadXml(new XmlNodeReader(xmlDocument));
}
But, this method retrieves me a DataSet with multiples tables with the configuration of each part of the Excel Data:
Basically, I want to convert these structures to a simple DataTable with only the raw data. Someone could help me with a hint how achieve this? ...I don't want to use a third party library in this implementation.
I found a clean and bullet-proof solution. Here the code:
First, a extension to convert a XmlDocument to XElement:
/// <summary> Convert XML Document to XDocument </summary>
/// <param name="xmlDocument">Attached XML Document</param>
public static XDocument fwToXDocument(this XmlDocument xmlDocument)
{
using (XmlNodeReader xmlNodeReader = new XmlNodeReader(xmlDocument))
{
xmlNodeReader.MoveToContent();
return XDocument.Load(xmlNodeReader);
}
}
The complete function:
private DataTable clipboardExcelToDataTable(bool blnFirstRowHasHeader = false)
{
string strTime = "S " + DateTime.Now.ToString("mm:ss:fff");
var clipboard = Clipboard.GetDataObject();
if (!clipboard.GetDataPresent("XML Spreadsheet")) return null;
strTime += "\r\nRead " + DateTime.Now.ToString("mm:ss:fff");
StreamReader streamReader = new StreamReader((MemoryStream)clipboard.GetData("XML Spreadsheet"));
strTime += "\r\nFinish read " + DateTime.Now.ToString("mm:ss:fff");
streamReader.BaseStream.SetLength(streamReader.BaseStream.Length - 1);
XmlDocument xmlDocument = new XmlDocument();
xmlDocument.LoadXml(streamReader.ReadToEnd());
strTime += "\r\nRead XML Document " + DateTime.Now.ToString("mm:ss:fff");
XNamespace ssNs = "urn:schemas-microsoft-com:office:spreadsheet";
DataTable dtData = new DataTable();
var linqRows = xmlDocument.fwToXDocument().Descendants(ssNs + "Row").ToList<XElement>();
for (int x = 0; x < linqRows.Max(a => a.Descendants(ssNs + "Cell").Count()); x++)
dtData.Columns.Add("Column " + (x + 1).ToString());
int intCol = 0;
DataRow drCurrent;
linqRows.ForEach(rowElement =>
{
intCol = 0;
drCurrent = dtData.Rows.Add();
rowElement.Descendants(ssNs + "Cell")
.ToList<XElement>()
.ForEach(cell => drCurrent[intCol++] = cell.Value);
});
if (blnFirstRowHasHeader)
{
int x = 0;
foreach (DataColumn dcCurrent in dtData.Columns)
dcCurrent.ColumnName = dtData.Rows[0][x++].ToString();
dtData.Rows.RemoveAt(0);
}
strTime += "\r\nF " + DateTime.Now.ToString("mm:ss:fff");
return dtData;
}
The process takes ~15 seconds to read ~25,000 rows.
Works perfectly for any kind of data. Basically, the method creates a grid with the same structure of the Excel WorkSheet. Merge of rows or columns will fill up the first cell able. All columns will be string DataType by default.