Search code examples
c#exceldatatableclipboardxmldocument

How to use clipboard to copy data from Excel Sheet to DataTable?


I have a Winform project, created on Microsoft Framework 3.5. The users may have installed Windows 7 or Windows XP, and Office 2007 or above.

I'm working on in a procedure to get the clipboard data and put in on a C# DataTable. I already created a method to get the raw data from the clipboard and upload it in a DataTable.

But in some cases, the Excel data shows a value, but internally have another:

enter image description here

I'm investigating a method to get the raw data from Excel:

string XmlFmt = "XML Spreadsheet";
var clipboard = Clipboard.GetDataObject();

if (clipboard.GetDataPresent(XmlFmt))
{
    var clipData = clipboard.GetData(XmlFmt);
    StreamReader streamReader = new StreamReader((MemoryStream)clipData);
    streamReader.BaseStream.SetLength(streamReader.BaseStream.Length - 1);

    string xmlText = streamReader.ReadToEnd();
    var stream = new StringReader(xmlText);

    XmlDocument xmlDocument = new XmlDocument();
    xmlDocument.LoadXml(xmlText);

    DataSet dsExcelData = new DataSet();
    dsExcelData.ReadXml(new XmlNodeReader(xmlDocument));
}

But, this method retrieves me a DataSet with multiples tables with the configuration of each part of the Excel Data: enter image description here enter image description here

Basically, I want to convert these structures to a simple DataTable with only the raw data. Someone could help me with a hint how achieve this? ...I don't want to use a third party library in this implementation.


Solution

  • I found a clean and bullet-proof solution. Here the code:

    First, a extension to convert a XmlDocument to XElement:

    /// <summary> Convert XML Document to XDocument </summary>
    /// <param name="xmlDocument">Attached XML Document</param>
    public static XDocument fwToXDocument(this XmlDocument xmlDocument)
    {
        using (XmlNodeReader xmlNodeReader = new XmlNodeReader(xmlDocument))
        {
            xmlNodeReader.MoveToContent();
            return XDocument.Load(xmlNodeReader);
        }
    }
    

    The complete function:

    private DataTable clipboardExcelToDataTable(bool blnFirstRowHasHeader = false)
    {
        string strTime = "S " + DateTime.Now.ToString("mm:ss:fff");
        var clipboard = Clipboard.GetDataObject();
        if (!clipboard.GetDataPresent("XML Spreadsheet")) return null;
    
        strTime += "\r\nRead " + DateTime.Now.ToString("mm:ss:fff");
        StreamReader streamReader = new StreamReader((MemoryStream)clipboard.GetData("XML Spreadsheet"));
        strTime += "\r\nFinish read " + DateTime.Now.ToString("mm:ss:fff");
        streamReader.BaseStream.SetLength(streamReader.BaseStream.Length - 1);
    
        XmlDocument xmlDocument = new XmlDocument();
        xmlDocument.LoadXml(streamReader.ReadToEnd());
        strTime += "\r\nRead XML Document " + DateTime.Now.ToString("mm:ss:fff");
    
        XNamespace ssNs = "urn:schemas-microsoft-com:office:spreadsheet";
        DataTable dtData = new DataTable();
    
        var linqRows = xmlDocument.fwToXDocument().Descendants(ssNs + "Row").ToList<XElement>();
    
        for (int x = 0; x < linqRows.Max(a => a.Descendants(ssNs + "Cell").Count()); x++)
            dtData.Columns.Add("Column " + (x + 1).ToString());
    
        int intCol = 0;
    
        DataRow drCurrent;
    
        linqRows.ForEach(rowElement =>
            {
                intCol = 0;
                drCurrent = dtData.Rows.Add();
                rowElement.Descendants(ssNs + "Cell")
                    .ToList<XElement>()
                    .ForEach(cell => drCurrent[intCol++] = cell.Value);
            });
    
        if (blnFirstRowHasHeader)
        {
            int x = 0;
            foreach (DataColumn dcCurrent in dtData.Columns)
                dcCurrent.ColumnName = dtData.Rows[0][x++].ToString();
    
            dtData.Rows.RemoveAt(0);
        }
    
        strTime += "\r\nF " + DateTime.Now.ToString("mm:ss:fff");
    
        return dtData;
    }
    

    The process takes ~15 seconds to read ~25,000 rows.

    Works perfectly for any kind of data. Basically, the method creates a grid with the same structure of the Excel WorkSheet. Merge of rows or columns will fill up the first cell able. All columns will be string DataType by default.