Search code examples
c#ms-wordclipboard

How to create and assign a string to clipboard so that when pasted MS Word will accept it as a table?


I have to find a way to generate a string so that when set and then pasted to a Word document it will display a table.

I did some research about this, but none of them seem to work. One of the things I tried was to generate a string of HTML, then set it as the clipboard data and pass format as HTML.

string html = @"<html><body><table>
  <tr>
    <th>Month</th>
    <th>Savings</th>
  </tr>
  <tr>
    <td>January</td>
    <td>$100</td>
  </tr>
  </table></body></html>";

Clipboard.SetData(DataFormats.Html, html);

but it did not work, nothing was pasted to word doc when I tried. And then when I set Data format as text (DataFormats.Text) it was pasted but only as text, not as a table.


Solution

  • Basically, everything you need to know is in Microsoft's own article on this very topic: how to use Win32's Clipboard API to copy and paste HTML:

    https://learn.microsoft.com/en-us/windows/win32/dataxchg/html-clipboard-format

    (Note: at the time of writing, there's a small irony in that an article about correctly formatting HTML for the clipboard itself is itself a mesh of Markdown and broken HTML... so I thought I'd act all civic for once and fix the broken HTML and... I ended up basically rewriting the article, here's the PR).


    To pass HTML through Windows' Clipboard API, you need to do a few things:

    • The entire HTML text must be structurally valid HTML (so you can still hang on to any Netscape-era HTML when everyone WAS SHOUTING ALL THE TIME BECAUSE HTML TAGS WERE ALL UPPERCASE and few bothered to ever use a </p> or </li>).

      • ...but you can't do things like <script><head></p>.
      • While HTML like <span><div></div></span> is syntactically valid and well-formed, HTML itself doesn't allow <span> to parent a <div> and requires browsers to break-up the outer <span> into separate siblings and cousins and bring the single <div> to the top-level - and the documentation doesn't mention what happens if you attempt to copy (or paste) "HTML" like that - though I assume Windows doens't care and just treats it as a big string, but the possibility of having fun with other applications that consume pasted HTML (e.g. Word, Excel, etc) because this is how security vulnerabilities start.
      • Also, it's also completeley undocumented how Windows and other applications handle HTML's Custom Elements feature.
    • Then you need to generate a header, with the right parameters, and the right formatting just for it to work. Computers are awful.

    • As you're passing in a single large element (the <table>, I assume?) then the <table>'s outerHTML can be the bounds of the actual HTML fragment that will (or rather, should) be copied by the receiving application.

      • This outer element, the fragment, is marked with HTML comments <!--StartFragment--> and <!--EndFragment-->.
    • Only now can you calculate the values of those header parameters (which also represent an intentionally redundant representation of the <!--Start/EndFragment--> bounds)...

      • ... the problem is the insertion of those Start/EndFragment comments into the HTML makes the message longer would break any previously calculated absolute offsets.

    So, (Doug DeMuro voice) this is the template for text data representing a HTML fragment being placed into the Windows Clipboard:

    Version:1.0
    StartHTML:AAAA
    EndHTML:BBBB
    StartFragment:CCCC
    EndFragment:DDDD
    [StartSelection:CCCC
    EndSelection:CCCC]
    <HTML goes here>
    
    • The headers are separated by simple line-breaks. Curiously MS's documentation says that all 3 major line-break-styles were permissable (\r, \r\n, and \n),

    • The last 2 headers/parameters (StartSelection and EndSelection) are optional (but such that you must either supply neither or supply both - supplying only one of the two Start/EndSelection parameter will break things in unspecified ways.

    • Now do this:

      1. Copy that template into a new String and stick your completed HTML directly at the end of the header, separated by a single line-break.
      2. Take a look at that header above. See at the top where it says Version:1.0? Yeah? Good. Ignore it.
      3. Now see th StartHTML parameter below. We will calculate that last. Ignore it for now
      4. Next is EndHTML, which is the absolute byte offset (from the start of the header, not the HTML) of the end of the HTML text itself - basically it's the message length.
      5. StartFragment is the absolute byte offset (from the start of the header) to the < in a well-formed HTML element representing the fragment itself.
      • The inserted <!--Start/EndFragment--> comments must be considered outside the fragment (i.e. a String representation of your fragment's raw HTML would not have those comments in it anywhere)..
      1. EndFragment is the absolute byte offset (from the start of the header) to the end of the fragment. This is an exclusive upper-bound (so if StartFragment=100 and EndFragment=150 then the fragment is 50 char in length, simple).
        • Important note: Remember the text run represented by Start/EndFragment must be a valid kinda-"self-contained" element. So you cannot have a fragment that abrubtly ends inside a tag or its attributes - nor start half-way through a container element's many text nodes...
        • ...however the Start/EndSelection headers, if present, can start and end at arbitrary points in the human-readable text (but conversely, the documentation does not say what happens if Start/EndSelection are inside an element's tags or attributes, for example).
    • Then substitute those calculated numbers into those BBBB/CCCC/etc placeholders below: but do not delete unused placeholders: instead replace them with a leading digit '0' character.

      • Don't worry: in this case (at least) any leading zeroes are not interpreted as forcing octal (base-8) integer parsing, phew.
    1. Oh, almost forgot: go back to the top to caclulate StartHTML, which is the absaolute byte offset to the start of the HTML text (and of the opening '<' in <htm> I assume).

    So you'll end up with this:

    string htmlInTheClipboardLooksLikeThis =
    @"Version:1.0
    StartHTML:0081
    EndHTML:0263
    StartFragment:0016
    EndFragment:0126
    <html>
    <body>
    <!--StartFragment--><table>
    <tr>
    <th>Month</th>
    <th>Savings</th>
    </tr>
    <tr>
    <td>January</td>
    <td>$100</td>
    </tr>
    </table><!--EndFragment-->
    </body>
    </html>";
    

    and you just need to pass it into SetText (not SetData) with TextDataFormat.Html:

    Clipboard.SetText( htmlInTheClipboardLooksLikeThis, TextDataFormat.Html );
    

    and just running that in Linqpad gives us something we can paste directly into Word... and it indeed imported it as a Word table:

    enter image description here