I have to find a way to generate a string so that when set and then pasted to a Word document it will display a table.
I did some research about this, but none of them seem to work. One of the things I tried was to generate a string of HTML, then set it as the clipboard data and pass format as HTML.
string html = @"<html><body><table>
<tr>
<th>Month</th>
<th>Savings</th>
</tr>
<tr>
<td>January</td>
<td>$100</td>
</tr>
</table></body></html>";
Clipboard.SetData(DataFormats.Html, html);
but it did not work, nothing was pasted to word doc when I tried. And then when I set Data format as text (DataFormats.Text) it was pasted but only as text, not as a table.
Basically, everything you need to know is in Microsoft's own article on this very topic: how to use Win32's Clipboard API to copy and paste HTML:
https://learn.microsoft.com/en-us/windows/win32/dataxchg/html-clipboard-format
(Note: at the time of writing, there's a small irony in that an article about correctly formatting HTML for the clipboard itself is itself a mesh of Markdown and broken HTML... so I thought I'd act all civic for once and fix the broken HTML and... I ended up basically rewriting the article, here's the PR).
To pass HTML through Windows' Clipboard API, you need to do a few things:
The entire HTML text must be structurally valid HTML (so you can still hang on to any Netscape-era HTML when everyone WAS SHOUTING ALL THE TIME BECAUSE HTML TAGS WERE ALL UPPERCASE and few bothered to ever use a </p>
or </li>
).
<script><head></p>
.<span><div></div></span>
is syntactically valid and well-formed, HTML itself doesn't allow <span>
to parent a <div>
and requires browsers to break-up the outer <span>
into separate siblings and cousins and bring the single <div>
to the top-level - and the documentation doesn't mention what happens if you attempt to copy (or paste) "HTML" like that - though I assume Windows doens't care and just treats it as a big string, but the possibility of having fun with other applications that consume pasted HTML (e.g. Word, Excel, etc) because this is how security vulnerabilities start.Then you need to generate a header, with the right parameters, and the right formatting just for it to work. Computers are awful.
As you're passing in a single large element (the <table>
, I assume?) then the <table>
's outerHTML
can be the bounds of the actual HTML fragment that will (or rather, should) be copied by the receiving application.
<!--StartFragment-->
and <!--EndFragment-->
.Only now can you calculate the values of those header parameters (which
also represent an intentionally redundant representation of the <!--Start/EndFragment-->
bounds)...
Start/EndFragment
comments into the HTML makes the message longer would break any previously calculated absolute offsets.So, (Doug DeMuro voice) this is the template for text data representing a HTML fragment being placed into the Windows Clipboard:
Version:1.0
StartHTML:AAAA
EndHTML:BBBB
StartFragment:CCCC
EndFragment:DDDD
[StartSelection:CCCC
EndSelection:CCCC]
<HTML goes here>
The headers are separated by simple line-breaks. Curiously MS's documentation says that all 3 major line-break-styles were permissable (\r
, \r\n
, and \n
),
The last 2 headers/parameters (StartSelection
and EndSelection
) are optional (but such that you must either supply neither or supply both - supplying only one of the two Start/EndSelection
parameter will break things in unspecified ways.
Now do this:
String
and stick your completed HTML directly at the end of the header, separated by a single line-break.Version:1.0
? Yeah? Good. Ignore it.StartHTML
parameter below. We will calculate that last. Ignore it for nowEndHTML
, which is the absolute byte offset (from the start of the header, not the HTML) of the end of the HTML text itself - basically it's the message length.StartFragment
is the absolute byte offset (from the start of the header) to the <
in a well-formed HTML element representing the fragment itself.<!--Start/EndFragment-->
comments must be considered outside the fragment (i.e. a String
representation of your fragment's raw HTML would not have those comments in it anywhere)..EndFragment
is the absolute byte offset (from the start of the header) to the end of the fragment. This is an exclusive upper-bound (so if StartFragment=100
and EndFragment=150
then the fragment is 50 char in length, simple).
Start/EndFragment
must be a valid kinda-"self-contained" element. So you cannot have a fragment that abrubtly ends inside a tag or its attributes - nor start half-way through a container element's many text nodes...Start/EndSelection
headers, if present, can start and end at arbitrary points in the human-readable text (but conversely, the documentation does not say what happens if Start/EndSelection
are inside an element's tags or attributes, for example).Then substitute those calculated numbers into those BBBB
/CCCC
/etc placeholders below: but do not delete unused placeholders: instead replace them with a leading digit '0'
character.
StartHTML
, which is the absaolute byte offset to the start of the HTML text (and of the opening '<'
in <htm>
I assume).So you'll end up with this:
string htmlInTheClipboardLooksLikeThis =
@"Version:1.0
StartHTML:0081
EndHTML:0263
StartFragment:0016
EndFragment:0126
<html>
<body>
<!--StartFragment--><table>
<tr>
<th>Month</th>
<th>Savings</th>
</tr>
<tr>
<td>January</td>
<td>$100</td>
</tr>
</table><!--EndFragment-->
</body>
</html>";
and you just need to pass it into SetText
(not SetData
) with TextDataFormat.Html
:
Clipboard.SetText( htmlInTheClipboardLooksLikeThis, TextDataFormat.Html );
and just running that in Linqpad gives us something we can paste directly into Word... and it indeed imported it as a Word table: