I've got to get a quick and dirty configuration editor up and running. The flow goes something like this:
Configuration (POCOs on server) are serialized to XML.
The XML is well formed at this point. The configuration is sent to the web server in XElements.
On the web server, the XML (yes, ALL OF IT) is dumped into a textarea for editing.
The user edits the XML directly in the webpage and clicks Submit.
In the response, I retrieve the altered text of the XML configuration. At this point, ALL escapes have been reverted by the process of displaying them in a webpage.
I attempt to load the string into an XML object (XmlElement, XElement, whatever).
The problem is that serialization escapes attribute strings, but this is lost in translation along the way.
For example, let's say I have an object that has a regex. Here's the configuration as it comes to the web server:
<Configuration>
<Validator Expression="[^<]" />
</Configuration>
So, I put this into a textarea, where it looks like this to the user:
<Configuration>
<Validator Expression="[^<]" />
</Configuration>
So the user makes a slight modification and submits the changes back. On the web server, the response string looks like:
<Configuration>
<Validator Expression="[^<]" />
<Validator Expression="[^&]" />
</Configuration>
So, the user added another validator thing, and now BOTH have attributes with illegal characters. If I try to load this into any XML object, it throws an exception because < and & are not valid within a text string. I CANNOT use any kind of encoding function, as it encodes the entire thing:
var result = Server.HttpEncode(editedConfig);
results in
<Configuration>
<Validator Expression="[^<]" />
<Validator Expression="[^&]" />
</Configuration>
This is NOT valid XML. If I try to load this into an XML element of any kind I will be hit by a falling anvil.
So, the question remains... Is the ONLY way I can get this string XML ready for parsing into an XML object is by using regex replaces? Is there any way to "turn off constraints" when I load? How do you get around this?
One last response and then wiki-izing this, as I don't think there is a valid answer.
The XML I place in the textarea IS valid, escaped XML. The process of 1) putting it in the text area 2) sending it to the client 3) displaying it to the client 4) submitting the form it's in 5) sending it back to the server and 6) retrieving the value from the form REMOVES ANY AND ALL ESCAPES.
Let me say this again: I'M not un-escaping ANYTHING. Just displaying it in the browser does this!
Things to mull over: Is there a way to prevent this un-escaping from happening in the first place? Is there a way to take almost-valid XML and "clean" it in a safe manner?
I need to know how to edit VALID XML in a browser window WITHOUT a third party/open source tool that doesn't require me to use regex to escape attribute values manually, that doesn't require users to escape their attributes, and that doesn't fail when round-tripping (&amp;amp;amp;etc;)
Erm … How do you serialize? Usually, the XML serializer should never produce invalid XML.
/EDIT in response to your update: Do not display invalid XML to your user to edit! Instead, display the properly escaped XML in the TextBox. Repairing broken XML isn't fun and I actually see no reason not to display/edit the XML in a valid, escaped form.
Again I could ask: how do you display the XML in the TextBox? You seem to intentionally unescape the XML at some point.
/EDIT in response to your latest comment: Well yes, obviously, since the it can contain HTML. You need to escape your XML properly before writing it out into an HTML page. With that, I mean the whole XML. So this:
<foo mean-attribute="<">
becomes this:
<foo mean-attribute="&<">