Search code examples
c#xmldatasetado.net

Dataset.readxml doesn't trim whitespace


My goal is to have an XML file read into a DataSet and then put into a DataGridView. All of these steps are done, but the dataset.xml isn't trimming whitespace. This leaves a bunch of indentations and new lines in the cells.

The XML in question:

<?xml version="1.0"?>
<xml>
  <data name="This is a name">
    <Owner>
      Person
    </Owner>
    <ip_address>
      191.1.1.1
    </ip_address>
    <LastOndate>
      04/14/2026
    </LastOndate>
  </data>
</xml>

Most of the data has been omitted, but this is the gist of it. If the data in the XML file seems poorly written or explained, it's because I had to remove data for privacy.

The code in question:

DataSet dataSet = new DataSet();
dataSet.ReadXml(@filePath);
dataGridView1.DataSource = dataSet.Tables[0];

The intended result is that the XML elements, like the IP address and last on date, are loaded in without whitespace, but it doesn't do that. Short of writing a whole function to use ReadXml and read all this in more properly, is there an option to make DataSet.ReadXml() eliminate whitespace?


Solution

  • Assuming your real XML is well-formed, DataSet.ReadXml() is behaving properly because the whitespace around your various element values is significant whitespace. If you don't want that whitespace, you will have to fix the XML or trim it manually.

    As a workaround, if you can't fix your XML not to include inappropriate significant whitespace, you can trim it manually after reading using the following extension methods:

    public static class DataExtensions
    {
        public static DataSet TrimWhiteSpace(this DataSet set)
        {
            foreach (DataTable table in set.Tables)
                table.TrimWhiteSpace();
            return set;
        }
        
        public static DataTable TrimWhiteSpace(this DataTable table)
        {
            var columns = table.Columns.Cast<DataColumn>().Where(c => c.DataType == typeof(string)).ToList();
            if (columns.Count < 1)
                return table;
            foreach (var row in table.AsEnumerable())
                foreach (var col in columns)
                    if (row[col] is string s) // Skip DBNull values.
                        row[col] = s.Trim();
            return table;
        }
    }
    

    And then to load, do:

    DataSet dataSet = new DataSet();
    dataSet.ReadXml(@filePath);
    dataSet.TrimWhiteSpace();
    

    Demo fiddle here.

    An explanation for why whitespace is not trimmed can be found in in Extensible Markup Language (XML) 1.0 (Fifth Edition): 2.10 White Space Handling, which states:

    In editing XML documents, it is often convenient to use "white space" (spaces, tabs, and blank lines) to set apart the markup for greater readability. Such white space is typically not intended for inclusion in the delivered version of the document. On the other hand, "significant" white space that should be preserved in the delivered version is common, for example in poetry and source code.

    But exactly how is significant whitespace distinguished from insignificant whitespace? In the absence of an xml:space attribute, insignificant whitespace nodes will be identified with XmlReader.NodeType == XmlNodeType.Whitespace, which is defined to be:

    White space between markup.

    But the whitespace around the 191.1.1.1 value is not white space between markup, it's whitespace between markup and text content. Thus it is reported as part of the text value. Look at it another way: in the absence of any schema information, XmlReader and DataSet do not know that the value of <ip_address> is an IP address, it might be a formatted Shakespeare sonnet, in which case whitespace would need to be reported to the application.[1]

    Thus your XML should have been serialized as follows:

    <?xml version="1.0"?>
    <xml>
      <data name="This is a name">
        <Owner>Person</Owner>
        <ip_address>191.1.1.1</ip_address>
        <LastOndate>04/14/2026</LastOndate>
      </data>
    </xml>
    

    [1] Liquid Technologies's documentation page XML Whitespace has a nice diagram that shows the difference between significant and insignificant whitespace.