Search code examples
exceldatatables

datatables.net superscript excel export


In my datatables.net I have a superscript tag <td>System Architect<sup>5</sup></td> that, using button/excelHtml5, I need to export in my excel file to be a style in excel. Below is the logic I am using to export to excel is there a way to replace the <sup>5</sup> with the style?

  customize: function(xlsx) {
        var sheet = xlsx.xl['styles.xml'];
        var tagName = sheet.getElementsByTagName('sz');
        for (i = 0; i < tagName.length; i++) {
          tagName[i].setAttribute("superscript", "55")
        }
        console.log(tagName)
      }

Working sample

https://jsfiddle.net/tjmcdevitt/3dvf1x9o/5/


Solution

  • There are 2 related tasks needed here (task 2 will actually need to be performed first):

    Task 1: The Excel file needs to use a "run" - a sequence of text strings, where each string is formatted differently, within one Excel cell. Runs are specified using the <r> tag in Open Office XML formatting. So, you need to replace the original XML generated by DataTables with this new string:

    Original:

    <t xml:space="preserve">SystemArchitect5</t>
    

    Replacement:

    <r><t xml:space="preserve">System Architect</t></r>
    <r><rPr><vertAlign val="superscript"/></rPr><t>5</t></r>
    

    Note the two separate <r>...<\r> tags in the above string - one for the standard text and one for the superscript text.

    Task 2: By default, DataTables will strip the HTML from the content of your DataTables cells - so your content System Architect<sup>5</sup> will become System Architect5 - as you have no doubt already noticed.

    When this happens, you no longer know when to use superscripts - that information has been removed from the data extracted from the DataTable - and therefore the data sent to Excel cannot be formatted correctly.

    So, task 2 needs to be performed first (preserving the superscript instructions), so that we can then perform task 1 (using a "rich text" string in Excel, mixing different styles in one Excel cell).


    Here is task 2:

          exportOptions: {
            //stripHtml: false
            format: {
              body: function ( inner, rowidx, colidx, node ) {
                // strip HTML, except for content containing <sup> tags:
                if ($(node).children("sup").length > 0) {
                  return inner;
                } else {
                  return node.textContent;
                }
              }
            }
          }
    

    This uses a DataTables format option to detect where the exported DataTables cell data contains a <sup> tag. Any such tags are preserved, overriding the default behavior where tags are stripped out. Any other tags which may exist elsewhere in the data are removed, as usual.

    This means the <sup> information is passed to Excel, where we can detect it and then modify that data accordingly...

    Here is task 1:

          customize: function( xlsx ) {
            let sheet = xlsx.xl.worksheets['sheet1.xml'];
            $( 'c t', sheet ).filter(function() {
              // find cell content matching text ending with an HTML superscript:
              const found = this.textContent.match(/(.*)<sup>(.*)<\/sup>/);
              if ( found ) {
                const regular = found[1]; // the regular text content
                const superscript = found[2]; // the superscript content
                // the replacement HTML string for text followed by superscript:
                let excelRun = '<r><t xml:space="preserve">' + regular 
                    + '</t></r><r><rPr><vertAlign val="superscript"/></rPr><t>' 
                    + superscript + '</t></r>';
                // replace the original HTML with the enhanced HTML:
                $(this).replaceWith(excelRun);
              }
            })
          }
    

    This uses a function to customize the data in the Excel file, after we have preserved the <sup> tag data.

    It finds Excel column text data <c><t>...<\t><\c>.

    It then uses a JavaScript regular expression, so that we can extract the found text - such as your System Architect<sup>5</sup> example - and it splits that text into two pieces - the System Architect label and the 5 superscript value.

    Now we can build a new Excel XML string using these two values. This is the excelRun variable. We use this to replace the original content.


    The end result is:

    enter image description here


    For reference, here is the full code (it contains some additional DataTables libraries you probably don't need - e.g. for PDF handling):

    <!doctype html>
    <html>
    <head>
      <meta charset="UTF-8">
      <title>ExcelSuperscript</title>
      <link href="https://cdn.datatables.net/1.13.4/css/jquery.dataTables.css" rel="stylesheet"/>
      <link href="https://cdn.datatables.net/buttons/2.3.6/css/buttons.dataTables.css" rel="stylesheet"/>
     
      <script src="https://code.jquery.com/jquery-3.6.0.js"></script>
      <script src="https://cdnjs.cloudflare.com/ajax/libs/jszip/2.5.0/jszip.js"></script>
      <script src="https://cdnjs.cloudflare.com/ajax/libs/pdfmake/0.1.36/pdfmake.js"></script>
      <script src="https://cdnjs.cloudflare.com/ajax/libs/pdfmake/0.1.36/vfs_fonts.js"></script>
      <script src="https://cdn.datatables.net/1.13.4/js/jquery.dataTables.js"></script>
      <script src="https://cdn.datatables.net/buttons/2.3.6/js/dataTables.buttons.js"></script>
      <script src="https://cdn.datatables.net/buttons/2.3.6/js/buttons.html5.js"></script>
    
    
    </head>
    
    <body>
    
    <div style="margin: 20px;">
    
        <table id="example" class="display" cellspacing="0" width="100%">
          <thead>
            <tr>
              <th>First name</th>
              <th>Last name</th>
              <th>Position</th>
              <th>Office</th>
              <th>Salary</th>
            </tr>
          </thead>
          <tbody>
            <tr>
              <td>Tiger</td>
              <td>Nixon</td>
              <td>System Architect<sup>5</sup></td>
              <td>Edinburgh<b>!</b></td>
              <td>$320,800</td>
            </tr>
          </tbody>
        </table>
    
    </div>
    
    <script>
    
    $(document).ready(function() {
    
      var table = $('#example').DataTable( {
        dom: 'Bfrtip',
        buttons: [ {
          extend: 'excel',
          exportOptions: {
            //stripHtml: false
            format: {
              body: function ( inner, rowidx, colidx, node ) {
                // strip HTML, except for content containing <sup> tags:
                if ($(node).children("sup").length > 0) {
                  return inner;
                } else {
                  return node.textContent;
                }
              }
            }
          },
          customize: function( xlsx ) {
            let sheet = xlsx.xl.worksheets['sheet1.xml'];
            $( 'c t', sheet ).filter(function() {
              // find cell content matching text ending with an HTML superscript:
              const found = this.textContent.match(/(.*)<sup>(.*)<\/sup>/);
              if ( found ) {
                const regular = found[1]; // the regular text content
                const superscript = found[2]; // the superscript content
                // the replacement HTML string for text followed by superscript:
                let excelRun = '<r><t xml:space="preserve">' + regular 
                    + '</t></r><r><rPr><vertAlign val="superscript"/></rPr><t>' 
                    + superscript + '</t></r>';
                // replace the original HTML with the enhanced HTML:
                $(this).replaceWith(excelRun);
              }
            })
          }
        } ]
      } );
    
    } );
    
    </script>
    
    </body>
    </html>
    

    Final Notes

    In DataTables there is often more than one way to accomplish a task such as this, so someone else may have a better solution.

    Bear in mind also that this approach makes some attempt to provide a more generalized solution instead of hard-coding your System Architect 5 data contents. But even so, it has its limitations. For example, what if you want to handle other HTML tags in addition to <sup> in some way? For example, see the <b> tag in my test data, which is discarded.

    You can often gain insight into how to build a non-standard Excel XML string by creating a simple Excel file for yourself (using Excel directly, not using DataTables). You can, for example, create a simple Excel file containing only one cell, with the formatting you want. You can then unzip that file to inspect what Excel created, and how it handled the formatting you want. That is how I confirmed the correct way to use the Excel XML <r> tag. In Excel's case, it added this to the shared strings list. In my solution I did not use that list, because DataTables does not use it (by default). A better solution (if you have many repeated identical strings) would be to create and use a shared strings list in your Excel file. It can help to keep the file size smaller, if you have very large files with lots of repeated strings.


    Update for follow-up question:

    How can I view the simple Excel file. I would like to see how it will build the <r> if there is data after the </sup>.

    I may have misunderstood your question, but here is what I think you are asking (sorry if you know this already):

    After you have created your Excel (.xlsx) file, you can unzip it using any suitable tool (I use 7-Zip on Windows). An Excel file is just a zip file containing various nested folders and XML files, which adhere to the Office Open XML standard.

    After unzipping, you can inspect these XML files using any text editor (e.g. Notepad++). Once you have unzipped the Excel file, go to the xl directory and then look in the sharedStrings.xml file (bearing in mind the note in my answer about this file).

    Now you will see three <r>...</r> tags for the three different runs you have in your text in the Excel cell:

    • normal text, followed by
    • superscript text, followed by
    • normal text

    Then you can change the excelRun string accordingly, to accommodate the extra data.

    You will also need to use a different regular expression to capture these three parts in the original DataTables data. So instead of this:

    /(.*)<sup>(.*)<\/sup>/
    

    Probably this:

    /(.*)<sup>(.*)<\/sup>(.*)/
    

    And if you need to process a mixture of DataTable inputs with 2 or 3 (or more) parts, then you will need additional JavaScript logic to handle that - but I think that may be outside the scope of this question.

    I wanted to add: It's even possible that your DataTables HTML data will be too complicated - to the point where trying to use a regex will only make things worse. In that case, you should parse the HTML string using an appropriate tool - and then you can handle the results with more flexibility (and safety).