Search code examples
xsltclient-sidelarge-data

Transform Large XML files with XSLT


I have a program that outputs reports in HTML format. On average, they are about 5-10 MB, but I have seen extreme cases where they are 500 MB. These reports are purely client side, there is no server involved here.

The problem is that the browser will hang until all the content is loaded, and sometimes will not even load the content. I am trying to find a solution where someone opening the report can always open it. The people opening the reports should be able to open them using the browser and any technology available in it.

I have come up with a solution that will open a report that was previously 100MB by having our program output xml, then transform it to html via xslt, but the user still needs to wait for the entire thing to load into memory. All the content is within these diff nodes are loaded into 2 rows of a table, and the order of them does not matter.

XML:

    <diff>
        <parent loc="some string"/>
        <right> content</right>
        <left> content </left>
    </diff>

The XSLT to do this transformation is below:

<xsl:for-each select="./diff">
    <table align="center" border="1px" width="602">
    <tbody>
    <tr>
    <td colspan="2"><xsl:value-of select="./parent/@loc"/></td>
    </tr>
    <tr>
    <td width="50%" align="left">
    <xsl:if test="./left/text()">
        <xsl:value-of select="./left/text()"/>
    </xsl:if>
    <xsl:if test="not(./left/text())">
         <xsl:variable name="left">
             <xsl:apply-templates select="./left/*" mode="serialize"/>
         </xsl:variable>
         <xsl:value-of select="$left"/>
    </xsl:if>
    </td>
    <td width="50%" align="right">
    <xsl:if test="./right/text()">
        <xsl:value-of select="./right/text()"/>
    </xsl:if>
    <xsl:if test="not(./right/text())">
        <xsl:variable name="right">
            <xsl:apply-templates select="./right/*" mode="serialize"/>
        </xsl:variable>
    <xsl:value-of select="$right"/>
    </xsl:if>
    </td>
    </tr>
    </tbody>
    </table>
</xsl:for-each>

I am wondering if there is a way to either load the file more quickly or to not wait for the whole table to get loaded into memory before displaying the page.

I do not want to load a javascript library to do this, as we don't want to worry about connectivity while viewing these reports and do not want to install a bunch of files on everyone's machine, but I can use some script within the xslt.

I know this is an odd scenario and isn't the ideal way to structure the app, but we do not have time to change the way we generate these reports.


Solution

  • My initial thought is to output a directory of html files. So if we begin with

    /supersize500MB.html
    

    To:

    /container
        /first10percent.html
        /second10percent.html
        /third10percent.html
        ...
    

    Then within the HTML you produce you can hardcode things like:

    <a href="first10percent.html">Last Page</a>
    <a href="second10percent.html">Next Page</a>
    

    XSLT 2.0 has the capability to output several documents from a single input. A quick google gave this. The XSLT processor will have to load the entire input XML into memory but I assume that the output HTML will be produced in sequence. The overall effect should be that the browser does not have to load a 500meg source file, but a 50meg snippet of the whole.