Search code examples
freemarker

?right_pad applied on captureGroup inside a ?replace won't consider captureGroup length


I'm trying to convert a HTML table to plain text. To have the "columns" aligned correctly I'd like to insert as many whitespaces to every cell content to match the max length of all cell contents.

The cell content is extracted from the HTML using a RegEx Replace using a captureGroup. When I'm applying the ?right_pad on the captureGroup the actual length of the captureGroup isn't considered but just 2 characters ($1), thus the columns of the plain text aren't aligned but shifted.

Any other approaches? Or if a Freemarker Contributor/Dev is reading - could you register this as a bug or invite me to the project's Jira so I can register it myself?

Template:

<#-- DETERMINE MAX TABLE CELL CHARACTER LENGTH -->
<#assign tableCells = htmlTable?matches("<td>([\\w\\d\\s]*)</td>") >
<#assign cellSizes = []>
<#list tableCells as t>
 <#assign cellSizes += [t?groups[1]?length]>
</#list>
<#assign maxCellSize = cellSizes?max>


Max Cell Character length: ${maxCellSize}

${htmlTable

<#-- REPLACE HTML TABLE WITH PLAINTEXT -->
<#-- REMOVE OUTER TABLE ELEMENTS -->
?replace("<table.*<tbody>(.*)</tbody></table>", "$1", "rgi")

<#-- REPLACE TABLE HEADERS -->
?replace("<th[\\w\\d\\s=\\\"]*>(<p>)*(<strong>)*([\\w\\d\\s=\\\"]*)(</strong>)*(</p>)*", "<b>" + "$3"?right_pad(maxCellSize, "-") + "</b>", "rgi")
<#-- ADD SPACERS BETWEEN TABLE HEADERS -->
?replace("</th>(?!</tr>)", " ", "rgi")

<#-- REPLACE TABLE CELLS-->
?replace("<td[\\w\\d\\s=\\\"]*>(<p>)*(<strong>)*([\\w\\d\\s=\\\"]*)(</strong>)*", "$3"?right_pad(maxCellSize, "-"), "rgi")

<#-- ADD SPACERS BETWEEN TABLE CELLS -->
?replace("</td>(?!</tr>)", " ", "rgi")

<#-- REPLACE "TABLE LINE BREAKS" (END OF ROW) WITH REGULAR LINE BREAKS-->
?replace("</tr>", "<br>")

<#-- REMOVE REMAINING <tr>|</th>|</td> ELEMENTS -->
?replace("<tr>|</th>|</td>", "", "rgi")

}

Data model

htmlTable = "<table><tbody><tr><th>col1</th><th>column 2</th><th>very long col header 3</th></tr><tr><td>text</td><td>some text</td><td>last col text</td></tr><tr><td>longer text</td><td>text</td><td>last col text 2</td></tr><tr><td>even longer text</td><td>yet another fairly long text</td><td>last col text 3</td></tr></tbody></table>"

Result



Max Cell Character length: 28

<b>col1--------------------------</b> <b>column 2--------------------------</b> <b>very long col header 3--------------------------</b><br>text-------------------------- some text-------------------------- last col text--------------------------<br>longer text-------------------------- text-------------------------- last col text 2--------------------------<br>even longer text-------------------------- yet another fairly long text-------------------------- last col text 3--------------------------<br>

Solution

  • So I found a solution to my problem, here it is if someone else can use it:

    TL;DR: "flag" the table headers and contents, copy those without the flags and padding into an array and later replace flagged stuff with right padded stuff.

    Template:

    <#-- DETERMINE MAX TABLE CELL CHARACTER LENGTH -->
    <#assign tableHeaders = htmlTable?matches("<th[\\w\\d\\s=\\\"]*>(<p>)*(<strong>)*([\\w\\d\\s=\\\"]*)(</strong>)*(</p>)*")>
    <#assign tableCells = htmlTable?matches("<td[\\w\\d\\s=\\\"]*>(<p>)*(<strong>)*([\\w\\d\\s=\\\"]*)(</strong>)*") >
    <#assign cellSizes = []>
    <#assign cellContents = []>
    <#list tableCells as t>
    <#assign cellSizes += [t?groups[3]?length]>
    <#assign cellContents += [t?groups[3]]>
    </#list>
    <#assign headerContents = []>
    <#list tableHeaders as h>
    <#assign cellSizes += [h?groups[3]?length]>
    <#assign headerContents += [h?groups[3]]>
    </#list>
    <#assign maxCellSize = cellSizes?max>
    <#assign flaggedCellContents = [] paddedCellContents = []>
    <#list cellContents as c>
    <#assign flaggedCellContents += ["###"+c+"###"]>
    <#assign paddedCellContents += [c?right_pad(maxCellSize+3, "-")]>
    </#list>
    <#assign flaggedHeaderContents = [] paddedHeaderContents = []>
    <#list headerContents as h>
    <#assign flaggedHeaderContents += ["§§§"+h+"§§§"]>
    <#assign paddedHeaderContents += [h?right_pad(maxCellSize+3, "-")]>
    </#list>
    
    Max Cell Character length: ${maxCellSize}
    
    <#assign convertedTable = htmlTable
    
    <#-- REPLACE HTML TABLE WITH PLAINTEXT -->
    <#-- REMOVE OUTER TABLE ELEMENTS -->
    ?replace("<table.*<tbody>(.*)</tbody></table>", "$1", "rgi")
    
    <#-- REPLACE TABLE HEADERS -->
    ?replace("<th[\\w\\d\\s=\\\"]*>(<p>)*(<strong>)*([\\w\\d\\s=\\\"]*)(</strong>)*(</p>)*", "§§§$3§§§", "rgi")
    
    
    <#-- REPLACE TABLE CELLS-->
    ?replace("<td[\\w\\d\\s=\\\"]*>(<p>)*(<strong>)*([\\w\\d\\s=\\\"]*)(</strong>)*", "###$3###", "rgi")
    
    
    <#-- REPLACE "TABLE LINE BREAKS" (END OF ROW) WITH REGULAR LINE BREAKS-->
    ?replace("</tr>", "\n")
    
    <#-- REMOVE REMAINING <tr>|</th>|</td> ELEMENTS -->
    ?replace("<tr>|</th>|</td>", "", "rgi")
    
    >
    
    <#list 1..cellContents?size as i>
    <#assign convertedTable = convertedTable?replace(flaggedCellContents[i?index], paddedCellContents[i?index])>
    </#list>
    
    <#list 1..headerContents?size as i>
    <#assign convertedTable = convertedTable?replace(flaggedHeaderContents[i?index], paddedHeaderContents[i?index])>
    </#list>
    
    ${convertedTable}
    

    Data Model:

    htmlTable = "<table><tbody><tr><th>col1</th><th>column 2</th><th>very long col header 3</th></tr><tr><td>text</td><td>some text</td><td>last col text</td></tr><tr><td>longer text</td><td>text</td><td>last col text 2</td></tr><tr><td>even longer text</td><td>yet another fairly long text</td><td>last col text 3</td></tr></tbody></table>"
    

    Result:

    
    Max Cell Character length: 28
    
    
    
    
    col1---------------------------column 2-----------------------very long col header 3---------
    text---------------------------some text----------------------last col text------------------
    longer text--------------------text---------------------------last col text 2----------------
    even longer text---------------yet another fairly long text---last col text 3----------------