Search code examples
phpodtopentbstinybutstrong

Error on variable with XML tags when parsing ODT files


I'm implementing TinyButStrong / OpenTBS in a system that needs to process ODT documents and I'm facing an issue with a specific template that has tags inside the variable name.

The situation is as follows:

The template part:

enter image description here

The relevant part of content.xml

<table:table-cell table:style-name="Table3.A1" office:value-type="string">
  <text:p text:style-name="P22">Tipo de documento</text:p>
  <text:p text:style-name="P29">
    <text:span text:style-name="T7">
     [b.</text:span>tipoDocumento<text:span text:style-name="T7">]
    </text:span>
  </text:p>
</table:table-cell>

As you can see, the variable name is </text:span>tipoDocumento<text:span text:style-name="T7">. The document was edited in LibreOffice and for some unknown reason the tags were added.

I thought I could pass the full variable name (with the tags included) and OpenTBS would correctly parse the value, so I tried the following:

$data = ['</text:span>tipoDocumento<text:span text:style-name="T7">' => 'somevalue'];
$tbs = new clsTinyButStrong;
$tbs->Plugin(TBS_INSTALL, OPENTBS_PLUGIN);
$tbs->LoadTemplate($templatePath, OPENTBS_ALREADY_UTF8);
// Note that we need to send an array of arrays to $data,
$tbs->MergeBlock($block, 'array', [$data]);

But this results in a TBS error:

<b>TinyButStrong Error</b> in field &#91;b.</text:span>tipoDocumento<text:span text:style-name...]: item '&lt;/text:span&gt;tipoDocumento&lt;text:span text:style-name' is not an existing key in the array. <em>This message can be cancelled using parameter 'noerr'.</em>

I've done some debugging and figured that, at the core tbs_class.php, line 1177 (in meth_Locator_Replace(), which is where the error is throwed), the content of $Loc->SubLst[$i] is </text:span>tipoDocumento<text:span text:style-name, which doesn't match the value in my array.

So, I'm assuming that for some reason, TBS is exploding the index by the equal sign (=) which causes this issue. So,

  1. Is this on purpose?
  2. Can this be fixed (in case of a bug) to allow tags with equal signs?
  3. Is there a better way to avoid tags in variablesor is there a way t avoid this in LibreOffice?

Solution

  • @Skrol29's answer is the most reliable solution.

    However, one of the reasons we're using templates is to enable the end users to edit them, and it won't be easy to explain them why they need to do that, because there's no visual changes in LibreOffice (or Microsoft Office, for that matter).

    So, I ended up parsing the template source before saving it, thus removing all XML tags from the variables.

    This is the code I use whenever a new template file is uploaded:

    // Create a temporary file, only to load it with TBS
    // $fileContents is the binary file contents and $extensao is the file extension
    $filePath = intranet_storage_path(sha1($fileContents) . '.' . $extensao, 'tmp');
    // Store the binary contents in the file path
    file_put_contents($filePath, $fileContents);
    
    // Create a new TBS instance and load OpenTBS
    $tbs = new clsTinyButStrong;
    $tbs->Plugin(TBS_INSTALL, OPENTBS_PLUGIN);
    
    // Load the temporary file
    $tbs->LoadTemplate($filePath, OPENTBS_ALREADY_UTF8);
    
    // Find all variables (the only block name is 'b')
    preg_match_all(
        "/(\[b\.  # Start by finding a part of [ followed by the block name and a dot
        [^.\];]+  # Now we should get all characters until one of the following is found: `.` (dot), `]`, `;
        [\]|;]    # Stop the regex when a `]` or `;` is found.
        )/ix",
        $tbs->Source,
        $matches
    );
    
    // Loop through all the found variables
    $searched = $replaced = [];
    foreach ($matches[0] as $var) {
        // Fill the $searched and $replaced where $searched is the real variable name 
        // with XML tags (if they exist) and $replaced is the variable without tags
        $searched[] = $var;
        $replaced[] = strip_tags($var);
    }
    
    // Replace the contents of the Source
    $tbs->Source = str_replace($searched, $replaced, $tbs->Source);
    
    // Store the final template file with variables without XML
    $tbs->Show(OPENTBS_FILE, $filePath);
    

    I must state that this solution will result in an invalid XML when there's only a open or close tag within the variable. The following examples will break the XML (and you'll not be able to open nor parse the document):

    <text:span text:style-name="T7">[b.tipoDocumento<text:span text:style-name="T7">]</text:span>
    // OR
    <text:span text:style-name="T7">[b.</text:span>tipoDocumento]</text:span>
    

    However, from the test cases I've had, there's always an opening and closing tag (as presented in the question), so stripping them will result in a valid XML.