Search code examples
google-sheetsgoogle-apps-scripthtml-parsing

Xml Parse code working fine at my end, does not work at client region over same Html


I have written Apps Script code for Html Parsing using XmlParse. It works fine at my end, my browser and system language both are English as well as my Google Account's. But when I shared the same script with Client in Italy, it gives error

​The markup in the document following the root element must be well-formed!

I am confused if its the same HTML code then why is it giving error at client's end.

I am wondering if ther is any regional or language settings required to be updated for this?

I already tried changing locale of the associated Google Sheet.

Following is the HTML Structure

<div dir=""ltr"">
   <div dir=""ltr"">
      <div align=""center"">
         <div style=""font:20pt Times New Roman""><b>TITLE</b></div>
         <br></br>
         <table cellspacing=""1"" cellpadding=""3"" border=""0"">
         <tbody>
            <tr align=""left"">
               <td colspan=""2""><b>Account: </b></td>
               <td colspan=""5""><b>Name: a</b></td>
               <td colspan=""2""><b>Currency: USD</b></td>
               <td colspan=""2""><b>Leverage: </b></td>
               <td colspan=""3"" align=""right""><b>2024 June 22, 09:14</b></td>
            </tr>
            <tr align=""left""><td colspan=""13""><b>Closed Transactions:</b></td></tr>
            <tr align=""center"" bgcolor=""#C0C0C0"">
            <td>Ticket</td>
            <td>Open Time</td>
            <td>Type</td>
            <td>Size</td>
            <td>Item</td>
            <td>Price</td>
            <td>S / L</td>
            <td>T / P</td>
            <td>Close Time</td>
            <td>Price</td>
            <td>Commission</td>
            <td>Taxes</td>
            <td>Swap</td>
            <td>Profit</td>
            </tr>
         </tbody>
         </table>
      </div>
   </div>
</div>

Below is my appscript function

function sanitizeHtml(htmlContent) {
  return htmlContent.replace(/<meta([^>]*?)\/?>/g, '<meta$1></meta>')
                    .replace(/<img([^>]*?)\/?>/g, '<img$1></img>')
                    .replace(/<br([^>]*?)\/?>/g, '<br$1></br>')
                    .replace(/<hr([^>]*?)\/?>/g, '<hr$1></hr>')
                    .replace(/&nbsp;/g, '&#160;')
                    .replace(/\s+nowrap/gi, '')
                    .replace(/\\(\#|0|,|\.|\\)/g, '$1');
}

function extractDateFromHtml(htmlContent) {
  // Parse the HTML content
  var sanitizedHtml = sanitizeHtml(htmlContent);
  var doc = XmlService.parse(sanitizedHtml);
}

Solution

  • I tried your sample HTML file by creating the file "index.html" in the script editor, after editing your double quotion marks to single and used the below script for getting the date-time value in the table.

    index.html file is;

    <!DOCTYPE html>
    <html> 
    <body>
    <div dir="ltr">
       <div dir="ltr">
          <div align="center">
             <div style="font:20pt Times New Roman"><b>TITLE</b></div>
             <br></br>
             <table cellspacing="1" cellpadding="3" border="0">
             <tbody>
                <tr align="left">
                   <td colspan="2"><b>Account: </b></td>
                   <td colspan="5"><b>Name: a</b></td>
                   <td colspan="2"><b>Currency: USD</b></td>
                   <td colspan="2"><b>Leverage: </b></td>
                   <td colspan="3" align="right"><b>2024 June 22, 09:14</b></td>
                </tr>
                <tr align="left"><td colspan="13"><b>Closed Transactions:</b></td></tr>
                <tr align="center" bgcolor="#C0C0C0">
                <td>Ticket</td>
                <td>Open Time</td>
                <td>Type</td>
                <td>Size</td>
                <td>Item</td>
                <td>Price</td>
                <td>S / L</td>
                <td>T / P</td>
                <td>Close Time</td>
                <td>Price</td>
                <td>Commission</td>
                <td>Taxes</td>
                <td>Swap</td>
                <td>Profit</td>
                </tr>
             </tbody>
             </table>
          </div>
       </div>
    </div> 
    </body>
    </html>
    

    and the script where I used "XmlService" is;

    function getDateData() {   
          var html = HtmlService.createTemplateFromFile('index').evaluate().getContent();
          html = html.replace(/(\r\n|\n|\r)/gm,"");
    
          var tablesMatch = html.match(/<table(.*?)<\/table>/gm);
          if (tablesMatch) {
            var myTable = tablesMatch[0];
    //        Logger.log(myTable);
    
            var doc = XmlService.parse(myTable); 
            var rows = doc.getDescendants().filter(function(c) {
              var element = c.asElement();
              return element && element.getName() == "tr";
            });
           
            var data = rows.slice(0).map(function(row) {
              return row.getChildren("td").map(function(cell) {
                return cell.getValue();
              });
            });
    
            Logger.log(data[0][4]);
      }
    }
    

    The screenshot after rıunning the script is added below;

    enter image description here