Search code examples
csvgoogle-apps-scriptspreadsheetworkday-api

How to convert a paragraph html string to plain text without html tags in google app script?


this is a follow up question from my previous question. I'm having trouble when I want to convert HTML strings to plain text without HTML tags in google app script using the reference in this question. However, this time it's a paragraph format.

This is the script that I use:

function pullDataFromWorkday() {
  var url = 'https://services1.myworkday.com/ccx/service/customreport2/[company name]/[owner's email]/[Report Name]?format=csv'; //this is the csv link from workday report
  var b64 = 'asdfghjklkjhgfdfghj=='; //this is supposed to be our workday password in b64
  var response = UrlFetchApp.fetch(url, {
      headers: {
        Authorization: 'Basic '+ b64
      }
  });

//Parse   
  if (response.getResponseCode() >= 200 && response.getResponseCode() < 300) {
    var blob = response.getBlob();
    var string = blob.getDataAsString();
    var data = Utilities.parseCsv(string, ",");

    for(i=1;i<data.length;i++)
    {

      data[i][0];
      data[i][1];
      data[i][2]=toStringFromHtml(data[i][2]);
      data[i][3]=toStringFromHtml(data[i][3]);
      data[i][4]=toStringFromHtml(data[i][4]);
      data[i][5]=toStringFromHtml(data[i][5]);
    }

  //Paste  it in   
  var ss = SpreadsheetApp.getActive();
  var sheet = ss.getSheetByName('Sheet1');
  sheet.clear();
  sheet.getRange(1,1,data.length,data[0].length).setValues(data);
    }

  else {
    return;
    }
  }



function toStringFromHtml(html)
{
  
html = '<div>' + html + '</div>';
html = html.replace(/<br>/g,"");
var document = XmlService.parse(html);
var strText = XmlService.getPrettyFormat().format(document);
strText = strText.replace(/<[^>]*>/g,"");
return strText.trim();
}

This is the sample of the data that I want:

enter image description here

Or you can use this sample spreadsheet.

Is there any step that I miss or I do wrong?

Thank you before for answering the question


Solution

  • In your situation, how about modifying toStringFromHtml as follows?

    Modified script:

    function toStringFromHtml(html) {
      html = '<div>' + html + '</div>';
      html = html.replace(/<br>/g, "").replace(/<p><\/p><p><\/p>/g, "<p></p>").replace(/<span>|<\/span>/g, "");
      var document = XmlService.parse(html);
      var strText = XmlService.getPrettyFormat().setIndent("").format(document);
      strText = strText.replace(/<[^>]*>/g, "");
      return strText.trim();
    }
    
    • In this modified script, your following sample HTML is converted as follows.

      • From

          <p><span>Hi Katy</span></p>
          <p></p>
          <p><span>The illustration (examples) paragraph is useful when we want to explain or clarify something, such as an object, a person, a concept, or a situation. Sample Illustration Topics:</span></p>
          <p></p>
          <p></p>
          <p><span>1. Examples of annoying habits people have on the Skytrain.</span></p>
          <p><span>2. Positive habits that you admire in other people. </span></p>
          <p><span>3. Endangered animals in Asia. </span></p>
        
      • To

          <div>
            <p>Hi Katy</p>
            <p></p>
            <p>The illustration (examples) paragraph is useful when we want to explain or clarify something,
              such as an object,
              a person,
              a concept,
              or a situation. Sample Illustration Topics:</p>
            <p></p>
            <p>1. Examples of annoying habits people have on the Skytrain.</p>
            <p>2. Positive habits that you admire in other people. </p>
            <p>3. Endangered animals in Asia. </p>
          </div>
        
      • By this conversion, the following result is obtained.

          Hi Katy
        
          The illustration (examples) paragraph is useful when we want to explain or clarify something, such as an object, a person, a concept, or a situation. Sample Illustration Topics:
        
          1. Examples of annoying habits people have on the Skytrain.
          2. Positive habits that you admire in other people.
          3. Endangered animals in Asia.
        

    Note:

    • When your sample HTML shown in your question is used, the modified script can achieve your goal. But, I'm not sure about your other HTML data. So I'm not sure whether this modified script can be used for your actual HTML data. Please be careful about this.