Search code examples
google-apps-scriptgoogle-docsgoogle-docs-apigoogle-apps-script-editor

Google Apps Script; Docs; convert selected element to HTML


I am just starting with Google Apps Script and following the Add-on quickstart

https://developers.google.com/apps-script/quickstart/docs

In the quickstart you can create a simple add-on to get a selection from a document and translate it with the LanguageApp service. The example gets the underlying text using this:

function getSelectedText() {
  var selection = DocumentApp.getActiveDocument().getSelection();
  if (selection) {
    var text = [];
    var elements = selection.getSelectedElements();
    for (var i = 0; i < elements.length; i++) {
      if (elements[i].isPartial()) {
        var element = elements[i].getElement().asText();
        var startIndex = elements[i].getStartOffset();
        var endIndex = elements[i].getEndOffsetInclusive();

        text.push(element.getText().substring(startIndex, endIndex + 1));
      } else {
        var element = elements[i].getElement();
        // Only translate elements that can be edited as text; skip images and
        // other non-text elements.
        if (element.editAsText) {
          var elementText = element.asText().getText();
          // This check is necessary to exclude images, which return a blank
          // text element.
          if (elementText != '') {
            text.push(elementText);
          }
        }
      }
    }
    if (text.length == 0) {
      throw 'Please select some text.';
    }
    return text;
  } else {
    throw 'Please select some text.';
  }
}

It gets the text only: element.getText(), without any formatting.

I know the underlying object is not html, but is there a way to get the selection converted into a HTML string? For example, if the selection has a mix of formatting, like bold:

this is a sample with bold text

Then is there any method, extension, library, etc, -- like element.getHTML() -- that could return this?

this is a sample with <b>bold</b> text

instead of this?

this is a sample with bold text


Solution

  • There is a script GoogleDoc2HTML by Omar AL Zabir. Its purpose is to convert the entire document into HTML. Since you only want to convert rich text within the selected element, the function relevant to your task is processText from the script, shown below.

    The method getTextAttributeIndices gives the starting offsets for each change of text attribute, like from normal to bold or back. If there is only one change, that's the attribute for the entire element (typically paragraph), and this is dealt with in the first part of if-statement.

    The second part deals with the general case, looping over the indices and inserting HTML markup corresponding to the attributes.

    The script isn't maintained, so consider it as a starting point for your own code, rather than a ready-to-use library. There are some unmerged PRs that improve the conversion process, in particular for inline links.

    function processText(item, output) {
      var text = item.getText();
      var indices = item.getTextAttributeIndices();
    
      if (indices.length <= 1) {
        // Assuming that a whole para fully italic is a quote
        if(item.isBold()) {
          output.push('<b>' + text + '</b>');
        }
        else if(item.isItalic()) {
          output.push('<blockquote>' + text + '</blockquote>');
        }
        else if (text.trim().indexOf('http://') == 0) {
          output.push('<a href="' + text + '" rel="nofollow">' + text + '</a>');
        }
        else {
          output.push(text);
        }
      }
      else {
    
        for (var i=0; i < indices.length; i ++) {
          var partAtts = item.getAttributes(indices[i]);
          var startPos = indices[i];
          var endPos = i+1 < indices.length ? indices[i+1]: text.length;
          var partText = text.substring(startPos, endPos);
    
          Logger.log(partText);
    
          if (partAtts.ITALIC) {
            output.push('<i>');
          }
          if (partAtts.BOLD) {
            output.push('<b>');
          }
          if (partAtts.UNDERLINE) {
            output.push('<u>');
          }
    
          // If someone has written [xxx] and made this whole text some special font, like superscript
          // then treat it as a reference and make it superscript.
          // Unfortunately in Google Docs, there's no way to detect superscript
          if (partText.indexOf('[')==0 && partText[partText.length-1] == ']') {
            output.push('<sup>' + partText + '</sup>');
          }
          else if (partText.trim().indexOf('http://') == 0) {
            output.push('<a href="' + partText + '" rel="nofollow">' + partText + '</a>');
          }
          else {
            output.push(partText);
          }
    
          if (partAtts.ITALIC) {
            output.push('</i>');
          }
          if (partAtts.BOLD) {
            output.push('</b>');
          }
          if (partAtts.UNDERLINE) {
            output.push('</u>');
          }
    
        }
      }
    }