Say you have the following paragraph in a Google Doc and you want to pull the element out of the url that relates to a car.
Some paragraph with some data in it has a url http://example.com/ford/some/other/data.html. There is also another link: http://example.com/ford/latest.html.
What I am looking for is pulling "ford" out of this paragraph so I can use it. And for the sake of simplicity I know the paragraph number, I will just call it "1" down below.
I have tried:
function getData() {
var paragraphs = DocumentApp.getActiveDocument().getBody().getParagraphs();
var element = paragraphs[1];
var re = element.findText('http://example.com/([a-z])+/');
var data = re.getElement().asText().getText();
Logger.log(data);
}
The problem is that data
contains the entire paragraph text.
Also is there a way to capture and use the data from a capturing group, aka the content in the ()?
I believe your goal like below.
ford
from the values like http://example.com/ford/latest.html
and http://example.com/ford/some/other/data.html
using Google Apps Script.For this, how about this modification?
In your script, when element.findText('http://example.com/([a-z])+/')
has a value, re.getElement().asText().getText()
is the text of the paragraph. In this case, it is found that the text with the pattern by element.findText()
is including in element
. Using this, how about retrieving the values like ford
from re.getElement().asText().getText()
?
var data = re.getElement().asText().getText();
Logger.log(data);
To:
if (re) {
var data = [...re.getElement().asText().getText().matchAll(/http:\/\/example\.com\/([\w\S]+?)\//g)];
console.log(data.map(([,e]) => e));
} else {
throw "Not match."
}
re
is null
. Please be careful.