I have the following code that should work. I am simply trying to get all the "img" elements from a page into a list in AS so that I can work on that list.
tell application "Safari"
set theWindow to front window
set theTab to current tab of theWindow
set theURL to URL of theTab
set asImages to (do JavaScript "theSearch = document.getElementsByTagName(\"img\");
theImages = [].slice.call(theSearch);
theImages" in theTab)
end tell
If I enter in
theSearch = document.getElementsByTagName("img");
theImages = [].slice.call(theSearch);
theImages
into the console on Safari it works. But when I run the same code as above from within the "do javascript" command in Safari, I get nothing back at all, the variable asImages is not created at all. I have tried everything that I can think of, to no avail. I am hoping someone with a fresh pair of eyes can spot what I am doing wrong rather quickly. TIA
TL;DR: To jump straight to the solution, scroll down to the section marked "ADDENDUM" at the bottom.
To summarise what's been discussed in the comments:
getElementsByTagName()
returns an HTMLCollection
object, which is similar to an array, but it’s not an array. AppleScript won't know what it is nor what to do with it. You already took care of this by converting it to an array, using a fairly old technique:
theSearch = document.getElementsByTagName("img"); theImages = [].slice.call(theSearch);
However, each element of the array is a Node
object, which represents an element in the HTML DOM. Again, this is a data type that's completely foreign to AppleScript, so we need to convert every element of the list into something more amenable.
You stated that you want a list of strings. This could mean a list of src
attributes for each img
element, which will be a list of URLs (i.e. "https://..."
), or it could be that you want the literal HTML that defines each img
element (i.e. <img src="...">
) ? I'll cover both situations below.
I'm going to use a different method to convert the HTMLCollection
into a standard JavaScript array, namely Array.from()
. This takes at least one argument—the object you wish to convert to an array—and an optional second, which is a callback function that gets applied to every element in the resulting array. This is very convenient in our case, as we can define this function such that it converts the Node
elements into strings for us.
To obtain a list of the src
attributes:
Array.from( document.getElementsByTagName('img'),
img => img.src );
Running it on this page returns:
['data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAvAA...',
'https://cdn.sstatic.net/Img/teams/teams-illo-free-...',
'https://www.gravatar.com/avatar/6ecd4d9aedf87d99e7...',
'https://www.gravatar.com/avatar/6ecd4d9aedf87d99e7...',
'https://graph.facebook.com/1261291368/picture?type...',
'https://lh3.googleusercontent.com/-XdUIqdMkCWA/AAA...']
I've truncated the values for easier reading, but the values returned are full URLs.
To obtain a list of HTML <img>
declarations:
Array.from( document.getElementsByTagName('img'),
img => img.outerHTML );
which returns:
['<img src="data:image/png;base64,iVBORw0KGgoAAAANSU...',
'<img class="wmx100 mx-auto my8 h-auto d-block" wid...',
'<img src="https://www.gravatar.com/avatar/6ecd4d9a...',
'<img src="https://www.gravatar.com/avatar/6ecd4d9a...',
'<img src="https://graph.facebook.com/1261291368/pi...',
'<img src="https://lh3.googleusercontent.com/-XdUIq...']
Again, the values returned are full. To exemplify, here's the last item in the array in full, formatted over two lines for readability:
'<img src="https://lh3.googleusercontent.com/-XdUIq…er avatar"
width="32" height="32" class="bar-sm">'
Using the second case from above, this can be implemented in AppleScript in a single line:
tell application id "com.apple.Safari" to tell document 1 ¬
to set images to do JavaScript "Array.from( document
.getElementsByTagName(
'img', i=>i.outerHTML)
);"
JavaScript statements can be split over a multiple lines inherently, and they should always be terminated with a semicolon. I've also used a different identifier in the JS code for the callback function, namely "i"
, just to demonstrate that the identifier used makes no difference.
In macOS 12.6 (Monterey) using Safari 16.0 (17614.1.25.9.10, 17614), the Array.from()
method doesn't appear to apply the callback function to the array, simply returning the list of HTMLImageElement
class JS objects, which will not be viable to AppleScript.
I checked to make sure it was Safari at fault here and not me, and this seems to be the case.
The callback argument of Array.from()
is actually just syntactic sugar for calling the Array.map()
method on the array, which we can do ourselves pretty easily, so that instead of this:
Array.from(document.getElementsByTagName('img'),
img => img.outerHTML);
we do this:
Array.from(document.getElementsByTagName('img')
).map(img => img.outerHTML);
which does work in my version of Safari. Here's the AppleScript:
tell application id ("com.apple.Safari") ¬
to tell the front document to if ¬
it exists then set images to the ¬
do JavaScript "Array.from(document
.getElementsByTagName('img')).map(
i => i.outerHTML);"
which, when run on the StackExchange home page, returns for me:
{"<img src=\"/Content/Img/hero/close.png\" alt=\"close\">", ¬
"<img src=\"/Content/Img/hero/bubble.png\" alt=\"Speech bubbles\">", ¬
"<img src=\"/Content/Img/hero/vote.png\" alt=\"Voting arrows\">", ¬
"<img src=\"/Content/Img/hero/check.png\" alt=\"checkmark\">", ¬
"<img src=\"https://...\" alt=\"Mathematics Stack Exchange\">", ¬
"<img src=\"https://...\" alt=\"Code Golf Stack Exchange\">", ¬
"<img src=\"https://...\" alt=\"Retrocomputing Stack Exchange\">", ¬
"<img src=\"https://...\" alt=\"Politics Stack Exchange\">", ¬
"<img src=\"https://...\" alt=\"Mathematica Stack Exchange\">", ...}
Substitute ".outerHTML"
for ".src"
to obtain the list of image URLs by themselves.