Search code examples
javascriptapplescript

Getting a Javascript HTML Collection from Safari into Applescript


I have the following code that should work. I am simply trying to get all the "img" elements from a page into a list in AS so that I can work on that list.

tell application "Safari"
set theWindow to front window
    set theTab to current tab of theWindow
    set theURL to URL of theTab
    set asImages to (do JavaScript "theSearch = document.getElementsByTagName(\"img\");
            theImages = [].slice.call(theSearch);
            theImages" in theTab) 
end tell

If I enter in

theSearch = document.getElementsByTagName("img");
theImages = [].slice.call(theSearch);
theImages

into the console on Safari it works. But when I run the same code as above from within the "do javascript" command in Safari, I get nothing back at all, the variable asImages is not created at all. I have tried everything that I can think of, to no avail. I am hoping someone with a fresh pair of eyes can spot what I am doing wrong rather quickly. TIA


Solution

  • TL;DR: To jump straight to the solution, scroll down to the section marked "ADDENDUM" at the bottom.


    To summarise what's been discussed in the comments:

    getElementsByTagName() returns an HTMLCollection object, which is similar to an array, but it’s not an array. AppleScript won't know what it is nor what to do with it. You already took care of this by converting it to an array, using a fairly old technique:

    theSearch = document.getElementsByTagName("img");
    theImages = [].slice.call(theSearch);
    

    However, each element of the array is a Node object, which represents an element in the HTML DOM. Again, this is a data type that's completely foreign to AppleScript, so we need to convert every element of the list into something more amenable.

    You stated that you want a list of strings. This could mean a list of src attributes for each img element, which will be a list of URLs (i.e. "https://..."), or it could be that you want the literal HTML that defines each img element (i.e. <img src="...">) ? I'll cover both situations below.

    I'm going to use a different method to convert the HTMLCollection into a standard JavaScript array, namely Array.from(). This takes at least one argument—the object you wish to convert to an array—and an optional second, which is a callback function that gets applied to every element in the resulting array. This is very convenient in our case, as we can define this function such that it converts the Node elements into strings for us.

    To obtain a list of the src attributes:

    Array.from( document.getElementsByTagName('img'),
                img => img.src );
    

    Running it on this page returns:

    ['...',
     'https://cdn.sstatic.net/Img/teams/teams-illo-free-...',
     'https://www.gravatar.com/avatar/6ecd4d9aedf87d99e7...',
     'https://www.gravatar.com/avatar/6ecd4d9aedf87d99e7...',
     'https://graph.facebook.com/1261291368/picture?type...',
     'https://lh3.googleusercontent.com/-XdUIqdMkCWA/AAA...']
    

    I've truncated the values for easier reading, but the values returned are full URLs.

    To obtain a list of HTML <img> declarations:

    Array.from( document.getElementsByTagName('img'),
                img => img.outerHTML );
    

    which returns:

    ['<img src="...',
     '<img class="wmx100 mx-auto my8 h-auto d-block" wid...', 
     '<img src="https://www.gravatar.com/avatar/6ecd4d9a...', 
     '<img src="https://www.gravatar.com/avatar/6ecd4d9a...', 
     '<img src="https://graph.facebook.com/1261291368/pi...', 
     '<img src="https://lh3.googleusercontent.com/-XdUIq...']
    

    Again, the values returned are full. To exemplify, here's the last item in the array in full, formatted over two lines for readability:

     '<img src="https://lh3.googleusercontent.com/-XdUIq…er avatar"
           width="32" height="32" class="bar-sm">'
    

    The AppleScript

    Using the second case from above, this can be implemented in AppleScript in a single line:

    tell application id "com.apple.Safari" to tell document 1 ¬
        to set images to do JavaScript "Array.from( document 
                                       .getElementsByTagName(
                                       'img', i=>i.outerHTML)
                                       );"
    

    JavaScript statements can be split over a multiple lines inherently, and they should always be terminated with a semicolon. I've also used a different identifier in the JS code for the callback function, namely "i", just to demonstrate that the identifier used makes no difference.


    ADDENDUM

    In macOS 12.6 (Monterey) using Safari 16.0 (17614.1.25.9.10, 17614), the Array.from() method doesn't appear to apply the callback function to the array, simply returning the list of HTMLImageElement class JS objects, which will not be viable to AppleScript.

    I checked to make sure it was Safari at fault here and not me, and this seems to be the case.

    The callback argument of Array.from() is actually just syntactic sugar for calling the Array.map() method on the array, which we can do ourselves pretty easily, so that instead of this:

    Array.from(document.getElementsByTagName('img'),
               img => img.outerHTML);
    

    we do this:

    Array.from(document.getElementsByTagName('img')
              ).map(img => img.outerHTML);
    

    which does work in my version of Safari. Here's the AppleScript:

    tell application id ("com.apple.Safari") ¬
            to tell the front document to if ¬
            it exists then set images to the ¬
            do JavaScript "Array.from(document
            .getElementsByTagName('img')).map(
            i => i.outerHTML);"
    

    which, when run on the StackExchange home page, returns for me:

    {"<img src=\"/Content/Img/hero/close.png\" alt=\"close\">", ¬
     "<img src=\"/Content/Img/hero/bubble.png\" alt=\"Speech bubbles\">", ¬ 
     "<img src=\"/Content/Img/hero/vote.png\" alt=\"Voting arrows\">", ¬
     "<img src=\"/Content/Img/hero/check.png\" alt=\"checkmark\">", ¬
     "<img src=\"https://...\" alt=\"Mathematics Stack Exchange\">", ¬
     "<img src=\"https://...\" alt=\"Code Golf Stack Exchange\">", ¬
     "<img src=\"https://...\" alt=\"Retrocomputing Stack Exchange\">", ¬
     "<img src=\"https://...\" alt=\"Politics Stack Exchange\">", ¬
     "<img src=\"https://...\" alt=\"Mathematica Stack Exchange\">", ...}
    

    Substitute ".outerHTML" for ".src" to obtain the list of image URLs by themselves.