Search code examples
htmlapplescriptsubstringcutreformat

AppleScript: substring to string or format html


I'm working on my applescript right now and I'm stuck here.. Lets take this snippet as an example of html code

<body><div>Apple don't behave accordingly <a href = "http://apple.com>apple</a></div></body>

What I need now is to return the word without the html tags. Either by deleting the bracket with everything in it or maybe there is any other way to reformat html into plain text..

The result should be:

Apple don't behave accordingly apple


Solution

  • How about using textutil?

    on run -- example (don't forget to escape quotes)
        removeMarkup from "<body><div>Apple don't behave accordingly <a href = \"http://apple.com\">apple</a></div></body>"
    end run
    
    to removeMarkup from someText -- strip HTML using textutil
        set someText to quoted form of ("<!DOCTYPE HTML PUBLIC>" & someText) -- fake a HTML document header
        return (do shell script "echo " & someText & " | /usr/bin/textutil -stdin -convert txt -stdout") -- strip HTML
    end removeMarkup