Search code examples
applescripttext-manipulation

Problem with text formating in applescript


I have been trying to tackle a problem I am having extracting text from a website and filtering it to get the information I want. I have gotten to the point where I create a TextEdit file from the website that looks like this:

7:00
Name of Meeting: Location Bad
Address
Area
8:00
Name of Meeting: Location Good
Address
Area
Noon
Name of Meeting: Location Good 2
Address
Area
3:00 pm
Name of Meeting: Location Bad 2
Area

My goal is to extract all meetings at certain locations (Location Good and Location Good 2). Ideally filtering just this information --> Time @ Location Good, Time @ Location Good 2.

I do not know how to format the text in order to get this done. I have tried filtering it, but since information is all separated on different lines, the filter comes back as just the keyword I am filtering (using Automator). To work around this, I've just done it manually and set an applescript to send me a text message with the information I already hand-filtered. This works for now, but when information on the website changes, my information will be out of date.

Here is the website: https://loukyaa.org/meetings/?tsml-day=6&tsml-region=louisville

Question is: how do I manipulate the text in order to filter the information that I want? I am interested in filtering all meetings for "Icehouse" and "Token 3 Club." Thank you!


Solution

  • With the incomplete information presented in your question, let me offer a solution for both Safari and Google Chrome to open the target URL in a new window, use JavaScript to get the inner text to the table of meetings, close the window, and filter it to the form of Time @ Location, e.g. 7:00 am @ Token 3 Club containing the meeting time and location for Icehouse and Token 3 Club.

    Using JavaScript, in this use case, it returns paragraphs of tab delimitated text in the variable foo which will be filtered using awk in a do shell script command, where the final output is stored in a variable named bar, which you can then do whatever you'd like with.

    The following example AppleScript code is for Safari:

    set theURL to "https://loukyaa.org/meetings/?tsml-day=6&tsml-region=louisville"
    
    tell application "Safari" to make new document with properties {URL:theURL}
    
    tell application "System Events"
        repeat until exists ¬
            (buttons of UI elements of groups of toolbar 1 of window 1 of ¬
                application process "Safari" whose name = "Reload this page")
            delay 0.5
        end repeat
    end tell
    
    tell application "Safari"
        set foo to do JavaScript ¬
            "document.getElementById('meetings_tbody').innerText;" in document 1
        close its front window
    end tell
    
    set awkCommand to ¬
        "awk 'BEGIN{FS=\"\t\"; OFS=\" @ \"}/Icehouse|Token 3 Club/{print $1,$3}'"
    
    set bar to do shell script awkCommand & " <<< " & foo's quoted form
    
    • NOTE: This code was tested under macOS High Sierra, however, for macOS Mojave and later, remove the words buttons of from the repeat until exists ¬ ... code.

    • NOTE: do JavaScript only works if Allow JavaScript from Apple Events is checked on the Safari > Develop menu, which is hidden by default and can be shown by checking [√] Show Develop menu in menu bar in: Safari > Preferences… > Advanced


    The following example AppleScript code is for Google Chrome:

    set theURL to "https://loukyaa.org/meetings/?tsml-day=6&tsml-region=louisville"
    
    tell application "Google Chrome"
        set URL of active tab of (make new window) to theURL
        repeat until (loading of tab 1 of window 1 is false)
            delay 0.5
        end repeat
        tell active tab of front window to set foo to ¬
            execute javascript ¬
                "document.getElementById('meetings_tbody').innerText;"
        close its front window
    end tell
    
    set awkCommand to ¬
        "awk 'BEGIN{FS=\"\t\"; OFS=\" @ \"}/Icehouse|Token 3 Club/{print $1,$3}'"
    
    set bar to do shell script awkCommand & " <<< " & foo's quoted form
    

    NOTE: This should work by default, as Google Chrome allows execution of JavaScript.


    In either case the variable bar contains e.g.:

    7:00 am @ Token 3 Club
    8:00 am @ Token 3 Club
    8:30 am @ Icehouse
    8:30 am @ Icehouse
    10:30 am @ Icehouse
    2:00 pm @ Token 3 Club
    4:00 pm @ Token 3 Club
    6:00 pm @ Icehouse
    6:00 pm @ Icehouse
    6:00 pm @ Token 3 Club
    8:00 pm @ Icehouse
    8:00 pm @ Token 3 Club
    10:30 pm @ Token 3 Club
    

    You can then do with it as you wish.

    Also note the FS=\"\t\"; portion of the awk command will expand to a normal tab character when compiled in, e.g., Script Editor. The use of \t is necessary when posting code on this site, otherwise it will show as, e.g., FS=\" \"; and then when copying the code it will not be a normal tab character once compiled.


    Note: The example AppleScript code is just that and does not contain any additional error handling as may be appropriate. The onus is upon the user to add any error handling as may be appropriate, needed or wanted. Have a look at the try statement and error statement in the AppleScript Language Guide. See also, Working with Errors. Additionally, the use of the delay command may be necessary between events where appropriate, e.g. delay 0.5, with the value of the delay set appropriately.