I have been trying to tackle a problem I am having extracting text from a website and filtering it to get the information I want. I have gotten to the point where I create a TextEdit file from the website that looks like this:
7:00
Name of Meeting: Location Bad
Address
Area
8:00
Name of Meeting: Location Good
Address
Area
Noon
Name of Meeting: Location Good 2
Address
Area
3:00 pm
Name of Meeting: Location Bad 2
Area
My goal is to extract all meetings at certain locations (Location Good and Location Good 2). Ideally filtering just this information --> Time @ Location Good, Time @ Location Good 2.
I do not know how to format the text in order to get this done. I have tried filtering it, but since information is all separated on different lines, the filter comes back as just the keyword I am filtering (using Automator). To work around this, I've just done it manually and set an applescript to send me a text message with the information I already hand-filtered. This works for now, but when information on the website changes, my information will be out of date.
Here is the website: https://loukyaa.org/meetings/?tsml-day=6&tsml-region=louisville
Question is: how do I manipulate the text in order to filter the information that I want? I am interested in filtering all meetings for "Icehouse" and "Token 3 Club." Thank you!
With the incomplete information presented in your question, let me offer a solution for both Safari and Google Chrome to open the target URL in a new window, use JavaScript to get the inner text to the table of meetings, close the window, and filter it to the form of Time @ Location, e.g. 7:00 am @ Token 3 Club containing the meeting time and location for Icehouse and Token 3 Club.
Using JavaScript, in this use case, it returns paragraphs of tab delimitated text in the variable foo
which will be filtered using awk
in a do shell script
command, where the final output is stored in a variable named bar
, which you can then do whatever you'd like with.
The following example AppleScript code is for Safari:
set theURL to "https://loukyaa.org/meetings/?tsml-day=6&tsml-region=louisville"
tell application "Safari" to make new document with properties {URL:theURL}
tell application "System Events"
repeat until exists ¬
(buttons of UI elements of groups of toolbar 1 of window 1 of ¬
application process "Safari" whose name = "Reload this page")
delay 0.5
end repeat
end tell
tell application "Safari"
set foo to do JavaScript ¬
"document.getElementById('meetings_tbody').innerText;" in document 1
close its front window
end tell
set awkCommand to ¬
"awk 'BEGIN{FS=\"\t\"; OFS=\" @ \"}/Icehouse|Token 3 Club/{print $1,$3}'"
set bar to do shell script awkCommand & " <<< " & foo's quoted form
NOTE: This code was tested under macOS High Sierra, however, for macOS Mojave and later, remove the words buttons of
from the repeat until exists ¬ ...
code.
NOTE: do JavaScript
only works if Allow JavaScript from Apple Events is checked on the Safari > Develop menu, which is hidden by default and can be shown by checking [√] Show Develop menu in menu bar in: Safari > Preferences… > Advanced
The following example AppleScript code is for Google Chrome:
set theURL to "https://loukyaa.org/meetings/?tsml-day=6&tsml-region=louisville"
tell application "Google Chrome"
set URL of active tab of (make new window) to theURL
repeat until (loading of tab 1 of window 1 is false)
delay 0.5
end repeat
tell active tab of front window to set foo to ¬
execute javascript ¬
"document.getElementById('meetings_tbody').innerText;"
close its front window
end tell
set awkCommand to ¬
"awk 'BEGIN{FS=\"\t\"; OFS=\" @ \"}/Icehouse|Token 3 Club/{print $1,$3}'"
set bar to do shell script awkCommand & " <<< " & foo's quoted form
NOTE: This should work by default, as Google Chrome allows execution of JavaScript.
In either case the variable bar
contains e.g.:
7:00 am @ Token 3 Club
8:00 am @ Token 3 Club
8:30 am @ Icehouse
8:30 am @ Icehouse
10:30 am @ Icehouse
2:00 pm @ Token 3 Club
4:00 pm @ Token 3 Club
6:00 pm @ Icehouse
6:00 pm @ Icehouse
6:00 pm @ Token 3 Club
8:00 pm @ Icehouse
8:00 pm @ Token 3 Club
10:30 pm @ Token 3 Club
You can then do with it as you wish.
Also note the FS=\"\t\";
portion of the awk
command will expand to a normal tab character when compiled in, e.g., Script Editor. The use of \t
is necessary when posting code on this site, otherwise it will show as, e.g., FS=\" \";
and then when copying the code it will not be a normal tab character once compiled.
Note: The example AppleScript code is just that and does not contain any additional error handling as may be appropriate. The onus is upon the user to add any error handling as may be appropriate, needed or wanted. Have a look at the try statement and error statement in the AppleScript Language Guide. See also, Working with Errors. Additionally, the use of the delay command may be necessary between events where appropriate, e.g. delay 0.5
, with the value of the delay set appropriately.