I'm trying to write an AutoIT script that would automatically navigate to a URL (https://www.macrotrends.net/1476/copper-prices-historical-chart-data) via Internet Explorer and click on a button (red coloured, labeled as 'Download Historical Data') that would download an excel data file. By inspecting this button, I've found that it has an id ("dataDownload"), but my attempts of using the _IEGetObjById function in AutoIT have been unsuccessful. Here is my script:
#include <IE.au3>
Local $oIE = _IECreate("https://www.macrotrends.net/1476/copper-prices-historical-chart-data", 1)
Local $oButton = _IEGetObjById($oIE, "dataDownload")
_IEAction($oButton, "click")
The script is opening Internet Explorer, but I think it's not detecting the button as intended. My suspicion is that it could be because the button is located within a frame or container or whatever you might call it?
Also, my intention is to have this AutoIT script called by a batch file on Windows, to be run on a schedule. Please feel free to enlighten if you have any other recommended methods for such automated web-scraping.
Would appreciate all help I could get here please.
The Download historical data
button is inside an iframe that points to this URL.
https://www.macrotrends.net/assets/php/chart_iframe_comp.php?id=1476&url=copper-prices-historical-chart-data
But it needs Javascript to work, meaning, you have to load the page in a browser (that executes javascript) to click that button [^1].
Luckily, there are tools to automate browser interactions. Playwright is a library available in many languages, but we'll use it with Python.
pip
(Python package manager) to the latest version:
python -m pip install --user --upgrade pip
playright
python -m pip install --user playwright
python -m playwright install chromium
python -m playwright open 'https://www.macrotrends.net/1476/copper-prices-historical-chart-data'
A browser window should pop up. Change the Target to python and click Record button in Playwright inspector window.
Then do what you need to do to click & download the file from the page.
Playwright will generate some code for us.
Stop recording and copy the script.
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
with p.chromium.launch(headless=False) as browser:
page = browser.new_page(accept_downloads=True)
page.goto('https://www.macrotrends.net/1476/copper-prices-historical-chart-data')
# paste the script here
In my runs, playwright generated this script for me:
# Click button:has-text("Download Historical Data")
with page.expect_download() as download_info:
page.frame(name="chart_iframe").click("button:has-text(\"Download Historical Data\")")
Combining both and adding the code to save the file, we have:
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
print('launching the browser')
with p.chromium.launch(headless=False) as browser:
page = browser.new_page(accept_downloads=True)
print('visiting the page')
page.goto('https://www.macrotrends.net/1476/copper-prices-historical-chart-data')
print('clicking download button')
# Click button:has-text("Download Historical Data")
with page.expect_download() as download_info:
page.frame(name="chart_iframe").click("button:has-text(\"Download Historical Data\")")
download = download_info.value
print('saving the file')
download.save_as('prices.csv')
One important point you should look out for is that Python is a whitespace-sensitive language, meaning you have to align the code indentation properly to make it run.
Save this file as download_copper_prices.py
, and when we run the script,
python download_copper_prices.py
A browser should pop up, visit the page, download and save the prices as prices.csv
in the working directory.
[^1]: Actually, we can scrape the prices right off the page, as they're serialized as JSON. With a bit of regex, we can extract the JSON from the HTML. But that solution seemed to me a bit too job-specific, so I opted for a more hands-on solution that he can apply for other occasions. Scripting browser interactions is an important tool every power user should learn.
That said, I'm not knowledgeable in AutoIt, but assuming it has capable Regex tools, it's entirely possible to achieve the same result in AutoIt by downloading the HTML of the URL that the iframe points to (https://www.macrotrends.net/assets/php/chart_iframe_comp.php?id=1476&url=copper-prices-historical-chart-data
) then using this regex to extract the JSON:
var originalData = (.*);
Running this code on the browser console returns an array of 15k+ entries:
JSON.parse(document.body.innerHTML.match('var originalData = (.*);')[1])
consisting of date and closing prices:
[{
"date": "1959-07-02",
"close": "0.2930"
}, {...}, ...]
Then it should be easy to convert this JSON to CSV.