I hope to get the table contents of this website. However, the webpage's design is very special and my code below is only able to get the table in the first page:
I know since there are only three pages I can just copy manually, however I still hope to write a script that can automate the entire process.
driver = webdriver.Chrome()
driver.get(url)
time.sleep(5)
html_str = driver.page_source
soup = bs(html_str, "html.parser")
soup.find("table")
Here is the pagnitor part from soup
, I have no experience in web-development and do not understand what actually happens after we click Next.
<ha-paginator data-translation-block="false" data-translation-id="1442"><!-- --><nav aria-label="Page navigation" class="text-center" data-translation-block="false" data-translation-id="1443">
<ul class="pagination" data-translation-block="false" data-translation-id="1444">
<!-- -->
<!-- --><li class="active" data-translation-block="false" data-translation-id="1445">
<!-- --><a data-translated="false" data-translation-checksum="57ad7d2ec0e248914c2b0ae7efc17011d1435f99d807e43b172697027ffe46ce500c3ff64f5162eaa059c11a23fa5d8c442ab67bd219d74311601bed517cf477" href="#"> 1
<!-- --><span class="sr-only">(current)</span>
</a>
</li><li data-translation-block="false" data-translation-id="1446">
<!-- --><a data-translated="false" data-translation-checksum="7eece0387dc3c6876397df60e2d7dbe0e2c94ecdc42d7e50d5208a4c84885caa703c487d86900ac97f10ad493893db85144cf7889d8ac8fd008dfd4c8f0e98df" href="#"> 2
<!-- -->
</a>
</li><li data-translation-block="false" data-translation-id="1447">
<!-- --><a data-translated="false" data-translation-checksum="aa08ec665075172d835562b332e78832e7f9d3b7f3df47d5a32b8f3a1682daaed49831faf19eeaca164d8e94e3449ade2a83d83dfaa83878c832f644fea11f95" href="#"> 3
<!-- -->
</a>
</li><!-- --><li data-translated="false" data-translation-block="true" data-translation-checksum="7d03f54e74b11d46eacd33365a0aa16a3ba2857949c7f795c2d9c07b5689fbc4230dc22c45af2303eba21a7d8016f197d9b474d4149db6d0df059ce00416e192" data-translation-id="1448">
<a href="#">
Next
</a>
</li>
</ul>
</nav>
<!-- --></ha-paginator>
<hr class="big" data-translation-block="true" data-translation-id="1449"/>
</div>
</div>
</div>
</ha-table-search>
The data you see on the page is loaded from external URL via JavaScript, so you can get data directly from there:
import pandas as pd
import requests
url = "https://immi.homeaffairs.gov.au/_layouts/15/api/data.aspx/GetPriceList"
data = requests.post(url, json={"category": "Visa", "onshore": "All"}).json()
df = pd.DataFrame(data["d"]["data"])
df.pop("note")
print(df.head(5))
Prints:
visaSubclassCode visaSubclassText streamCode streamText onShore basePrice over18Price under18Price nonInternetPrice subsequentPrice
0 100 Partner (Provisional and Migrant) visa (subclass 309/100) No AUD8,850.00 AUD4,430.00 AUD2,215.00 N/A N/A
1 101 Child visa (subclass 101) No AUD3,055.00 AUD1,530.00 AUD765.00 N/A N/A
2 102 Adoption visa (subclass 102) No AUD3,055.00 AUD1,530.00 AUD765.00 N/A N/A
3 117 Orphan Relative visa (subclass 117) No AUD1,870.00 AUD935.00 AUD470.00 N/A N/A
4 124 Distinguished Talent visa (subclass 124) No AUD4,110.00 AUD2,055.00 AUD1,030.00 N/A N/A