Looking at something like this:
https://disclosures-clerk.house.gov/FinancialDisclosure
Using the 'Search' function in the box on the left, I'd like to select a year in the 'Filing Year' dropdown and retrieve PDFs hyperlinked to in the results in Python.
For instance, for year 2024, I'd like to retrieve PDFs linked to for the 140 entries returned. Ideally, I'd also be able to filter out based on 'Filing'. Any way to do this?
Try:
import requests
from bs4 import BeautifulSoup
data = {
"LastName": "",
"FilingYear": "2022", # <-- change year here
"State": "",
"District": "",
}
api_url = (
"https://disclosures-clerk.house.gov/FinancialDisclosure/ViewMemberSearchResult"
)
soup = BeautifulSoup(requests.post(api_url, data=data).content, "html.parser")
for a in soup.select('a[href$=".pdf"]'):
print(a.text, a["href"])
Prints:
...
Wittman, Hon.. Robert J. public_disc/ptr-pdfs/2022/20021150.pdf
Wittman, Hon.. Robert J. public_disc/ptr-pdfs/2022/20021344.pdf
Wittman, Hon.. Robert J. public_disc/ptr-pdfs/2022/20021515.pdf
Wittman, Hon.. Robert J. public_disc/ptr-pdfs/2022/20021679.pdf
Wittman, Hon.. Robert J. public_disc/ptr-pdfs/2022/20021807.pdf
Wittman, Hon.. Robert J. public_disc/ptr-pdfs/2022/20022101.pdf
Wittman, Hon.. Robert J. public_disc/financial-pdfs/2022/30018513.pdf
Womack, Hon.. Steve public_disc/financial-pdfs/2022/10054531.pdf
Womack, Hon.. Steve public_disc/ptr-pdfs/2022/20022049.pdf
Yakym, Hon.. Rudy III. public_disc/financial-pdfs/2022/10052905.pdf
Yakym, Hon.. Rudy III. public_disc/ptr-pdfs/2022/20022181.pdf
Yakym, Hon.. Rudy III. public_disc/financial-pdfs/2022/30018183.pdf
Zinke, Hon.. Ryan K. public_disc/financial-pdfs/2022/10053424.pdf