Search code examples
pythonweb-scrapingtwitterbeautifulsouphref

Beautifulsoup4: How do you extract a usable link from href when it only provides parameters


I'm making a twitterbot for an honors project and have it almost completed. However, when I scrape the website for a specific URL, the href refers to a link that looks like this:

?1dmy&urile=wcm%3apath%3a%2Fohio%2Bcontent%2Benglish%2Fcovid-19%2Fresources%2Fnews-releases-news-you-can-use%2Fnew-restartohio-opening-dates

When inspecting the html and hovering over the href contents above, it shows that the above is actually the tail-end of the link. Is there any way to take this data and make it into a usable link? Other links within the same carousal provide full links on the same website, so I'm not sure why this is different than the others.

I tried searching for answers to this question but came up short: sorry if this is a repeat.


Solution

  • BeautifulSoup is showing you what the HTML of the page has. If the link is relative, you need the base URL for the page. That should come back in your request result, not in the HTML itself.