Two part question -- First, when running the code without final if statement, I'm not getting all of the HREF tags... I see many more links in Inspector that don't seem to come through.
Looking for a fix, but also trying to understand general knowledge on this - is there a reason why some links would work and others would not?
Similarly, I wanted to pull the HREF tags that contain "Surf-Report". I've used this code with p.startswith, and it works... but I couldn't find what the function call would be to say "contains".
I'm new to all of this, looking but don't fully understand either of these.
import requests
from bs4 import BeautifulSoup
profiles = []
urls = [
'https://magicseaweed.com/New-Jersey-Monmouth-County-Surfing/277/',
'https://magicseaweed.com/New-Jersey-Ocean-County-Surfing/278/'
]
for url in urls:
req = requests.get(url)
soup = BeautifulSoup(req.text, 'html.parser')
for profile in soup.find_all('a'):
profile = profile.get('href')
profiles.append(profile)
# print(profiles)
for p in profiles:
if p.contains('Surf-Report'):
print(p)
For context, my overall goal is to go to these different county pages, and get all of the HREF tags there. Once I have those, I want to visit each individual link and pull the wave sizes from each of the links stored there.
I'm looking to build a way to monitor all waves in New Jersey daily... no purpose, just a fun practice project with something I find interesting.
Those urls in page appears to be fed into dynamically, via an (or more?) XHR call. Upon a brief inspection of that page' Dev tools - network tab, I noticed a call to an api (from which I stripped the variables). Scraping that api returns over 8k results:
import requests
import pandas as pd
import json
r = requests.get('https://magicseaweed.com/api/mdkey/spot?&limit=-1')
df = pd.DataFrame(r.json())
print(df)
Result:
_id | _obj | _path | name | description | lat | lon | dataLat | dataLon | surfAreaId | dataSpotId | url | multiplier | optimumSwellAngle | optimumWindAngle | timezone | offset | modelName | isBigWave | ratingType | timeZoneAbbr | hasAdvancedForecast | proteusDataId | proteusResolution | surflineSpotId | defaultModelId | topLevelNav | tidalPort | isDataSpot | favouriteCount | mapImageUrl | breakingWaveModelId | weatherModel | added | hidden | edited | pointOfInterestId | useSDS | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Spot | Spot | Newquay - Fistral North | 50.4184 | -5.0997 | 50.42 | -5.08 | 7 | nan | /Newquay-Fistral-North-Surf-Report/1/ | 0.7 | 290 | 110 | Europe/London | 3600 | glo_30m | False | directional | BST | True | nan | UK_4m | 584204214e65fad6a7709cec | 42 | True | True | 0 | https://chart-1.msw.ms/maps/spot/2576f3cfb35dba07a84590141d54d3a5.png | nan | gfs.0p25 | -62169984000 | False | 1617982527 | c10396fc-ed41-4771-8e8e-ab8dbff5c67c | True | ||
1 | 2 | Spot | Spot | Porthtowan | 50.2891 | -5.2461 | 50.27 | -5.3 | 6 | nan | /Porthtowan-Surf-Report/2/ | 0.8 | 290 | 110 | Europe/London | 3600 | glo_30m | False | directional | BST | True | nan | GLOB_30m | 5842041f4e65fad6a7708c98 | 38 | True | True | 0 | https://chart-3.msw.ms/maps/spot/d278b42dc4a8adc983a24e2c04333665.png | nan | gfs.0p25 | -62169984000 | False | 1617982527 | 39bca112-f093-4a7b-90eb-b7993920e5c4 | True | ||
2 | 3 | Spot | Spot | Gwithian | 50.2235 | -5.399 | 50.2 | -5.5 | 6 | nan | /Gwithian-Surf-Report/3/ | 0.5 | 285 | 105 | Europe/London | 3600 | glo_30m | False | directional | BST | True | nan | GLOB_30m | 5842041f4e65fad6a7708c95 | 38 | True | Perranporth | True | 0 | https://chart-5.msw.ms/maps/spot/2a4608d0e793ee20f4566ca85f5ba6cd.png | nan | gfs.0p25 | -62169984000 | False | 1617982527 | 6b0785be-1efb-413d-a5a9-ba2133c6ef68 | True | |
3 | 4 | Spot | Spot | Sennen | 50.0802 | -5.6976 | 50.07 | -5.7 | 6 | nan | /Sennen-Surf-Report/4/ | 0.8 | 270 | 90 | Europe/London | 3600 | glo_30m | False | directional | BST | True | nan | GLOB_30m | 5842041f4e65fad6a7708c97 | 38 | True | True | 0 | https://chart-4.msw.ms/maps/spot/c1be3fe6871d15e4ea5297193b8b81da.png | nan | gfs.0p25 | -62169984000 | False | 1617982527 | a641e633-8692-4d4b-b2d6-c4e1d4132c9b | True | ||
4 | 5 | Spot | Spot | Constantine | 50.5333 | -5.0221 | 50.5759 | -4.92239 | 8 | nan | /Constantine-Surf-Report/5/ | 1 | 270 | 90 | Europe/London | 3600 | glo_30m | False | directional | BST | True | nan | GLOB_30m | 584204204e65fad6a77090b3 | 38 | True | True | 0 | https://chart-3.msw.ms/maps/spot/47b00f609d5e46cda66040d8b811bae6.png | nan | gfs.0p25 | -62169984000 | False | 1617982527 | 1daacdd5-a92a-4f7c-bc7c-af30a392ef7d | True | ||
5 | 6 | Spot | Spot | Bude - Crooklets | 50.8358 | -4.5548 | 50.8336 | -4.56057 | 8 | nan | /Bude-Crooklets-Surf-Report/6/ | 1 | 270 | 90 | Europe/London | 3600 | glo_30m | False | directional | BST | True | nan | GLOB_30m | 5842041f4e65fad6a7708ca5 | 38 | True | True | 0 | https://chart-1.msw.ms/maps/spot/553d3a850372eee8b10d13d23cbdb78e.png | nan | gfs.0p25 | -62169984000 | False | 1617982527 | 6cb522d3-a781-45ae-83cd-fcc941fd47cb | True | ||
6 | 7 | Spot | Spot | Croyde Beach | 51.1302 | -4.2435 | 51.1449 | -4.25995 | 9 | nan | /Croyde-Beach-Surf-Report/7/ | 0.8 | 270 | 90 | Europe/London | 3600 | glo_30m | False | directional | BST | True | nan | GLOB_30m | 5842041f4e65fad6a7708ca4 | 38 | True | Ilfracombe, England | True | 0 | https://chart-3.msw.ms/maps/spot/0f967e1e6130e9cb1b2623aafe966b58.png | nan | gfs.0p25 | -62169984000 | False | 1617982527 | 2dca4454-5789-4be3-808e-f512fef45dc3 | True | |
7 | 8 | Spot | Spot | Praa Sands | 50.103 | -5.391 | 50 | -3.87 | 5 | nan | /Praa-Sands-Surf-Report/8/ | 0.8 | 210 | 30 | Europe/London | 3600 | glo_30m | False | directional | BST | True | nan | GLOB_30m | 5842041f4e65fad6a7708c9a | 38 | True | True | 0 | https://chart-4.msw.ms/maps/spot/aea8da3ce8bd22228c07c79db8e9b8de.png | nan | gfs.0p25 | -62169984000 | False | 1617982527 | a166dde9-2d1c-4a55-bb34-5a6efce93986 | True | ||
8 | 9 | Spot | Spot | Whitsand Bay | 50.3387 | -4.2434 | 50.3334 | -4.2433 | 5 | nan | /Whitsand-Bay-Surf-Report/9/ | 0.7 | 225 | 45 | Europe/London | 3600 | glo_30m | False | directional | BST | True | nan | UK_4m | 584204204e65fad6a77090c5 | 42 | True | True | 0 | https://chart-3.msw.ms/maps/spot/1fe1f342742ba3cf7dd3f8d9943948cc.png | nan | gfs.0p25 | -62169984000 | False | 1617982527 | c6b42c46-7db3-4e53-8f38-b28db957b4e7 | True | ||
9 | 10 | Spot | Spot | Bantham | 50.2787 | -3.8885 | 50 | -3.87 | 5 | nan | /Bantham-Surf-Report/10/ | 0.8 | 230 | 65 | Europe/London | 3600 | glo_30m | False | directional | BST | True | 2 | UK_4m | 584204204e65fad6a77090c9 | 42 | True | River Yealm | True | 0 | https://chart-1.msw.ms/maps/spot/358c02090c0c31888fee4794b39d397c.png | nan | gfs.0p25 | -62169984000 | False | 1646829186 | d3566d34-b58d-4803-8cf2-3e3dc5fc1a48 | True |
Is this what you're after?