Search code examples
python-3.xweb-scrapingbeautifulsouphref

Can't get href from div, despite calling the class


I am trying to get all the products' links in this website: https://www.officeworks.com.au/shop/officeworks/c/technology/audio-speakers/voice-assistant-speakers

For example, for the Google Home Mini Chalk I should get https://www.officeworks.com.au/shop/officeworks/p/google-home-mini-chalk-sygminiwe

However, I can't even get into the div class that precedes the href link. I've tried different codes, all with bs4. Here are the two codes I was sure were going to work, but didn't:

First code:

from bs4 import BeautifulSoup
from urllib.request import Request, urlopen

url_products = []
url = "https://www.officeworks.com.au/shop/officeworks/c/technology/audio-speakers/voice-assistant-speakers"
req = Request(url)
html_page = urlopen(req)
soup = BeautifulSoup(html_page, "lxml")
data = soup.find_all('div', {'class': 'ProductTile__ProductImageWrapper-sc-1dlojg1-2 gRQAGx'})
for div in data:
    links = div.find_all('a')
    for a in links:
        print('https://www.officeworks.com.au/' + a['href'])
        url_products.append('https://www.officeworks.com.au/' + a['href'])

Second code:

from bs4 import BeautifulSoup
import requests

r = requests.get('https://www.officeworks.com.au/shop/officeworks/c/technology/audio-speakers/voice-assistant-speakers')
soup = BeautifulSoup(r.content, 'lxml')
links = [item['href'] for item in soup.select('.gRQAGx > a')]

I believe I am not calling the right class, but I can't manage to figure out what it is. Thanks in advance!


Solution

  • The reason why you are not getting the expected output because the page is loaded via JavaScript, Therefore you will be unable to extract the expected output until you render the JS.

    So you can use Selenium but i don't recommend it as it's will slowdown your task.

    Or to use HTMLSession from requests_html to render it on the fly.

    Otherwise let's just use the origin where the JS rendered from it's API.

    after tracking the XHR request via Network-Tab under Browser Developer tools CTRL SHIFT E for FireFox etc.

    So here we can do the call:

    import requests
    
    json = {"requests": [{"indexName": "prod-product-wc-bestmatch-personal", "params": "query=&hitsPerPage=24&maxValuesPerFacet=10&page=0&highlightPreTag=%3Cais-highlight-0000000000%3E&highlightPostTag=%3C%2Fais-highlight-0000000000%3E&clickAnalytics=true&optionalFilters=%5B%5D&sumOrFiltersScores=true&filters=(categorySeoPaths%3A%22technology%2Faudio-speakers%2Fvoice-assistant-speakers%22)&facets=%5B%22rangedOnline%22%2C%22forestProductSchemeName%22%2C%22hardDriveType%22%2C%22bagStyle%22%2C%22socketType%22%2C%22fullSizeInnerDimensions%22%2C%22stapleSize%22%2C%22connectivity%22%2C%22smartHomeCompatibility%22%2C%22industryType%22%2C%22sizeCapacity%22%2C%22performancePrintResolution%22%2C%22handsetIncludedHandsets%22%2C%22usbFlashLidType%22%2C%22videoResolution%22%2C%22maximumPunchingCapacity%22%2C%22rangedRetail%22%2C%22protectionType%22%2C%22rulerLength%22%2C%22sizeNumber%22%2C%22deviceConnectivityTechnology%22%2C%22unitsOfMeasure%22%2C%22selfAdhesive%22%2C%22interfaceHardDrive%22%2C%22sharpenerSize%22%2C%22connectivityWifiBands%22%2C%22microphoneType%22%2C%22labellerKeyboardLayout%22%2C%22numberOfUsb30Ports%22%2C%22operatingSystemEdition%22%2C%22ringRingSize%22%2C%22performanceHealthMonitoringFunctions%22%2C%22connectivityTechnology%22%2C%22dualSimCompatible%22%2C%22audioSource%22%2C%22totalNumberOfLabels%22%2C%22brushShape%22%2C%22maxProcessorClockSpeed%22%2C%22operatingHand%22%2C%22powerBatteryTechnology%22%2C%22travelRegion%22%2C%22capacityBinder%22%2C%22licenceValidityPeriod%22%2C%22storageHardDriveCapacity%22%2C%22spineSize%22%2C%22rollLength%22%2C%22numberOfRings%22%2C%22lightBulbType%22%2C%22colour%22%2C%222SidedCopying%22%2C%22automaticDocumentFeederCapacity%22%2C%22automaticPaperFeed%22%2C%22performanceShredderCutType%22%2C%22performanceBrightness%22%2C%22displayResolution%22%2C%22labellingOfficeUseFacet%22%2C%22securityLevel%22%2C%22maxSupportedDocumentSize%22%2C%22bulkbuyOnline%22%2C%22staplingCapacity%22%2C%22storageIncludedFlashMemory%22%2C%22compatibabilityCustomFitAndroid%22%2C%22drawerNumberOfDrawers%22%2C%22storageInternalMemorySize%22%2C%22ramInstalledSize%22%2C%22100RecycledProduct%22%2C%22placementPlacingMounting%22%2C%22earPlacement%22%2C%22foldedDimensions%22%2C%22portsTotalNumberOfNetworkingPorts%22%2C%22powerBatteryChargeAmpHours%22%2C%22noiseCancelling%22%2C%22surfaceShape%22%2C%22labellingHomeUseFacet%22%2C%22sizeDescription%22%2C%22maxLoadWeight%22%2C%22numberOfPowerPorts%22%2C%22compatibabilityCustomFitApple%22%2C%22tsaApproved%22%2C%22chassisType%22%2C%22surgeSuppression%22%2C%22printingTechnologyPrinters%22%2C%22placementVesaMountCompatibility%22%2C%22boardSizeFacet%22%2C%22frameStyle%22%2C%22serviceProvider%22%2C%22bluetoothCompatibility%22%2C%22scannerType%22%2C%22photoCapacityQuantity%22%2C%22numberOfUsb20Ports%22%2C%22rulingType%22%2C%22learningSkillsFocus%22%2C%22licenceType%22%2C%22connectivityDisplayConnections%22%2C%22performanceMaxThickness%22%2C%22performanceResolution%22%2C%22paperWeightGsm%22%2C%22numberOfProcessorCores%22%2C%22fitsDevice%22%2C%22brushhairtype%22%2C%22opticalZoom%22%2C%22processorClockSpeed%22%2C%22labellingIndustrialUseFacet%22%2C%22performanceApproximateNumberOfImpressions%22%2C%222SidedPrinting%22%2C%22powerPowerType%22%2C%22interfaceType%22%2C%22printerConnectivityTechnology%22%2C%22numberOfReamsPerCarton%22%2C%22baseWheels%22%2C%22performanceEstimatedCartridgeYieldSheets%22%2C%22papersize%22%2C%22processorType%22%2C%22wallStrengthThickness%22%2C%22storageHardDriveCapacityComputingDevices%22%2C%22ciewhiteness%22%2C%22runTime%22%2C%22stampInking%22%2C%22switched%22%2C%22processorManufacturer%22%2C%22deviceCaseCompatibility%22%2C%22caseFeaturesNumberOfCompartments%22%2C%22displaySize%22%2C%222sidedScanning%22%2C%22glutenFree%22%2C%22restTime%22%2C%22operatingPlatformCompatibility%22%2C%22powerSource%22%2C%22touchScreen%22%2C%22displayPanelType%22%2C%22secondaryProcessorType%22%2C%22wastebinCapacityRange%22%2C%22softwareDistributionMedia%22%2C%22learningAgeRange%22%2C%22tapeWidth%22%2C%22storageStorageCapacity%22%2C%22cableLength%22%2C%22skillLevel%22%2C%22flightTime%22%2C%22energyRating%22%2C%22maximumRecommendedDailyUsage%22%2C%22contentLayout%22%2C%22deviceLocation%22%2C%22brand%22%2C%22numberOfUsb31Ports%22%2C%22lidIncluded%22%2C%22scannerScanResolution%22%2C%22portsNumberOfUsbChargePorts%22%2C%22envelopeSize%22%2C%22keyboardCompatibility%22%2C%22primaryCameraVideo%22%2C%22supportedMemoryCards%22%2C%22connectivityDisplayConnectionsPanels%22%2C%22up1Category%22%2C%22price%22%2C%22categorySeoPaths%22%2C%22rangedRetail%22%2C%22rangedOnline%22%2C%22price%22%2C%22brand%22%2C%22colour%22%2C%22audioSource%22%2C%22cableLength%22%2C%22up1Category%22%2C%22bulkbuyOnline%22%2C%22microphoneType%22%2C%22noiseCancelling%22%2C%22bluetoothCompatibility%22%2C%22powerBatteryTechnology%22%2C%22smartHomeCompatibility%22%5D&tagFilters=&facetFilters=%5B%5B%22categorySeoPaths%3Atechnology%2Faudio-speakers%2Fvoice-assistant-speakers%22%5D%5D"}, {"indexName": "prod-product-wc-bestmatch-personal", "params": "query=&hitsPerPage=1&maxValuesPerFacet=10&page=0&highlightPreTag=%3Cais-highlight-0000000000%3E&highlightPostTag=%3C%2Fais-highlight-0000000000%3E&clickAnalytics=false&optionalFilters=%5B%5D&sumOrFiltersScores=true&filters=(categorySeoPaths%3A%22technology%2Faudio-speakers%2Fvoice-assistant-speakers%22)&attributesToRetrieve=%5B%5D&attributesToHighlight=%5B%5D&attributesToSnippet=%5B%5D&tagFilters=&analytics=false&facets=categorySeoPaths"}]}
    r = requests.post("https://k535caawve-dsn.algolia.net/1/indexes/*/queries?x-algolia-agent=Algolia%20for%20JavaScript%20(3.35.1)%3B%20Browser%20(lite)%3B%20react-instantsearch%205.4.0%3B%20JS%20Helper%202.26.1&x-algolia-application-id=K535CAAWVE&x-algolia-api-key=8a831febe0110932cfa06ff0e2024b4f", json=json).json()
    
    for item in r['results'][0]['hits']:
        print("Name: {:<65}, Url: {}".format(
            item['name'], f"https://www.officeworks.com.au/shop/officeworks/p/{item['urlKeyword']}"))
    

    Output:

    Name: Google Home Mini Chalk                                           , Url: https://www.officeworks.com.au/shop/officeworks/p/google-home-mini-chalk-sygminiwe
    Name: Google Home Mini Charcoal                                        , Url: https://www.officeworks.com.au/shop/officeworks/p/google-home-mini-charcoal-sygminibk
    Name: Google Nest Hub Max Charcoal                                     , Url: https://www.officeworks.com.au/shop/officeworks/p/google-nest-hub-max-charcoal-sygnhmaxbk
    Name: Google Nest Hub Max Chalk                                        , Url: https://www.officeworks.com.au/shop/officeworks/p/google-nest-hub-max-chalk-sygnhmaxwe
    Name: Google Home                                                      , Url: https://www.officeworks.com.au/shop/officeworks/p/google-home-sygghome
    Name: Ultimate Ears Megablast Wireless Speaker with Alexa Graphite     , Url: https://www.officeworks.com.au/shop/officeworks/p/ultimate-ears-megablast-wireless-speaker-with-alexa-graphite-inmblastbk
    Name: Google Nest Mini 2nd Generation Charcoal                         , Url: https://www.officeworks.com.au/shop/officeworks/p/google-nest-mini-2nd-generation-charcoal-sygnmini2c
    Name: Google Nest Mini 2nd Generation Chalk                            , Url: https://www.officeworks.com.au/shop/officeworks/p/google-nest-mini-2nd-generation-chalk-sygnmini2w
    Name: Ultimate Ears Blast Wireless Speaker with Alexa Graphite         , Url: https://www.officeworks.com.au/shop/officeworks/p/ultimate-ears-blast-wireless-speaker-with-alexa-graphite-imblastbk
    Name: Amazon 5.5" Echo Show 5 Charcoal                                 , Url: https://www.officeworks.com.au/shop/officeworks/p/amazon-5-5-echo-show-5-charcoal-syecosh5cl
    Name: Amazon Echo 3rd Generation Charcoal                              , Url: https://www.officeworks.com.au/shop/officeworks/p/amazon-echo-3rd-generation-charcoal-syaedotclc
    Name: JBL Flip Essential Bluetooth Speaker Gun Metal                   , Url: https://www.officeworks.com.au/shop/officeworks/p/jbl-flip-essential-bluetooth-speaker-gun-metal-imjblfless
    Name: Ultimate Ears Megablast Wireless Speaker with Alexa Blue         , Url: https://www.officeworks.com.au/shop/officeworks/p/ultimate-ears-megablast-wireless-speaker-with-alexa-blue-inmblastbe
    Name: Amazon Echo Dot 3rd Gen With Clock Sandstone                     , Url: https://www.officeworks.com.au/shop/officeworks/p/amazon-echo-dot-3rd-gen-with-clock-sandstone-syaedotcls
    Name: Ultimate Ears Megablast Wireless Speaker with Alexa Merlot       , Url: https://www.officeworks.com.au/shop/officeworks/p/ultimate-ears-megablast-wireless-speaker-with-alexa-merlot-inmblastrd
    Name: Amazon Echo Dot 3rd Gen Heather Grey                             , Url: https://www.officeworks.com.au/shop/officeworks/p/amazon-echo-dot-3rd-gen-heather-grey-syamdot3ng
    Name: Lenovo Smart Clock E27 Starter Pack                              , Url: https://www.officeworks.com.au/shop/officeworks/p/lenovo-smart-clock-e27-starter-pack-sylsmcbun2
    Name: Amazon 5.5" Echo Show 5 Sandstone                                , Url: https://www.officeworks.com.au/shop/officeworks/p/amazon-5-5-echo-show-5-sandstone-syecosh5ss
    Name: Amazon Echo Studio Black                                         , Url: https://www.officeworks.com.au/shop/officeworks/p/amazon-echo-studio-black-syastudiob
    Name: Lenovo Smart Clock B22 Starter Pack                              , Url: https://www.officeworks.com.au/shop/officeworks/p/lenovo-smart-clock-b22-starter-pack-sylsmcbun1
    Name: JBL Link View Speaker with Google Assistant                      , Url: https://www.officeworks.com.au/shop/officeworks/p/jbl-link-view-speaker-with-google-assistant-injblinkvw
    Name: Ultimate Ears Blast Wireless Speaker with Alexa Blue Steel       , Url: https://www.officeworks.com.au/shop/officeworks/p/ultimate-ears-blast-wireless-speaker-with-alexa-blue-steel-imblastbe
    Name: LG WK7 ThinQ WiFi/Bluetooth Speaker with Google Assistant        , Url: https://www.officeworks.com.au/shop/officeworks/p/lg-wk7-thinq-wifi-bluetooth-speaker-with-google-assistant-inlgthinkq