Search code examples
google-sheetsweb-scrapingxpath

XPath scrape using google sheets


I have been struggling to get any XPath technique to work on octoparse and similar software. I'm now trying google sheets from reading posts here and can't get it to work either.

Input: A slideshare presentation url (eg https://www.slideshare.net/carologic/ai-and-machine-learning-demystified-by-carol-smith-at-midwest-ux-2017)

Intended output: Slideshare embed url (in this case: https://www.slideshare.net/slideshow/embed_code/key/wZudqqTdctjWXA)

I think this would be the way to get the output using google sheets: =importxml(A1,"//meta[@itemprop='embedURL']/@content")

It is not working for me (failure to fetch url). With Octoparse etc I just got a blank value.

I'm being daft here, no doubt. Any help would be useful.


Solution

  • It doesn't work because slideshare is owned by LinkedIN, and they have put in a lot of effort to ensure they cant be scraped, including google sheets. Before it was possible, but I believe they eventually caught on to the work around.