Search code examples
curllibcurl

Issue using Curl to download HTML for parsing


I have been trying to download some information from TikTok to graph some information like views and likes and whatnot.

I checked to see what information I would receive using curl in the CMD terminal with the following info:

mycurl> curl -k https://www.tiktok.com/@liamferrari/video/6816604410496519429

where I receive the following output:

{"statusCode":200,"contentType":"application/json","content":""}

However, when I use curl on almost any other web page, I receive the full HTML code as I expect.

Is there an obvious reason that I'm not receiving the HTML code from the web page? When I open the web console, I am able to see the HTML information I am trying to access with curl.

If anyone could give any insights, that would be nice.

Regards

Defender


Solution

  • Looks like TikTok requires a user agent to be specified (Firefox used here),

    curl -A "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:59.0) Gecko/20100101 Firefox/59.0" -k https://www.tiktok.com/@liamferrari/video/681660441049651
    

    Please take note that they are most likely filtering this to reduce scraping and it may be against their EULA to do this.