I'm trying to get the URL of a tiktok video from a shortened URL in order to extract the @username of the poster and the video id of the post. Some examples of shortened URL's I've come across seem to be shared URL's on Facebook/Twitter in the form of "m.tiktok.com" or more specifically, "https://vm.tiktok.com/pF6GGf/". This link ends up redirecting to "https://www.tiktok.com/@blessy2flex/video/6796374554391448838...". Is there any way I could get this URL with only the shortened URL?
I want to be able to get the username (@blessy2flex) and the video id (6796374554391448838) from the shortened URL as it appears in the actual URL. I've tried tracking redirects but the URL I end up with "https://m.tiktok.com/v/6833793010149412101.html..." is this, which evidently is not the same.
I've also tried things like Selenium, which actually ends up giving me the HTML of the original video page, in which I can find the username and the video id by searching through the actual HTML, but this method doesn't seem too scalable as I'm sure tiktok would notice and slow down my processes.
TikTok might be not redirecting you the right URL because it is detecting your User-Agent
. If you update your headers with some 'browser-like' User-Agent
, it should work.
Here's how you can solve your problem.
import re
import requests
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
url = 'https://vm.tiktok.com/pF6GGf/'
response = requests.get(url, headers=headers)
print(response.url) # the correct url with the username
Finally, you can find the username and the video id using regex.
re.findall(r'(@[a-zA-z0-9]*)\/.*\/([\d]*)?',response.url)
OUTPUT: [('@blessy2flex', '6796374554391448838')]
Extra: Modern webservices are usually quite smart and may sometimes have different mechanisms to thwart crawling activities. If you plan to do a lot of crawling (I assume valid/legal), you'll have to also take into account your rate of requesting the URL pages (among a lot of other things). If you need to manage more user-agents, you might find this pip package helpful (fake-useragent).