Search code examples
javascriptpythonnode.jsweb-scrapingtwitch

Scraping text data with timestamps from a generated chat window


I want to scrape a chat data from a Twitch clip. It's like a saved clip of a livestream where you can see how people reacted that moment. We can take this as an example:

https://clips.twitch.tv/BenevolentPunchyLyrebirdMingLee

So, I can pull all data if I watch the video until the end by using query selectors. However the thing I want to do is write a scraper which gets a link of a clip and outputs a raw text data with timestamps.

I searched about Twitch.com's API but there isn't anything about clips.


Solution

  • In the end, I created this little Python script to get chat data of a given Twitch Clip.

    Apparently, you can fetch chat data of a given video with the API call : https://api.twitch.tv/v5/videos/$VODID/comments?cursor=$NEXT

    where vodid is the id of the clip's video and the cursor works like this:

    you can fetch chat data by in chunks and every chunk has a cursor value which indicates the next chunk. So, you can keep getting next chunk UNTIL you find the offset where the clip's offset matches, and writing/keeping the chat data until it supresses clip's duration.

    If anyone needs it:

    https://github.com/OgulcanCelik/twitch-clip-chat