I'm trying to scrape really hectic twitch chats for keywords but sometimes the socket stops for a split second, but in that split second, 5 messages can go by. I thought of implementing some multithreading but no luck in the code below. It seems like they all fail to catch a keyword, or all succeed. Any help is appreciated. Code below:
import os
import time
from dotenv import load_dotenv
import socket
import logging
from emoji import demojize
import threading
# loading environment variables
load_dotenv()
# variables for socket
server = "irc.chat.twitch.tv"
port = 6667
nickname = "frankied003"
token = os.getenv("TWITCH_TOKEN")
channel = "#xqcow"
# creating the socket and connecting
sock = socket.socket()
sock.connect((server, port))
sock.send(f"PASS {token}\n".encode("utf-8"))
sock.send(f"NICK {nickname}\n".encode("utf-8"))
sock.send(f"JOIN {channel}\n".encode("utf-8"))
while True:
consoleInput = input(
"Enter correct answer to the question (use a ',' for multiple answers):"
)
# if console input is stop, the code will stop ofcourse lol
if consoleInput == "stop":
break
# make array of all the correct answers
correctAnswers = consoleInput.split(",")
correctAnswers = [answer.strip().lower() for answer in correctAnswers]
def threadingFunction():
correctAnswerFound = False
# while the correct answer is not found, the chats will keep on printing
while correctAnswerFound is not True:
while True:
try:
resp = sock.recv(2048).decode(
"utf-8"
) # sometimes this fails, hence retry until it succeeds
except:
continue
break
if resp.startswith("PING"):
sock.send("PONG\n".encode("utf-8"))
elif len(resp) > 0:
username = resp.split(":")[1].split("!")[0]
message = resp.split(":")[2]
strippedMessage = " ".join(message.split())
# once the answer is found, the chats will stop, correct answer is highlighted in green, and onto next question
if str(strippedMessage).lower() in correctAnswers:
print(bcolors.OKGREEN + username + " - " + message + bcolors.ENDC)
correctAnswerFound = True
else:
if username == nickname:
print(bcolors.OKCYAN + username + " - " + message + bcolors.ENDC)
# else:
# print(username + " - " + message)
t1 = threading.Thread(target=threadingFunction)
t2 = threading.Thread(target=threadingFunction)
t3 = threading.Thread(target=threadingFunction)
t1.start()
time.sleep(.3)
t2.start()
time.sleep(.3)
t3.start()
time.sleep(.3)
t1.join()
t2.join()
t3.join()
First, it makes not much sense to let 3 threads in parallel read on the same socket, it only leads to confusion and race conditions.
The main problem though is that you are assuming that a single recv
will always read a single message. But this is not how TCP works. TCP has no concept of a message, but only is a byte stream. A message is an application level concept. A single recv
might contain a single message, multiple messages, parts of messages ...
So you have to actually parse the data you get according to the semantics defined by the application protocol, i.e.
Apart from that don't blindly throw away errors during recv(..).decode(..)
. Given that you are using a blocking socket recv
will usually only fail if there is a fatal problem with the connection, in which case a retry will not help. The problem is most likely because you are calling decode
on incomplete messages which might also mean invalid utf-8 encoding. But since you simply ignore the problem you essentially lose the messages.