Search code examples
pythonregexpraw

Using regular expressions to match a word in Python


I am using PRAW to make a reddit bot that takes the comment author of someone who says "alot" and stores their username into a list. I am having troubles with the regular expression and how to get the string to work. Here is my code.

#importing praw for reddit api and time to make intervals

import praw
import time
import re


username = "LewisTheRobot"
password = 



r = praw.Reddit(user_agent = "Counts people who say alot")

word_to_match = ['\balot\b']

storage = []

r.login(username, password)

def run_bot():
    subreddit = r.get_subreddit("test")
    print("Grabbing subreddit")
    comments = subreddit.get_comments(limit=200)
    print("Grabbing comments")
    for comment in comments:
        comment_text = comment.body.lower()
        isMatch = any(string in comment_text for string in word_to_match)
        if comment.id not in storage and isMatch:
            print("Match found! Storing username: " + str(comment.author) + " into list.")
            storage.append(comment.author)


    print("There are currently: " + str(len(storage)) + " people who use 'alot' instead of ' a lot'.")


while True:
    run_bot()
    time.sleep(5)

so the regular expression I am using looks for the word alot instead of alot as part of a string. Example zealot. Whenever I run this, it will not find a comment that I have made. Any suggestions?


Solution

  • You're checking with string operations, not RE ones, in

    isMatch = any(string in comment_text for string in word_to_match)
    

    The first in here checks for a substring -- nothing to do with REs.

    Change this to

    isMatch = any(re.search(string, comment_text) for string in word_to_match)
    

    Moreover, you have an error in your initialization:

    word_to_match = ['\balot\b']
    

    '\b' is the character with code 0x08 (backspace). Always use raw string syntax for RE patterns, to avoid such traps:

    word_to_match = [r'\balot\b']
    

    Now you'll have a couple of characters, backslash then b, which RE will interpret to mean "word boundary".

    There may be other bugs but I try not to look for more than two bugs per question...:-)