Search code examples
pythonstringparsingmatchingany

How to check which words from a list are contained in a string?


I was looking to collect each word from a list that is included in a string in python. I found some solutions but so far i get:

data = "Today I gave my dog some carrots to eat in the car"
tweet = data.lower()                             #convert to lower case
split = tweet.split()

matchers = ['dog','car','sushi']
matching = [s for s in split if any(xs in s for xs in matchers)]
print(matching)

The result is

['dog', 'carrots', 'car']

How do I fix that the result is only dog and car without adding spaces to my matchers?

Also how would I remove any $ signs (as example) from the data string but no other special characters like @?


Solution

  • How do I fix that the result is only dog and car without adding spaces to my matchers?
    

    To do this with your current code, replace this line:

    matching = [s for s in split if any(xs in s for xs in matchers)]
    

    With this:

    matching = []
    # iterate over all matcher words
    for word in matchers:
        if word in split:  # check if word is in the split up words
            matching.append(word)  # add word to list
    

    You also mention this:

    Also how would I remove any $ signs (as example) from the data string but no other special characters like @?
    

    To do this, I would create a list that contains characters you want to remove, like so:

    things_to_remove = ['$', '*', '#']  # this can be anything you want to take out
    

    Then, simply strip each character from the tweet string before you split it.

    for remove_me in things_to_remove:
        tweet = tweet.replace(remove_me, "")
    

    So a final code block that demonstrates all of these topics:

    data = "Today I@@ gave my dog## some carrots to eat in the$ car"
    tweet = data.lower()                             #convert to lower case
    
    things_to_remove = ['$', '*', '#']
    
    for remove_me in things_to_remove:
        tweet = tweet.replace(remove_me, "")
    print("After removeing characters I don't want:")
    print(tweet)
    
    split = tweet.split()
    
    matchers = ['dog','car','sushi']
    
    matching = []
    # iterate over all matcher words
    for word in matchers:
        if word in split:  # check if word is in the split up words
            matching.append(word)  # add word to list
    print(matching)