Search code examples
pythonpython-re

Search for matching content with several AND and OR conditions


I'm having a list of content="[...]" variables (str). That variable must match at least one of each value in these lists (case insensitive). Do you have advice on how to achieve that best?

react_terms = ["reactjs", "react.js", "react"] (OR condition)

AND

python_terms = ["python", "django"] (OR condition)

AND

cities_countries = ["london", "UK"] (OR condition)

What I'm trying (not working)

for content_str in content:
    if content_str in any(react_terms) and any(python_terms) and any(cities_countries):
        print(content_str, "match!")

Example with data

content = [
    "Lorem Ipsum reactjs, python in London",
    "Lorem Ipsum reactjs, python in United States",
    "Lorem Ipsum Vue, python in London, UK",
]

Result

  • content[0] matches

content[1] & content[2] do NOT match because:

  • content[1] didn't match as it didn't include any cities_countries terms
  • content[2] didn't match as it didn't include any react_terms

Solution

  • Initial response

    If you want content_str to match exactly any of the items in the three lists, you could use:

    if content_str.lower() in (react_terms + python_terms +cities_countries):
      # Do stuff
    

    The any function will not work the way you used it. It will return a boolean value. Specifically, True if any of the items in the argument evaluates to a truthy expression (which, in turn, non empty strs are). So, the code you have written would be similar to:

    if content_str in True and content_str in True and content_str in True:
      #...
    

    One last comment: if you do not plan on changing the items in the lists dinamically, it will be more efficient to just construct the "all items" list once:

    ITEMS_TO_MATCH = react_terms + python_terms +cities_countries
    if content_str.lower() in ITEMS_TO_MATCH:
      # Do stuff
    

    Note: I have ignored the and operators you tried to use as, with the data you have provided, there is no items that is on the three lists. If you actually plan to have items on both three lists, and you want to do stuff if content_str is in all theee lists, just recalculate ITEMS_TO_MATCH as such:

    ITEMS_TO_MATCH = [item for item in react_terms if item in python_terms and item in cities_countries]
    

    Edit

    Now that you have provided some sample data I can more clearly understand what you are trying to do. Here is a scripts that meets you requirements:

    from typing import Iterable
    
    CONTENT = [
        "Lorem Ipsum reactjs, python in London",
        "Lorem Ipsum reactjs, python in United States",
        "Lorem Ipsum Vue, python in London, UK",
    ]
    
    CITIES_COUNTRIES = ("london", "UK")
    PYTHON_TERMS = ("python", "django")
    REACT_TERMS = ("reactjs", "react.js", "react")
    MATCHES = (CITIES_COUNTRIES, PYTHON_TERMS, REACT_TERMS)
    
    
    def word_in_match(word: str, match: Iterable[str]) -> bool:
        for word_to_match in match:
            if word_to_match in word.lower():
                return True
        return False
    
    
    def contains_items_from_all(str_to_match: str, matches: Iterable[Iterable[str]]) -> bool:
        results = [False for _ in matches]
        for word in str_to_match.split():
            for i, match in enumerate(matches):
                if not results[i]:
                    results[i] = word_in_match(word, match)
        return all(results)
    
    
    for str_to_match in CONTENT:
        print(contains_items_from_all(str_to_match, MATCHES))
    

    A more efficient approach

    def contains_item(str_to_match: str, match: Iterable[str]) -> bool:
        for word_in_match in match:
            if word_in_match in str_to_match:
                return True
        return False
    
    
    def contains_items_from_all(str_to_match: str, matches: Iterable[Iterable[str]]) -> bool:
        str_to_match = str_to_match.lower()
        results = [False for _ in matches]
        for i, match in enumerate(matches):
            if contains_item(str_to_match, match):
                results[i] = True
            else:
                return False
        return all(results)