Search code examples
pythontextnlpdata-sciencetextmatching

Check if a text string contains text or similar text


I have an interesting problem:

I have a fairly large paragraph of text, and I want to check if the paragraph contains certain phrases. Now, direct matching is not allowed as I want to know whether the paragraph contains the phrases OR similar phrases, e.g. if I have a privacy policy document, and I want to check if the document mentions anything about "tracking cookies", how will I go about this?

I am doing it in Python.


Solution

  • You could build a regular expression that captures multiple variants of the string "tracking cookies". For example, a regex that captures:

    tracking cookies
    cookie trackers
    Cookies
    cookie
    tracker cookie
    Tracking Cookies
    .
    .
    .
    etc.
    

    Then, every time you encounter a new variant of your string, you can add it to the regular expression.