Search code examples
pythonnlpprofanity

What’s a good Python profanity filter library?


Like https://stackoverflow.com/questions/1521646/best-profanity-filter, but for Python — and I’m looking for libraries I can run and control myself locally, as opposed to web services.

(And whilst it’s always great to hear your fundamental objections of principle to profanity filtering, I’m not specifically looking for them here. I know profanity filtering can’t pick up every hurtful thing being said. I know swearing, in the grand scheme of things, isn’t a particularly big issue. I know you need some human input to deal with issues of content. I’d just like to find a good library, and see what use I can make of it.)


Solution

  • I didn't found any Python profanity library, so I made one myself.

    Parameters


    filterlist

    A list of regular expressions that match a forbidden word. Please do not use \b, it will be inserted depending on inside_words.

    Example: ['bad', 'un\w+']

    ignore_case

    Default: True

    Self-explanatory.

    replacements

    Default: "$@%-?!"

    A string with characters from which the replacements strings will be randomly generated.

    Examples: "%&$?!" or "-" etc.

    complete

    Default: True

    Controls if the entire string will be replaced or if the first and last chars will be kept.

    inside_words

    Default: False

    Controls if words are searched inside other words too. Disabling this

    Module source


    (examples at the end)

    """
    Module that provides a class that filters profanities
    
    """
    
    __author__ = "leoluk"
    __version__ = '0.0.1'
    
    import random
    import re
    
    class ProfanitiesFilter(object):
        def __init__(self, filterlist, ignore_case=True, replacements="$@%-?!", 
                     complete=True, inside_words=False):
            """
            Inits the profanity filter.
    
            filterlist -- a list of regular expressions that
            matches words that are forbidden
            ignore_case -- ignore capitalization
            replacements -- string with characters to replace the forbidden word
            complete -- completely remove the word or keep the first and last char?
            inside_words -- search inside other words?
    
            """
    
            self.badwords = filterlist
            self.ignore_case = ignore_case
            self.replacements = replacements
            self.complete = complete
            self.inside_words = inside_words
    
        def _make_clean_word(self, length):
            """
            Generates a random replacement string of a given length
            using the chars in self.replacements.
    
            """
            return ''.join([random.choice(self.replacements) for i in
                      range(length)])
    
        def __replacer(self, match):
            value = match.group()
            if self.complete:
                return self._make_clean_word(len(value))
            else:
                return value[0]+self._make_clean_word(len(value)-2)+value[-1]
    
        def clean(self, text):
            """Cleans a string from profanity."""
    
            regexp_insidewords = {
                True: r'(%s)',
                False: r'\b(%s)\b',
                }
    
            regexp = (regexp_insidewords[self.inside_words] % 
                      '|'.join(self.badwords))
    
            r = re.compile(regexp, re.IGNORECASE if self.ignore_case else 0)
    
            return r.sub(self.__replacer, text)
    
    
    if __name__ == '__main__':
    
        f = ProfanitiesFilter(['bad', 'un\w+'], replacements="-")    
        example = "I am doing bad ungood badlike things."
    
        print f.clean(example)
        # Returns "I am doing --- ------ badlike things."
    
        f.inside_words = True    
        print f.clean(example)
        # Returns "I am doing --- ------ ---like things."
    
        f.complete = False    
        print f.clean(example)
        # Returns "I am doing b-d u----d b-dlike things."