Search code examples
pythonpandastuplescharacter

Remove tuple based on character count


I have a dataset consisting of tuple of words. I want to remove words that contain less than 4 characters, but I could not figure out a way to iterate my codes.

Here is a sample of my data:

                   content                clean4Char
0         [yes, no, never]                   [never]
1    [to, every, contacts]         [every, contacts]
2 [words, tried, describe]  [words, tried, describe]
3          [word, you, go]                    [word]

Here is the code that I'm working with (it keeps showing me error warning).

def remove_single_char(text):
    text = [word for word in text]
    return re.sub(r"\b\w{1,3}\b"," ", word)

df['clean4Char'] = df['content'].apply(lambda x: remove_single_char(x))
df.head(3)

Solution

  • the problem is with your remove_single_char function. This will do the job:

    Also there is no need to use lambda since you already are passing a function to applay

    def remove(input):
        return list(filter(lambda x: len(x) > 4, input))
    
    df['clean4Char'] = df['content'].apply(remove)
    df.head(3)