Search code examples
pythonpandascontains

What is the fastest way to check whether a cell contains letters?


I have a dataset with 2.6 million rows in which I have one column called msgText, which contains written messages.

Now, I want to filter out all messages that don't contain any letters. To do so I found the following code:

dataset = dataset[dataset['msgText'].astype(str).str.contains('[A-Za-z]')]

However, after 16 hours the code is still running.

Furthermore, based on Does Python have a string 'contains' substring method? I thought about creating a list of length 26, that contains all the letters in the alphabet and then check whether each cell contains that letter. But that does not seem efficient either.

Therefore, I am wondering if there is a faster way to find whether a cell contains letters.


EDIT: The code above works pretty well. Apparently, what I had in my (slow) code was: dataset['msgText'] = dataset[dataset['msgText'].astype(str).str.contains('[A-Za-z]')]


Solution

  • import pandas
    
    dataset['columnName'].apply(lambda x: x.find('\\w') > 0)