Search code examples
pythonpandasstringdataframefilter

Pandas filter string data based on its string length using DataFrame.query


The question is very similar to this question Python: Pandas filter string data based on its string length, but I want to use pandas.DataFrame.query. Let's say we have a pandas.DataFrame. I like to filter out the rows where the string length of the column A is not equal to 3 using pandas.DataFrame.query

import pandas as pd
import numpy as np
df = pd.DataFrame({'A' : ['hi', 'hello', 'day', np.nan], 'B' : [1, 2, 3, 4]})  
df.query('A.str.len() != 3')

However, I got the following error

TypeError: unhashable type: 'numpy.ndarray'

Solution

  • Replacing 3 with "3" works. I'm using pandas 0.23.1.

    df.query('A.str.len() != "3"')
    

    Output:

           A  B
    0     hi  1
    1  hello  2
    3    NaN  4
    

    Alternatively, if you want to remove np.nan as 3-character string (NaN):

    df.query('A.astype("str").str.len() != "3"')
    

    Output:

           A  B
    0     hi  1
    1  hello  2
    

    Hope this helps.