Search code examples
pythonpandasdataframepathlib

pandas dataframe with pathlib Path filtering


I am trying to filter a dataframe like the following by its path and the paths items:


from pathlib import Path
import pandas as pd

lst = [('100', Path('/root/sub1/nameA.txt'), 'some_type'),
       ('101', Path('/root/sub1/nameB.txt'), 'some_type'),
       ('102', Path('/root/sub2/nameC.txt'), 'other_type')]

df = pd.DataFrame(lst, columns = ['id', 'path', 'category'])
print(df)

At the moment I am looking for all elements, whose parent is sub1 (i.e all files in dir sub1). Generally I want to be able to filter the df by certain properties of its path.

I know of the Path.parent property and have been using it for a while. I am also aware of filter options like df['path'].str.contains() which does not work with a path object in the df entry.

Any advice? Thanks for your help!

answering Manakins questions in the comment - Example Output

df[df['path'].apply(Path.parent == '/root/sub1')] # does of course not work!

# desired output
df
      id     path                    category
0     100    '/root/sub1/nameA.txt'  'some_type'
1     101    '/root/sub1/nameB.txt'  'some_type'

Solution

  • if you want to use pathlib objects then you'll have to use apply

    df[df['path'].apply(lambda x : x.parent == Path('/root/sub1'))]
    
    
        id                  path   category
    0  100  \root\sub1\nameA.txt  some_type
    1  101  \root\sub1\nameB.txt  some_type
    

    you could convert the object to a string, but you'd get an absolute path which may not be correct.