I am trying to filter a dataframe like the following by its path and the paths items:
from pathlib import Path
import pandas as pd
lst = [('100', Path('/root/sub1/nameA.txt'), 'some_type'),
('101', Path('/root/sub1/nameB.txt'), 'some_type'),
('102', Path('/root/sub2/nameC.txt'), 'other_type')]
df = pd.DataFrame(lst, columns = ['id', 'path', 'category'])
print(df)
At the moment I am looking for all elements, whose parent is sub1 (i.e all files in dir sub1). Generally I want to be able to filter the df by certain properties of its path.
I know of the Path.parent
property and have been using it for a while. I am also aware of filter options like df['path'].str.contains()
which does not work with a path object in the df entry.
Any advice? Thanks for your help!
answering Manakins questions in the comment - Example Output
df[df['path'].apply(Path.parent == '/root/sub1')] # does of course not work!
# desired output
df
id path category
0 100 '/root/sub1/nameA.txt' 'some_type'
1 101 '/root/sub1/nameB.txt' 'some_type'
if you want to use pathlib objects then you'll have to use apply
df[df['path'].apply(lambda x : x.parent == Path('/root/sub1'))]
id path category
0 100 \root\sub1\nameA.txt some_type
1 101 \root\sub1\nameB.txt some_type
you could convert the object to a string, but you'd get an absolute path which may not be correct.