I am working on a remote file system, where I don't have direct access to the files/directories, so I cannot check if a string represents a file or a directory.
I have the following paths I need to handle, where I have to get a hold of the "partition column":
path1 = "/path/to/2012-01-01/files/2014-01-31/la.parquet"
path2 = "/path/to/2012-01-01/files/2014-01-31/"
path3 = "/path/to/2012-01-01/files/2014-01-31"
In all cases, the deepest path (partition column) is "2014-01-31". Is there a way consistently to get this path in a single line of code, or do I have to do all sorts of checks of file names?
I was hoping to do something like:
import os
os.path.dirname(path).split("/")[-1]
But this doesn't work for path3. Does one need to have access to the filesystem to correctly identify the deepest directory, or is there some easy way?
Technically la.parquet
is a valid directory name, so there's no way to tell just from the string, you'll need to introduce some manual logic. E.x. check for '.' in the name.
>>> import pathlib
>>> p = pathlib.Path(path1)
>>> p.parent.name if '.' in p.name else p.name
'2014-01-31'
>>> p = pathlib.Path(path2)
>>> p.parent.name if '.' in p.name else p.name
'2014-01-31'
>>> p = pathlib.Path(path3)
>>> p.parent.name if '.' in p.name else p.name
'2014-01-31'
You can be more precise (e.x. check '.parquet' in p.name
) if needed.