I am working on a remote file system, where I don't have direct access to the files/directories, so I cannot check if a string represents a file or a directory.
I have the following paths I need to handle, where I have to get a hold of the "partition column":
path1 = "/path/to/2012-01-01/files/2014-01-31/la.parquet"
path2 = "/path/to/2012-01-01/files/2014-01-31/"
path3 = "/path/to/2012-01-01/files/2014-01-31"
In all cases, the deepest path (partition column) is "2014-01-31". Is there a way consistently to get this path in a single line of code, or do I have to do all sorts of checks of file names?
I was hoping to do something like:
import os
os.path.dirname(path).split("/")[-1]
But this doesn't work for path3. Does one need to have access to the filesystem to correctly identify the deepest directory, or is there some easy way?
I expected there to be a function like "basename" to resolve this, but I guess as @Jatinder says, there is no way to differentiate between "directory name" and "filename without extension" without having access to the filesystem.
I ended up just using regex for my specific use-case, and I guess if example 3 was left out there would be a general solution checking for strings ending with "/".
Here my solution:
import re
# Examples
path1 = "/path/to/2012-01-01/files/2014-01-31/la.parquet"
path2 = "/path/to/2012-01-01/files/2014-01-31/"
path3 = "/path/to/2012-01-01/files/2014-01-31"
# Find match
for path in [path1, path2, path3]:
print(re.search(r'.*(\d{4}-\d{2}-\d{2})', path).group(1))