Search code examples
python-3.xos.path

os.path.basename(file) vs file.split("/")[-1]


I need to extract seq_00034 from a file path like

    file = "/home/user/workspace/data/seq_00034.pkl"

I know 2 ways to achieve it:

method.A

    import os
    seq_name = os.path.basename(file).split(".")[0]

or

method.B

    seq_name = file.split("/")[-1].split(".")[0]

Which is safer/faster?

(taking the cost of import os into account)

Is there a more elegent way to extract seq_name from given path?


Solution

  • It turns out split twice(i.e. Method B) is faster than os.path + split.

    They are both significantly faster than using pathlib

    speed test:

    import os
    import pathlib
    import time
    
    given_path = "/home/home/user/workspace/data/task_2022_02_xx_xx_xx_xx.pkl"
    
    time1 = time.time()
    for _ in range(10000):
        seq_name = given_path.split("/")[-1].split(".")[0]
    print(time.time()-time1, 'time of split')
    
    
    time2 = time.time()
    for _ in range(10000):
        seq_name = pathlib.Path(given_path).stem
    print(time.time()-time2, 'time of pathlib')
    
    
    time3 = time.time()
    for _ in range(10000):
        seq_name = os.path.basename(given_path).split(".")[0]
    print(time.time()-time3, 'time of os.path')
    

    result (on my PC) is:

    0.00339508056640625 time of split
    0.0355381965637207 time of pathlib
    0.005405426025390625 time of os.path
    

    if we take the time consumed for importing into account, split twice (i.e. Method B) is still the fastest

    (assume the code is only called once)

    time1 = time.time()
    seq_name = given_path.split("/")[-1].split(".")[0]
    print(time.time()-time1, 'time of split')
    
    time2 = time.time()
    import pathlib
    seq_name = pathlib.Path(given_path).stem
    print(time.time()-time2, 'time of pathlib')
    
    time3 = time.time()
    import os
    seq_name = os.path.basename(given_path).split(".")[0]
    print(time.time()-time3, 'time of os.path')
    
    

    speed test result:

    0.000001430511474609375 time of split
    0.003416776657104492 time of pathlib
    0.0000030994415283203125 time of os.path