Search code examples

Using os.listdir(os.path.join) with pandas series to obtain list of file in the folder from a variable


In excel I have in cell A1=numero PPAS, A2= 1973.01, A3=1975.01, A4=1975.02 I use the cells A2, A3, A4 that are name of are folder "1973.01", "1975.01", "1975.02". I use them to access to the directories F:/Comune/Breggia_test/1973.01, F:/Comune/Breggia_test/1975.01, F:/Comune/Breggia_test/1975.02. For every directory I want the list of the files.

import pandas as pd
df = pd.read_excel (r'P:/Breggia_Tresa_ZP_test.xlsx')
y=df['numero PPAS']

the result is the following:

0 1973.01

1 1975.01

2 1975.02

Name: numero PPAS, dtype: float64

Next step I transform the series into string and I remove the disturbing indexes (0, 1, 2) before the cell values.

for index, value in y.items():
    z=f" {index} : {value}"

The result is the following and it is a string (confirmed by type function not shown):




I know that os.path.join accept only string and now it should be ok because of the above for loop with items function. Now I want to obtain three list of the files in 1973.01 (first iteration), 1975.01 (second iteration), 1975.01 (third iteration).

for item in k:
    item=os.listdir(os.path.join('F:/Comune/Breggia_test', k) )

but unfortunately the result is the list of F:\Comune\Breggia_test\1975.02 repeated for seven times, the same number of the caracter of the string created with k=z[-7:]:

['apm_19761129.pdf', 'apcst_19780823.pdf', 'apada_19771213.pdf']

['apm_19761129.pdf', 'apcst_19780823.pdf', 'apada_19771213.pdf']

['apm_19761129.pdf', 'apcst_19780823.pdf', 'apada_19771213.pdf']

['apm_19761129.pdf', 'apcst_19780823.pdf', 'apada_19771213.pdf']

['apm_19761129.pdf', 'apcst_19780823.pdf', 'apada_19771213.pdf']

['apm_19761129.pdf', 'apcst_19780823.pdf', 'apada_19771213.pdf']

['apm_19761129.pdf', 'apcst_19780823.pdf', 'apada_19771213.pdf']

The wished result had to be three list that came from the followin directories:




Could someone explain what does not work?


  • I don't know if I understood what you're trying to do, but here's how you can take a Pandas Series, combine it with a base path, and list all directories inside:

    from pathlib import Path
    import pandas as pd
    # Path we'll be using as common base path.
    base_path = Path(r'/content/sample_data')
    # Our initial dataset. We'll be using `Pandas.Series`, and `pandas.DataFrame` common operation called:
    # `.asype` at the end of the next code block represents the conversion into strings.
    y = pd.Series([1973.01, 1975.01, 1975.02], name='PPAS').astype(str)

    Now, choose one of the following code, depending on what you want to retrieve.

    Option 1: Retrieve only the immediate files and directories


    base_path = Path(r'/content/sample_data')
    list_of_subdirs = y.astype(str).apply(
        lambda value: [
            str(file) for file in base_path.joinpath(value).glob('*')]

    In my case, it returns:


    Option 2: Retrieve only the immediate files

    base_path = Path(r'/content/sample_data')
    list_of_subdirs = y.astype(str).apply(
        lambda value: [str(file) for file in base_path.joinpath(value).glob('*') if file.is_file()]

    In my case, it returns:


    Option 3: Retrieve all subdirectories recursively

    base_path = Path(r'/content/sample_data')
    list_of_subdirs = y.astype(str).apply(
        lambda value: [str(file) for file in base_path.joinpath(value).glob('**/*')]

    In my case, it returns:


    Option 4: Retrieve only files from all subdirectories

    base_path = Path(r'/content/sample_data')
    list_of_subdirs = y.astype(str).apply(
        lambda value: [str(file) for file in base_path.joinpath(value).glob('**/*') if file.is_file()]

    In my case, it returns:


    For some additional context, here's a tree view of all the subdirectories:

    ├── 1973.01
    │   └──
    ├── 1975.01
    │   ├── 1975.01
    │   │   └── mnist_test.csv
    │   └── california_housing_test.csv
    ├── 1975.02
    │   └── california_housing_train.csv
    ├── anscombe.json
    └── mnist_train_small.csv