Task
In excel I have in cell A1=numero PPAS, A2= 1973.01, A3=1975.01, A4=1975.02 I use the cells A2, A3, A4 that are name of are folder "1973.01", "1975.01", "1975.02". I use them to access to the directories F:/Comune/Breggia_test/1973.01, F:/Comune/Breggia_test/1975.01, F:/Comune/Breggia_test/1975.02. For every directory I want the list of the files.
import pandas as pd
df = pd.read_excel (r'P:/Breggia_Tresa_ZP_test.xlsx')
y=df['numero PPAS']
print(y)
the result is the following:
0 1973.01
1 1975.01
2 1975.02
Name: numero PPAS, dtype: float64
Next step I transform the series into string and I remove the disturbing indexes (0, 1, 2) before the cell values.
for index, value in y.items():
z=f" {index} : {value}"
k=z[-7:]
print(k)
The result is the following and it is a string (confirmed by type function not shown):
1973.01
1975.01
1975.02
I know that os.path.join accept only string and now it should be ok because of the above for loop with items function. Now I want to obtain three list of the files in 1973.01 (first iteration), 1975.01 (second iteration), 1975.01 (third iteration).
for item in k:
item=os.listdir(os.path.join('F:/Comune/Breggia_test', k) )
print(item)
but unfortunately the result is the list of F:\Comune\Breggia_test\1975.02 repeated for seven times, the same number of the caracter of the string created with k=z[-7:]:
['apm_19761129.pdf', 'apcst_19780823.pdf', 'apada_19771213.pdf']
['apm_19761129.pdf', 'apcst_19780823.pdf', 'apada_19771213.pdf']
['apm_19761129.pdf', 'apcst_19780823.pdf', 'apada_19771213.pdf']
['apm_19761129.pdf', 'apcst_19780823.pdf', 'apada_19771213.pdf']
['apm_19761129.pdf', 'apcst_19780823.pdf', 'apada_19771213.pdf']
['apm_19761129.pdf', 'apcst_19780823.pdf', 'apada_19771213.pdf']
['apm_19761129.pdf', 'apcst_19780823.pdf', 'apada_19771213.pdf']
The wished result had to be three list that came from the followin directories:
F:\Comune\Breggia_test\1973.01
F:\Comune\Breggia_test\1975.01
F:\Comune\Breggia_test\1975.02
Could someone explain what does not work?
I don't know if I understood what you're trying to do, but here's how you can take a Pandas Series
, combine it with a base path, and list all directories inside:
from pathlib import Path
import pandas as pd
# Path we'll be using as common base path.
base_path = Path(r'/content/sample_data')
# Our initial dataset. We'll be using `Pandas.Series`, and `pandas.DataFrame` common operation called:
# `.asype` at the end of the next code block represents the conversion into strings.
y = pd.Series([1973.01, 1975.01, 1975.02], name='PPAS').astype(str)
Now, choose one of the following code, depending on what you want to retrieve.
Code:
base_path = Path(r'/content/sample_data')
list_of_subdirs = y.astype(str).apply(
lambda value: [
str(file) for file in base_path.joinpath(value).glob('*')]
).to_list()
In my case, it returns:
[['/content/sample_data/1973.01/README.md'],
['/content/sample_data/1975.01/california_housing_test.csv',
'/content/sample_data/1975.01/1975.01',
'/content/sample_data/1975.01/.ipynb_checkpoints'],
['/content/sample_data/1975.02/california_housing_train.csv']]
base_path = Path(r'/content/sample_data')
list_of_subdirs = y.astype(str).apply(
lambda value: [str(file) for file in base_path.joinpath(value).glob('*') if file.is_file()]
).to_list()
list_of_subdirs
In my case, it returns:
[['/content/sample_data/1973.01/README.md'],
['/content/sample_data/1975.01/california_housing_test.csv'],
['/content/sample_data/1975.02/california_housing_train.csv']]
base_path = Path(r'/content/sample_data')
list_of_subdirs = y.astype(str).apply(
lambda value: [str(file) for file in base_path.joinpath(value).glob('**/*')]
).to_list()
list_of_subdirs
In my case, it returns:
[['/content/sample_data/1973.01/README.md'],
['/content/sample_data/1975.01/california_housing_test.csv',
'/content/sample_data/1975.01/1975.01',
'/content/sample_data/1975.01/.ipynb_checkpoints',
'/content/sample_data/1975.01/1975.01/mnist_test.csv'],
['/content/sample_data/1975.02/california_housing_train.csv']]
base_path = Path(r'/content/sample_data')
list_of_subdirs = y.astype(str).apply(
lambda value: [str(file) for file in base_path.joinpath(value).glob('**/*') if file.is_file()]
).to_list()
list_of_subdirs
In my case, it returns:
[
['/content/sample_data/1973.01/README.md'],
[
'/content/sample_data/1975.01/california_housing_test.csv',
'/content/sample_data/1975.01/1975.01/mnist_test.csv'
],
['/content/sample_data/1975.02/california_housing_train.csv']
]
For some additional context, here's a tree view of all the subdirectories:
sample_data
├── 1973.01
│ └── README.md
├── 1975.01
│ ├── 1975.01
│ │ └── mnist_test.csv
│ └── california_housing_test.csv
├── 1975.02
│ └── california_housing_train.csv
├── anscombe.json
└── mnist_train_small.csv