I have about (100 files +) XLS files in a folder with different columns names and data types
File_1.xls:
Id test category
1 ab 4
2 cs 3
3 cs 1
FILE_2.xls:
index remove stocks category
1 dr 4 a
2 as 3 b
3 ae 1 v
File 3: ....
File 100.....
Thats my code:
import pandas as pd
import pathlib
data = []
for filename in pathlib.Path.cwd().iterdir():
if filename.suffix.lower().startswith('.xls'):
data.append(pd.read_excel(filename).melt())
df = pd.concat(data, ignore_index=True)
I would like to have the Dataframe in my output like this:
filename variable value
0 File_1 Id 1
1 File_1 Id 2
2 File_1 Id 3
3 File_1 test ab
4 File_1 test cs
5 File_1 test cs
6 File_1 category 4
7 File_1 category 3
8 File_1 category 1
9 File_1 index 1
10 File_1 index 2
11 File_1 index 3
12 FILE_2 remove dr
13 FILE_2 remove as
14 FILE_2 remove ae
15 FILE_2 stocks 4
16 FILE_2 stocks 3
17 FILE_2 stocks 1
18 FILE_2 category a
19 FILE_2 category b
20 FILE_2 category v
1000 FILE_100 .... ..
How I can melt all mu columns and keep in the column "filename" the name of the source file?
Don't read your data in the loop but rather collect the filenames, then use a dictionary comprehension to add the filenames as concatenation keys:
import pandas as pd
import pathlib
files = []
for filename in pathlib.Path.cwd().iterdir():
if filename.suffix.lower().startswith('.xls'):
files.append(filename)
df = (pd.concat({f: pd.read_excel(f).melt() for f in files})
.rename_axis(['filename', None]).reset_index(0)
.reset_index(drop=True)
)
output:
filename variable value
0 File_1 Id 1
1 File_1 Id 2
2 File_1 Id 3
3 File_1 test ab
4 File_1 test cs
5 File_1 test cs
6 File_1 category 4
7 File_1 category 3
8 File_1 category 1
9 File_2 index 1
10 File_2 index 2
11 File_2 index 3
12 File_2 remove dr
13 File_2 remove as
14 File_2 remove ae
15 File_2 stocks 4
16 File_2 stocks 3
17 File_2 stocks 1
18 File_2 category a
19 File_2 category b
20 File_2 category v