I have 3 folders of excel data and was asked to create a Machine Learning model using that data. But the problem is that the data does not have headers.
How to import all those folders of data in Python.
Python won't tell you the name of the columns. What python can do is help you import and/or concatenate easily all of the excels.
In order to import them massively:
import os
import pandas as pd
# List files in an specific folder
os.listdir(source_directory)
# Set source and destination directories
source_directory = "xx"
# Open all files and assign them to a variable whose name will be df + name of file
for file in os.listdir(source_directory):
file_name = file.split(".")[0]
name = "df_" + file_name
vars()[name] = pd.read_excel(f"{source_directory}/{file}")
You could as well use another loop to read data in every directory you need
In case you need to concatenate all of the excels, suppossing them have the same structure, you could use pandas append, then it would be something like this:
df = pd.DataFrame()
for file in os.listdir(source_directory):
file_name = file.split(".")[0]
df.append(pd.read_excel(f"{source_directory}/{file}"))
Regarding how to add a header row on the files, here is an answer