im new using python and for a work purpose im here to ask your help.
I have on a same folder 12 excel files month by month that contains columns like: Product_Name, Quantity and Total_Value
So, what i would like to do but I have no idea how to do are:
For example:
January-21.xls:
Product_Name (type:string) | Quantity (type:float) | Total_Value (type:float) | Month (type:Date) |
---|---|---|---|
Product A | 10 | 250 | "File Name" (January-21) |
Product B | 20 | 500 | "File Name" (January-21) |
Product C | 15 | 400 | "File Name" (January-21) |
February-21.xls:
Product_Name (type:string) | Quantity (type:float) | Total_Value (type:float) | Month (type:Date) |
---|---|---|---|
Product A | 40 | 800 | "File Name" (February-21) |
Product B | 25 | 700 | "File Name" (February-21) |
Product C | 30 | 500 | "File Name" (February-21) |
After merge:
Product_Name (type:string) | Quantity (type:float) | Total_Value (type:float) | Month (type:Date) |
---|---|---|---|
Product A | 10 | 250 | "File Name" (January-21) |
Product B | 20 | 500 | "File Name" (January-21) |
Product C | 15 | 400 | "File Name" (January-21) |
Product A | 40 | 800 | "File Name" (February-21) |
Product B | 25 | 700 | "File Name" (February-21) |
Product C | 30 | 500 | "File Name" (February-21) |
Is it possible? Sorry for my bad English, I'm not a native speaker.
I really appreciate your help!
Edit.1
This is how i merge, create a csv file and convert to a dataframe using pandas:
import pandas as pd
import os
path = "/content/drive/MyDrive/Colab_Notebooks/sq_datas"
files = [file for file in os.listdir(path) if not file.startswith('.')] # Ignore hidden files
all_months_data = pd.DataFrame()
for file in files:
current_data = pd.read_excel(path+"/"+file)
all_months_data = pd.concat([all_months_data, current_data])
all_months_data.to_csv("/content/drive/MyDrive/Colab_Notebooks/sq_datas/all_months.csv", index=False)
So, the main problem to me is create a loop for to add the month column before merge all those files in one.
At a basic level, you first need to read your Excel files, such as with pandas.read_excel:
import pandas as pd
jan21_df = pd.read_excel('January-21.xls')
feb21_df = pd.read_excel('February-21.xls')
You wrote type:Date for the Month column. To add a date column to each dataframe:
jan21_df['Month'] = pd.to_datetime('2021-01-01')
feb21_df['Month'] = pd.to_datetime('2021-02-01')
But if you wanted the file name as string:
jan21_df['Month'] = "File Name (January-21)"
feb21_df['Month'] = "File Name (February-21)"
Then to combine the two dataframes:
combined = pd.concat([jan21_df, feb21_df])
This is a proof of concept. There are ways to automate this further based on the requirements.
EDIT: based on the edit in the OP, minor addition to the loop:
for file in files:
current_data = pd.read_excel(path+"/"+file)
current_data['Month'] = file
all_months_data = pd.concat([all_months_data, current_data])