I have a folder that has some files as .csv and some as .xls. I want to get all the data from them into one single dataframe. The good part is that all the files have the same data fields.
I am using the following code:
import os
import glob
import pandas as pd
os.chdir("/content/")
extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
new_extend = 'xls'
all_files = [i for i in glob.glob('*.{}'.format(extension))]
all_filenames.append(all_files)
#combine all files in the list
combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames ])
#export to csv
combined_csv.to_csv( "combined_csv.csv", index=False, encoding='utf-8-sig')
But I get the following error:
ValueError Traceback (most recent call last)
<ipython-input-7-747e8d68cec8> in <module>()
1 #combine all files in the list
----> 2 combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames ])
3 #export to csv
4 combined_csv.to_csv( "combined_csv.csv", index=False, encoding='utf-8-sig')
3 frames
/usr/local/lib/python3.7/dist-packages/pandas/io/common.py in get_filepath_or_buffer(filepath_or_buffer, encoding, compression, mode, storage_options)
241 if not is_file_like(filepath_or_buffer):
242 msg = f"Invalid file path or buffer object type: {type(filepath_or_buffer)}"
--> 243 raise ValueError(msg)
244
245 return filepath_or_buffer, None, compression, False
ValueError: Invalid file path or buffer object type: <class 'list'>
Can someone please tell me what I am doing wrong?
I could not test your method because I had no access to your files, but I've tested it with some sample files you can download here in case you need.
Try to run the code below, changing only "mypath" to your folder where files are located. In the same folder the script will generate the final dataframe ("1.Final_DF.xlsx"):
mypath = r"F:\yourFolder"
import pandas as pd
import glob, os
os.chdir(mypath)
first_df = 0
ext_csv = ".csv"
ext_xls = ".xls"
target_files = []
for file in glob.glob("*xls"):
target_files.append(file)
for file in glob.glob("*csv"):
target_files.append(file)
for file in target_files:
file_path = mypath + "\\" + file
if ext_xls in file:
df_raw = pd.read_excel(file_path)
else:
df_raw = pd.read_csv(file_path)
first_df += 1
if first_df == 1:
df = df_raw.copy()
else:
df = df.append(df_raw, ignore_index = True)
df.to_excel (r'1.Final_DF.xlsx', index = False)
Let me know ;-)