I have am having trouble with a for loop. I dont know what happened, this used to work, but now it tells me "No objects to concatenate".
I want to accomplish two things. I want to loop through all excel files in a folder. For each excel file I want to remove 2 lines of header data (which the code below already does).
Then I want to save each edited file with its original file name in a new folder. But also, save a new file where all the data from each edited file is appended together.
I thought I had the append correct but for some reason it is not working anymore.
import os
import pandas as pd
import numpy as np
from pandas import Series, DataFrame
appended_data = []
path = 'C:\Test\TestRawFile'
for fn in os.listdir(path):
if os.path.isfile(fn):
# Import the excel file and call it xlsx_file
xlsx_file = pd.ExcelFile(fn)
# View the excel files sheet names
xlsx_file.sheet_names
# Load the xlsx files Data sheet as a dataframe
df = xlsx_file.parse('Sheet1',header= None)
df_NoHeader = df[2:]
data = df_NoHeader
appended_data.append(data)
appended_data = pd.concat(appended_data)
Here is the error I currently get.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-46-962ccf280c0b> in <module>()
11 data = df_NoHeader
12 appended_data.append(data)
---> 13 appended_data = pd.concat(appended_data)
C:\Anaconda2\lib\site-packages\pandas\tools\merge.pyc in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy)
832 keys=keys, levels=levels, names=names,
833 verify_integrity=verify_integrity,
--> 834 copy=copy)
835 return op.get_result()
836
C:\Anaconda2\lib\site-packages\pandas\tools\merge.pyc in __init__(self, objs, axis, join, join_axes, keys, levels, names, ignore_index, verify_integrity, copy)
865
866 if len(objs) == 0:
--> 867 raise ValueError('No objects to concatenate')
868
869 if keys is None:
ValueError: No objects to concatenate
Most likely the reason script worked before and not now is you moved its location from the folder of Excel files as this code uses relative paths. Try using absolute paths where you concatenate the folder path name to file name. Consider using os.path.join()
which even helps in saving to new folder:
dfList = []
path = 'C:\\Test\\TestRawFile'
newpath = 'C:\\Path\\To\\New\\Folder'
for fn in os.listdir(path):
# Absolute file path
file = os.path.join(path, fn)
if os.path.isfile(file):
# Import the excel file and call it xlsx_file
xlsx_file = pd.ExcelFile(file)
# View the excel files sheet names
xlsx_file.sheet_names
# Load the xlsx files Data sheet as a dataframe
df = xlsx_file.parse('Sheet1',header= None)
df_NoHeader = df[2:]
data = df_NoHeader
# Save individual dataframe
data.to_excel(os.path.join(newpath, fn))
dfList.append(data)
appended_data = pd.concat(dfList)
appended_data.to_excel(os.path.join(newpath, 'master_data.xlsx'))