I'm constructing a pandas Dataframe for some ML. The X Dataframe has a date index composed of all existing dates from my various data files:
all_index=set()
for table in data:
for date in table.index.values:
all_index.add(date)
Then I construct my datavariable where I want to consolidate every data I have:
temp2= np.empty((len(all_index),1,))
temp2[:]=np.nan
X=pd.DataFrame(temp2, all_index)
And, of course, now I want to fill it with the data (data is 1 DF, later on, it will be a list of DF):
for i in X.index.values:
for j in data[0].index.values:
if(j==i):
X.at[i, 0] = data['Column Name'][i]
The error is :
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-73-73562c8b1e98> in <module>
8 #X[i]=data[0]['BCH-USD'][i]
9 elem = data[0]['BCH-USD'][str(i)]
---> 10 X.at[i, 0] = elem
11 #print(X[0][i])
12 print(data[0]['BCH-USD'][i])
~\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\indexing.py in __setitem__(self, key, value)
2157 key = list(self._convert_key(key, is_setter=True))
2158 key.append(value)
-> 2159 self.obj._set_value(*key, takeable=self._takeable)
2160
2161
~\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\frame.py in _set_value(self, index, col, value, takeable)
2580 series = self._get_item_cache(col)
2581 engine = self.index._engine
-> 2582 engine.set_value(series._values, index, value)
2583 return self
2584 except (KeyError, TypeError):
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.set_value()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.set_value()
pandas/_libs/src\util.pxd in util.set_value_at()
pandas/_libs/src\util.pxd in util.set_value_at_unsafe()
ValueError: setting an array element with a sequence.
What I tried :
This error is weird since set_value is Deprecated. And the doc page says to use .at. And .at uses set_value...
I also tried to see the type of the variables type(data['Column Name'][i]) -> it's float64
I also tried to convert with pd.is_numeric. Same error
I try to print out data['Column Name'][i] in the loop, no error. If I try to print out X, also no error.
If I try without loop : X.at['2018-11-24', 0] = data['Column Name'][0] It works...
I expect to get:
A DataFrame with as index all dates in my multiple csv files, as column the values (if available) from my csv files. If not available, just nan.
I finally managed to solve my problem by using join(), which I didn't manage before.
First I build a temp DataFrame that will have all Dates possible :
all_dates=set()
for table in data:
for ind in table.index.values:
all_dates.add(table['Date'][ind])
dates_list=list(all_dates)
Data={'Date': dates_list}
temp=pd.DataFrame(Data)
temp.sort_values(by=['Date'], inplace=True, ascending=True)
temp=temp.reset_index(drop=True)
Then, I joined that temp Dataframe to my list of Dataframe extracted from CSVs (on top since it has most indexes):
data.insert(0,temp)
dfs = [df.set_index('Date') for df in data]
df_final=dfs[0].join(dfs[1:])
So, df_final
has the Dates as an index (sorted) and the columns are the columns of the extracted Dataframes only.
The advantage of this method is that when one source of data is not complete, df_final will have "nan" there instead of a row less and losing all values from the other sources for that date.