I have 2 CSV files (1) u.Data and (2) prediction_matrix which I need to read and write into a Single Dataframe, once done it is processed for Clustering based on int / float values it will contain
I'm done combining the 2 CSVs into 1 Dataframe named AllData.csv, but the type of columns holding value have a different type now (object), as shown below (with a warning)
sys:1: DtypeWarning: Columns (0,1,2) have mixed types. Specify dtype option on import or set low_memory=False.
UDATA -------------
uid int64
iid int64
rat int64
dtype: object
PRED_MATRIX -------
uid int64
iid int64
rat float64
dtype: object
AllDATA -----------
uid object
iid object
rat object
dtype: object
P.S. I know how to use low_memory=False
and that just supresses the warning.
with open('AllData.csv', 'w') as handle:
udata_df.to_csv(handle, index=False)
pred_matrix.to_csv(handle, index=False)
Since, I need to write 2 CSVs into Single DF handle object is used and probably that turns all the values into its type. Can anything preserve the data type applying the same logic?
Unhelpful References taken so far:
There is problem your header in second DataFrame
is written too, so need parametr header=False
:
with open('AllData.csv', 'w') as handle:
udata_df.to_csv(handle, index=False)
pred_matrix.to_csv(handle, index=False, header=False)
Another solution is mode=a
for append second DataFrame
:
f = 'AllData.csv'
udata_df.to_csv(f, index=False)
pred_matrix.to_csv(f,header=False, index=False, mode='a')
Or use concat
:
f = 'AllData.csv'
pd.concat([udata_df, pred_matrix]).to_csv(f, index=False)
Sample:
udata_df = pd.DataFrame({'uid':[1,2],
'iid':[8,9],
'rat':[0,3]})
pred_matrix = udata_df * 10
Third row is header
:
with open('AllData.csv', 'w') as handle:
udata_df.to_csv(handle, index=False)
pred_matrix.to_csv(handle, index=False)
f = 'AllData.csv'
df = pd.read_csv(f)
print (df)
iid rat uid
0 8 0 1
1 9 3 2
2 iid rat uid
3 80 0 10
4 90 30 20
After parameter header=False
it working correctly:
with open('AllData.csv', 'w') as handle:
udata_df.to_csv(handle, index=False)
pred_matrix.to_csv(handle, index=False, header=False)
f = 'AllData.csv'
df = pd.read_csv(f)
print (df)
iid rat uid
0 8 0 1
1 9 3 2
2 80 0 10
3 90 30 20
mode append
solution:
f = 'AllData.csv'
udata_df.to_csv(f, index=False)
pred_matrix.to_csv(f,header=False, index=False, mode='a')
df = pd.read_csv(f)
print (df)
iid rat uid
0 8 0 1
1 9 3 2
2 80 0 10
3 90 30 20
concat
solution:
f = 'AllData.csv'
pd.concat([udata_df, pred_matrix]).to_csv(f, index=False)
df = pd.read_csv(f)
print (df)
iid rat uid
0 8 0 1
1 9 3 2
2 80 0 10
3 90 30 20