I tried to merge two dataframes using panda but this is the error code that I get:
ValueError: You are trying to merge on datetime64[ns] and datetime64[ns, UTC] columns. If you wish to proceed you should use pd.concat
I have tried different solutions found online but nothing works!! The code has been provided to me and it seems to work on other PCs but not on my computer.
This is my code:
import sys
import os
from datetime import datetime
import numpy as np
import pandas as pd
# --------------------------------------------------------------------
# -- price, consumption and production --
# --------------------------------------------------------------------
fn = '../data/np_data.csv'
if os.path.isfile(fn):
df_data = pd.read_csv(fn,header=[0],parse_dates=[0])
else:
sys.exit('Could not open data file {}̈́'.format(fn))
# --------------------------------------------------------------------
# -- temp. data --
# --------------------------------------------------------------------
fn = '../data/temp.csv'
if os.path.isfile(fn):
dtemp = pd.read_csv(fn,header=[0],parse_dates=[0])
else:
sys.exit('Could not open data file {}̈́'.format(fn))
# --------------------------------------------------------------------
# -- price data --
# -- first date: 2014-01-13 --
# -- last date: 2020-02-01 --
# --------------------------------------------------------------------
fn = '../data/eprice.csv'
if os.path.isfile(fn):
eprice = pd.read_csv(fn,header=[0])
else:
sys.exit('Could not open data file {}̈́'.format(fn))
# --------------------------------------------------------------------
# -- combine dataframes (and save as CSV file) --
# --------------------------------------------------------------------
#
df= df_data.merge(dtemp, on='time',how='left') ## This is where I get the error.
print(df.info())
print(eprice.info())
#
# add eprice
df = df.merge(eprice, on='date', how='left')
#
# eprice available only available on trading days
# fills in missing values, last observation is used
df = df.fillna(method='ffill')
#
# keep only the relevant time period
df = df[df.date > '2014-01-23']
df = df[df.date < '2020-02-01']
df.to_csv('../data/my_data.csv',index=False)
The datasets that have been imported look normal with expected number of columns and observations. The version that I have in Panda is 1.0.3
this is the output (df) when I first merge df_data and dtemp.
time price_sys price_no1 ... temp_no3 temp_no4 temp_no5
0 2014-01-23 00:00:00+00:00 32.08 32.08 ... NaN NaN NaN
1 2014-01-24 00:00:00+00:00 31.56 31.60 ... -2.5 -8.7 2.5
2 2014-01-24 00:00:00+00:00 30.96 31.02 ... -2.5 -8.7 2.5
3 2014-01-24 00:00:00+00:00 30.84 30.79 ... -2.5 -8.7 2.5
4 2014-01-24 00:00:00+00:00 31.58 31.10 ... -2.5 -8.7 2.5
[5 rows x 25 columns]
This is the output for eprice before I merge:
<bound method NDFrame.head of date gas price oil price coal price carbon price
0 2014-01-24 00:00:00 66.00 107.88 79.42 6.89
1 2014-01-27 00:00:00 64.20 106.69 79.43 7.04
2 2014-01-28 00:00:00 63.75 107.41 79.29 7.20
3 2014-01-29 00:00:00 63.20 107.85 78.52 7.21
4 2014-01-30 00:00:00 62.60 107.95 78.18 7.46
... ... ... ... ...
1608 2020-03-25 00:00:00 22.30 27.39 67.81 17.51
1609 2020-03-26 00:00:00 21.55 26.34 70.35 17.35
1610 2020-03-27 00:00:00 18.90 24.93 72.46 16.39
1611 2020-03-30 00:00:00 19.20 22.76 71.63 17.06
1612 2020-03-31 00:00:00 18.00 22.74 71.13 17.68
[1613 rows x 5 columns]>
This is what happends when I merge df and eprice:
<bound method NDFrame.head of date gas price oil price coal price carbon price
0 2014-01-24 00:00:00 66.00 107.88 79.42 6.89
1 2014-01-27 00:00:00 64.20 106.69 79.43 7.04
2 2014-01-28 00:00:00 63.75 107.41 79.29 7.20
3 2014-01-29 00:00:00 63.20 107.85 78.52 7.21
4 2014-01-30 00:00:00 62.60 107.95 78.18 7.46
... ... ... ... ...
1608 2020-03-25 00:00:00 22.30 27.39 67.81 17.51
1609 2020-03-26 00:00:00 21.55 26.34 70.35 17.35
1610 2020-03-27 00:00:00 18.90 24.93 72.46 16.39
1611 2020-03-30 00:00:00 19.20 22.76 71.63 17.06
1612 2020-03-31 00:00:00 18.00 22.74 71.13 17.68
[1613 rows x 5 columns]>
time price_sys ... coal price carbon price
0 2014-01-23 00:00:00+00:00 32.08 ... NaN NaN
1 2014-01-24 00:00:00+00:00 31.56 ... NaN NaN
2 2014-01-24 00:00:00+00:00 30.96 ... NaN NaN
3 2014-01-24 00:00:00+00:00 30.84 ... NaN NaN
4 2014-01-24 00:00:00+00:00 31.58 ... NaN NaN
[5 rows x 29 columns]
Try doing df['Time'] = pd.to_datetime(df['Time'], utc = True)
on both the time columns before joining (or rather the one without UTC needs to go through this!)