I face a problem to change the timezone of a dataframe in which frequency is below 1 hour. In my case, I get a quarter-hourly dataframe from a CSV source and I have to delete the DST hour in March and add the DST hour in October. The below function works well if freq is hourly but doesn't work with below freq.
Has someone any solution to this problem ?
import pandas as pd
import numpy as np
from pytz import timezone
def DST_Paris(NH, NH_str):
## Suppose that I do not create the dataframe here but I import one from a CSV file
df = pd.DataFrame(np.random.randn(NH * 365), index = pd.date_range(start="01/01/2014", freq=NH_str, periods=NH * 365))
## I need to delete the hour in March and duplicate the hour in October
## If freq is inf at 1 hour, I need to duplicate all the data inside the considerated hour
tz = timezone('Europe/Paris')
change_date = tz._utc_transition_times
GMT1toGMT2_dates = [datei.date() for datei in list(change_date) if datei.month == 3]
GMT2toGMT1_dates = [datei.date() for datei in list(change_date) if datei.month == 10]
ind_March = np.logical_and(np.in1d(df.index.date, GMT1toGMT2_dates),(df.index.hour == 2))
ind_October = np.logical_and(np.in1d(df.index.date, GMT2toGMT1_dates),(df.index.hour == 2))
df['ind_March'] = (1-ind_March)
df['ind_October'] = ind_October * 1
df = df[df.ind_March == 1]
df = df.append(df[df.ind_October == 1])
del df['ind_March']
del df['ind_October']
df = df.sort()
## Error if granularity below of 1 hours
df = df.tz_localize('Europe/Paris', ambiguous = 'infer')
return df
try:
DST_Paris(24, "1h")
print "dataframe freq = 1h ==> no pb"
except:
print "dataframe freq = 1h ==> error"
try:
DST_Paris(96, "15min")
print "dataframe freq = 15min ==> no pb"
except:
print "dataframe freq = 15min ==> error"
The output is :
dataframe freq = 1h ==> no pb
dataframe freq = 15min ==> error
A workaround would be to use
is_dst = False # or True
df = df.tz_localize('Europe/Paris', ambiguous=[is_dst]*len(df))
to explicitly specify if the ambiguous local times should be interpreted as in the Daylight Savings Time zone or not.
By the way,
df['ind_March'] = (1-ind_March)
df['ind_October'] = ind_October * 1
df = df[df.ind_March == 1]
df = df.append(df[df.ind_October == 1])
del df['ind_March']
del df['ind_October']
df = df.sort()
could be simplified to
df = df.loc[(~ind_March) & (ind_October)]
df = df.sort()