Search code examples
pythonpandasconcatenationpandas-loc

python dataframe concatenate based on a chosen date


Say I have the following variables and dataframe:

a = '2020-04-23 14:00:00+00:00','2020-04-23 13:00:00+00:00','2020-04-23 12:00:00+00:00','2020-04-23 11:00:00+00:00','2020-04-23 10:00:00+00:00','2020-04-23 09:00:00+00:00','2020-04-23 08:00:00+00:00','2020-04-23 07:00:00+00:00','2020-04-23 06:00:00+00:00','2020-04-23 04:00:00+00:00'
b = '2020-04-23 10:00:00+00:00','2020-04-23 09:00:00+00:00','2020-04-23 08:00:00+00:00','2020-04-23 07:00:00+00:00','2020-04-23 06:00:00+00:00','2020-04-23 05:00:00+00:00','2020-04-23 04:00:00+00:00','2020-04-23 03:00:00+00:00','2020-04-23 02:00:00+00:00','2020-04-23 01:00:00+00:00'

aa = 7105.50,6923.50,6692.50,6523.00,6302.5,6081.5,6262.0,6451.50,6369.50,6110.00
bb = 6386.00,6221.00,6505.00,6534.70,6705.00,6535.00,7156.50,7422.00,7608.50,8098.00

df1 = pd.DataFrame()
df1['timestamp'] = a
df1['price'] = aa

df2 = pd.DataFrame()
df2['timestamp'] = b
df2['price'] = bb


print(df1)
print(df2)

I am trying to concatenate the rows of following:

  1. top row of df1 to '2020-04-23 08:00:00+00:00'

  2. '2020-04-23 07:00:00+00:00' to the last row of df2

for illustration purposes the following is what the dataframe should look like:

c = '2020-04-23 14:00:00+00:00','2020-04-23 13:00:00+00:00','2020-04-23 12:00:00+00:00','2020-04-23 11:00:00+00:00','2020-04-23 10:00:00+00:00','2020-04-23 09:00:00+00:00','2020-04-23 08:00:00+00:00','2020-04-23 07:00:00+00:00','2020-04-23 06:00:00+00:00','2020-04-23 05:00:00+00:00','2020-04-23 04:00:00+00:00','2020-04-23 03:00:00+00:00','2020-04-23 02:00:00+00:00','2020-04-23 01:00:00+00:00'

cc = 7105.50,6923.50,6692.50,6523.00,6302.5,6081.5,6262.0,6534.70,6705.00,6535.00,7156.50,7422.00,7608.50,8098.00


df = pd.DataFrame()
df['timestamp'] = c
df['price'] = cc
print(df)

Any ideas?


Solution

  • You can convert the timestamp columns to pd.date_time objects, and then use boolean indexing and pd.concat to select and merge them:

    df1.timestamp = pd.to_datetime(df1.timestamp)
    df2.timestamp = pd.to_datetime(df2.timestamp)
    
    dfs = [df1.loc[df1.timestamp >= pd.to_datetime("2020-04-23 08:00:00+00:00"),:],
           df2.loc[df2.timestamp <= pd.to_datetime("2020-04-23 07:00:00+00:00"),:]
           ]
    
    df_conc = pd.concat(dfs)