Search code examples
pythonpandasdatedate-range

Repeating dates in pandas DataFrame without hour format


I'm trying to insert a range of date labels in my dataframe, df1. I've managed some part of the way, but I still have som bumps that I want to smooth out.

I'm trying to generate a column with dates from 2017-01-01 to 2020-12-31 with all dates repeated 24 times, i.e., a column with 35,068 rows.

dates = pd.date_range(start="01-01-2017", end="31-12-2020")
num_repeats = 24
repeated_dates = pd.DataFrame(dates.repeat(num_repeats))

df1.insert(0, 'Date', repeated_dates)

However, it only generates some iterations of the last date, meaning that my column will be NaT for the remaining x hours.

output:
           Date  DK1 Up  DK1 Down  DK2 Up  DK2 Down
0     2017-01-01     0.0       0.0     0.0       0.0
1     2017-01-01     0.0       0.0     0.0       0.0
2     2017-01-01     0.0       0.0     0.0       0.0
3     2017-01-01     0.0       0.0     0.0       0.0
4     2017-01-01     0.0       0.0     0.0       0.0
...          ...     ...       ...     ...       ...
35063 2020-12-31     0.0       0.0     0.0       0.0
35064        NaT     0.0       0.0     0.0       0.0
35065        NaT     0.0     -54.1     0.0       0.0
35066        NaT    25.5       0.0     0.0       0.0
35067        NaT     0.0       0.0     0.0       0.0

Furthermore, how can I change the date format from '2017-01-01' to '01-01-2017'?


Solution

  • You set this up perfectly, so here is the dates that you have,

    import pandas as pd
    import numpy as np
    
    dates = pd.date_range(start="01-01-2017", end="31-12-2020")
    num_repeats = 24
    df = pd.DataFrame(dates.repeat(num_repeats),columns=['date'])
    

    and converting the column to the format you want is simple with the strftime function

    df['newFormat'] = df['date'].dt.strftime('%d-%m-%Y')
    

    Which gives

        date        newFormat
    0   2017-01-01  01-01-2017
    1   2017-01-01  01-01-2017
    2   2017-01-01  01-01-2017
    3   2017-01-01  01-01-2017
    4   2017-01-01  01-01-2017
    ... ... ...
    35059   2020-12-31  31-12-2020
    35060   2020-12-31  31-12-2020
    35061   2020-12-31  31-12-2020
    35062   2020-12-31  31-12-2020
    35063   2020-12-31  31-12-2020
    

    now

    dates = pd.date_range(start="01-01-2017", end="31-12-2020")
    

    gives

    DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04',
                   '2017-01-05', '2017-01-06', '2017-01-07', '2017-01-08',
                   '2017-01-09', '2017-01-10',
                   ...
                   '2020-12-22', '2020-12-23', '2020-12-24', '2020-12-25',
                   '2020-12-26', '2020-12-27', '2020-12-28', '2020-12-29',
                   '2020-12-30', '2020-12-31'],
                  dtype='datetime64[ns]', length=1461, freq='D')
    

    and

    1461 * 24 = 35064

    so I am not sure where 35,068 comes from. Are you sure about that number?