Search code examples
pythondataframedaterepeat

dynamically add a sequential date column to a data frame with a starting with each date is repeated N times in consecutive rows


I've got the following data.frame:

import pandas as pd
import random

data = {
    'Column1': [random.randint(1, 100) for _ in range(9)],
    'Column2': [random.uniform(0, 1) for _ in range(9)],
    'Column3': [chr(random.randint(65, 90)) for _ in range(9)],
    'Column4': [random.choice(['A', 'B', 'C']) for _ in range(9)]
}
df = pd.DataFrame(data)

   Column1   Column2 Column3 Column4
0       87  0.208179       M       C
1       85  0.049071       Q       C
2        4  0.474926       X       C
3       35  0.966357       L       B
4       58  0.295134       C       B
5       23  0.633367       R       B
6       87  0.069583       V       B
7       83  0.427594       N       A
8       16  0.592413       R       C

I'd like to add a new sequential DATE column with entries starting from a chosen start_date (= '2022-01-01'), so that each date is repeated N (=2) times, for the whole dataset. I also would like to add another column called SHIFT with n (=2) alternative selected_values=['Day','Night'].

## desired output for N=2 and start_date = '2022-01-01', and n=2 with selected_values = ['Day','Night']

   Column1   Column2 Column3 Column4   DATE         SHIFT
0       87  0.208179       M       C   2022-01-01    Day
1       85  0.049071       Q       C   2022-01-01    Night
2        4  0.474926       X       C   2022-01-02    Day
3       35  0.966357       L       B   2022-01-02    Night
4       58  0.295134       C       B   2022-01-03    Day
5       23  0.633367       R       B   2022-01-03    Night
6       87  0.069583       V       B   2022-01-04    Day
7       83  0.427594       N       A   2022-01-04    Night
8       16  0.592413       R       C   2022-01-05    Day

- N, n, selected_values, and start_date are all dynamic. with n = number of elements in selected_values

to make it more clear, here's the desired output for different parameters:

## desired output for N=2 and start_date = '2022-01-01', and n=3 with selected_values = ['Day','Night','Afternoon']

   Column1   Column2 Column3 Column4   DATE         SHIFT
0       87  0.208179       M       C   2022-01-01    Day
1       85  0.049071       Q       C   2022-01-01    Night
2        4  0.474926       X       C   2022-01-02    Afternoon
3       35  0.966357       L       B   2022-01-02    Day
4       58  0.295134       C       B   2022-01-03    Night
5       23  0.633367       R       B   2022-01-03    Afternoon
6       87  0.069583       V       B   2022-01-04    Day
7       83  0.427594       N       A   2022-01-04    Night
8       16  0.592413       R       C   2022-01-05    Afternoon

Solution

  • from datetime import datetime, timedelta
    
    def add_dates_shifts(df, start_date, N, shifts, n):
        dates = [datetime.fromisoformat(start_date) + timedelta(days=i) for i in df.index for _ in range(N)][:len(df)]
        df["DATE"] = dates
        df["SHIFT"] = [shifts[i] for _ in df.index for i in range(n)][:len(df)]
    
    add_dates_shifts(df, "2022-01-01", 2, ["Day", "Night"], 2)
    
       Column1   Column2 Column3 Column4       DATE  SHIFT
    0       48  0.283085       J       C 2022-01-01    Day
    1       76  0.896891       J       B 2022-01-01  Night
    2       58  0.867856       Z       B 2022-01-02    Day
    3       60  0.175652       H       C 2022-01-02  Night
    4       50  0.222947       S       A 2022-01-03    Day
    5       36  0.753228       A       A 2022-01-03  Night
    6       48  0.032319       W       B 2022-01-04    Day
    7       10  0.412104       T       A 2022-01-04  Night
    8       61  0.266018       Q       A 2022-01-05    Day