I've got the following data.frame:
import pandas as pd
import random
data = {
'Column1': [random.randint(1, 100) for _ in range(9)],
'Column2': [random.uniform(0, 1) for _ in range(9)],
'Column3': [chr(random.randint(65, 90)) for _ in range(9)],
'Column4': [random.choice(['A', 'B', 'C']) for _ in range(9)]
}
df = pd.DataFrame(data)
Column1 Column2 Column3 Column4
0 87 0.208179 M C
1 85 0.049071 Q C
2 4 0.474926 X C
3 35 0.966357 L B
4 58 0.295134 C B
5 23 0.633367 R B
6 87 0.069583 V B
7 83 0.427594 N A
8 16 0.592413 R C
I'd like to add a new sequential DATE column with entries starting from a chosen start_date (= '2022-01-01'), so that each date is repeated N (=2) times, for the whole dataset. I also would like to add another column called SHIFT with n (=2) alternative selected_values=['Day','Night'].
## desired output for N=2 and start_date = '2022-01-01', and n=2 with selected_values = ['Day','Night']
Column1 Column2 Column3 Column4 DATE SHIFT
0 87 0.208179 M C 2022-01-01 Day
1 85 0.049071 Q C 2022-01-01 Night
2 4 0.474926 X C 2022-01-02 Day
3 35 0.966357 L B 2022-01-02 Night
4 58 0.295134 C B 2022-01-03 Day
5 23 0.633367 R B 2022-01-03 Night
6 87 0.069583 V B 2022-01-04 Day
7 83 0.427594 N A 2022-01-04 Night
8 16 0.592413 R C 2022-01-05 Day
- N, n, selected_values, and start_date are all dynamic. with n = number of elements in selected_values
to make it more clear, here's the desired output for different parameters:
## desired output for N=2 and start_date = '2022-01-01', and n=3 with selected_values = ['Day','Night','Afternoon']
Column1 Column2 Column3 Column4 DATE SHIFT
0 87 0.208179 M C 2022-01-01 Day
1 85 0.049071 Q C 2022-01-01 Night
2 4 0.474926 X C 2022-01-02 Afternoon
3 35 0.966357 L B 2022-01-02 Day
4 58 0.295134 C B 2022-01-03 Night
5 23 0.633367 R B 2022-01-03 Afternoon
6 87 0.069583 V B 2022-01-04 Day
7 83 0.427594 N A 2022-01-04 Night
8 16 0.592413 R C 2022-01-05 Afternoon
from datetime import datetime, timedelta
def add_dates_shifts(df, start_date, N, shifts, n):
dates = [datetime.fromisoformat(start_date) + timedelta(days=i) for i in df.index for _ in range(N)][:len(df)]
df["DATE"] = dates
df["SHIFT"] = [shifts[i] for _ in df.index for i in range(n)][:len(df)]
add_dates_shifts(df, "2022-01-01", 2, ["Day", "Night"], 2)
Column1 Column2 Column3 Column4 DATE SHIFT
0 48 0.283085 J C 2022-01-01 Day
1 76 0.896891 J B 2022-01-01 Night
2 58 0.867856 Z B 2022-01-02 Day
3 60 0.175652 H C 2022-01-02 Night
4 50 0.222947 S A 2022-01-03 Day
5 36 0.753228 A A 2022-01-03 Night
6 48 0.032319 W B 2022-01-04 Day
7 10 0.412104 T A 2022-01-04 Night
8 61 0.266018 Q A 2022-01-05 Day