Python Apply function to create new rows in loop

Goal: Here is a sample of a dataset that has "ID", "PHASENAME", "CDAYS", "MULTI_FACTOR", "DAY_COUNTER", and "DAILY_LABOR_PERCENT". The goal is to take every "ID", "PHASENAME", "CDAYS" and increase the "DAY_COUNTER" from whatever the last day is increase it to 100 days. I am also to repeat this for "DAILY_LABOR_PERCENT".

currently

ID	PHASENAME	C_DAYS	Multi_Factor	DAY_COUNTER	DAILY_LABOR_%
BAR	C	14.0	7.142857	1	1.0
BAR	C	14.0	7.142857	2	5.0
BAR	C	14.0	7.142857	13	4.0
BAR	C	14.0	7.142857	14	0.0

need it to be

ID	PHASENAME	C_DAYS	Multi_Factor	DAY_COUNTER	DAILY_LABOR_%
BAR	C	14.0	7.142857	1	.14
BAR	C	14.0	7.142857	2	some number
BAR	C	14.0	7.142857	99	some number
BAR	C	14.0	7.142857	100	0.0

In this sample dataset, this person worked for 14 days and their labor percentage is recorded for each day. Instead of have 14 days of labor, I want to have 100 days of labor ( formulas: 'DAY_COUNTER' * 'Multi_Factor' and "DAILY_LABOR_PERCENT" / 'Multi_Factor ).

Also, I'm worried that once I enlarge this dataset having the range as 100 won't work.

Looking for

The Labor is repeating. So when I apply my logic I am not getting the true labor percentage.
Suggestions on how to approach this with a larger dataset.

code

import pandas as pd
import numpy as np


data={
    "ID": [ "BAR","BAR","BAR","BAR","BAR","BAR","BAR","BAR","BAR","BAR","BAR","BAR","BAR","BAR"],
    "PHASENAME": [ "C","C","C","C","C","C","C","C","C","C","C","C","C","C"],
    "C_DAYS": [ 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0],
    "Multi_Factor": [7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857, 7.142857],
    "DAY_COUNTER": [1,2,3,4,5,6,7,8,9,10,11,12,13,14],
    "DAILY_LABOR_PERCENT": [1.0,5.0,9.0,11.0,10.0,9.0,9.0,9.0,8.0,10.0,8.0,7.0,4.0,0.0],
    }

df=pd.DataFrame(data)
df1=df.copy()


n = 100
first_day = df["DAY_COUNTER"].iloc[0]

rep = np.tile(df.values, (n // len(df) + 1, 1))
out = pd.DataFrame(rep, columns=df.columns).iloc[:n]

out["DAY_COUNTER"] = range(first_day, n + first_day)


def labor_factor(row):
    return row['DAILY_LABOR_PERCENT'] / row['Multi_Factor']

#creating empty list to store rows for each id, phasename, and cdays
new_rows= []

#iterating through each index range and applying function to get day_counter to 100

for i in range(len(out)):
    out['new_labor']= out.apply(labor_factor,axis=1)

Solution

As a general rule, you want to avoid iterating over dataframe rows, as that's very inefficient.

IIUC, you can instead repeat your dataframe using something vectorized like numpy.tile until you have the desired number of rows, then adjust the values accordingly:

n = 100
first_day = df["DAY_COUNTER"].iloc[0]

rep = np.tile(df.values, (n // len(df) + 1, 1))
out = pd.DataFrame(rep, columns=df.columns).iloc[:n]

out["DAY_COUNTER"] = range(first_day, n + first_day)

     ID PHASENAME C_DAYS Multi_Factor  DAY_COUNTER DAILY_LABOR_PERCENT
0   BAR         C   14.0     7.142857            1                 1.0
1   BAR         C   14.0     7.142857            2                 5.0
2   BAR         C   14.0     7.142857            3                 9.0
3   BAR         C   14.0     7.142857            4                11.0
4   BAR         C   14.0     7.142857            5                10.0
..  ...       ...    ...          ...          ...                 ...
95  BAR         C   14.0     7.142857           96                 7.0
96  BAR         C   14.0     7.142857           97                 4.0
97  BAR         C   14.0     7.142857           98                 0.0
98  BAR         C   14.0     7.142857           99                 1.0
99  BAR         C   14.0     7.142857          100                 5.0