Search code examples
pythonpandasdataframelist-comprehension

Split numbers based on dataframe column values


I am a novice python programmer and I have a task at hand which needs a little logic designing. So I have a data frame with the below mentioned values in User, Task and Hours invested by the user for that task:

User Task hours
John T1 8
John T2 32
Alex A1 40
Bob B1 16

Now what I need to do is, generate a new dataframe on a weekly basis which contains the user, task code and the hours divided into 5 (5 working days in a week of 7 days). Note that max hours per week cannot exceed 40 i.e 8 hours per day.

For Alex: The division is simple. 8 hours per day for 5 days will give me 40 hours of worktime.

User Task hours Day
Alex A1 8 1
Alex A1 8 2
Alex A1 8 3
Alex A1 8 4
Alex A1 8 5

Now it gets tricky.

For Bob: The division will be 16/5 = 3.2. But I need to enter whole numbers (4) and adjust the leftover on the last day.

User Task hours Day
Bob B1 3 1
Bob B1 3 2
Bob B1 3 3
Bob B1 3 4
Bob B1 4 5

For John:

This is the most tricky one. I need to split the 2 task's hours (32 and 8) among 5 days.

I need 8 split up into 2-2-2-2 for 4 days or 4-4 for 2 days

and 32 split up into 6-6-6-6-8 for 5 days or 4-4-8-8-8 for 5 days

Considering the 1st cases for both, we would get something like this:

User Task hours Day
John T1 2 1
John T2 6 1
John T1 2 2
John T2 6 2
John T1 2 3
John T2 6 3
John T1 2 4
John T2 6 4
John T2 8 5

I am sorry if my format or language is inappropriate or not adhering the guidelines. My apologies.

I would be extremely grateful of you could help me figure out a logic for this. A big thank you in advance.


Solution

  • First, import numpy as np

    Our dataframe:

    df = pd.DataFrame([
        ["John", "T1",8],
        ["John", "T2", 32],
        ["Alex","A1",40],
        ["Bob","B1",16]],
        columns=["User", "Task", 'Hours'])
    
    # We create a column with the average work hours
    df["avg"] = df["Hours"] / 5
    
    # A column with the leftover hours of the first 4 days of week
    df["leftover"] = (df["avg"] - np.floor(df2["avg"]) ) * 4
    
    # We create a function, in which we will store in a list
    # the working hours for each day, adding the leftovers in the 5th day.
    # The elements of the list are tuples following the format (<working-hours>, <nth-day>)
    def days_list(row):
        days = [(np.floor(row["avg"]), i+1) for i in range(4)]
        days.append((np.floor(row["leftover"] + row["avg"]), 5))
        return days
    
    df["list_of_days"] = df.apply(lambda row: days_list(row), axis=1) # applying the function
    
    # We explode the column 'list_of_days' so as to have a row for each tuple
    df = df.explode("list_of_days")
    
    # Finally, we unzip the values of each tuple in the 2 columns 'Hours' & 'Days' accordingly.
    df['Hours'], df['Days'] = zip(*df["list_of_days"])
    

    Keeping the columns we need and printing the Dataframe :

    final_df = df[["User", "Task", "Hours", "Days"]]
    final_df
    
    User Task Hours Days
    John T1 1.0 1
    John T1 1.0 2
    John T1 1.0 3
    John T1 1.0 4
    John T1 4.0 5
    John T2 6.0 1
    John T2 6.0 2
    John T2 6.0 3
    John T2 6.0 4
    John T2 8.0 5
    Alex A1 8.0 1
    Alex A1 8.0 2
    Alex A1 8.0 3
    Alex A1 8.0 4
    Alex A1 8.0 5
    Bob B1 3.0 1
    Bob B1 3.0 2
    Bob B1 3.0 3
    Bob B1 3.0 4
    Bob B1 4.0 5