Material Tree--Finding Late Items (inside a Pandas Dataframe)

Ok, so I need to help and/or advice on how to tackle a material tree on-time/late problem.

I have a pandas dataframe that contains material trees (['Tree']), different levels inside that tree (['Level']), part numbers (['Part #']), scheduled start dates (['Sched Start']) and scheduled finish dates (['Sched Fin']).

import pandas as pd


data = {'Tree': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3],
        'Level': [1, 2, 2, 3, 1, 2, 3, 4, 1, 2, 3, 2, 3, 2],
        'Part #': ['11', '12A', '12B', '12B3',
                   '21', '22A', '22A3', '22A4',
                   '31', '32A', '32A3', '32B', '32B3', '32C'],
        'Sched Start': pd.to_datetime(['12/01/2020', '11/01/2020', '11/01/2020', '10/01/2020',
                        '12/01/2020', '11/01/2020', '10/01/2020', '09/01/2020',
                        '12/01/2020', '11/01/2020', '10/01/2020', '11/01/2020', '10/01/2020', '11/01/2020']),
        'Sched Fin': pd.to_datetime(['12/15/2020', '11/15/2020', '12/02/2020', '11/02/2020',
                        '12/15/2020', '11/15/2020', '11/02/2020', '09/15/2020',
                        '12/15/2020', '11/15/2020', '10/15/2020', '11/15/2020', '10/15/2020', '11/15/2020'])        
        }

df = pd. DataFrame(data)

The normal material flow is that items feed the next higher assembly. For example a level 3 item feeds into a level 2 item. A level 2 item feed the level 1 item (level 1 is the top/final assembly that doesn't feed anything). There could be multiple level 2s that feed a single level 1. Multiple level 3s that feed a single level two and etc. So in the example code above (for Tree 1): 12B3 feeds into 12B, 12A and 12B feed into 11.

Anyhow, I need to add another column with comparison data from the finish date of an item with the start date of its next higher assembly. Back to our example above. The level 3 part 12B3 has a finish date of 11/02/2020--it feeds 12B that has a start date of 11/01/2020: 12B3 is LATE. Looking at the dates, 12B would be late, 12A would be on-time.

The lower assemblies will always be located under the highers.

Clear as mud, right?

What I have tried:

Well I made a horrible attempt at a loop that iterates over each row. It grabbed the level value and then went to the next row if previous row level > current row level it would compare current row 'Sched Fin' to previous row 'Sched Start' with a little success. Of course this all blows up when there are items of the same level in sequence (e.g. two level 2s).

Any help would be greatly appreciated.

** edit ** The trees are independent of each other. Not tied together like the levels.

Solution

Like above_c_level mentioned in the comment, it would be easier to keep track of the feeding pathway through a class or two.

For my answer, I modified your data dictionary to have semicolons between the tree level and subsequent production levels so that they could be compared with each other easier (i.e. '1;2B;3').

Firstly, a class for your individual parts would be helpful to keep track of the feeding between different parts.

class Part():
    def __init__(self, part, level, start, finish):
        self.part = part
        self.level = level
        self.start = pd.to_datetime(start)
        self.finish = pd.to_datetime(finish)
        self.feedsl = []
        self.isfedl = []
        self.status = None
    def __str__(self):
        return '({}, {}, {}, {})'.format(self.part, self.level, self.start, self.finish)
    def __repr__(self):
        return self.__str__()
    def feeds(self, parts):
        for p in parts:
            p.isfedl.append(self)
            self.feedsl.append(p)
    def isfed(self, parts):
        for p in parts:
            self.isfedl.append(p)
            p.feedsl.append(self)
    def late(self):
        deltas = []
        for feedp in self.feedsl:
            delta = feedp.start - self.finish
            deltas.append(delta)
        #lates should have self.finish > self.start, so negative delta
        lates = [t for t in deltas if t.days < 0]
        if len(lates) > 0:
            self.status = 'LATE'
            self.LATE = True
        elif len(lates) == 0:
            self.status = 'ONTIME'
            self.LATE = False
        return self.status

Each part tracks whether it's ontime or late based on what it feeds into given the dates you have from. You can either specify a part feeds into another (which updates both the feedsl attribute of the part and the isfedl attribute of the recipient) or specify a part is fed by some number of parts (which, again, updates both attributes). I have the input set to assume a list, so if you specify one by one a lot you could modify or just wrap everything in brackets.

Once you do this, you have to generate the list of parts from your data:

parts = []
LEN = len(data['Tree'])
for i in range(LEN):
    treelev = data['Tree'][i]
    level = data['Level'][i]
    partnum = data['Part #'][i]
    start = data['Sched Start'][i]
    finish = data['Sched Fin'][i]
    parts.append(Part(partnum, level, start, finish))

So with the list of parts you can separate trees by list comprehension using the Part.part names (since the first value of your format is always the tree number).

Now you need a class that accepts a list of parts (assuming they're properly sorted into corresponding trees) that generates the feed pathway by the part names. (This is where I wanted the semicolon).

class MaterialTree():
    def __init__(self, parts):
        self.parts = parts
    def setLevTree(self):
        self.levels = [p.level for p in self.parts]
        for ip in range(len(self.parts)-1):
            p = self.parts[ip]
            #all twos feed one:
            if p.level == 1:
                p2s = [p for p in self.parts if p.level == 2]
                p.isfed(p2s)
                continue
            #for each n >= 2, if adjacent is > n, adjacent feeds current
            for l in range(2, max(self.levels)+1):
                pnext = self.parts[ip+1]
                if p.level == pnext.level:
                    continue
                elif p.level == pnext.level-1:
                    p.isfed([pnext])
    def setTree(self):
        #number of production levels
        self.levels = range(max([p.level for p in self.parts]))
        #part names for each level
        self.levdct = {l+1:[p.part for p in self.parts if int(p.part.split(';')[-1][0]) == l+1] for l in self.levels}
        for ik in self.levels[:-1]: #exclude last level, only feeds second to last
            #get names for current base level
            namebase = self.levdct[ik+1]
            #get names for branches one level up
            namebranch = self.levdct[ik+2]
            #select parts with names in base
            base = [p for p in self.parts if p.part in namebase]
            #select parts with names in branch
            branch = [p for p in self.parts if p.part in namebranch]     
            #begin feed:
            for b in base:
                #if there is no letter in the name, all branches feed this
                if not b.part.upper().isupper():
                    for br in branch:
                        br.feeds([b])
                #if there is a letter in the name,
                if b.part.upper().isupper():
                    #select the letter and use it to compare branches
                    letts = [i for i in b.part if i.upper().isupper()][0]
                    #only branches with this letter feed this base
                    for fbr in [br for br in branch if letts in br.part]:
                        fbr.feeds([b])
    def status(self):
        lates = []
        for p in self.parts:
            lates.append(p.late())
        self.status = lates
        return self.status

The various str.upper().isupper() just test for the presence of any letters in the different part names. You could use this to generate the lists of statuses for your parts to add to the dataframe and export to Excel if you wanted.

Just an example:

T1 = [p for p in parts if p.part[0] == '1']
m1 = MaterialTree(T1)
m1.setTree()
print(m1.status())

returns for me ['ONTIME', 'ONTIME', 'LATE', 'LATE'].

It could get more complicated, of course, if your part names have a structure that's not readily parsed, but should be doable.

** EDIT **: If the feed structure is determined purely by order and level (i.e., adjacent parts with an incremented level feed the current part), then you can use the setLevTree instead. It assumes this kind of order, but doesn't depend on part name. Following the same example for for tree 2, m.setLevTree() gives me ['ONTIME', 'ONTIME', 'LATE', 'ONTIME'].