Search code examples
pythonpandasxlsx

Iterating through dataframe and updating based on dictionary conditions


I have the following xlsx file that I need to work on:

I want to iterate through the dataframe and if the column ITEM CODE contains a dictionary key, I want to check on the same row if contains a dictionary value[0] (first position in the tuple) and if contains I want to insert dictionary value1 (second position in the tuple) into another column named SKU

Dataframe: #df3 = df2.append(df1) enter image description here

catp = {"2755":(('24','002'),('25','003'),('26','003'),('27','004'),('28','005'),('29','006'),('30','007'),('31','008'),
                ('32','009'),('32','010'),('33','011'),('34','012'),('35','013'),('36','014')),
        "2513":(('38','002'),('40','003'),('42','004'),('44','005'),('46','006'),('48','007'),('50','008'),('52','009'),
               ('54','010'))}

for i, row in df3.iterrows():
    if catp.key() in df3['ITEM CODE'][i] and catp.value()[0] in df3['TG'][i]:
            codmarime = catp.value()[1]
            df3['SKU'][i] = '20'+df3['ITEM CODE'][i]+[i]+codmarime

    else:
        df3['SKU'][i] = '20'+df3['ITEM CODE'][i]+'???'

If 2755 and 24 found SKU = '202755638002'

If 2513 and 44 found SKU = '202513123005'

Output xlsx enter image description here


Solution

  • As you failed to provide text data to create at least a fragment of your DataFrame, I copied from your picture 3 rows, creating my test DataFrame:

    df3 = pd.DataFrame(data=[
        [ '1513452', 'AVRO D2', '685', 'BLACK/BLACK/ANTRACITE', '24', 929.95, '8052644627565' ],
        [ '2513452', 'AVRO D2', '685', 'BLACK/BLACK/ANTRACITE', '21', 929.95, '8052644627565' ],
        [ '2755126', 'AMELIA',  'Y17', 'DARK-DENIM',            '24', 179.95, '8052644627565' ]],
        columns=[ 'ITEM CODE', 'ITEM', 'COLOR', 'COLOR CODE', 'TG', 'PRICE', 'EAN' ])
    

    Details:

    • The first row does not contain any of catp keys in ITEM CODE column.
    • The second row: ITEM CODE contains one of your codes (2513) but for TG column no tuple saved under 2513 key contains first element == 21.
    • The third row: ITEM CODE contains one of your codes (2755), TG == 24 and among tuples saved under 2755 there is one == 24.

    Then we have to define a couple of auxiliary functions:

    def findContainedCodeAndVal(dct, str):
        for eachKey in dct.keys():
            if str.find(eachKey) >= 0:
                return (eachKey, dct[eachKey])
        else:
            return (None, None)
    

    This function attempts to find in dct a key contained in str. It returns a 2-tuple containing the key found and associated value from dct.

    def find2ndElem(tuples, str):
        for tpl in tuples:
            if tpl[0] == str:
                return tpl[1]
        else:
            return ''
    

    This function checks each tuple from tuples whether its first element == str and returns the second element from this tuple.

    And the last function to define is a function to be applied to each row from your DataFrame. It returns the value to be saved in SKU column:

    def fn(row):
        ind = row.name  # Read row index
        iCode = row['ITEM CODE']
        k, val = findContainedCodeAndVal(catp, iCode)
        codmarime = ''
        if k:
            tg = row.TG
            codmarime = find2ndElem(val, tg)
        if codmarime == '':
            codmarime = '???'
        return f'20/{iCode}/{ind}/{codmarime}'
    

    Note that it uses your catp dictionary.

    For demonstration purposes, I introduced in the returned value additional slashes, separating adjacent parts. In the target version remove them.

    And the last thing to do is to compute SKU column of your DataFrame, applying fn function to each row of df3 and saving the result under SKU column:

    df3['SKU'] = df3.apply(fn, axis=1)
    

    When you print the DataFrame (containig my test data), SKU column will contain:

    20/1513452/0/???
    20/2513452/1/???
    20/2755126/2/002