Search code examples
pythonloopsrowitems

Loop over each item in a row and compare with each item from another row then save the result in a new column_python


I want to loop in python, over each item from a row against other items from the correspondent row from another column. If item is not present in the row of the second column then should append to the new list that will be converted in another column (this should also eliminate duplicates when appending through if i not in c).

The goal is to compare items from each row of a column against items from the correspondent row in another column and to save the unique values from the first column, in a new column same df.

df columns

This is just an example, I have much many items in each row

I tried using this code but nothing happened and conversion of the list into the column it's not correct from what I have tested

a= df['final_key_concat'].tolist()
b = df['attributes_tokenize'].tolist()
c = []
for i in df.values:
    for i in a:
        if i in a:
            if i not in b:
                if i not in c:
                    c.append(i)
                    print(c)
                    df['new'] = pd.Series(c)

Any help is more than needed, thanks in advance


Solution

  •     def parse_str_into_list(s):
        if s.startswith('[') and s.endswith(']'):
            return ' '.join(s.strip('[]').strip("'").split("', '"))
        return s
    
    def filter_restrict_words(row):
        targets = parse_str_into_list(row[0]).split(' ', -1)
        restricts = parse_str_into_list(row[1]).split(' ', -1)
        print(restricts)
    
        # start for loop each words
        # use set type to save words or  list if we need to keep words in order
        words_to_keep = []
        for word in targets:
            # condition to keep eligible words
            if word not in restricts and 3 < len(word) < 45 and word not in words_to_keep:
                words_to_keep.append(word)
                print(words_to_keep)
    
        return ' '.join(words_to_keep)
    
    df['FINAL_KEYWORDS'] = df[[col_target, col_restrict]].apply(lambda x: filter_restrict_words(x), axis=1)