pandas: assigning values back to unknown column

I have a some dataframes whose columns have string values (sentences). each of these dataframes have column names that either has the word 'gold' in combination with other words (e.g, df.columns: 'gold_data', 'dataset_gold',...etc' or has the word 'labeled' in combination with other words (e.g, df.columns: 'labeled_data', 'dataset_labeled',...etc' or have both 'gold' and 'labeled' in combination with other words.

Here is an example of how the dataframes look like if both column name exists.

import pandas as pd

df = pd.DataFrame({'gold_data':['hello the weather nice','this is interesting','the weather is good'],
                   'data2':['goodbye','the plant is green','the weather is sunny'],
                   'new_labeled_dataset':['hello','there is no food in the fridge','this weather amazing']})

I trying to process the strings in the columns depending on which one exists and return the dataframe where the conditions were true for the rows in the original dataframe as follows.

result = []

for index, entry in df.iterrows():
    if not any(df.columns.str.contains(pat='labeled')):
        text = entry.filter(regex='gold').squeeze()
    else:
        text = entry.filter(regex='labeled').squeeze()


    if len(text.split()) > 2:
       # assigment? = 'new_info:' + text (this is where i do not know how to assign back to the column which was processed)
        result.append(entry)


print(pd.DataFrame(result))

so, I am saying if there is no 'labeled' in column names take the text from the column that has the word 'gold' otherwise take the text from 'labeled' column. But since I do not know the complete name of the column, i am not sure how to assign the processed text back to that column. The desired output should be:

            gold_data              data2                   augmented_new

0     new_info:this is interesting    the plant is green  there is no food in the fridge
1     new_info:the weather is good  the weather is sunny            this weather amazing

I have tried to get the full_name of the column and assign it to that, but that is not correct either.

# df[col for col in df if 'gold' or 'labeled' in col] ='new_info:' + text

Solution

If I understood correctly, you want to apply the string transformation on an certain elements of a column chosen using the column names. If this is the case, you can avoid to manually iterate over each single row, and simply use the apply() method of Pandas over the retrieved column. Since you do not want to do this for all the strings, but only with strings of at least 3 words, you can filter them thanks to the loc method of Pandas. You can do it with the following code:

# Chose in what case you are
if not any(df.columns.str.contains(pat='labeled')):
    # Retrieve the 'gold' column name
    chosen_col = next(filter(lambda x: 'gold' in x, [col for col in df.columns ])) 
else:
    # Retrieve the 'labeled' column name
    chosen_col = next(filter(lambda x: 'labeled' in x, [col for col in df.columns ]))
# Filter rows
df = df.loc[df[chosen_col].str.split().map(len) > 2]
# Transform all the string in the retrieved column
df[chosen_col] = df[chosen_col].apply(lambda x: 'new_info:' + x) 
print(df)

Since you have provided two different dataframes, the results obtained by this code are:

             gold_data                 data2                      new_labeled_dataset
1  this is interesting    the plant is green  new_info:there is no food in the fridge
2  the weather is good  the weather is sunny            new_info:this weather amazing

and for the final one:

                         gold_data                 data2                   augmented_new
0  new_info:hello the weather nice               goodbye                           hello
1     new_info:this is interesting    the plant is green  there is no food in the fridge
2     new_info:the weather is good  the weather is sunny            this weather amazing