Search code examples
pandasreplacenp

replacing values in df based on condition with np.select and np.where


I have a list of data_terms, and I'd like to loop through the col of df to replace all instances matching any item in data_terms with the word "database". I've tried with np.select:

discrete_distributed_bar["profile_standardized"] = np.select(
    [discrete_distributed_bar["profile_standardized"].isin(data_terms)],
    ["database"],
    default=discrete_distributed_bar.profile_standardized,
)

and, with np.where:

for index, row in discrete_distributed_bar["profile_standardized"].items():
    np.where(
        discrete_distributed_bar["profile_standardized"].isin(data_terms),
        "database",
        row,
    )

but the replacement is not actually happening. What am I missing here?

Thanks for the help!


Solution

  • Here seems solution should be simplify:

    discrete_distributed_bar["profile_standardized"] = discrete_distributed_bar["profile_standardized"].replace(data_terms, 'database')
    

    But I think there is some problem with data (e.g. whitespaces), test it by:

    print (discrete_distributed_bar["profile_standardized"].isin(data_terms))
    

    Then is possible use:

    discrete_distributed_bar["profile_standardized"] = discrete_distributed_bar["profile_standardized"].str.strip().replace(data_terms, 'database')