Search code examples
pythondata-science

How can I use Pandas or Openpyxl to classify a database in Excel based on supplier names?


I want to classify a database in excel using Python (Pandas or Openpyxl), where I have the name of the supplier in the 1st column and I want to fill in the type in the 2nd column. But I need to identify the supplier as a string (because the cells of the suppliers may vary). How can I do it?

I thought that this could work, but it's adding "SHOPEE" (example) for all the lines:

x = [table.loc[:,"supplier"].str.contains("SHOPEE")]
for x in table:
    table["TYPE"] = "SHOPEE"
display(table)

Solution

  • try to use this as a template, make changes wherever you need

    import pandas as pd
    
    # Load the Excel file into a DataFrame
    df = pd.read_excel('your_file.xlsx')
    
    # Define a dictionary to map suppliers to their corresponding types
    supplier_types = {
        'SHOPEE': 'Type 1',
        'SUPPLIER2': 'Type 2',
        'SUPPLIER3': 'Type 3',
        # Add more mappings as needed
    }
    
    # Iterate over the rows of the DataFrame
    for index, row in df.iterrows():
        supplier = str(row['supplier'])  # Convert supplier to a string
        for key, value in supplier_types.items():
            if key in supplier:
                df.at[index, 'TYPE'] = value  # Assign the corresponding type
    
    # Display the updated DataFrame
    print(df)