Search code examples
pythonpandasdataframewandb

wandb.Table raises error: AssertionError: columns argument expects a `list` object


I'm very beginner with wandb , so this is very basic question. I have dataframe which has my x features and y values. I'm tryin to follow this tutorial to train model from my pandas dataframe . However, when I try to create wandb table from my pandas dataframe, I get an error:


wandb.init(project='my-xgb', config={'lr': 0.01})

#the log didn't work  so I haven't run it at the moment (the log 'loss') 
#wandb.log({'loss': loss, ...})


# Create a W&B Table with your pandas dataframe
table = wandb.Table(df1)

AssertionError: columns argument expects a list object

I have no idea why is this happen, and why it excpect a list. In the tutorial it doesn't look like the dataframe is list.

My end goal - to be able to create wandb table.


Solution

  • Short answer: table = wandb.Table(dataframe=my_df).

    The explanation of your specific case is at the bottom.


    Minimal example of using wandb.Table with a DataFrame:

    import wandb
    import pandas as pd
    
    iris_path = 'https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv'
    iris = pd.read_csv(iris_path)
    table = wandb.Table(dataframe=iris)
    wandb.log({'dataframe_in_table': table})
    

    (Here the dataset is called the Iris dataset that consists of "3 different types of irises’ (Setosa, Versicolour, and Virginica) petal and sepal length, stored in a 150x4 numpy.ndarray")

    There are two ways of creating W&B Tables according to the official documentation:

    • List of Rows: Log named columns and rows of data. For example: wandb.Table(columns=["a", "b", "c"], data=[["1a", "1b", "1c"], ["2a", "2b", "2c"]]) generates a table with two rows and three columns.
    • Pandas DataFrame: Log a DataFrame using wandb.Table(dataframe=my_df). Column names will be extracted from the DataFrame.

    Explanation: Why table = wandb.Table(my_df) gives error "columns argument expects a list object"? Because wandb.Table's init function looks like this:

    def __init__(
            self,
            columns=None,
            data=None,
            rows=None,
            dataframe=None,
            dtype=None,
            optional=True,
            allow_mixed_types=False,
        ):
    

    If one passes a DataFrame without telling it's a DataFrame, wandb.Table will assume the argument is columns.