Search code examples
pythonboxplotseaborn

Seaborn load_dataset


I am trying to get a grouped boxplot working using Seaborn as per the example

I can get the above example working, however the line:

tips = sns.load_dataset("tips")

is not explained at all. I have located the tips.csv file, but I can't seem to find adequate documentation on what load_dataset specifically does. I tried to create my own csv and load this, but to no avail. I also renamed the tips file and it still worked...

My question is thus:

Where is load_dataset actually looking for files? Can I actually use this for my own boxplots?

EDIT: I managed to get my own boxplots working using my own DataFrame, but I am still wondering whether load_dataset is used for anything more than mysterious tutorial examples.


Solution

  • load_dataset looks for online csv files on https://github.com/mwaskom/seaborn-data. Here's the docstring:

    Load a dataset from the online repository (requires internet).

    Parameters


    name : str Name of the dataset (name.csv on https://github.com/mwaskom/seaborn-data). You can obtain list of available datasets using :func:get_dataset_names

    kws : dict, optional Passed to pandas.read_csv

    If you want to modify that online dataset or bring in your own data, you likely have to use pandas. load_dataset actually returns a pandas DataFrame object, which you can confirm with type(tips).

    If you already created your own data in a csv file called, say, tips2.csv, and saved it in the same location as your script, use this (after installing pandas) to load it in:

    import pandas as pd
    
    tips2 = pd.read_csv('tips2.csv')