Search code examples
pythonpandascsvgithubseaborn

How does "load_dataset" work, as it is not detecting example files?


I´m learning about the seaborn package in Datacamp and one of the files that include in the documentation is "tips.csv", which belongs to the repository [https://github.com/mwaskom/seaborn-data][1] that I have found in other posts.

I´m trying to load the dataset "tips.csv" on VSCode but it raises the following error:

ValueError: 'tips.csv' is not one of the example datasets.```


  [1]: https://github.com/mwaskom/seaborn-data

> How can I solve this, as it is available in the link provided? I think
> it is not detecting the website, but I have internet connection.


Solution

  • I suppose that you got confused by the docs :

    Parameters

    name str:
    Name of the dataset ({name}.csv on https://github.com/mwaskom/seaborn-data).

    The dataset_names actually don't have a suffix. So, you should write tips and not tips.csv :

    print(sns.get_dataset_names())
    
    [
        'anagrams',
        'anscombe',
        'attention',
        ...
        'taxis',
        'tips', # <-- here is yours
        'titanic'
    ]
    
    df = sns.load_dataset("tips")
    

    Output :

    print(df)
    
         total_bill  tip     sex smoker   day    time  size
    0         16.99 1.01  Female     No   Sun  Dinner     2
    1         10.34 1.66    Male     No   Sun  Dinner     3
    2         21.01 3.50    Male     No   Sun  Dinner     3
    3         23.68 3.31    Male     No   Sun  Dinner     2
    ..          ...  ...     ...    ...   ...     ...   ...
    240       27.18 2.00  Female    Yes   Sat  Dinner     2
    241       22.67 2.00    Male    Yes   Sat  Dinner     2
    242       17.82 1.75    Male     No   Sat  Dinner     2
    243       18.78 3.00  Female     No  Thur  Dinner     2
    
    [244 rows x 7 columns]