statsmodels library has
get_rdataset() method that can fetch various datasets. Where is the list of datasets that can be fetched? How do I use it to load datasets?
The documentation has no mention of which datasets are available. It merely says that
dataname: The name of the dataset you want to download is a required parameter but does not mention which datanames are possible anywhere.
A CSV containing meta information about all datasets may be found at https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/datasets.csv which is defined as variable
_get_dataset_meta() function in the
When this dataset is loaded, e.g. using pandas, its first 5 rows look like below.
import pandas as pd
datasets = pd.read_csv("https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/datasets.csv")
As the documentation shows, the first argument of
get_rdataset() is the dataname (recorded as Item in the meta dataset) and the second argument is the package name the dataset belongs to. So for example, the following retrieves the first dataset in the CSV (because the dataname is Affairs which is in the AER package).
import statsmodels.api as sm
df = sm.datasets.get_rdataset('Affairs', 'AER', cache=True).data
Thanks @Vitalizzare for pointing me to this repo.