I want to use the Adult dataset from the UCI ML Repo.
For this I'm following the "import in python" option in the page, which gives this code:
pip install ucimlrepo
from ucimlrepo import fetch_ucirepo
# fetch dataset
adult = fetch_ucirepo(id=2)
# data (as pandas dataframes)
X = adult.data.features
y = adult.data.targets
# metadata
print(adult.metadata)
# variable information
print(adult.variables)
But that raises the following error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/Users/.../playground.ipynb Cell 3 line 4
1 from ucimlrepo import fetch_ucirepo
3 # fetch the adult dataset
----> 4 adult = fetch_ucirepo(id=2)
6 # convert the dataset to a Pandas DataFrame
7 df = pd.DataFrame(adult.data.features, columns=adult.variables.names, missing_values=["?"])
File ~/anaconda3/envs/myEnv/lib/python3.11/site-packages/ucimlrepo/fetch.py:148, in fetch_ucirepo(name, id)
142 # alternative usage?:
143 # variables.age.role or variables.slope.description
144 # print(variables) -> json-like dict with keys [name] -> details
145
146 # make nested metadata fields accessible via dot notation
147 metadata['additional_info'] = dotdict(metadata['additional_info'])
--> 148 metadata['intro_paper'] = dotdict(metadata['intro_paper'])
150 # construct result object
151 result = {
152 'data': dotdict(data),
153 'metadata': dotdict(metadata),
154 'variables': variables
155 }
TypeError: 'NoneType' object is not iterable
I know that i can download this database and then load it as a pandas df but this is more dirty because then i need to do some extra parsing and its not looking good (First load adult.data and second add the headers from the specific lines in adult.names (after splitting everything after the ":" in each line...)
thanks.
I found that the simplest way to do it was to use the url as given in the github issues page:
url = "https://archive.ics.uci.edu/static/public/2/data.csv"
df = pd.read_csv(url)