Search code examples
pythondatasettypeerrordata-analysis

TypeError: 'NoneType' object is not iterable when using ucimlrepo


I want to use the Adult dataset from the UCI ML Repo.

For this I'm following the "import in python" option in the page, which gives this code:

pip install ucimlrepo

from ucimlrepo import fetch_ucirepo 
  
# fetch dataset 
adult = fetch_ucirepo(id=2) 
  
# data (as pandas dataframes) 
X = adult.data.features 
y = adult.data.targets 
  
# metadata 
print(adult.metadata) 
  
# variable information 
print(adult.variables) 

But that raises the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/Users/.../playground.ipynb Cell 3 line 4
      1 from ucimlrepo import fetch_ucirepo
      3 # fetch the adult dataset
----> 4 adult = fetch_ucirepo(id=2)
      6 # convert the dataset to a Pandas DataFrame
      7 df = pd.DataFrame(adult.data.features, columns=adult.variables.names, missing_values=["?"])

File ~/anaconda3/envs/myEnv/lib/python3.11/site-packages/ucimlrepo/fetch.py:148, in fetch_ucirepo(name, id)
    142 # alternative usage?: 
    143 # variables.age.role or variables.slope.description
    144 # print(variables) -> json-like dict with keys [name] -> details
    145 
    146 # make nested metadata fields accessible via dot notation
    147 metadata['additional_info'] = dotdict(metadata['additional_info'])
--> 148 metadata['intro_paper'] = dotdict(metadata['intro_paper'])
    150 # construct result object
    151 result = {
    152     'data': dotdict(data),
    153     'metadata': dotdict(metadata),
    154     'variables': variables
    155 }

TypeError: 'NoneType' object is not iterable

I know that i can download this database and then load it as a pandas df but this is more dirty because then i need to do some extra parsing and its not looking good (First load adult.data and second add the headers from the specific lines in adult.names (after splitting everything after the ":" in each line...)


Solution

  • thanks.

    I found that the simplest way to do it was to use the url as given in the github issues page:

    url = "https://archive.ics.uci.edu/static/public/2/data.csv"
    df = pd.read_csv(url)