Search code examples
pythonpandasscikit-learnmodeling

Passing a list of vars to a ridge.fit() -- unhashable type?


I have a dataset called training which was read in an manipulated using Pandas. There are about 150 variables, so I put them in a list and I want to pass them to a ridge regression; however, I get an error saying "unhashable type: list"

It's likely that I'm missing something obvious as this is my first pass using python (used to R and Stata).

Here's the code:

  # Variables to use (potentially) -- for dummies, one has already been taken out to avoid dummy var trap
continuous_vars = ['VehicleAge','VehOdo', 'MMRAcquisitionAuctionAveragePrice', 'MMRAcquisitionAuctionCleanPrice', 'MMRAcquisitionRetailAveragePrice', 'MMRAcquisitonRetailCleanPrice', 'MMRCurrentAuctionAveragePrice', 'MMRCurrentAuctionCleanPrice', 'MMRCurrentRetailAveragePrice', 'MMRCurrentRetailCleanPrice', 'VehBCost', 'IsOnlineSale', 'WarrantyCost', 'reliability_score', 'num_bought']
make_cats = ['BUICK', 'CADILLAC', 'CHEVROLET', 'CHRYSLER', 'DODGE', 'FORD', 'GMC', 'HONDA', 'HUMMER', 'HYUNDAI', 'INFINITI', 'ISUZU', 'JEEP', 'KIA', 'LEXUS', 'LINCOLN', 'MAZDA', 'MERCURY', 'MINI', 'MITSUBISHI', 'NISSAN', 'OLDSMOBILE', 'PLYMOUTH', 'PONTIAC', 'SATURN', 'SCION', 'SUBARU', 'SUZUKI', 'TOYOTA', 'TOYOTA SCION', 'VOLKSWAGEN', 'VOLVO']
state_cats = ['AR', 'AZ', 'CA', 'CO', 'FL', 'GA', 'IA', 'ID', 'IL', 'IN', 'KY', 'LA', 'MA', 'MD', 'MI', 'MN', 'MO', 'MS', 'NC', 'NE', 'NH', 'NJ', 'NM', 'NV', 'NY', 'OH', 'OK', 'OR', 'PA', 'SC', 'TN', 'TX', 'UT', 'VA', 'WA', 'WV']
auction_cats = ['ADESA', 'MANHEIM', 'OTHER']
trans_cats = ['AUTO']
color_cats = ['BEIGE', 'BLACK', 'BLUE', 'BROWN', 'GOLD', 'GREEN', 'GREY', 'MAROON', 'NOT AVAIL', 'ORANGE', 'OTHER', 'PURPLE', 'RED', 'SILVER', 'WHITE', 'YELLOW']
wheel_cats = ['Alloy', 'Covers', 'Special']
nat_cats = ['AMERICAN', 'OTHER', 'OTHER ASIAN', 'TOP LINE ASIAN']
size_cats =['COMPACT', 'CROSSOVER', 'LARGE', 'LARGE SUV', 'LARGE TRUCK', 'MEDIUM', 'MEDIUM SUV', 'SMALL SUV', 'SMALL TRUCK', 'SPECIALTY', 'SPORTS', 'VAN']
year_cats = ['2001', '2002', '2003', '2004', '2005', '2006', '2007', '2008', '2009', '2010_x']

all_vars = continuous_vars + make_cats + state_cats + auction_cats + trans_cats + color_cats + wheel_cats + nat_cats + size_cats + year_cats
hashable = all_vars
## Ridge Regression 
ridge_reg = Ridge(alpha=1)

ridge_reg.fit(training[hashable], training['IsBadBuy'])

Update I've updated the code to reflect some suggestions. Here is the new error message:

  File "C:\Anaconda\lib\site-packages\pandas\core\indexing.py", line 1068, in _convert_to_indexer
    raise KeyError('%s not in index' % objarr[mask])
KeyError: "['reliability_score' 'num_bought' 'BUICK' 'CADILLAC' 'CHEVROLET' 'CHRYSLER'\n 'DODGE' 'FORD' 'GMC' 'HONDA' 'HUMMER' 'HYUNDAI' 'INFINITI' 'ISUZU' 'JEEP'\n 'KIA' 'LEXUS' 'LINCOLN' 'MAZDA' 'MERCURY' 'MINI' 'MITSUBISHI' 'NISSAN'\n 'OLDSMOBILE' 'PLYMOUTH' 'PONTIAC' 'SATURN' 'SCION' 'SUBARU' 'SUZUKI'\n 'TOYOTA' 'TOYOTA SCION' 'VOLKSWAGEN' 'VOLVO' 'AR' 'AZ' 'CA' 'CO' 'FL' 'GA'\n 'IA' 'ID' 'IL' 'IN' 'KY' 'LA' 'MA' 'MD' 'MI' 'MN' 'MO' 'MS' 'NC' 'NE' 'NH'\n 'NJ' 'NM' 'NV' 'NY' 'OH' 'OK' 'OR' 'PA' 'SC' 'TN' 'TX' 'UT' 'VA' 'WA' 'WV'\n 'ADESA' 'MANHEIM' 'OTHER' 'AUTO' 'BEIGE' 'BLACK' 'BLUE' 'BROWN' 'GOLD'\n 'GREEN' 'GREY' 'MAROON' 'NOT AVAIL' 'ORANGE' 'OTHER' 'PURPLE' 'RED'\n 'SILVER' 'WHITE' 'YELLOW' 'Alloy' 'Covers' 'Special' 'AMERICAN' 'OTHER'\n 'OTHER ASIAN' 'TOP LINE ASIAN' 'COMPACT' 'CROSSOVER' 'LARGE' 'LARGE SUV'\n 'LARGE TRUCK' 'MEDIUM' 'MEDIUM SUV' 'SMALL SUV' 'SMALL TRUCK' 'SPECIALTY'\n 'SPORTS' 'VAN' '2001' '2002' '2003' '2004' '2005' '2006' '2007' '2008'\n '2009' '2010_x'] not in index"

Solution

  • What you did was create a list of lists, you want to concatenate them all:

    all_vars = continuous_vars+ make_cats+ state_cats+ auction_cats+ trans_cats+ color_cats+ wheel_cats+ nat_cats+ size_cats+ year_cats
    

    This will then select the feature columns from pandas correctly

    Compare the following:

    In [84]:
    
    [auction_cats, wheel_cats]
    Out[84]:
    [['ADESA', 'MANHEIM', 'OTHER'], ['Alloy', 'Covers', 'Special']]
    In [85]:
    
    auction_cats+wheel_cats
    Out[85]:
    ['ADESA', 'MANHEIM', 'OTHER', 'Alloy', 'Covers', 'Special']