I have a dataset called training which was read in an manipulated using Pandas. There are about 150 variables, so I put them in a list and I want to pass them to a ridge regression; however, I get an error saying "unhashable type: list"
It's likely that I'm missing something obvious as this is my first pass using python (used to R and Stata).
Here's the code:
# Variables to use (potentially) -- for dummies, one has already been taken out to avoid dummy var trap
continuous_vars = ['VehicleAge','VehOdo', 'MMRAcquisitionAuctionAveragePrice', 'MMRAcquisitionAuctionCleanPrice', 'MMRAcquisitionRetailAveragePrice', 'MMRAcquisitonRetailCleanPrice', 'MMRCurrentAuctionAveragePrice', 'MMRCurrentAuctionCleanPrice', 'MMRCurrentRetailAveragePrice', 'MMRCurrentRetailCleanPrice', 'VehBCost', 'IsOnlineSale', 'WarrantyCost', 'reliability_score', 'num_bought']
make_cats = ['BUICK', 'CADILLAC', 'CHEVROLET', 'CHRYSLER', 'DODGE', 'FORD', 'GMC', 'HONDA', 'HUMMER', 'HYUNDAI', 'INFINITI', 'ISUZU', 'JEEP', 'KIA', 'LEXUS', 'LINCOLN', 'MAZDA', 'MERCURY', 'MINI', 'MITSUBISHI', 'NISSAN', 'OLDSMOBILE', 'PLYMOUTH', 'PONTIAC', 'SATURN', 'SCION', 'SUBARU', 'SUZUKI', 'TOYOTA', 'TOYOTA SCION', 'VOLKSWAGEN', 'VOLVO']
state_cats = ['AR', 'AZ', 'CA', 'CO', 'FL', 'GA', 'IA', 'ID', 'IL', 'IN', 'KY', 'LA', 'MA', 'MD', 'MI', 'MN', 'MO', 'MS', 'NC', 'NE', 'NH', 'NJ', 'NM', 'NV', 'NY', 'OH', 'OK', 'OR', 'PA', 'SC', 'TN', 'TX', 'UT', 'VA', 'WA', 'WV']
auction_cats = ['ADESA', 'MANHEIM', 'OTHER']
trans_cats = ['AUTO']
color_cats = ['BEIGE', 'BLACK', 'BLUE', 'BROWN', 'GOLD', 'GREEN', 'GREY', 'MAROON', 'NOT AVAIL', 'ORANGE', 'OTHER', 'PURPLE', 'RED', 'SILVER', 'WHITE', 'YELLOW']
wheel_cats = ['Alloy', 'Covers', 'Special']
nat_cats = ['AMERICAN', 'OTHER', 'OTHER ASIAN', 'TOP LINE ASIAN']
size_cats =['COMPACT', 'CROSSOVER', 'LARGE', 'LARGE SUV', 'LARGE TRUCK', 'MEDIUM', 'MEDIUM SUV', 'SMALL SUV', 'SMALL TRUCK', 'SPECIALTY', 'SPORTS', 'VAN']
year_cats = ['2001', '2002', '2003', '2004', '2005', '2006', '2007', '2008', '2009', '2010_x']
all_vars = continuous_vars + make_cats + state_cats + auction_cats + trans_cats + color_cats + wheel_cats + nat_cats + size_cats + year_cats
hashable = all_vars
## Ridge Regression
ridge_reg = Ridge(alpha=1)
ridge_reg.fit(training[hashable], training['IsBadBuy'])
Update I've updated the code to reflect some suggestions. Here is the new error message:
File "C:\Anaconda\lib\site-packages\pandas\core\indexing.py", line 1068, in _convert_to_indexer
raise KeyError('%s not in index' % objarr[mask])
KeyError: "['reliability_score' 'num_bought' 'BUICK' 'CADILLAC' 'CHEVROLET' 'CHRYSLER'\n 'DODGE' 'FORD' 'GMC' 'HONDA' 'HUMMER' 'HYUNDAI' 'INFINITI' 'ISUZU' 'JEEP'\n 'KIA' 'LEXUS' 'LINCOLN' 'MAZDA' 'MERCURY' 'MINI' 'MITSUBISHI' 'NISSAN'\n 'OLDSMOBILE' 'PLYMOUTH' 'PONTIAC' 'SATURN' 'SCION' 'SUBARU' 'SUZUKI'\n 'TOYOTA' 'TOYOTA SCION' 'VOLKSWAGEN' 'VOLVO' 'AR' 'AZ' 'CA' 'CO' 'FL' 'GA'\n 'IA' 'ID' 'IL' 'IN' 'KY' 'LA' 'MA' 'MD' 'MI' 'MN' 'MO' 'MS' 'NC' 'NE' 'NH'\n 'NJ' 'NM' 'NV' 'NY' 'OH' 'OK' 'OR' 'PA' 'SC' 'TN' 'TX' 'UT' 'VA' 'WA' 'WV'\n 'ADESA' 'MANHEIM' 'OTHER' 'AUTO' 'BEIGE' 'BLACK' 'BLUE' 'BROWN' 'GOLD'\n 'GREEN' 'GREY' 'MAROON' 'NOT AVAIL' 'ORANGE' 'OTHER' 'PURPLE' 'RED'\n 'SILVER' 'WHITE' 'YELLOW' 'Alloy' 'Covers' 'Special' 'AMERICAN' 'OTHER'\n 'OTHER ASIAN' 'TOP LINE ASIAN' 'COMPACT' 'CROSSOVER' 'LARGE' 'LARGE SUV'\n 'LARGE TRUCK' 'MEDIUM' 'MEDIUM SUV' 'SMALL SUV' 'SMALL TRUCK' 'SPECIALTY'\n 'SPORTS' 'VAN' '2001' '2002' '2003' '2004' '2005' '2006' '2007' '2008'\n '2009' '2010_x'] not in index"
What you did was create a list of lists, you want to concatenate them all:
all_vars = continuous_vars+ make_cats+ state_cats+ auction_cats+ trans_cats+ color_cats+ wheel_cats+ nat_cats+ size_cats+ year_cats
This will then select the feature columns from pandas correctly
Compare the following:
In [84]:
[auction_cats, wheel_cats]
Out[84]:
[['ADESA', 'MANHEIM', 'OTHER'], ['Alloy', 'Covers', 'Special']]
In [85]:
auction_cats+wheel_cats
Out[85]:
['ADESA', 'MANHEIM', 'OTHER', 'Alloy', 'Covers', 'Special']