Search code examples
pythonscikit-learngraphlabsklearn-pandas

having trouble to write corresponding code using sklearn with respect to graphlab create mainly unable to plot properly


Finding very much trouble to plot a graph for crimerate vs houseprice. with graphlab lib it is easy to do but using sklearn i am unable to do it. here is my code w.r.t sklearn

import sklearn
import sframe
from sframe import SFrame
import pandas as pd


# #Load some house value vs. crime rate data
# 
# Dataset is from Philadelphia, PA and includes average house sales price in a number of neighborhoods.  The attributes of each neighborhood we have include the crime rate ('CrimeRate'), miles from Center City ('MilesPhila'), town name ('Name'), and county name ('County').


sales = pd.read_csv('Philadelphia_Crime_Rate_noNA.csv')


sales[:2]


# #Exploring the data 

# The house price in a town is correlated with the crime rate of that town. Low crime towns tend to be associated with higher house prices and vice versa.

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

plt.scatter(x=sales['CrimeRate'], y=sales['HousePrice'])

crime_model =sklearn.linear_model.LinearRegression()


# #Let's see what our fit looks like

X=sales.drop(['Name','County'],axis=1)
X=X.dropna()

crime_rate=X['CrimeRate']
price=X['HousePrice']

crime_model.fit(crime_rate.reshape(-1,1),X.HousePrice)

plt.plot(X['CrimeRate'],X['HousePrice'],'.',
    X['CrimeRate'],crime_model.predict(X),'-')

The output which i am getting using sklearn environment(not proper)

The output which i am looking for is It could be done using the Graphlab create environment

This is the complete code which runs properly with graphlab create

import graphlab

sales = graphlab.SFrame.read_csv('Philadelphia_Crime_Rate_noNA.csv')

sales

graphlab.canvas.set_target('ipynb')
sales.show(view="Scatter Plot", x="CrimeRate", y="HousePrice")

crime_model = graphlab.linear_regression.create(sales, target='HousePrice', features=['CrimeRate'],validation_set=None,verbose=False)

import matplotlib.pyplot as plt
%matplotlib inline


# In[25]:

plt.plot(sales['CrimeRate'],sales['HousePrice'],'.',
        sales['CrimeRate'],crime_model.predict(sales),'-')


# Above: blue dots are original data, green line is the fit from the simple regression.

Hope someone can point my mistake. Thanks.

This is the Dataset

HousePrice  HsPrc CrimeRate MilesPhila  PopChg  Name    County
140463  14.0463 29.7    10  -1  Abington    Montgome
113033  11.3033 24.1    18  4   Ambler  Montgome
124186  12.4186 19.5    25  8   Aston   Delaware
110490  11.049  49.4    25  2.7 Bensalem    Bucks
79124   7.9124  54.1    19  3.9 Bristol B.  Bucks
92634   9.2634  48.6    20  0.6 Bristol T.  Bucks
89246   8.9246  30.8    15  -2.6    Brookhaven  Delaware
195145  19.5145 10.8    20  -3.5    Bryn Athyn  Montgome
297342  29.7342 20.2    14  0.6 Bryn Mawr   Montgome
264298  26.4298 20.4    26  6   Buckingham  Bucks
134342  13.4342 17.3    31  4.2 Chalfont    Bucks
147600  14.76   50.3    9   -1  Cheltenham  Montgome
77370   7.737   34.2    10  -1.2    Clifton Delaware
170822  17.0822 33.7    32  2.4 Collegeville    Montgome
40642   4.0642  45.7    15  0   Darby Bor.  Delaware
71359   7.1359  22.3    8   1.6 Darby Town  Delaware
104923  10.4923 48.1    21  6.9 Downingtown Chester
190317  19.0317 19.4    26  1.9 Doylestown  Bucks
215512  21.5512 71.9    26  5.8 E. Bradford Chester
178105  17.8105 45.1    25  2.3 E. Goshen   Chester
131025  13.1025 31.3    19  -1.8    E. Norriton Montgome
149844  14.9844 24.9    22  6.4 E. Pikeland Chester
170556  17.0556 27.2    30  4.6 E. Whiteland    Chester
280969  28.0969 17.7    14  2.9 Easttown    Chester
114233  11.4233 29  30  1.3 Falls Town  Bucks
74502   7.4502  21.4    15  -3.2    Follcroft   Delaware
475112  47.5112 28.6    12      Gladwyne    Montgome
97167   9.7167  29.3    10  0.2 Glenolden   Delaware
114572  11.4572 17.5    20  5.2 Hatboro Montgome
436348  43.6348 16.5    10  -0.7    Haverford   Delaware
389302  38.9302 17.8    20  1.5 Horsham Montgome
122392  12.2392 17.3    10  1.9 Jenkintown  Montgome
130436  13.0436 31.2    17  -0.4    L Southampton   Delaware
272790  27.279  14.5    20  -5.1    L. Gwynedd  Montgome
194435  19.4435 15.7    32  15  L. Makefield    Bucks
299621  29.9621 28.6    10  1.4 L. Merion   Montgome
210884  21.0884 20.8    20  0.1 L. Moreland Montgome
112471  11.2471 29.3    35  3.4 Lansdale    Montgome
93738   9.3738  19.3    7   -0.4    Lansdown    Delaware
121024  12.1024 39.5    35  26.9    Limerick    Montgome
156035  15.6035 13  23  6.3 Malvern Chester
185404  18.5404 24.1    10  0.9 Marple  Delaware
126160  12.616  38  20  -2.4    Media   Delaware
143072  14.3072 40.1    23  1.6 Middletown  Bucks
96769   9.6769  36.1    15  5.1 Morrisville Bucks
94014   9.4014  26.6    14  0.5 Morton  Delaware
118214  11.8214 25.1    25  5.7 N. Wales    Montgome
157446  15.7446 14.6    15  3.1 Narberth    Montgome
150283  15.0283 18.2    15  0.9 Nether  Delaware
153842  15.3842 15.3    23  8.5 Newtown Bucks
197214  19.7214 15.2    25  2.1 Newtown B.  Bucks
206127  20.6127 17.4    15  2.7 Newtown T.  Delaware
71981   7.1981  73.3    19  4.9 Norristown  Montgome
169401  16.9401 7.1 22  1.5 Northampton Bucks
99843   9.9843  12.5    12  -3.7    Norwood Delaware
60000   6   45.8    18  -1.4    Phila, Far NE   Phila
28000   2.8 44.9    5.5 -8.4    Phila, N    Phila
60000   6   65  9   -4.9    Phila, NE   Phila
61800   6.18    49.9    9   -6.4    Phila, NW   Phila
38000   3.8 54.8    4.5 -5.1    Phila, SW   Phila
38000   3.8 53.5    2   -9.2    Phila, South    Phila
42000   4.2 69.9    4   -5.7    Phila, West Phila
96200   9.62    366.1   0   4.8 Phila,CC    Phila
103087  10.3087 24.6    24  3.9 Phoenixville    Chester
147720  14.772  58.6    25  1.5 Plymouth    Montgome
78175   7.8175  53.2    41  2.2 Pottstown   Montgome
92215   9.2215  17.4    14  7.8 Prospect Park   Delaware
271804  27.1804 15.5    17  1.2 Radnor  Delaware
119566  11.9566 14.5    12  -2.9    Ridley Park Delaware
100231  10.0231 24.1    15  1.9 Ridley Town Delaware
95831   9.5831  21.2    32  3.2 Royersford  Montgome
229711  22.9711 9.8 22  5.3 Schuylkill  Chester
74308   7.4308  29.9    7   1.8 Sharon Hill Delaware
259506  25.9506 7.2 40  17.4    Solebury    Bucks
159573  15.9573 19.4    15  -2.1    Springfield Montgome
147176  14.7176 41.1    12  -1.7    Springfield Delaware
205732  20.5732 11.2    12  -0.2    Swarthmore  Delaware
215783  21.5783 21.2    20  1.1 Tredyffin   Chester
116710  11.671  42.8    20  12.9    U. Chichester   Delaware
359112  35.9112 9.4 36  4   U. Makefield    Bucks
189959  18.9959 61.7    22  -2.1    U. Merion   Montgome
133198  13.3198 19.4    22  -2  U. Moreland Montgome
242821  24.2821 6.6 21  1.6 U. Providence   Delaware
142811  14.2811 15.9    20  -1.6    U. Southampton  Bucks
200498  20.0498 18.8    36  11  U. Uwchlan  Chester
199065  19.9065 13.2    20  7.8 Upper Darby Montgome
93648   9.3648  34.5    8   -0.7    Upper Darby Delaware
163001  16.3001 22.1    50  8   Uwchlan T.  Chester
436348  43.6348 22.1    15  1.3 Villanova   Montgome
124478  12.4478 71.9    22  4.6 W. Chester  Chester
168276  16.8276 31.9    26  5.9 W. Goshen   Chester
114157  11.4157 44.6    38  14.6    W. Whiteland    Chester
130088  13.0088 28.6    19  -0.2    Warminster  Bucks
152624  15.2624 24  19  23.1    Warrington  Bucks
174232  17.4232 13.8    25  4.7 Westtown    Chester
196515  19.6515 29.9    16  1.8 Whitemarsh  Montgome
232714  23.2714 9.9 21  0.2 Willistown  Chester
245920  24.592  22.6    10  0.3 Wynnewood   Montgome
130953  13.0953 13  24  5.2 Yardley Bucks

Solution

  • plt.plot(X['CrimeRate'],X['HousePrice'],'.', X['CrimeRate'],crime_model.predict(X),'-')

    I made a mistake above i am suppose to give input as X['CrimeRate'] for predict but i have given (X) so i replaced with X['CrimeRate'] and now it is working properly.

    Proper one is

    plt.plot(X['CrimeRate'],X['HousePrice'],'.',
        X['CrimeRate'],crime_model.predict(X['CrimeRate']),'-')