Search code examples
pandasreshapesklearn-pandasnumpy-ndarray

How can I select a column from a DataFrame so that it has the shape (n, 1) instead of (n,)?


I am separating two columns of a data-frame to use as features and labels respectively:

X = bmi_life_data['BMI']
y = bmi_life_data['Life expectancy']

But I have to reshape the obtained uni-dimensional array ( with shape=(n,) ) to the shape (n, 1) in order for it to be accepted by the regression.fit() function:

X = X.values.reshape(len(X), 1)

Otherwise I get the error:

bmi_life_model = LinearRegression()
bmi_life_model.fit(X, y)

laos_life_exp = bmi_life_model.predict([[21.07931]])
 
>>>>
ValueError: Found arrays with inconsistent numbers of samples: [  1 163]
None

I also could reshape on the spot when defining X but its pretty much the same. I find this reshaping step tedious so I think there has to be a better way. I tried to find it but all I found was the difference between matrices and uni-dimensional arrays which was useful info but did not answer my question.


Solution

  • This should work:

    X = bmi_life_data[['BMI']]
    y = bmi_life_data[['Life expectancy']]