Search code examples
pythonkaggle

Can't convert float to int in python DataFrame/Array


I'm new to both Kaggle and Python and can't figure out how to convert this data set. For anyone familiar, I'm trying to reproduce the gender based solution for the Titanic tutorial.

I have:

submission = pd.DataFrame({'PassengerId' : test_data.PassengerId, 'Survived' : final_prediction})
print(submission.head())

Which gives me:

PassengerId Survived 0 892 0.184130 1 893 0.761143 2 894 0.184130 3 895 0.184130 4 896 0.761143

Which I need to convert to:

PassengerId Survived 0 892 0 1 893 1 2 894 0 3 895 0 4 896 1

Again, not really knowing Python, I have tried some solutions like:

for x in np.nditer(final_prediction, op_flags=['readwrite']):
    x[...]=(1 if x[...] >= 0.50 else 0)

Which gives me floating point like: (and still shows in CSV file as 0.0, 1.0)

PassengerId Survived 0 892 0. 1 893 1.

And:

rounded_prediction = np.rint(final_prediction)

Gives me the same (i.e. 0., 1.)

The following:

int_prediction = final_prediction.astype(int)

Gives me all 0's

Any ideas? Thanks!


Solution

  • So first of all, try and keep in mind that you want to use as many vectorized operations as possible because this will speed up your code! Always important. So instead of looping through, pandas has an amazing way of doing this.

    submission['Survived'] = submission['Survived'].astype(int)
    

    Do note that this will truncate values so in your case you might want to say:

    submission['Survived][:] += 0.5 before performing the above which will ensure values of 0.5 to be 1 when you convert to int and values below that to truncate to 0.

    Changing of the dtype (types of columns can be found with df.dtypes) is thus done with the function pd.astype()

    Might be another way of stating literally that it should be rounded up/down but with this simple data manipulation it should work ;)