Search code examples
pythonpandasdata-cleaning

Different Data Name Output


I want to count the highest age of diabetes in this dataframe. Where the expected output of this code is like this:

age
25    14
31    13
41    13
29    13
43    11
22    11
28    10
33    10
38    10
36    10
Name: age, dtype: int64

However when I run it with this command:

(data_clean['age'].where(data_clean['class'] == 'Diabetes')).value_counts().head(10)

The output produced is like this:

age
25.0    14
31.0    13
41.0    13
29.0    13
43.0    11
22.0    11
28.0    10
33.0    10
38.0    10
36.0    10
Name: count, dtype: int64

Here's the csv file I used in this case: CSV file link

The resulting output index is float, while the expected output index should be integer. And the output name is count, while the expected output name should be age. Do you have any suggestions about it? I appreciate any help you can give me. Thank you


Solution

  • Don't use where which will convert the non Diabetes data to NaN and thus to float, instead perform boolean indexing to only select the valid rows:

    out = (data_clean
            .loc[data_clean['class'] == 'Diabetes', 'age']
            .value_counts().head(10)
          )