Search code examples
machine-learningartificial-intelligencedata-analysismissing-dataexploratory-data-analysis

How to fill missing values in categorical data?


I have a dataset of 20000 employees which has following three columns with missing values:

  1. Passing year of College
  2. College specialization
  3. Name of College

Now I have 10000 employees who never went to college. My final aim is to predict their salary.

How can I fill in missing values in this case.


Solution

  • Missing values can be dealt with number of ways, which way to follow depends on the kind of data you have.

    • Deleting the rows with missing values

      Rows with more number of column values as null could be dropped. (Again what is exactly more number depends on individual use case)

    • Imputing the missing vlaues with Mean / Median

      For the numerical Columns you can try replacing the missing values by taking Mean / Median of the column values.

    • Most frequent Values: Applicable to your Scenario

      This method is suitable for Categorical data which i assume is your case. You can try replacing missing vlaues in all three Columns with the most frequently occuring value in the given column.