Search code examples
pandasmultiple-columnskaggledata-preprocessing

How do I access the data from one column and make changes against another column in Pandas?


So I am trying to preprocess the 911.csv dataset from Kaggle (911 Calls) and I found that there are missing values in the Zip code (zip) column. I preprocessed the dataset a bit and found out what Township (twp) has missing zip values: 6 towns, to be specific.

The idea is to take the column Township, and corresponding to the towns that has 'nan' value, I would like to assign their respective zip code in the zip column.

It sounds simple but I've been banging my head against the wall over this a couple of hours now.

Please help. Thank you in advance!


Solution

  • You can groupby twp column and fillna in zip with value.

    df["zip"] = df["zip"].fillna(df.groupby("twp")["zip"].transform("first"))