I have a weird column named null
in a dataframe that contains some missing values from other columns. One column is lat-lon coordinates named location
, the other is an integer representing a target variable named level
. In some but not all of the cases where location
or level
have missing values, the values that should be there are in this null
column. Here's an example df:
pd.DataFrame(
{'null': {0: '43.70477575,-72.28844073', 1: '2', 2: '43.70637091,-72.28704334', 3: '4', 4: '3'},
'location': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan},
'level': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}
}
)
I need to be able to filter the null
column according to whether the value is an integer or a string, and then based on that fill the missing value in the appropriate column with the appropriate value. I've tried using .apply()
with a lambda funcation as well as .match()
, .contains()
and in
inside of a for loop with no luck so far.
The easiest, if not the simplest approach, is to simply fill all the missing values in df.location
and df.level
with the values in df.null
, then create a boolean filter with regex to return innappropriate/misassigned values in df.location
and df.level
to np.nan
.
df = pd.DataFrame(
{'null': {0: '43.70477575,-72.28844073', 1: '2', 2: '43.70637091,-72.28704334', 3: '4', 4: '3'},
'location': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan},
'level': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}
}
)
for col in ['location', 'level']:
df[col].fillna(
value = stress.null,
inplace = True
)
Now we'll use string expressions to correct the mis-assigned values.
# Converting columns to type str so string methods work
df = df.astype(str)
# Using regex to change values that don't belong in column to NaN
regex = '[,]'
df.loc[df.level.str.contains(regex), 'level'] = np.nan
regex = '^\d\.?0?$'
df.loc[df.location.str.contains(regex), 'location'] = np.nan
# Returning `df.level` to float datatype (str is the correct
# datatype for `df.location`
df.level.astype(float)
Here's the output:
pd.DataFrame(
{'null': {0: '43.70477575,-72.28844073', 1: '2', 2: '43.70637091,-72.28704334', 3: '4', 4: '3'},
'location': {0: '43.70477575,-72.28844073', 1: nan, 2: '43.70637091,-72.28704334', 3: nan, 4: nan},
'level': {0: nan, 1: '2', 2: nan, 3: '4', 4: '3'}
}
)