Data standardization of feat having lt/gt values among absolute values

One of the datasets I am dealing with has few features which have lt/gt values along with absolute values. Please refer to an example below -

>>> df = pd.DataFrame(['<10', '23', '34', '22', '>90', '42'], columns=['foo'])
>>> df
   foo
0  <10
1   23
2   34
3   22
4  >90
5   42

note - foo is % value. ie 0 <= foo <= 100

How are such data transformed to run regression models on?

Solution

One thing you could do is, for values <10, impute the median value (5). Similarly, for those >90, impute 95.

Then add two extra boolean columns:

df = pd.DataFrame(['<10', '23', '34', '22', '>90', '42'], columns=['foo'])
dummies = pd.get_dummies(df, columns=['foo'])[['foo_<10', 'foo_>90']]
df = df.replace('<10', 5).replace('>90', 95)
df = pd.concat([df, dummies], axis=1)
df

This will give you

  foo  foo_<10  foo_>90
0   5        1        0
1  23        0        0
2  34        0        0
3  22        0        0
4  95        0        1
5  42        0        0