I have column (Column1) in dataframe that contain information about work centers and values that I need to assign names of these work centers in empty column (Column2), example:
Input:
Column1: | Column2: |
---|---|
WC1 | NaN |
123 | NaN |
456 | NaN |
789 | NaN |
abc | NaN |
fgh | NaN |
WC2 | NaN |
123 | NaN |
abc | NaN |
123 | NaN |
Output:
Column1: | Column2: |
---|---|
WC1 | WC1 |
123 | WC1 |
456 | WC1 |
789 | WC1 |
abc | WC1 |
fgh | WC1 |
WC2 | WC2 |
123 | WC2 |
abc | WC2 |
123 | WC2 |
I tried many things and read famous DON'T https://stackoverflow.com/a/55557758 but still can't understand what should I use because I want loop to go through every row one by one to assign WC1 only to top values and when loop will find WC2 it will stop assigning WC1 to values
Use Series.where
with filter rows with WC
value by Series.str.contains
and forward non missing values by Series.ffill
:
df['Column2'] = df['Column1'].where(df['Column1'].str.contains('WC')).ffill()
print (df)
Column1 Column2
0 WC1 WC1
1 123 WC1
2 456 WC1
3 789 WC1
4 abc WC1
5 fgh WC1
6 WC2 WC2
7 123 WC2
8 abc WC2
9 123 WC2
How it working:
print (df['Column1'].str.contains('WC'))
0 True
1 False
2 False
3 False
4 False
5 False
6 True
7 False
8 False
9 False
Name: Column1, dtype: bool
print (df['Column1'].where(df['Column1'].str.contains('WC')))
0 WC1
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 WC2
7 NaN
8 NaN
9 NaN
Name: Column1, dtype: object
print (df['Column1'].where(df['Column1'].str.contains('WC')).ffill())
0 WC1
1 WC1
2 WC1
3 WC1
4 WC1
5 WC1
6 WC2
7 WC2
8 WC2
9 WC2
Name: Column1, dtype: object