Search code examples
pythonpandasexploratory-data-analysis

How to fill string missing values in specific column with condition based on row after in python?


I'd like to fill string missing values in one column with condition based on row after in python. enter image description here

The missing values should have different value with the row after (if we see it with timely perspective), so it should be connect -> disconnect -> connect and so on. So in this case here I would like to have the Monitoring Desc missing values to be filled with "disconnect", as the row after is "connect".

I've tried: df['Monitoring Desc'] = df['Monitoring Desc'].fillna(np.where(df['Monitoring Desc'].shift(1) == "ws_connect", "ws_disconnect", "ws_connect"))

but showed error message: TypeError: "value" parameter must be a scalar, dict or Series, but you passed a "ndarray"

I'm not do this kind of analysis in python, so I can't think of anything else. Maybe there is something to do with loop or function to fill these missing values? Any help would be appreciated.


Solution

  • Code

    plz provide input as text table not image.

    mapping reverse and ffill

    m = {'ws_connect':'ws_disconnect', 'ws_disconnect':'ws_connect'}
    df['Monitoring Desc'] = df['Monitoring Desc'].fillna(df['Monitoring Desc'].map(m).ffill())
    

    df

            Monitoring Desc datetime
    3956    ws_disconnect   2023-11-01 21:00
    3955    ws_disconnect   2023-11-01 20:00
    1336    ws_disconnect   2023-11-01 19:58
    3954    ws_connect      2023-11-01 19:00
    1363    ws_connect      2023-11-01 18:00
    1364    ws_disconnect   2023-11-01 18:00
    1365    ws_connect      2023-11-01 18:00
    3953    ws_disconnect   2023-11-01 18:00
    3952    ws_disconnect   2023-11-01 17:00
    3951    ws_disconnect   2023-11-01 16:00
    3950    ws_disconnect   2023-11-01 15:00
    3949    ws_disconnect   2023-11-01 14:00
    

    Example Code

    import pandas as pd
    data1 = {'Monitoring Desc': {3956: 'ws_disconnect', 3955: 'ws_disconnect', 1336: 'ws_disconnect', 
                                 3954: 'ws_connect', 1363: 'ws_connect', 1364: 'ws_disconnect', 
                                 1365: 'ws_connect', 3953: 'ws_disconnect', 3952: 'ws_disconnect', 
                                 3951: 'ws_disconnect', 3950: 'ws_disconnect', 3949: 'ws_disconnect'}, 
             'datetime': {3956: '2023-11-01 21:00', 3955: '2023-11-01 20:00', 1336: '2023-11-01 19:58', 
                          3954: '2023-11-01 19:00', 1363: '2023-11-01 18:00', 1364: '2023-11-01 18:00', 
                          1365: '2023-11-01 18:00', 3953: '2023-11-01 18:00', 3952: '2023-11-01 17:00', 
                          3951: '2023-11-01 16:00', 3950: '2023-11-01 15:00', 3949: '2023-11-01 14:00'}}
    df = pd.DataFrame(data1)