pythonpandastime-seriesdataframenan

Pandas: Remove NaN only at beginning and end of dataframe


I've got a pandas DataFrame that looks like this:

       sum
1948   NaN
1949   NaN
1950     5
1951     3
1952   NaN
1953     4
1954     8
1955   NaN

and I would like to cut off the NaNs at the beginning and at the end ONLY (i.e. only the values incl. NaN from 1950 to 1954 should remain). I already tried .isnull() and dropna(), but somehow I couldn't find a proper solution. Can anyone help?


Solution

  • Use the built in first_valid_index and last_valid_index they are designed specifically for this and slice your df:

    In [5]:
    
    first_idx = df.first_valid_index()
    last_idx = df.last_valid_index()
    print(first_idx, last_idx)
    df.loc[first_idx:last_idx]
    1950 1954
    Out[5]:
          sum
    1950    5
    1951    3
    1952  NaN
    1953    4
    1954    8