Search code examples
pythonpandasjupyter-notebookuser-warning

UserWarning: Boolean Series key will be reindexed to match DataFrame index


When using this statement this shows multiple warning in a single statements:

Internaldfdeny=pd.DataFrame({'Count':Internaldf[Internaldf['Status']=='deny'][Internaldf['SrcIP']!="NA"][Internaldf['DstIP']!="NA"][Internaldf['TimeStamp']-Internaldf['TimeStamp'].iloc[0]<pd.tslib.Timedelta(minutes=30)].groupby(['DstPort','SrcIP']).size()}).reset_index().pivot_table('Count',['DstPort'],'SrcIP').fillna(0).to_sparse(fill_value=0)

the warning comes out to be:

/home/lubuntu/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:1: UserWarning: Boolean Series key will be reindexed to match DataFrame index. """Entry point for launching an IPython kernel. /home/lubuntu/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:1: UserWarning: Boolean Series key will be reindexed to match DataFrame index. """Entry point for launching an IPython kernel. /home/lubuntu/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:1: FutureWarning: pandas.tslib is deprecated and will be removed in a future version. You can access Timedelta as pandas.Timedelta """Entry point for launching an IPython kernel. /home/lubuntu/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:1: UserWarning: Boolean Series key will be reindexed to match DataFrame index. """Entry point for launching an IPython kernel.

I couldn't find any other method of pivoting the table:

I had checked without to_sparse(0) but it still shows it! Is this an important warning? I have been neglecting it. I've been using Jupyter Notebook Python v3.6 Installed through anaconda if that at all is relevant.

Edit:

Internaldf.head() 

shows

                   TimeStamp          SrcIP          DstIP  DstPort Status
0 2018-03-31 03:48:13.731929  192.168.52.43  166.62.28.228       80  close
1 2018-03-31 03:48:13.749007  10.208.23.136    96.45.33.73     8888   deny
2 2018-03-31 03:48:13.799235    10.208.2.56   14.142.64.16     8081   deny
3 2018-03-31 03:48:13.799235  10.208.35.193  13.75.119.102      443  close
4 2018-03-31 03:48:13.799235    10.208.2.70   10.208.3.255      137   deny

Solution

  • I believe need:

    m1 = Internaldf['Status']=='deny'
    m2 = Internaldf['SrcIP']!="NA"
    #if want check non NaNs
    #m2 = Internaldf['SrcIP'].notnull()
    m3 = Internaldf['DstIP']!="NA"
    #if want check non NaNs
    #m3 = Internaldf['DstIP'].notnull()
    m4 = Internaldf['TimeStamp']-Internaldf['TimeStamp'].iloc[0] < pd.Timedelta(minutes=30)
    
    #chain condition with & for AND or by | for OR, for column use reset_index 
    df=Internaldf[m1 & m2 & m3 & m4].groupby(['DstPort','SrcIP']).size().reset_index(name='Count')
    
    Internaldfdeny=df.pivot_table('Count','DstPort','SrcIP').fillna(0).to_sparse(fill_value=0)
    print (Internaldfdeny)
    SrcIP    10.208.2.56  10.208.2.70  10.208.23.136
    DstPort                                         
    137              0.0          1.0            0.0
    8081             1.0          0.0            0.0
    8888             0.0          0.0            1.0