Search code examples
pythonpandasdataframereindex

Error in slicing pandas DataFrame based on Missing column names


I have a pandas dataframe with multiple indexes and columns I want to slice this dataframe based on some column names, but sometimes the given column names are not in the dataframe. Pandas raises a warning to use .reindex instead of .loc But I found strange results. To clarify, let us load the dataFrame

import pandas as pd
d2 = pd.read_csv('https://docs.google.com/uc?id=1Ufx6pvnSC6zQdTAj05ObmV027fA4-Mr3&export=download', index_col=[0,1])
d2.head(3)

the result is:

..............................................
:          :      : ind475 : ind476 : ind456 :
:..........:......:........:........:........:
: Country  : Year :        :        :        :
: Argentin : 1966 :   6.15 :   7.77 : NaN    :
:          : 1967 :   8.33 :   9.81 : NaN    :
:          : 1968 :   9.19 :   10.2 : NaN    :
:..........:......:........:........:........:

If we slice using existing columns, then no problem:

indicators_list = ['ind475', 'ind456']
idx = pd.IndexSlice
d3 = d2.loc[idx[:,:], idx[indicators_list]]
d3.dropna(axis=0, how='all').dropna(axis=1, how='all').shape

Out>> (10006,2)

But if we slice with one or more missing columns, an Error is raised, but it works

indicators_list = ['ind475', 'ind179']
d4 = d2.loc[idx[:,:], idx[indicators_list]]
d4.dropna(axis=0, how='all').dropna(axis=1, how='all').shape

Out>> (2672, 1) with a red Warning

FutureWarning: 
Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
  return self._getitem_nested_tuple(tup)

I tried using reindex as suggested by the warning and as shown in this post, but the result is none!!

indicators_list = ['ind475', 'ind179']
d5 = d2.reindex(columns=[indicators_list])
d5.dropna(axis=0, how='all').dropna(axis=1, how='all').shape

Out:>> (0, 0)

How can I slice and get the proper size without warnings or errors?


Solution

  • I believe you need filter columns names with isin (and then if necessary remove NaNs columns):

    indicators_list = ['ind475', 'ind179']
    print (df2.loc[:, df2.columns.isin(indicators_list)])
    

    Or:

    print (df2[df2.columns[df2.columns.isin(indicators_list)]])
    

    If working with multiindex use get_level_values:

    print (df2.loc[:, df2.columns.get_level_values(0).isin(indicators_list)])