Search code examples
pythonpandasdataframesubsetindices

Subset pandas df using concatenation of column indices slices


I have a large dataframe that I am trying to subset using only column indices. I am using the following code:

df = df.ix[:, [3,21:28,30:34,36:57,61:64,67:]]

The code is pretty self explanatory. I am trying to subset the df by keeping columns 3, 21 through 28 and so on. However, I am getting the following error:

  File "<ipython-input-44-3108b602b220>", line 1
  df = df.ix[:, [3,21:28,30:34,36:57,61:64,67:]]
                     ^
  SyntaxError: invalid syntax

What am I missing?


Solution

  • Use numpy.r_[...]:

    df = df.iloc[:, np.r_[3,21:28,30:34,36:57,61:64,67:df.shape[1]]]
    

    Demo:

    In [39]: df = pd.DataFrame(np.random.randint(5, size=(2, 100)))
    
    In [40]: df
    Out[40]:
       0   1   2   3   4   5   6   7   8   9  ...  90  91  92  93  94  95  96  97  98  99
    0   3   1   0   3   2   4   1   2   1   3 ...   2   1   4   2   1   2   1   3   3   4
    1   0   2   4   1   1   1   0   0   3   4 ...   4   4   0   3   2   3   0   2   0   1
    
    [2 rows x 100 columns]
    
    In [41]: df.iloc[:, np.r_[3,21:28,30:34,36:57,61:64,67:df.shape[1]]]
    Out[41]:
       3   21  22  23  24  25  26  27  30  31 ...  90  91  92  93  94  95  96  97  98  99
    0   3   4   1   2   0   3   0   3   2   2 ...   2   1   4   2   1   2   1   3   3   4
    1   1   1   0   2   1   4   4   4   1   3 ...   4   4   0   3   2   3   0   2   0   1
    
    [2 rows x 69 columns]