Search code examples
pythonpandasnumpy-slicing

Slice Non-Contiguous and Contiguous Columns in Pandas to the Last Column in DataFrame


I am fairly new to python and want to get non-contiguous columns in pandas, but can seem to figure it out. I know in R it could be done like df[:, c(1, 3:)] to select columns 1, 3 to end of columns when using indexing. Just want to know how that is done in python using a general approach that would be applicable to different datasets with differing number of columns

Say I have generate some data like below:

## generate integer and category/hierarchy data
dataset = pd.DataFrame({'Group': np.random.choice(range(1, 5), 100, replace=True),
                        "y": np.random.choice(range(1, 6), 100, replace=True),
                        "X1": np.random.choice(range(1, 6), 100, replace=True),
                        "X2": np.random.choice(range(1, 6), 100, replace=True),
                        "X3": np.random.choice(range(1, 6), 100, replace=True),
                        "X4": np.random.choice(range(1, 6), 100, replace=True),
                        "X5": np.random.choice(range(1, 6), 100, replace=True)
                      })
dataset.head()

I know I can select columns 0 and 1 (Group and y) with dataset.iloc[:, np.r_[0,1]], and I can also select columns Group, X1 through X5 with dataset.iloc[:, np.r_[0, 2:7]].

    Group   X1      X2          X3         X4   X5
0   2   3.000000    4.000000    5.000000    4.0 2.0
1   2   4.000000    2.000000    2.000000    5.0 3.0
2   1   5.000000    1.000000    3.000000    5.0 1.0
3   4   5.000000    2.986855    2.000000    3.0 4.0
4   1   1.000000    3.000000    5.000000    4.0 1.0
... ... ... ... ... ... ...
95  1   3.000000    3.000000    2.000000    5.0 3.0
96  4   2.964054    4.000000    5.000000    1.0 5.0
97  2   4.000000    3.000000    2.863587    2.0 5.0
98  1   3.000000    3.000000    4.000000    3.0 2.0
99  4   5.000000    2.692210    3.000000    3.0 1.0

My question is, is there a more general way to select columns 2: to the last column using the np.r_ function, like can be done in R df[:, c(1, 3:)].


Solution

  • With numpy:

    dataset.iloc[:, np.r_[0, 2:dataset.shape[1]]]
    

    With pandas:

    dataset[[dataset.columns[0], *dataset.columns[2:]]]