Search code examples
pythonpandasloopssubsetlevels

Create subsets in a loop according to a column values in pandas dataframe


I have a dataframe of which I wan't to create subsets in a loop according to the values of one column.

Here is an example df :

c1        c2      c3
A          1       2
A          2       2
B          0       2
B          1       1

I would like to create subsets like so in a loop

first iteration, select all rows in which C1=A, and only columns 2 and 3, second, all rows in which C1=B, and only C2 and 3.

I've tried the following code :

for level in enumerate(df.loc[:,"C1"].unique()):

    df_s = df.loc[df["C1"]==level].iloc[:, 1:len(df.columns)]
    #other actions on the subsetted dataframe

but the subset isn't performed. How to iterate throudh the levels of a column

For instance in R it would be

for (le in levels(df$C1){
dfs <- df[df$C1==le,2:ncol(df)]
}

Thanks


Solution

  • There is no need for the enumerate which gives both index and values, just loop through c1 column directly:

    for level in df.c1.unique():
        df_s = df.loc[df.c1 == level].drop('c1', 1)
        print(level + ":\n", df_s)
    
    #A:
    #    c2  c3
    #0   1   2
    #1   2   2
    #B:
    #    c2  c3
    #2   0   2
    #3   1   1
    

    Most likely, what you need is df.groupby('c1').apply(lambda g: ...), which should be a more efficient approach; Here g is the sub data frame with a unique c1 value.