python python-3.x pandas dataframe sklearn-pandas

choose random rows in pandas datafram

I have a dataframe like this;

ID          code
333_c_132   x
333_c_132   n06
333_c_132   n36
333_c_132   n60
333_c_132   n72
333_c_132   n84
333_c_132   n96
333_c_132   n108
333_c_132   n120
999_c_133   x
999_c_133   n06
999_c_133   n12
999_c_133   n24
998_c_134   x
998_c_134   n06
998_c_134   n12
998_c_134   n18
998_c_134   n36
997_c_135   x
997_c_135   n06
997_c_135   n12
997_c_135   n24
997_c_135   n36
996_c_136   x
996_c_136   n06
996_c_136   n12
996_c_136   n18
996_c_136   n24
996_c_136   n36
995_c_137   x

I have to choose one random row in between two x in the code column. ie. for example, a possible combination is;

333_c_132   n06
999_c_133   n12
998_c_134   n18
997_c_135   n36
996_c_136   n18

How can I achieve this in pandas?

Solution

We can use cumsum create the subkey for groupby and use sample

s=df[df.code.ne('x')].groupby(df.code.eq('x').cumsum()).apply(lambda x : x.sample(1))
s=s.reset_index(level=0, drop=True)
s
           ID code
1   333_c_132  n06
12  999_c_133  n24
17  998_c_134  n36
20  997_c_135  n12
27  996_c_136  n24