Search code examples
pythonpython-3.xpandasdataframesklearn-pandas

choose random rows in pandas datafram


I have a dataframe like this;

ID          code
333_c_132   x
333_c_132   n06
333_c_132   n36
333_c_132   n60
333_c_132   n72
333_c_132   n84
333_c_132   n96
333_c_132   n108
333_c_132   n120
999_c_133   x
999_c_133   n06
999_c_133   n12
999_c_133   n24
998_c_134   x
998_c_134   n06
998_c_134   n12
998_c_134   n18
998_c_134   n36
997_c_135   x
997_c_135   n06
997_c_135   n12
997_c_135   n24
997_c_135   n36
996_c_136   x
996_c_136   n06
996_c_136   n12
996_c_136   n18
996_c_136   n24
996_c_136   n36
995_c_137   x

I have to choose one random row in between two x in the code column. ie. for example, a possible combination is;

333_c_132   n06
999_c_133   n12
998_c_134   n18
997_c_135   n36
996_c_136   n18

How can I achieve this in pandas?


Solution

  • We can use cumsum create the subkey for groupby and use sample

    s=df[df.code.ne('x')].groupby(df.code.eq('x').cumsum()).apply(lambda x : x.sample(1))
    s=s.reset_index(level=0, drop=True)
    s
               ID code
    1   333_c_132  n06
    12  999_c_133  n24
    17  998_c_134  n36
    20  997_c_135  n12
    27  996_c_136  n24