pandas dataframe random conditional-statements

Create pandas dataframe column with random conditional numbers

I have created the following pandas dataframe.

import pandas as pd
import numpy as np

ds = {'col1' : [1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2],
      'col2' : [0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1]}

data = pd.DataFrame(data=ds)

which looks like this:

print(data)

    col1  col2
0      1     0
1      1     0
2      1     0
3      1     0
4      1     0
5      1     0
6      1     0
7      1     0
8      1     0
9      1     0
10     1     0
11     1     0
12     1     0
13     1     0
14     2     1
15     2     1
16     2     1
17     2     1
18     2     1
19     2     1
20     2     1
21     2     1
22     2     1
23     2     1
24     2     1
25     2     1
26     2     1
27     2     1

I need to create a new column (called col3) subject to the following conditions:

when col1 = 1, there are 14 records for which col2 = 0. The new column (i.e. col3), needs to have 50% (of exactly those 14 records) of the values equal to col2 (randomly distributed across the 14 records) and the remaining 50% equal to 1.
when col1 = 2, there are 14 records for which col2 = 1. The new column (i.e. col3), needs to have 50% (of exactly those 14 records) of the values equal to col2 (randomly distributed across the 14 records) and the remaining 50% equal to 0.

So, the resulting dataset would look like this (bear in mind that the location - or record - of the values in col3 is randomly assigned):

Does anyone know the python code to produce such dataframe?

Solution

`groupby` + `sample`

# take a sample of 50% from col2 per unique value in col1
data['col3'] = data.groupby('col1')['col2'].sample(frac=.5)

# fill the remaining 50% using a predefined mapping of col1 value
data['col3'] = data['col3'].fillna(data['col1'].map({1: 1, 2: 0}), downcast='infer')

Result

    col1  col2  col3
0      1     0     1
1      1     0     0
2      1     0     0
3      1     0     0
4      1     0     0
5      1     0     0
6      1     0     0
7      1     0     1
8      1     0     0
9      1     0     1
10     1     0     1
11     1     0     1
12     1     0     1
13     1     0     1
14     2     1     1
15     2     1     0
16     2     1     1
17     2     1     0
18     2     1     0
19     2     1     1
20     2     1     1
21     2     1     0
22     2     1     1
23     2     1     0
24     2     1     0
25     2     1     1
26     2     1     1
27     2     1     0

Create pandas dataframe column with random conditional numbers

groupby + sample

`groupby` + `sample`