Pick random values from a second table based on join in Python / Pandas

Suppose I have a Python dataframe:

A
B
C
A
B

...and a second dataframe

A  3
A  2
A  4
B  5
B  2
B  8
B  7
C  1
C  5

I want to join the second dataframe to the first - but for each value in the first frame, the join should be a random selection from the second row of the second dataframe picking only from where the first column is the same value.

So, for example, for the first value A in the first dataframe, I'd look in the second table and it would pick randomly from the values in the 2nd row whose first row value is an A - i.e. randomly select one of 3, 2 or 4. For the second value B, I'd pick randomly from 5,2,8 or 7. The end result I'd simply want a dataframe like:

A  2
B  8
C  1
B  7
A  4

Solution

Let's say this is what you're starting with:

df1

  label
0     A
1     B
2     C
3     A
4     B

df2

  label  value
0     C      5
1     B      8
2     C      1
3     B      2
4     A      3
5     A      4
6     B      5
7     A      2
8     B      7

Option 1: Merge on cumcounted key
One easy way to do this is to shuffle df2, add an incremental key to both dataFrames and then merge:

df3 = df1.assign(key=df1.groupby('label').cumcount())
df4 = (df2.sample(frac=1)
          .reset_index(drop=True)
          .assign(key=lambda d: d.groupby('label').cumcount()))

df3.merge(df4, how='left', on=['label', 'key']).drop('key', 1)

  label  value
0     A      2
1     B      5
2     C      1
3     A      3
4     B      8

note: set np.random.seed for deterministic shuffles

Option 2: Sample groups and concat
Another option is to groupby df2, sample groups and concat

counts = df1['label'].value_counts() 
pd.concat([g.sample(n=counts[k]) for k, g in df2.groupby('label')])

  label  value
7     A      2
5     A      4
3     B      2
6     B      5
2     C      1

caveat here is that ordering isn't preserved.