Search code examples

Populating an even distribution of values across multiple axis?

Basic Example:

# Given params such as:
params = {
    'cols': 8,
    'rows': 4, 
    'n': 4
# I'd like to produce (or equivalent):
       col0  col1  col2  col3  col4  col5  col6  col7
row_0     0     1     2     3     0     1     2     3
row_1     1     2     3     0     1     2     3     0
row_2     2     3     0     1     2     3     0     1
row_3     3     0     1     2     3     0     1     2

Axis Value Counts:

  • Where the axis all have an equal distribution of values
df.apply(lambda x: x.value_counts(), axis=1)

       0  1  2  3
row_0  2  2  2  2
row_1  2  2  2  2
row_2  2  2  2  2
row_3  2  2  2  2
df.apply(lambda x: x.value_counts())

   col0  col1  col2  col3  col4  col5  col6  col7
0     1     1     1     1     1     1     1     1
1     1     1     1     1     1     1     1     1
2     1     1     1     1     1     1     1     1
3     1     1     1     1     1     1     1     1

My attempt thus far:

import itertools
import pandas as pd

def create_df(cols, rows, n):
    x = itertools.cycle(list(itertools.permutations(range(n))))
    df = pd.DataFrame(index=range(rows), columns=range(cols))
    df[:] = np.reshape([next(x) for _ in range((rows*cols)//n)], (rows, cols))
    #df = df.T.add_prefix('row_').T
    #df = df.add_prefix('col_')
    return df 

params = {
    'cols': 8,
    'rows': 4, 
    'n': 4
df = create_df(**params)


   0  1  2  3  4  5  6  7
0  0  1  2  3  0  1  3  2
1  0  2  1  3  0  2  3  1
2  0  3  1  2  0  3  2  1
3  1  0  2  3  1  0  3  2

# Correct on this Axis:
>>> df.apply(lambda x: x.value_counts(), axis=1)
   0  1  2  3
0  2  2  2  2
1  2  2  2  2
2  2  2  2  2
3  2  2  2  2

# Incorrect on this Axis:
>>> df.apply(lambda x: x.value_counts())
     0  1    2    3    4  5    6    7
0  3.0  1  NaN  NaN  3.0  1  NaN  NaN
1  1.0  1  2.0  NaN  1.0  1  NaN  2.0
2  NaN  1  2.0  1.0  NaN  1  1.0  2.0
3  NaN  1  NaN  3.0  NaN  1  3.0  NaN

So, I have the conditions I need on one axis, but not on the other.

How can I update my method/create a method to meet both conditions?


  • You can tile you input and use a custom roll to shift each row independently:

    c = params['cols']
    r = params['rows']
    n = params['n']
    a = np.arange(params['n']) # or any input
    b = np.tile(a, (r, c//n))
    # array([[0, 1, 2, 3, 0, 1, 2, 3],
    #        [0, 1, 2, 3, 0, 1, 2, 3],
    #        [0, 1, 2, 3, 0, 1, 2, 3],
    #        [0, 1, 2, 3, 0, 1, 2, 3]])
    idx = np.arange(r)[:, None]
    shift = (np.tile(np.arange(c), (r, 1)) - np.arange(r)[:, None])
    df = pd.DataFrame(b[idx, shift])


       0  1  2  3  4  5  6  7
    0  0  1  2  3  0  1  2  3
    1  3  0  1  2  3  0  1  2
    2  2  3  0  1  2  3  0  1
    3  1  2  3  0  1  2  3  0

    Alternative order:

    idx = np.arange(r)[:, None]
    shift = (np.tile(np.arange(c), (r, 1)) + np.arange(r)[:, None]) % c
    df = pd.DataFrame(b[idx, shift])


       0  1  2  3  4  5  6  7
    0  0  1  2  3  0  1  2  3
    1  1  2  3  0  1  2  3  0
    2  2  3  0  1  2  3  0  1
    3  3  0  1  2  3  0  1  2

    Other alternative: use a custom strided_indexing_roll function.