Search code examples
pythonpandaspython-itertools

pandas: creating toy data using itertools and flattened lists


I am writing a functions which should produce toy data for pandas (examples for grouping and multiindex). My goal is to generate groups (e.g. representing conditions during experiments) which might be repeated several times. My attempt:

import itertools as it
import numpy as np
import pandas as pd

p = it.product([[4,5,6],[7,8,9]],[1,2,3])
p = list(p)
p

[([4, 5, 6], 1),
 ([4, 5, 6], 2),
 ([4, 5, 6], 3),
 ([7, 8, 9], 1),
 ([7, 8, 9], 2),
 ([7, 8, 9], 3)]

I would like to flatten only the inner list but preserve the structure of the outer list (and get rid of the tuples). My solution is based on this SO post:

def flatten(l):
    for el in l:
        if isinstance(el, collections.Iterable) and not isinstance(el, (str, bytes)):
            yield from flatten(el)
        else:
            yield el

lf = list(flatten(p))
np.reshape(lf, (len(p), 4))

array([[4, 5, 6, 1],
       [4, 5, 6, 2],
       [4, 5, 6, 3],
       [7, 8, 9, 1],
       [7, 8, 9, 2],
       [7, 8, 9, 3]])

I have two questions. First, is there a simpler solution possible ? Second, do I need to do all that when I would like to create a pandas dataframe in the end ? The dataframe should look like:

pd.DataFrame(np.reshape(it, (len(p), 4)))

    0   1   2   3
0   4   5   6   1
1   4   5   6   2
2   4   5   6   3
3   7   8   9   1
4   7   8   9   2
5   7   8   9   3

Solution

  • Option 1:

    In [249]: pd.DataFrame([np.concatenate(t)
                            for t in it.product([[4,5,6],[7,8,9]],[[1],[2],[3]])])
    Out[249]:
       0  1  2  3
    0  4  5  6  1
    1  4  5  6  2
    2  4  5  6  3
    3  7  8  9  1
    4  7  8  9  2
    5  7  8  9  3
    

    Option 2: pure Pandas solution:

    In [261]: a = pd.DataFrame([[4,5,6],[7,8,9]], columns=list('abc'))
    
    In [262]: b = pd.DataFrame([[1],[2],[3]], columns=['d'])
    
    In [263]: a
    Out[263]:
       a  b  c
    0  4  5  6
    1  7  8  9
    
    In [264]: b
    Out[264]:
       d
    0  1
    1  2
    2  3
    
    In [265]: a.assign(k=0).merge(b.assign(k=0), on='k').drop('k',1)
    Out[265]:
       a  b  c  d
    0  4  5  6  1
    1  4  5  6  2
    2  4  5  6  3
    3  7  8  9  1
    4  7  8  9  2
    5  7  8  9  3