Search code examples
pythonpandassequencerepeatintervals

Repeat a list of cells in a column over an interval in python


I have a dataframe as follows:

Code_1 Code_2
A C1
B C2
C C3
D C4
E C5
NaN C6
NaN C7
NaN C8
NaN C9
NaN C10

Then, I modified my dataframe because I wanted the same Code_1 for the whole column Code_2, please check it out the code: Firstly, I did split the columns:

dfa = pd.DataFrame()
dfb = pd.DataFrame()
dfa['Code_1'] = df['Code_1']
dfb['Code_2'] = df['Code_2']
dfa = dfa.dropna()
dfa['times'] = len(dfa)

dfa = dfa.loc[dfa.index.repeat(dfa.times)].reset_index(drop=True)

** df --> Original dataframe

And then, the output is something like this (I am ignoring "times" column):

Code_1 Code_2
A C1
A C2
A C3
A C4
A C5
A C6
A C7
A C8
A C9
A C10
B NaN
B NaN
B NaN
B NaN
B NaN
B NaN
B NaN
B NaN
B NaN
B NaN

(and so forth)

But I'd like to have C1 to C10 repeated for each interval of Code_1, like this:

Code_1 Code_2
A C1
A C2
A C3
A C4
A C5
A C6
A C7
A C8
A C9
A C10
B C1
B C2
B C3
B C4
B C5
B C6
B C7
B C8
B C9
B C10

(and so forth)

But I don't know how to repeat Code_2 sequence over Code_1 repetition. Can you help me?

Also, if there is an easier way to do the first part of this code, please let me know.

Thank you in advance!


Solution

  • You can use a bulit-in Python tool called product from itertools to help you with the work.

    from itertools import product
    
    new_df = pd.DataFrame(product(df.Code_1[~df.Code_1.isna()],df.Code_2),columns=['Code_1','Code_2'])
    
       Code_1 Code_2
    0       A     C1
    1       A     C2
    2       A     C3
    3       A     C4
    4       A     C5
    5       A     C6
    6       A     C7
    7       A     C8
    8       A     C9
    9       A    C10
    10      B     C1
    11      B     C2
    12      B     C3
    13      B     C4
    14      B     C5
    15      B     C6
    16      B     C7
    17      B     C8
    18      B     C9
    19      B    C10
    .       .     .
    .       .     .
    .       .     .