I have a dataframe as follows:
Code_1 | Code_2 |
---|---|
A | C1 |
B | C2 |
C | C3 |
D | C4 |
E | C5 |
NaN | C6 |
NaN | C7 |
NaN | C8 |
NaN | C9 |
NaN | C10 |
Then, I modified my dataframe because I wanted the same Code_1 for the whole column Code_2, please check it out the code: Firstly, I did split the columns:
dfa = pd.DataFrame()
dfb = pd.DataFrame()
dfa['Code_1'] = df['Code_1']
dfb['Code_2'] = df['Code_2']
dfa = dfa.dropna()
dfa['times'] = len(dfa)
dfa = dfa.loc[dfa.index.repeat(dfa.times)].reset_index(drop=True)
** df --> Original dataframe
And then, the output is something like this (I am ignoring "times" column):
Code_1 | Code_2 |
---|---|
A | C1 |
A | C2 |
A | C3 |
A | C4 |
A | C5 |
A | C6 |
A | C7 |
A | C8 |
A | C9 |
A | C10 |
B | NaN |
B | NaN |
B | NaN |
B | NaN |
B | NaN |
B | NaN |
B | NaN |
B | NaN |
B | NaN |
B | NaN |
(and so forth)
But I'd like to have C1 to C10 repeated for each interval of Code_1, like this:
Code_1 | Code_2 |
---|---|
A | C1 |
A | C2 |
A | C3 |
A | C4 |
A | C5 |
A | C6 |
A | C7 |
A | C8 |
A | C9 |
A | C10 |
B | C1 |
B | C2 |
B | C3 |
B | C4 |
B | C5 |
B | C6 |
B | C7 |
B | C8 |
B | C9 |
B | C10 |
(and so forth)
But I don't know how to repeat Code_2 sequence over Code_1 repetition. Can you help me?
Also, if there is an easier way to do the first part of this code, please let me know.
Thank you in advance!
You can use a bulit-in Python tool called product
from itertools
to help you with the work.
from itertools import product
new_df = pd.DataFrame(product(df.Code_1[~df.Code_1.isna()],df.Code_2),columns=['Code_1','Code_2'])
Code_1 Code_2
0 A C1
1 A C2
2 A C3
3 A C4
4 A C5
5 A C6
6 A C7
7 A C8
8 A C9
9 A C10
10 B C1
11 B C2
12 B C3
13 B C4
14 B C5
15 B C6
16 B C7
17 B C8
18 B C9
19 B C10
. . .
. . .
. . .