I have searched for some similar questions like'equivalent R function rep in Python'.
In R,rep can be used to an array or a dataframe and you can set the parameter each
to specify whether you want to repeat every element or repeat the whole list/dataframe.
But in Python, you have to distinguish between array and dataframe.
For an array, np.repeat
will repeat each element and np.tile
repeat the whole array .
x=['a','b']
np.repeat(x,2)#repeat each element twice
Out[85]: array(['a', 'a', 'b', 'b'], dtype='<U1')
np.tile(x,2)#repeat the whole array twice
Out[86]: array(['a', 'b', 'a', 'b'], dtype='<U1')
For a Pandas dataframe. pd.concat
can be used to repeat the whole dataframe:
d=pd.DataFrame({'x':['a','b'],'y':['c','d']})
d
Out[94]:
x y
0 a c
1 b d
pd.concat([d]*2)
Out[93]:
x y
0 a c
1 b d
0 a c
1 b d
My question is how to repeat each row in a pandas dataframe rather repeat it as a whole. The result I want is:
x y
a c
a c
b d
b d
Anyway, I wish there is a function in Python like'rep' which can be used to both list and dataframe , and also can specify repeat as a whole or repeat each element.
In pandas
you can using reindex
with np.repeat
d.reindex(np.repeat(df.index.values,2))
x y
0 a c
0 a c
1 b d
1 b d
Or re-build your dataframe
pd.DataFrame(np.repeat(d.values,2,axis=0),columns=d.columns)
x y
0 a c
1 a c
2 b d
3 b d
Also concat
wih sort_index
pd.concat([d]*2).sort_index()
x y
0 a c
0 a c
1 b d
1 b d