Search code examples
pythonrpandasdataframerep

Equivalent 'rep' of R in Pandas dataframe


I have searched for some similar questions like'equivalent R function rep in Python'.

In R,rep can be used to an array or a dataframe and you can set the parameter each to specify whether you want to repeat every element or repeat the whole list/dataframe.

But in Python, you have to distinguish between array and dataframe.

For an array, np.repeat will repeat each element and np.tile repeat the whole array .

x=['a','b']

np.repeat(x,2)#repeat each element twice
Out[85]: array(['a', 'a', 'b', 'b'], dtype='<U1')

np.tile(x,2)#repeat the whole array twice
Out[86]: array(['a', 'b', 'a', 'b'], dtype='<U1')

For a Pandas dataframe. pd.concat can be used to repeat the whole dataframe:

d=pd.DataFrame({'x':['a','b'],'y':['c','d']})
d
Out[94]: 
   x  y
0  a  c
1  b  d


pd.concat([d]*2)
Out[93]: 
   x  y
0  a  c
1  b  d
0  a  c
1  b  d

My question is how to repeat each row in a pandas dataframe rather repeat it as a whole. The result I want is:

x y
a c
a c
b d 
b d

Anyway, I wish there is a function in Python like'rep' which can be used to both list and dataframe , and also can specify repeat as a whole or repeat each element.


Solution

  • In pandas you can using reindex with np.repeat

    d.reindex(np.repeat(df.index.values,2))
       x  y
    0  a  c
    0  a  c
    1  b  d
    1  b  d
    

    Or re-build your dataframe

    pd.DataFrame(np.repeat(d.values,2,axis=0),columns=d.columns)
       x  y
    0  a  c
    1  a  c
    2  b  d
    3  b  d
    

    Also concat wih sort_index

    pd.concat([d]*2).sort_index()
       x  y
    0  a  c
    0  a  c
    1  b  d
    1  b  d