python pandas dataframe vectorization series

Vectorizing a Function to Replicate Rows with Pandas

CONTEXT:

I have a DataFrame with a column and a function that duplicates a row based on the number in the column "count". My current method is very slow when working with larger datasets:

def replicate_row(df):
    for i in range(len(df)):
        row = df.iloc[i]
        if row['count']>0:
           rep = int(row['count'])-1
           if rep != 0:
               full_df = full_df.append([row]*rep, ignore_index=True)

I'm trying to figure out how to vectorize this function to run quicker and found this so far:

def vector_function(
    pandas_series: pd.Series) -> pd.Series:
    scaled_series = pandas_series['count'] - 1
    *** vectorized replication code here ? ***
    return scaled_series

SAMPLE DATA

Name    Age    Gender    Count
Jen     25     F         3
Paul    30     M         2

The expected outcome of DF would be:

Name    Age    Gender    
Jen     25     F         
Jen     25     F         
Jen     25     F         
Paul    30     M         
Paul    30     M

Solution

Try using pd.Index.repeat:

df = f.loc[df.index.repeat(df['Count'])].reset_index(drop=True).drop('Count', axis=1)

Output:

>>> df
   Name  Age Gender
0   Jen   25      F
1   Jen   25      F
2   Jen   25      F
3  Paul   30      M
4  Paul   30      M