Search code examples
pythondataframerow

How to add multiple rows in dataframe in python


I have a dataframe(df) like below (there are more rows actually).

number
0 21
1 35
2 467
3 965
4 2754
5 34r
6 5743
7 841
8 8934
9 275

I want to insert multiple 6 rows in between rows for example I want to get random 6 values within range of index 0 and 1 and add these 6 rows between index 0 and 1. Same goes to index 1 and 2, 2 and 3 and so forth until the end.

np.linspace(df["number"][0], df["number"][1],8)

Is there a function or any other method to generate 6 additional rows between all existing 9 rows so therefore the final number of rows will be not 9 but 64 rows (after adding 54 rows)?


Solution

  • You could try the following:

    from random import uniform
    
    def rng_numbers(row):
        left, right = row.iat[0], row.iat[1]
        n = left
        if pd.isna(right):
            return [n]
        if right < left:
            left, right = right, left
        return [n] + [uniform(left, right) for _ in range(6)]
    
    df["number"] = (
        pd.concat([df["number"], df["number"].shift(-1)], axis=1)
        .apply(rng_numbers, axis=1)
    )
    df = df.explode("number", ignore_index=True)
    
    • First create a dataframe with 2 columns that form the interval boundaries: the number column and number column shifted 1 forth.
    • Then .apply the function rng_numbers to the rows of the new dataframe: rng_numbers first sorts the interval boundaries and then returns a list that starts with the resp. item from column number and then num_rows many random numbers in the interval. In the last row the left boundary is NaN (due to the .shift(-1)): in this case the function returns the list without the random numbers.
    • Then .explode df on the new column number.

    You could do something similar with NumPy, which is probably faster:

    rng = np.random.default_rng()
    
    limits = pd.concat([df["number"], df["number"].shift(-1)], axis=1)
    left = limits.min(axis=1).values.reshape(-1, 1)
    right = limits.max(axis=1).values.reshape(-1, 1)
    df["number"] = (
        pd.Series(df["number"].values.reshape(len(df), 1).tolist())
        + pd.Series(rng.uniform(left, right, size=(len(df), 6)).tolist())
    )
    df["number"].iat[-1] = df["number"].iat[-1][:1]
    df = df.explode("number", ignore_index=True)