Search code examples
pythonpandasnumpyvectorizationsampling

Numpy array with different standard deviation and mean per row


I have a pandas data frame with two columns. They represent the mean and the standard deviation.

How can I perform vectorized sampling? I want to sample 1 observation per row.

import numpy as np
import pandas as pd

rng = np.random.RandomState(0)

#n_points = 4_000_000
n_points = 10
d_dimensions = 2

X = rng.random_sample((n_points, d_dimensions))

df = pd.DataFrame(X)
display(df.head())

df['raondomized'] = df.apply(lambda x: np.random.normal(x[0], x[1], 1), axis = 1)
df.head()

It is very slow when the number of records increases.

Numpy array with different standard deviation per row

np.random.seed(444)
arr = np.random.normal(loc=0., scale=[1., 2., 3.], size=(1000, 3)).T

print(arr.mean(axis=1))
# [-0.06678394 -0.12606733 -0.04992722]
print(arr.std(axis=1))
# [0.99080274 2.03563299 3.01426507]

show how to perform vectorized sampling with equal means - how can this be changed to support different means just like my naive version using apply, but faster?

A:

np.random.normal(df[0], df[1], 1)

only returns a single scalar value, even though multiple means/standard deviations are specified.


Solution

  • df['raondomized'] = np.random.normal(df[0], df[1])
    

    It is important to not specify the number of elements.