Search code examples
pythonpandassample

How to create a "weight" field when sampling a population in python?


I am sampling a population and I'd like to know if there is a straightforward way to generate a column called "weight" that indicates the sample weight in the sampled data.

Here is my code.

I create the population that is to be sampled

import pandas as pd
df=pd.DataFrame({'Age':[18,20,20,56,56,57,60]})

print(df)
   Age
0   18
1   20
2   20
3   56
4   56
5   57
6   60

I take a 30% random sample of that population

sampleData = df.sample(frac=0.3)
print(sampleData)

   Age
6   60
5   57

What I would like to know is whether it's possible to generate a field called "weight" that indicates the sample weight (without having to manually calculate the weight). So, I'd like my sample data to look like:

   Age  Weight
6   60   3.333
5   57   3.333

Solution

  • Just use assign() method and inside it use round() method:-

    frac=0.3
    sampleData=df.sample(frac=frac).assign(Weight=round(1/frac,3))
    

    Now if you print sampleData you will get your desired output:-

        Age     Weight
    4   56      3.333
    2   20      3.333