Hi I have down sampled my dataset and i need help in Up weight Or adding weight to the down-sampled examples. See below Code
#Separating majority and minority classes
df_majority = data[data.Collected_ind == 1]
df_minority = data[data.Collected_ind == 0]
# Downsample majority class
df_majority_downsampled = resample(df_majority,
replace=False, # sample without replacement
n_samples=152664, # to match minority class
random_state=1) # reproducible results
# Combining minority class with downsampled majority class
df_downsampled = pd.concat([df_majority_downsampled, df_minority])
# Display new class counts
df_downsampled.Collected_ind.value_counts()
df_downsampled['Collected_ind'].value_counts()
df_downsampled['Collected_ind'].value_counts(normalize=True)
#Randomly shuffle the rows.
df_downsampled = df_downsampled.sample(frac=1)
df_downsampled.to_csv("Sampled_Data.csv", index=False)
#Generate a train and test dataset
train = df_downsampled.sample(frac=0.8)
test = df_downsampled.drop(train.index)
train.to_csv("trainNew.csv", index=False)
test.to_csv("testNew.csv", index=False)
Your question actually helped me answer my own question because I was looking for this syntax. Since I'm here anyways, I'll show you what I am doing. I don't know if your definition of weight is the same as mine, but here's what we use:
class_weight = (original_class_count/original_row_count) / (new_class_count/new_row_count)
So to reformat your code, I would replace n_samples
with len(df_minority)
and then add the formula above as a column in your dataframes by dynamically using length of the various dataframes.
Perhaps something like
df_downsampled['weight']=np.where(df_downsampled['Collected_Ind']==1,(len(df_majority) / len(data) ) / ( len(df_minority) / len(df_minority) *2),(len(df_minority) / len(data) ) / ( len(df_minority) / len(df_minority) *2))