So I have this cosine similarity metric dictionary data with me that is stored in the variable 'similarity'. PIC of the data . May I know how can I split this data into portion of 70 and 30 precents. I want to split this data into two parts and store those two in a variable most preferably the split can be 7:3 division
The reason i have asking this is I have an accuracy algorithm that gives the accuracy of that data but the problem is that i used same data for training as well as testing as you can see in the code so I receive 100% acc obviously as my training and testing data is same. so wanted to split data into 70 30 percent such training is 70 and testing is 30.
print(similarity)
train_r = np.array(similarity)
test_r = np.array(similarity)
train_c = train_r[:,10]
test_c = test_r[:,10]
a = train_c
b = test_c
cos_sim = (dot(a, b)/(norm(a)*norm(b))) * 100
print(cos_sim)
It would be really grateful if I can get the answer. Thanks so much
This should do it:
split_rate = 0.7
split_idx = int(len(similarity)*split_rate)
train_r = np.array(similarity)[:split_idx]
test_r = np.array(similarity)[split_idx:]