I have a dataframe with shape (4237, 19)
and then other dataframe with the shape (4237, 6)
, I need to combine both these dataframes column wise, so technically resultant dataframe should be of the shape (4237, 25)
but am getting as (5524, 25)
. Am not able to understand the issue.
Code which I have used.
social_media_vectorizer = CountVectorizer(lowercase=True)
train_social_media_vector = social_media_vectorizer.fit_transform(x_train["social_media"].values.astype("U"))
test_social_media_vector = social_media_vectorizer.transform(x_test["social_media"].values.astype('U'))
print(x_train.shape)
print(x_test.shape)
train_social_media_df = pd.DataFrame(train_social_media_vector.todense(), columns=social_media_vectorizer.get_feature_names_out())
test_social_media_df = pd.DataFrame(test_social_media_vector.todense(), columns=social_media_vectorizer.get_feature_names_out())
x_train = pd.concat([x_train, train_social_media_df], axis=1)
x_test = pd.concat([x_test, test_social_media_df], axis=1)
print("="*100)
print(x_train.shape)
print(x_test.shape)
print("="*100)
print(social_media_vectorizer.vocabulary_)
Result
(4237, 19)
(1816, 19)
====================================================================================================
(5524, 25)
(3058, 25)
====================================================================================================
{'facebook': 0, 'linkedin': 2, 'twitter': 4, 'instagram': 1, 'youtube': 5, 'producthunt': 3}
Are you sure the shape of train_social_media_vector.todense()
is (4237, 6)? It's seems to be (1287, 6)
Try to ignore_index=True
:
x_train = pd.concat([x_train, train_social_media_df], axis=1, ignore_index=True)
x_test = pd.concat([x_test, test_social_media_df], axis=1, ignore_index=True)