Word2vec: distangling semantic from syntactic

I want to use pre-train word vectors (e.g., fasttest on Wikipedia) to find clusters of a set of words. However, in the list of words I have words like 'kindness', 'kind', 'kindly' and they fall in different clusters. That is sometimes words with similar part of speech are clusters together. I want to know how can I have word vectors that only captures meaning?

Solution

You can lemmatize or stem the words before using word2vec.

stemming library has several such algorithm implemented.

How to load a huggingface pretrained transformer model directly to GPU?
llama-cpp-python not using NVIDIA GPU CUDA
How to check if given word is in plural or singular form?
What is the loss function used in Trainer from the Transformers library of Hugging Face?
ImportError: cannot import name 'deprecated' from 'typing_extensions'
Why is the vocab size of Byte level BPE smaller than Unicode's vocab size?
Problem in tqdm function in a Doc2Vec model
Is there a database, API, or parsable text for getting verb conjugations?
Can't compile Marian NMT
Arabic lemmatization and Stanford NLP
how to get custom column in the model's forward() function when training with Huggingface Trainer?
Using WN-Affect to detect emotion/mood of a string
Getting all leaf words (reverse stemming) into one Python List
What's the major difference between glove and word2vec?
Inspect all probabilities of BERTopic model
How to make an AI bot of Natural Language Processing?
catelog sentences into 5 words that represent them
Determining most popular words in the English dictionary within a dictionary of words
How to save checkpoints for thie transformer gpt2 to continue training?
Bert model splits words by its own
Searching for specific words in Corpus with R (tm package)
Avoiding overlap in frequency and document frequency count in Quanteda
Questions about training LLMs on large text datasets for text generation from scratch
HuggingFace Transformers For Text Generation with CTRL with Google Colab's free GPU
Counting the Frequency of Some Words within some other Key Words in Text
How to get Bigram/Trigram of word from prelisted unigram from a document corpus / dataframe column
Error in getting Captum text explanations for text classification
euclidian distance from word to sentence after doing Vectorizer
Llama-3.2-1B-Instruct generate inconsistent output
Normalization of token embeddings in BERT encoder blocks