Search code examples
pythonnlp

Keyword assignment (not keyword extraction) in python machine learning: where to start?


I want to do keyword assignments (not keyword extraction) using python machine learning to a collection of articles, i.e. classifying a text using keywords from a predefined list. Google gives me an abundance of results on keyword extraction instead. Could you please direct me to any blogs or articles on the steps of keyword assignment (even better with recommendations to libraries)?

As shown in the screenshot (please advise how to share the CSV file), ten existing questions have already been manually tagged, and a new eleventh question is waiting to be tagged based on the patterns.

enter image description here


Solution

  • You can try & test multiple approaches and do a comparative analysis to find out which works best for you:

    • Extract keywords from article, compare target tags to these extracted keywords and assign matching/similar tags. You can use word2vec and distance metrics for term similarity.
    • Compute similarity between article and each tag and assign tags with similarity above certain threshold or top n terms. You can use BERT to extract article embeddings and cosine similarity.