Search code examples
pythonnlpstanford-nlp

How can I tag or give a document of text a topic?


I have set of documents and corresponding set of tags for those documents

ex.

Document-"Learned Counsel appearing for the Appellants however points out that in the..etc etc"

Tags - "Compensation, Fundamental Right"

Now I have multiple documents with their corresponding tags and I another test set of data without any tags what NLP techniques do I use to give these documents tag? Do I use text classification or topic modeling can someone please guide or suggest some ideas.


Solution

  • you can use two approaches:

    1- rule based (extract common words in each tag and classify documents with them)

    2- machine learning

    if you have a large scale training data you can use machine learning to classify documents:

    you can use this approaches:

    https://arxiv.org/abs/1904.08398

    https://medium.com/@armandj.olivares/using-bert-for-classifying-documents-with-long-texts-5c3e7b04573d