I am a beginner in natural language processing. I have to work on different languages that Tamil is one of them. Could I ask from experts whether there is any Tamil language tokenizer code (java,c,python or etc.) and part of speech tagger codes that I use it for my research?
I really appreciate if I can get some experts' opinion here. Any help is appreciated.
I have found one tool for tokenization Indic NLP Library. It supports Tamil.
I found no POS tagger tools available on the internet, but I have found some papers:
2008 Morpheme based Language Model for Tamil Part-of-Speech Tagging
2009 CRF Models for Tamil Part of Speech Tagging and Chunking
Maybe you can contact the authors for help.
Or if you can speak Tamil, search on the internet(especially university websites) in Tamil, you may find some resources and tools.