how to make a light-weighted stanford-nlp.jar

I've noticed the whole library is quite large, ~300MB. But I'm only using tokenize, ssplit, pos. How can I make a light library? Many thanks.

Best, Huang

Solution

If you only want part of speech tags, you can include just the part of speech tagger models; for example, as downloaded from: nlp.stanford.edu/software/tagger.shtml. You can also safely just go ahead and remove unwanted models from the models jar to make it smaller.

Arabic lemmatization and Stanford NLP
What is difference between Core NLP and Stanford NLP?
Google colab Glove_Python pip install not working
Which Stanford NLP package to use for content categorization>
Extracting clause from a Penn Treebank-formatted text
How to use local files in an Azure Function hosted on the Linux Consumption plan?
Java Stanford NLP: Part of Speech labels?
Is there any part of speech tagger and tokenizer of Tamil language?
How to make stanza lemmatizer to return just the lemma instead of a dictionary?
How to get Enhanced++ dependency labels with a java command line in the terminal?
What is Stanford CoreNLP's recipe for tokenization?
Stanford CoreNLP and Emoji?
TypeError: stat: path should be string, bytes, os.PathLike or integer, not _io.TextIOWrapper
Preventing Stanford Core NLP Server from outputting the text it receives
Stanford Parser for Python: Output Format
Extracting the relationship between entities in Stanford CoreNLP
Convert constituent string to Tree object (Stanza)
Calculating similarity score in contexto.me clone
Only Get Tokenized Sentences as Output from Stanford Core NLP
How can I find the cosine similarity between two song lyrics represented as strings?
GloVe Import error - Corpus - Unable to import
How to extract name from string using nltk
How to see if one Nokogiri::XML::Node contains parts of another Nokogiri::XML::Node?
Can someone explain how to create a PTB Dataset And/Or Train my own model using StanfordNLP?
Error while loading vector from Glove in Spacy
How to store Stanza Span in MongoDB collection?
Stanford's Stanza NLP: find all words ids for a given span
NoneType erorr when calling .lower() method on annotated text
Extract Noun Phrases with Stanza and CoreNLPClient
How do I get word indexes for Glove embeddings in pytorch