Search code examples
javastanford-nlptokenizenamed-entity-recognition

Training NER model in stanford-nlp


I have been trying to play around with stanford Core NLP. I would wish to train the my own NER model. From the forums on SO and the official website describes to use a property file to do so. How would I do it via API?.

Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, sentiment, regexner");
props.setProperty("regexner.mapping", "resources/customRegexNER.txt");

StanfordCoreNLP pipeline = new StanfordCoreNLP(props);      

String processedQuestion = "Who is the prime minister of Australia?"

//Annotation annotation = pipeline.process(processedQuestion);
Annotation document = new Annotation(processedQuestion);
pipeline.annotate(document);
List<CoreMap> sentences = document.get(SentencesAnnotation.class);
for (CoreMap sentence : sentences) {

    // To get the tokens for the parsed sentence
    for (CoreMap tokens : sentence.get(TokensAnnotation.class)) {           
        String token = tokens.get(TextAnnotation.class);
        String POS = tokens.get(PartOfSpeechAnnotation.class);      
        String NER = tokens.get(NamedEntityTagAnnotation.class);            
        String Sentiment = tokens.get(SentimentClass.class);            
        String lemma = tokens.get(LemmaAnnotation.class);
  1. How & Where do I add the Prop file?
  2. N-gram tokenization (E.g. prime minister to be considered as a single token, later this token is passed for the POS, NER instead of two tokens being passed (prime and minister))?

Solution

  • I think it could work with that code :

    val props = new Properties()
      props.put("annotators", "tokenize, ssplit, pos, lemma, ner, regexner")
      props.put("ner.model", "/your/path/ner-model.ser.gz");
      val pipeline = new StanfordCoreNLP(props)