Search code examples
javaparsingclearnlp

ClearNLP (NLP4J) parser Error Execute


I'm trying to train with ClearParser and I get this error. Before execute the command I put export CLASSPATH=nlp4j-1.1.0.jar:. and doing java edu.emory.mathcs.nlp.bin.Version I get the version info, so it's installed correctly.

Command line: java -Xmx5g -XX:+UseConcMarkSweepGC edu.emory.mathcs.nlp.bin.NLPTrain -mode dep -c config-train-dep.xml -t /home/iago/Escritorio/idiomasClearParser/UD_English/en-ud-train.conllu -d /home/iago/Escritorio/idiomasClearParser/UD_English/en-ud-dev.conllu -m bestModel-dep.xz

I'm using this config file: https://github.com/emorynlp/nlp4j/blob/master/src/main/resources/edu/emory/mathcs/nlp/configuration/config-train-dep.xml

Error: log4j:WARN No appenders could be found for logger (edu.emory.mathcs.nlp.common.util.BinUtils). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. java.io.FileNotFoundException: edu/emory/mathcs/nlp/lexica/en-brown-clusters-simplified-lowercase.xz (No existe el archivo o el directorio) at java.io.FileInputStream.open0(Native Method) at java.io.FileInputStream.open(FileInputStream.java:195) at java.io.FileInputStream.<init>(FileInputStream.java:138) at java.io.FileInputStream.<init>(FileInputStream.java:93) at edu.emory.mathcs.nlp.common.util.IOUtils.createFileInputStream(IOUtils.java:147) at edu.emory.mathcs.nlp.common.util.IOUtils.getInputStream(IOUtils.java:316) at edu.emory.mathcs.nlp.component.template.util.GlobalLexica.getLexiconFieldPair(GlobalLexica.java:82) at edu.emory.mathcs.nlp.component.template.util.GlobalLexica.getLexiconFieldPair(GlobalLexica.java:72) at edu.emory.mathcs.nlp.component.template.util.GlobalLexica.<init>(GlobalLexica.java:64) at edu.emory.mathcs.nlp.component.template.util.GlobalLexica.<init>(GlobalLexica.java:55) at edu.emory.mathcs.nlp.bin.NLPTrain$1.createGlobalLexica(NLPTrain.java:108) at edu.emory.mathcs.nlp.component.template.train.OnlineTrainer.train(OnlineTrainer.java:193) at edu.emory.mathcs.nlp.component.template.train.OnlineTrainer.train(OnlineTrainer.java:187) at edu.emory.mathcs.nlp.bin.NLPTrain.train(NLPTrain.java:76) at edu.emory.mathcs.nlp.bin.NLPTrain.main(NLPTrain.java:115) java.io.IOException: Stream closed at java.io.BufferedInputStream.getInIfOpen(BufferedInputStream.java:159) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at java.io.DataInputStream.readFully(DataInputStream.java:195) at java.io.DataInputStream.readFully(DataInputStream.java:169) at org.tukaani.xz.SingleXZInputStream.initialize(Unknown Source) at org.tukaani.xz.SingleXZInputStream.<init>(Unknown Source) at org.tukaani.xz.XZInputStream.<init>(Unknown Source) at org.tukaani.xz.XZInputStream.<init>(Unknown Source) at edu.emory.mathcs.nlp.common.util.IOUtils.createXZBufferedInputStream(IOUtils.java:220) at edu.emory.mathcs.nlp.common.util.IOUtils.createObjectXZBufferedInputStream(IOUtils.java:259) at edu.emory.mathcs.nlp.component.template.util.GlobalLexica.getLexiconFieldPair(GlobalLexica.java:82) at edu.emory.mathcs.nlp.component.template.util.GlobalLexica.getLexiconFieldPair(GlobalLexica.java:72) at edu.emory.mathcs.nlp.component.template.util.GlobalLexica.<init>(GlobalLexica.java:64) at edu.emory.mathcs.nlp.component.template.util.GlobalLexica.<init>(GlobalLexica.java:55) at edu.emory.mathcs.nlp.bin.NLPTrain$1.createGlobalLexica(NLPTrain.java:108) at edu.emory.mathcs.nlp.component.template.train.OnlineTrainer.train(OnlineTrainer.java:193) at edu.emory.mathcs.nlp.component.template.train.OnlineTrainer.train(OnlineTrainer.java:187) at edu.emory.mathcs.nlp.bin.NLPTrain.train(NLPTrain.java:76) at edu.emory.mathcs.nlp.bin.NLPTrain.main(NLPTrain.java:115) Exception in thread "main" java.lang.NullPointerException at java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2338) at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2351) at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2822) at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:804) at java.io.ObjectInputStream.<init>(ObjectInputStream.java:301) at edu.emory.mathcs.nlp.common.util.IOUtils.createObjectXZBufferedInputStream(IOUtils.java:259) at edu.emory.mathcs.nlp.component.template.util.GlobalLexica.getLexiconFieldPair(GlobalLexica.java:82) at edu.emory.mathcs.nlp.component.template.util.GlobalLexica.getLexiconFieldPair(GlobalLexica.java:72) at edu.emory.mathcs.nlp.component.template.util.GlobalLexica.<init>(GlobalLexica.java:64) at edu.emory.mathcs.nlp.component.template.util.GlobalLexica.<init>(GlobalLexica.java:55) at edu.emory.mathcs.nlp.bin.NLPTrain$1.createGlobalLexica(NLPTrain.java:108) at edu.emory.mathcs.nlp.component.template.train.OnlineTrainer.train(OnlineTrainer.java:193) at edu.emory.mathcs.nlp.component.template.train.OnlineTrainer.train(OnlineTrainer.java:187) at edu.emory.mathcs.nlp.bin.NLPTrain.train(NLPTrain.java:76) at edu.emory.mathcs.nlp.bin.NLPTrain.main(NLPTrain.java:115)

Why I'm getting this error? I unpacked the .jar and there is no "lexica" folder neither "en-brown-clusters-simplified-lowercase.xz". Where I can found it?

Regards


Solution

  • I found the solution, this error happens because you don't have the "log4j.properties" configured so "nlp4j" can't find it. To fix it only create the file on the same folder that the .jar with this simple code (if you want more details adapt it to your needs)

    # Root logger option 
    log4j.rootLogger=INFO, file, stdout
    # Direct log messages to a log file
    log4j.appender.file=org.apache.log4j.RollingFileAppender
    log4j.appender.file.File= /path/to/file
    log4j.appender.file.MaxFileSize=5MB #Set what you need
    log4j.appender.file.MaxBackupIndex=10
    log4j.appender.file.layout=org.apache.log4j.PatternLayout
    log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n
    
    # Direct log messages to stdout
    log4j.appender.stdout=org.apache.log4j.ConsoleAppender
    log4j.appender.stdout.Target=System.out
    log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
    log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n
    

    Moreover, to solve the problem of lexica. Go to this url and download the jars lexica url

    Then, on the config xml set the correct path to the jars.

    And now it should works. Hope it helps someone.