Search code examples
javamavenstanford-nlpdl4j

Error on non English satisfying sentence DL4J and NLP


I am trying to run the sample program from the Dl4J examples. Here is the program: https://github.com/deeplearning4j/dl4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/recurrent/word2vecsentiment/Word2VecSentimentRNN.java
I have done only a simple tweek for getting continuous input through commandline.
Now when I input perfect English sentence then it gives me output the sentiments. But when I type something weird then it throws exception.
Here is the example:

eweweerfsd dfddfdr
Exception in thread "main" org.nd4j.linalg.exception.ND4JIllegalStateException: Invalid shape: Requested INDArray shape [1, 300, 0] contains dimension size values < 1 (all dimensions must be 1 or more)
    at org.nd4j.linalg.factory.Nd4j.checkShapeValues(Nd4j.java:4654)
    at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:4644)
    at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:3810)
    at sf.sentiment.analyzer.core.SentimentAnalyser.getDataSet(SentimentAnalyser.java:77)
    at sf.sentiment.analyzer.core.SentimentAnalyser.predict(SentimentAnalyser.java:46)
    at sf.sentiment.analyzer.SentimentAnalysis.main(SentimentAnalysis.java:59)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)

I want to know how I can avoid this type of problem? I would like to know I can find whether I should give input to the program or just say that the sentence is not proper? How I can know that there is no spell mistake? In short how to judget the sentence for giving input to the program?
Kindly suggest. I am eager to know the solution.


Solution

  • If I had to guess the issue is that you are submitting words that aren't in the word2vec vocabulary, so something is going wrong when it can't find a word vector for eweweerfsd. Simple solutions would be to skip sentences with unknown vocabulary words or remove unknown words or replace the unknown words with a rare word that is in the word2vec vocabulary.