Search code examples
javanlpgrammartext-processinglanguagetool

Detecting meaningless and/or grammatically incorrect sentence with LanguageTool


I need to check spells and grammars in texts so I started using LanguageTool API (Can be found here). Now, when I am writing the start-up code provided by them as follows-

JLanguageTool langTool = new JLanguageTool(Language.ENGLISH);
langTool.activateDefaultPatternRules();
List<RuleMatch> matches = langTool.check("Eat I rice" +
    "every day and go school to good as a boy");
for (RuleMatch match : matches) {
  System.out.println("Potential error at line " +
      match.getEndLine() + ", column " +
      match.getColumn() + ": " + match.getMessage());
  System.out.println("Suggested correction: " +
      match.getSuggestedReplacements());
}

I don't get any error. Sorry if I am wrong but is the sentence "Eat I rice every day and go school to good as a boy" a correct sentence (grammatically)? If so, or if not, then is there any way to detect such sentences (meaningless and or grammatically incorrect) with the tool?


Solution

  • Languagetool is rule based. Obviously the sentence "Eat I rice every day and go school to good as a boy" is not catched by any of the rules yet.

    http://wiki.languagetool.org/tips-and-tricks has the info on how to add user-defined rules to Languagetool.

    Here is an example of such a rule:

    <rule>
      <pattern>
        <token>
          <exception regexp="yes">(that|ha[ds]|will|must|could|can|should|would|does|did|may|might|t|let)</exception>
          <exception inflected="yes" regexp="yes">feel|hear|see|watch|prevent|help|stop|be</exception>
          <exception postag="C[CD]|IN|DT|MD|NNP|\." postag_regexp="yes"></exception>
          <exception scope="previous" postag="PRP$"/>
        </token>
        <token postag="NNP" regexp="yes">.{2,}<exception postag="JJ|CC|RP|DT|PRP\$?|NNPS|NNS|IN|RB|WRB|VBN" postag_regexp="yes"></exception></token>
        <marker>
          <token postag="VB|VBP" postag_regexp="yes" regexp="yes">\p{Lower}+<exception postag="VBN|VBD|JJ|IN|MD" postag_regexp="yes"></exception></token>
        </marker>
        <token postag="IN|DT" postag_regexp="yes"></token>
      </pattern>
      <message>The proper name in singular (<match no="2"></match>) must be used with a third-person verb: <suggestion><match no="3" postag="VBZ"></match></suggestion>.</message>
      <short>Grammatical problem</short>
      <example correction="walks" type="incorrect">Ann <marker>walk</marker> to the building.</example>
      <example type="correct">Bill <marker>walks</marker> to the building.</example>
      <example type="correct">Guinness <marker>walked</marker> to the building.</example>
      <example type="correct">Roosevelt and Hoover speak each other's lines.</example>
      <example type="correct">Boys are at higher risk for autism than girls.</example>
      <example type="correct">In reply, he said he was too old for this.</example>
      <example type="correct">I can see Bill looking through the window.</example>
      <example type="correct">Richard J. Hughes made his Morris County debut in his bid for the Democratic gubernatorial elections.</example>
      <example type="correct">... last night got its seven-concert Beethoven cycle at Carnegie Hall off to a good start.</example>
      <example type="correct">... and through knowing Him better to become happier and more effective people.</example>
      <!-- TODO: Fix false-positive: The library and Medical Center are to the north.-->
      <!-- The present Federal program of vocational education began in 1917. -->
    </rule>
    

    There is an online rule editor available at

    http://community.languagetool.org/ruleEditor2/

    A simple solution to the problem would be

    <!-- English rule, 2014-09-19 --> 
    <rule id="ID" name="EatI"> 
       <pattern> <token>Eat</token> <token>i</token> </pattern> 
       <message>Instead of <match no="2"/> <match no="1"/> it should be <match no="1"/> <match no="2"/></message> 
      <url>http://stackoverflow.com/questions/13016469/detecting-meaningless-and-or-grammatically-incorrect-sentence-with-languagetool/25933907#25933907</url> 
      <short>wrong order of verb and nown</short> 
      <example type='incorrect'><marker>Eat i</marker> rice</example> <example type='correct'>I eat rice</example> 
    </rule>
    

    but of course this would only cover the verb "Eat" - but I hope you get the picture how it works ...