Search code examples
pdfpdfclown

IllegalArgumentException while highlighting pdf using PDFClown


I've been working on highlighting pdf using PDFClown and mostly its working fine however in few cases its giving the exception as provided in the below stacktrace :

Exception in thread "main" java.lang.IllegalArgumentException: Comparison method violates its general contract!
    at java.util.TimSort.mergeLo(Unknown Source)
    at java.util.TimSort.mergeAt(Unknown Source)
    at java.util.TimSort.mergeCollapse(Unknown Source)
    at java.util.TimSort.sort(Unknown Source)
    at java.util.TimSort.sort(Unknown Source)
    at java.util.Arrays.sort(Unknown Source)
    at java.util.Collections.sort(Unknown Source)
    at org.pdfclown.tools.TextExtractor.sort(TextExtractor.java:633)
    at org.pdfclown.tools.TextExtractor.extract(TextExtractor.java:284)
    at org.pdfclown.samples.cli.TextHighlightSample.run(TextHighlightSample.java:60)
    at com.dhawan.poc.Highlight.main(Highlight.java:9)

Link to PDF File

Any idea how can I resolve this ?


Solution

  • Which version of PDFClown do you use? Your stack trace does not match the current code at http://svn.code.sf.net/p/clown/code/trunk/java/pdfclown.lib but instead contains the following comparison used for sorting:

    public int compare(
      ITextString textString1,
      ITextString textString2
      )
    {
      Rectangle2D box1 = textString1.getBox();
      Rectangle2D box2 = textString2.getBox();
      if(isOnTheSameLine(box1,box2))
      {
        /*
          [FIX:55:0.1.3] In order not to violate the transitive condition, equivalence on x-axis
          MUST fall back on y-axis comparison.
        */
        int xCompare = Double.compare(box1.getX(), box2.getX());
        if(xCompare != 0)
          return xCompare;
      }
      return Double.compare(box1.getY(), box2.getY());
    }
    

    (http://svn.code.sf.net/p/clown/code/trunk/java/pdfclown.lib/src/org/pdfclown/tools/TextExtractor.java at revision 121)

    This fix has been introduced on May 5th, 2014. If you have a PDFClown version from before 0.1.3 or a version 0.1.3 built before that date, you should update PDFClown.