Search code examples
javac#noclassdeffounderrorapache-tikaikvm

IKVM C# Tika Implementation - NoClassDefFoundError - sun.java2d.Disposer


I have a small library that utilizes IKVM to run Tika (1.2) for the purposes of extracting text and metadata for use within Lucene. I grab document and image paths from a CMS we are using, and pass them through here:

public TextExtractionResult Extract(string filePath)
    {

        var parser = new AutoDetectParser();
        var metadata = new Metadata();
        var parseContext = new ParseContext();
        Class parserClass = parser.GetType();
        parseContext.set(parserClass, parser);

        try
        {
            // Attempt to fix ImageParser "NoClassDefFoundError"
            java.lang.System.setProperty("java.awt.headless", "true");

            var file = new File(filePath);
            var url = file.toURI().toURL();
            using (InputStream inputStream = TikaInputStream.get(url, metadata))
            {
                parser.parse(inputStream, getTransformerHandler(), metadata, parseContext);
                inputStream.close();
            }

            return AssembleExtractionResult(_outputWriter.toString(), metadata);
        }
        catch (Exception ex)
        {
            throw new ApplicationException("Extraction of text from the file '{0}' failed.".ToFormat(filePath), ex);
        }
    }

Only when the files are .png, it bombs with this error:

enter image description here

It seems as though it most likely coming from Tika's ImageParser.

For those who are interested - You can see getTransformerHandler() here:

private TransformerHandler getTransformerHandler()
    {

        var factory = TransformerFactory.newInstance() as SAXTransformerFactory;
        TransformerHandler handler = factory.newTransformerHandler();
        handler.getTransformer().setOutputProperty(OutputKeys.METHOD, "text");
        handler.getTransformer().setOutputProperty(OutputKeys.INDENT, "yes");
        handler.getTransformer().setOutputProperty(OutputKeys.INDENT, "UTF-8");

        _outputWriter = new StringWriter();
        handler.setResult(new StreamResult(_outputWriter));
        return handler;
    }

I have looked around and keep being pointed in the direct of running headless, so I already tried that with no luck. Because this is a C# implementation in IKVM, is something missing? It works on all other documents as far as I can tell (.jpeg, .docx, .pdf, etc.).

Thanks to those who know more about Tika + IKVM implementations than I do.


Solution

  • Apache Tika 1.2 was released back on 17 July 2012, and there have been a lot of fixes and improvements since then

    You should upgrade to the most recent version of Apache Tika (1.12 as of writing), and that should solve your issue