Search code examples
pythonapache-tika

How to manually install the .jar file for tika?


I am using tika for extracting text from pdf in python. But, it downloads the .jar on every run. which is time consuming.

[MainThread  ] [INFO ]  Retrieving http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server/1.19/tika-server-1.19.jar to /tmp/tika-server.jar.

This happens every time I run the code. Is there a way to manually do it once and stop tika to do it everytime?


Solution

  • I know it´s been a while and you probably figured something out already, but for others like me still looking for solution I would like to sugest other topic in wich the guy who asks the question presentes his own functional aproach.

    Moreover, I noticed that tika demands internet access only at the very first run, so, if you manage to deny internet access for it after setting everything up, it won´t waste time downloading new files.