I want to read the source code of jar files and extract the words' frequency. I know that it is possible to read the content of jar files with Java editors, but I want to do this automatically with a python script.
Do you require a Python library specifically? Krakatau is a command line tool in Python for decompiling .jar
files, you can perhaps import it and use the relevant functions from inside your script.
Alternatively, you can call it, or any other command line .jar
decompiler such as Procyon,
using Python's Subprocess.
In the 2nd case, you would most likely like to redirect and capture stdout and/or stderr. A basic call may look something like:
import os
from subprocess import Popen, PIPE
.
.
jar_decompiler_output = Popen(('jar_decompiler', '1stparam', '2ndparam',..), stdout= PIPE).communicate()[0].split(os.linesep)
Note that communicate() returns a tuple.