Search code examples
javamarkupinformation-retrieval

2015: Markup Language for Analyzing Java Source Code (JavaML not working)


Question: Do you know of a tool that will work without too much compatability issues that can do markup on Java Source Code and keep most of the integrity of the program intact, or do you know how to make JavaML/JavaML 2.0 work?
The tool should ideally be able to either process many projects or be scripted into working over many projects.

Explanation: I am trying to do research on a huge set of Java Source Code (About 20.000 Projects). In essence for my research to give any sort of results I need to be able to identify comments and different parts of code in the source code, for example I need to be able to differentiate between function declaration, function call, variable declaration, variable usage, if-blocks and so forth. - At its core: What JavaML (Java Markup Language) does.

Example:

import java.applet.*;   // do not forget this import statement!
import java.awt.*;      // Or this one for the graphics!


public class FirstApplet extends Applet {
  // this method displays the applet.
  // the Graphics class is how you do all the drawing in Java
  public void paint(Graphics g) {
    g.drawString("FirstApplet", 25, 50);
  }
}

Becomes:

<java-source-program>
    <java-class-file name="FirstApplet.java">
        <import module="java.applet.*"/>
        <import module="java.awt.*"/>
        <class name="FirstApplet" visibility="public" line="5" col="0" end-line="11" end-col="0" comment="// do not forget this import statement!// Or this one for the graphics!">
            <superclass name="Applet"/>
            <method name="paint" visibility="public" id="FirstApplet:mth-15" line="8" col="2" end-line="10" end-col="2" comment="// this method displays the applet.// the Graphics class is how you do all the drawing in Java">
                <type name="void" primitive="true"/>
                <formal-arguments>
                    <formal-argument name="g" id="FirstApplet:frm-13">
                        <type name="Graphics"/>
                    </formal-argument>
                </formal-arguments>
                <block line="8" col="32" end-line="10" end-col="2" comment="// do not forget this import statement!// Or this one for the graphics!// this method displays the applet.// the Graphics class is how you do all the drawing in Java">
                    <send message="drawString">
                        <target>
                            <var-ref name="g" idref="FirstApplet:frm-13"/>
                        </target>
                        <arguments>
                            <literal-string value="FirstApplet"/>
                            <literal-number kind="integer" value="25"/>
                            <literal-number kind="integer" value="50"/>
                        </arguments>
                    </send>
                </block>
            </method>
        </class>
    </java-class-file>
</java-source-program>

But here comes the problem. I have been trying to make JavaML and JavaML 2.0 work. But there is some quite clear compatibility problems. I have for JavaML tried running virtual machines of old and new ubuntu implementations (10.04, 12.04 and 14.04) in an attempt to compile the source code as instructed by JavaML's website, for all versions I keep getting errors while configuring, there seems to be issues with the version of Jikes used in JavaML that triggers issues with the g++ compiler. - Using newer versions of Jikes renders the patch from JavaML worthless, and thereby makes compilation of JavaML impossible.

For JavaML 2.0 it comes with an .exe file that can be run on windows. You just have to set it up with the correct path to your Java install (see below for instructions). However this also gives me problems. Using the newest Java (1.8.0_40) it will tell me: 'chaos: CODE "15" is an invalid tag !!!' When I set it up with Java versions: 1.5.0_14 , 1.5.0_12 , 1.5.0 , 1.4.2_19 and 1.3.1_28 the .exe file will crash, but first produce a .tok file and an empty .xml file.

Instructions for JavaML 2.0

  1. Download the JavaML 2.0 project
  2. Extract to somewhere
  3. Start your cmd (Command prompt)
  4. Navigate to the folder you placed your JavaML 2.0 project in
  5. Find your Java implementaiton (Typically stored at: C:\Program Files (x86)\Java)
  6. Find your rt.jar file (Typically stored at: C:\Program Files (x86)\Java\jre1.8.0_40\lib\rt.jar)
  7. Write the following lines in your cmd

s*

set CLASSPATH=.;C:\Program Files (x86)\Java\jre1.8.0_40\lib\rt.jar 
jikes +B +L +c +T=3 +ulx FirstTest.java

Solution

  • If anyone still is looking this problem up I wanted to make sure there was some sort of answer.

    In my research I couldn't find a tool that acted as JavaML and I couldn't make JavaML work on any newer system. Instead I created my own tool in Java, which have given me quite some headaches, and it is most certainly not worth publishing. Creating such a tool by hand took me about 30 man-hours.

    If you really need a tool that acts like JavaML I suggest that you customize a parser, as also suggested by immibis.

    I was told by a friend to take a look at the OpenJDK compiler, and customize it. There is a guide to customization of the compiler found here. - This is however a task for people that understand language, syntax and compilers at a deep level.

    Good luck.