Search code examples
javacode-analysisanalysisspssspss-modeler

Reading the spss file java


  SPSSReader reader = new SPSSReader(args[0], null);
            Iterator it = reader.getVariables().iterator();
            while (it.hasNext())
             {
                System.out.println(it.next());
            }

I am using this SPSSReader to read the spss file. Here,every string is printed with some junk characters appended with it.

Obtained Result :

StringVariable: nameogr(nulltpc{)(10)
NumericVariable: weightppuo(nullf{nd)
DateVariable: datexsgzj(nulllanck)
DateVariable: timeppzb(null|wt{l)
DateVariable: datetimegulj{(null|ns)
NumericVariable: commissionyrqh(nullohzx)
NumericVariable: priceeub{av(nullvlpl)

Expected Result :

 StringVariable: name (10)
 NumericVariable: weight
 DateVariable: date
 DateVariable: time
 DateVariable: datetime
 NumericVariable: commission
 NumericVariable: price

Thanks in advance :)


Solution

  • I tried recreating the issue and found the same thing.
    Considering that there is a licensing for that library (see here), I would assume that this might be a way of the developers to ensure that a license is bought as the regular download only contains a demo version as evaluation (see licensing before the download).

    As that library is rather old (copyright of the website is 2003-2008, requirement for the library is Java 1.2, no generics, Vectors are used, etc), I would recommend a different library as long as you are not limited to the one used in your question.

    After a quick search, it turned out that there is an open source spss reader here which is also available through Maven here.

    Using the example on the github page, I put this together:

    import com.bedatadriven.spss.SpssDataFileReader;
    import com.bedatadriven.spss.SpssVariable;
    
    public class SPSSDemo {
    
        public static void main(String[] args) {
            try {
                SpssDataFileReader reader = new SpssDataFileReader(args[0]);
    
                for (SpssVariable var : reader.getVariables()) {
                    System.out.println(var.getVariableName());
                }
    
            } catch (Exception ex) {
                ex.printStackTrace();
            }
        }
    }
    

    I wasn't able to find stuff that would print NumericVariable or similar things but as those were the classnames of the library you were using in the question, I will assume that those are not SPSS standardized. If they are, you will either find something like that in the library or you can open an issue on the github page.

    Using the employees.sav file from here I got this output from the code above using the open source library:

    resp_id
    gender
    first_name
    last_name
    date_of_birth
    education_type
    education_years
    job_type
    experience_years
    monthly_income
    job_satisfaction
    

    No additional characters no more!

    Edit regarding the comment:

    That is correct. I read through some SPSS stuff though and from my understanding there are only string and numeric variables which are then formatted in different ways. The version published in maven only gives you access to the typecode of a variable (to be honest, no idea what that is) but the github version (that does not appear to be published on maven as 1.3-SNAPSHOT unfortunately) does after write- and printformat have been introduced.

    You can clone or download the library and run mvn clean package (assuming you have maven installed) and use the generated library (found under target\spss-reader-1.3-SNAPSHOT.jar) in your project to have the methods SpssVariable#getPrintFormat and SpssVariable#getWriteFormat available.

    Those return an SpssVariableFormat which you can get more information from. As I have no clue what all that is about, the best I can do is to link you to the source here where references to the stuff that was implemented there should help you further (I assume that this link referenced to in the documentation of SpssVariableFormat#getType is probably the most helpful to determine what kind of format you have there.

    If absolutely NOTHING works with that, I guess you could use the demo version of the library in the question to determine the stuff through it.next().getClass().getSimpleName() as well but I would resort to that only if there is no other way to determining the format.