Search code examples
machine-learningclassificationregressionweka

WEKA Preprocess and Predicting student grade


I'm working on an application that performs prediction for each year based on the student grades. The grades of the student ranges from 70 to 97. My dataset contains student grades from 1st year to 5th year: GWA1 - 1st Year General Weighted Average GWA2 - 2nd Year General Weighted Average and so on, until GWA5

I'm planning to use J48 for the prediction.

I have different issues,

  1. My dataset contains different student year level. If the student is currently in his 4th year, then GWA4 and GWA5 are zero. Should I only take graduated students? (students who have GWA1-GWA5).
  2. There are courses that offers up to 4 years only. Their GWA5 in their dataset is 0.
  3. The program also needs to predict the grade in his current year. For example, for a 2nd year student, the program predicts the GWA2. For a 4th year student, the program predicts the GWA4.
  4. How do I preprocess the data? Should I classify the grades to a grade label such as Excellent, Average, Poor?

Sample Dataset:

GWA1     GWA2     GWA3     GWA4     GWA5
83.6     87.5     90.2     89.1     91.2
76.4     78.2     77.6     80.9     79.4
93.6     91.5     92.7     91.1     92.7

Solution

  • Most importantly, your dataset is not tidy. Columns need to be converted to rows , e.g. into this table layout:

    student year gpa passed s1 1 83.6 yes s1 2 76.4 no ...

    I made the "passed" column up. But this design is easier for J48 to work with, in its default configuration. WHich is, to classify categorical or binary variables - J48 can't handle numerical attributes in the "class" attribute (= the To Be Predicted attribute).

    I'm not sure if you can also predict numerical values with J48. I think weka and the dataset need to be tweaked quite a bit.

    Check the customization dialogs, read the documentation, and google "classification by regression". Or better yet, use the LinearRegression classifier instead of J48.

    I any case, in the "Filter" panel, you need to add the "AddClassification" supervised-instance-filter, and then set its "Output Classifications" option to True.