I'm developing a Weka app like Akinator by using the j48 method.
Sample: http://jbossews-vdoctor.rhcloud.com/doctor
The following is the app's table definition and sample data qa means question id(Please refer the master which can be set by user) + answer(1:Yes, 2: I don't know, 3: No). 1 line per 1 question & answer.
id,qa,class
A,13,1
A,23,1
B,13,2
B,21,2
The point is to find a way to select the question which can maximize the entropy. Currently this app is regarding first node id of decision tree as the best question. And then it narrows down the options by this elimination way.
But the accuracy was too bad to run correctly so I'd like to improve it. I noticed that the qa column was identified as numeric so it could not build the correct decision tree.
I am confused what I should do for improvement. Dataset? Table definition? Logic?
This is quite a broad question that you are asking, and without code or a clear understanding of the problem it is quite difficult to answer, but I'll give some tips for improvement:
Table Definition
What may have made more sense here is to have an attribute for each question, instead of using a single instance per question. For Example, instead of id, qa and class, you could have A, B, C, D, E, F and Disease. (I believe there were six questions, and naming each attribute would be recommended instead of A-F)
Dataset
You will need at least as many cases as there are diseases, if not more for defining multiple subsets of the problem space for the same disease. There are likely cases where some questions are irrelevant or missing, and the model may need to handle such situations.
Logic
In such a case, you might be able to do the questionnaire by starting with the root node and asking questions until you reach the estimated class. This way, you can ask from node to node until a class is reached.
I hope this helps in improving your existing model.
NOTE: I tried your questionnaire and answered No to all of your questions, and I strangely ended up with Trichomoniasis. Perhaps there could be a 'No Disease' category for your training data also.