Combining an image classifier and an expert system

Would it be accurate to include an expert system in an image classifying application? (I am working with Matlab, have some experience with image processing and no experience with expert systems.)

What I'm planning on doing is adding an extra feature vector that is actually an answer to a question. Is this fine?

For example: Assume I have two questions that I want the answers to : Question 1 and Question 2. Knowing the answers to these 2 questions should help classify the test image more accurately. I understand expert systems are coded differently from an image classifier but my question is would it be wrong to include the answers to these 2 questions, in a numerical form (1 can be yes, and 0 can be no) and pass this information along with the other feature vectors into a classifier.

If it matters, my current classifier is an SVM.

Regarding training images: yes, they too will be trained with the 2 extra feature vectors.

Solution

Converting a set of comments to an answer:

A similar question in cross-validated already explains that it can be done as long as data is properly preprocessed.

In short: you can combine them as long as training (and testing) data is properly preprocessed (e.g. standardized). Standardization improves the performance of most linear classifiers because it scales the variables so they have the similar weight in the learning process and improves the numerical stability (and performance) when variables are sampled from gaussian-like distributions (which is achieved by standarization).

With that, if continuous variables are standardized and categorical variables are encoded as (-1, +1) the SVM should work well. Whether it will improve or not the performance of the classifier depends on the quality of those cathegorical variables.

Answering the other question in the comment.. while using kernel SVM with for example a chi square kernel, the rows of the training data are suppose to behave like histograms (all positive and usually l1-normalized) and therefore introducing a (-1, +1) feature breaks the kernel. Using a RBF kernel the rows of the data are suppose to be L2 normalized, and again, introducing (-1, +1) features might introduce unexpected behaviour (I'm not very sure what exactly the effect would be..).