python algorithm classification document-classification

What classification algorithm can handle numeric and nominal data

This is probably a newbie question on possible classification algorithm, so please bear with me. I have a dataset that comprises both nominal and numeric attribute which may look the example below (not actual dataset). What kind of algorithm would be best to predicate the class and get the accuracy (preferably in Python/Java)?

Classes: classA, classB, classC

attribute1: Recurrence <Yes, No>
attribute2: Subject <Math, Science, Geography>
attribute3: ProbabilityA <0.0 - 1.0>
atrribute4: ProbabilityB <0.0 - 1.0>
attribute5: ProbabilityC <0.0 - 1.0>

The nominal data can contain numeric value of [1,-1] where 1 represent present and -1 not present, or it can be a set of string values such as ['YES', 'NO'] or ['Type1', 'Type2', 'Type3']. The numeric value is used to express the likelihood of an attribute. For example [0-1], The closer the value to 1, the more likely it evaluate to true.

Solution

Well, this is by no means a "newbie question", and is in fact quite complicated. While Inti's suggestion is certainly a good start, it really depends upon so many factors that there is no easy "right answer".

Some things to consider:

Speed vs. accuracy
Memory constraints
Training set (how large of a data set you can use to "learn" how to classify)
Test data set (how much of the data set you'll keep "in reserve" to verify / measure the quality of your algo)
Implementation: e.g., will this be running in a "batch mode", or will you need to make a classification in an ongoing fashion for each new observation you wish to categorize.
etc.

Until some more info like this is known, it's tough to give very precise details. (In general, on this forum, the more effort you put into the question, the more effort others put into their answers.)

That being said, here are some buzz words to start looking up, to get your head around the possibilities:

random forest / CART / decision tree (different algos, but similar in concept)
Naive Bayes
SVM (likely not helpful with the nominal parameters you have)
Neural Net
Clustering
KNN, as Inti suggests
many more...

The world of potential options in machine learning algos is pretty huge, nothing works perfectly, and nothing works equally well in all situations. This wiki page is not so great, but it's a decent start on finding algos.

Once you've decided whatever algo you think will work for your case, then look up a library / implementation in Python or Java or what-have-you. With SciPy and NumPy, you can assume that Python has a pretty large library of possibilities. I suspect Java also has a huge library, but I personally know Python far better.