Search code examples
c#data-miningneural-networkdata-analysis

Segmenting a set of data with discrete and continuos data values into one of two groups without using analysis services?


Say I have a table with the following scheme (note: this example is hypothetical, though the real use case is similar).

Type      | Name         | Notes
=====================================================================================
Gender    | Gender       | Either Male or Female (not null)
GeoCoord  | Location     | Lattitude and longitude coordinates
string    | FullName     | 
Date      | BirthDate    | 
bool?     | LikesToParty | Data from a survey (null for people who didn't answer)

Manually looking at the data I know there is a strong correlation between LikesToParty and certain specific configurations of the other values. For example, men who have Wells as their middle name and who are between 15 and 30 years old and who comes from the LA area almost certainly has true in LikeToParty. I would like to predict the value of LikesToParty for users that didn't answer the survey.

How do I mine this data using C# without having to buy an expensive package like analysis services? Are there any free libraries for c#?

I've already made a neural network that is capable of most of what I describe in my example above, but it is extremely slow to train and I'm not sure about if this is the right way to go. Maybe there is a better, more efficient, way to segment the data?


Solution

  • Because you are using both discrete and continous data, you might use a decision tree (C4.5, CART). There are some implemented libraries for them; don't beware of Java libs, as you can use the IKVM implementation of Java. For example, I have used the Weka API from C#.