Search code examples
machine-learningterminology

Difference between parameters, features and class in Machine Learning


I am a newbie in Machine learning and Natural language processing.

I am always confused between what are those three terms?

From my understanding:

class: The various categories our model output. Given a name of person identify whether he/she is male or female?

Lets say I am using Naive Bayes classifier.

What would be my features and parameters?

Also, what are some of the aliases of the above words which are used interchangeably?


Solution

  • Let's use the example of classifying the gender of a person. Your understanding about class is correct! Given an input observation, our Naive Bayes Classifier should output a category. The class is that category.

    Features: Features in a Naive Bayes Classifier, or any general ML Classification Algorithm, are the data points we choose to define our input. For the example of a person, we can't possibly input all data points about a person; instead, we pick a few features to define a person (say "Height", "Weight", and "Foot Size"). Specifically, in a Naive Bayes Classifier, the key assumption we make is that these features are independent (they don't affect each other): a person's height doesn't affect weight doesn't affect foot size. This assumption may or not be true, but for a Naive Bayes, we assume that it is true. In the particular case of your example where the input is just the name, features might be frequency of letters, number of vowels, length of name, or suffix/prefixes.

    Parameters: Parameters in Naive Bayes are the estimates of the true distribution of whatever we're trying to classify. For example, we could say that roughly 50% of people are male, and the distribution of male height is a Gaussian distribution with mean 5' 7" and standard deviation 3". The parameters would be the 50% estimate, the 5' 7" mean estimate, and the 3" standard deviation estimate.

    Aliases: Features are also referred to as attributes. I'm not aware of any common replacements for 'parameters'.