Why is naive Bayesian classification called “naive ?
Why is naive Bayesian classification called “naive”? Briefly outline the major ideas of naive Bayesian classification.
Naive Bayesian classification is called naive because it assumes class conditional independence. That is, the effect of an attribute value on a given class is independent of the values of the other attributes.
This assumption is made to reduce computational costs, and hence is considered “na¨ıve”. The major idea behind na¨ıve Bayesian classification is to try and classify data by maximizing P(X|Ci)P(Ci) (where i is an index of the class) using the Bayes’ theorem of posterior probability.
In general:
- We are given a set of unknown data tuples, where each tuple is represented by an n-dimensional vector, X = (x1, x2. . . xn) depicting n measurements made on the tuple from n attributes, respectively A1,A2, ..,An. We are also given a set of m classes, C1,C2, . . .Cm.
- Using Bayes theorem, the na¨ıve Bayesian classifier calculates the posterior probability of each class conditioned on X. X is assigned the class label of the class with the maximum posterior probability conditioned on X. Therefore, we try to maximize P(Ci|X) = P(X|Ci)P(Ci)/P(X). However, since P(X) is constant for all classes, only P(X|Ci)P(Ci) need be maximized. If the class prior probabilities are not known, then it is commonly assumed that the classes are equally likely, i.e. P(C1) = P(C2) = · · · = P(Cm), and we would therefore maximize P(X|Ci). Otherwise, we maximize P(X|Ci)P(Ci). The class prior probabilities may be estimated by P(Ci) = si s , where si is the number of training tuples of class Ci, and s is the total number of training tuples.
- In order to reduce computation in evaluating P(X|Ci), the na¨ıve assumption of class conditional independence is made. This presumes that the values of the attributes are conditionally independent of one another, given the class label of the tuple, i.e., that there are no dependence relationships among the attributes. – If Ak is a categorical attribute then P(xk|Ci) is equal to the number of training tuples in Ci that have xk as the value for that attribute, divided by the total number of training tuples in Ci. – If Ak is a continuous attribute then P(xk|Ci) can be calculated using a Gaussian density function.
Comments
Post a Comment