Computer Science Tutorials

Posts

Showing posts from 2018

Why is naive Bayesian classification called “naive ?

March 14, 2018

Why is naive Bayesian classification called “naive”? Briefly outline the major ideas of naive Bayesian classification. Naive Bayesian classification is called naive because it assumes class conditional independence. That is, the effect of an attribute value on a given class is independent of the values of the other attributes. This assumption is made to reduce computational costs, and hence is considered “na¨ıve”. The major idea behind na¨ıve Bayesian classification is to try and classify data by maximizing P(X|Ci)P(Ci) (where i is an index of the class) using the Bayes’ theorem of posterior probability. In general: We are given a set of unknown data tuples, where each tuple is represented by an n-dimensional vector, X = (x1, x2. . . xn) depicting n measurements made on the tuple from n attributes, respectively A1,A2, ..,An. We are also given a set of m classes, C1,C2, . . .Cm. Using Bayes theorem, the na¨ıve Bayesian classifier calculates the posterior pr...

Tree pruning useful in decision tree induction

March 14, 2018

Why is tree pruning useful in decision tree induction? What is a drawback of using a separate set of tuples to evaluate pruning? The decision tree built may overfit the training data. There could be too many branches, some of which may reflect anomalies in the training data due to noise or outliers. Tree pruning addresses this issue of overfitting the data by removing the least reliable branches (using statistical measures). This generally results in a more compact and reliable decision tree that is faster and more accurate in its classification of data. The drawback of using a separate set of tuples to evaluate pruning is that it may not be representative of the training tuples used to create the original decision tree. If the separate set of tuples are skewed, then using them to evaluate the pruned tree would not be a good indicator of the pruned tree’s classification accuracy. Furthermore, using a separate set of tuples to evaluate pruning means there are less tuples to use for ...

Assignment Questions

January 09, 2018