Notes for Prof. Hung-Yi Lee's ML Lecture: Ensemble

April 14, 2022

Ensemble: Bagging

By Bagging, we create multiple training data sets from the original training data set, train multiple models, and combine all the trained models for inference. For classification, we can use voting as the methodology of the combination; for regression, we can use averaging as the methodology.

Bagging may be helpful when our model is complex and easy to overfit. In other words, bagging is for reducing the error from variance of the model. Bagging is for the strong models.

bagging training

bagging inference

Random forest is the decision tree with bagging.

Ensemble: Boosting

Boosting is for the weak models.

Guarantee: If your ML algorithm can produce classifier with error rate smaller than 50% on training data, then You can obtain 0% error rate classifier after boosting.

Framework of boosting:

Obtain the first classifier $f_1(x)$.
Find another function $f_2(x)$, which is complementary with $f_1(x)$.
…… Find another classifier $f_n(x)$
Combine all the classifiers.

To obtain different classifiers, we can create new training sets by re-sampling or re-weighting the training. In real implementation, you only have to change the cost/objective function. For example, we multiply weights for each data points

\[L(f) = \sum_n l(f(x^n), \hat{y}^n) \to L(f) = \sum_n \mu_n l(f(x^n), \hat{y}^n)\]

By boosting, the classifiers are learned sequentially. Not like bagging, where the classifiers can be learned independently.

Adaboost

Idea: to train $f_{n}(x)$ on the new training set that fails $f_{n-1}(x)$.

adaboost-algo1

adaboost-algo2

General Formulation of Boosting

general-boosting

Finding the $f_t(x)$s is gradient boostint. Adaboost is the gradient boosting with exponential loss function.

Stacking

To train a simple (e.g. logistic regression) final classifier for the trained classifiers for a final decision. By training such further classifier rather than simple voting, we can handle the cases that some of the trained classifiers may be very poor.

stacking

References

Youtube ML Lecture 22: Ensemble

Course website

Sketching the Journey of Learning

Bali's notes & demos.