Latest News

Tuesday, September 22, 2020

All about XGBoost

XGBoost is an open source library that provides high-performance gradient-boosted decision trees implementation. An underlying C++ code base combined with top-sitting Python interface makes the package extremely powerful and easy to implement. Gradient Boosting is method in which new models are equipped to predict prior model residuals (i.e. errors).

Tianqi Chen, one of the co-creators of XGBoost, announced (in 2016) that the innovative system features and algorithmic optimizations in XGBoost have rendered it 10 times faster than most sought after machine learning solutions. A truly amazing technique!

Did you know CERN recognized it as the best approach to classify signals from the Large Hadron Collider.

  • XGBoost is an ensemble learning method. 
  • Ensemble learning is a systematic solution to combine the predictive power of multiple learners.
  • The resultant is a single model which gives the aggregated output from several models.
  • The models that form the ensemble, also known as base learners, could be either from the same learning algorithm or different learning algorithms. 
  • Bagging and boosting are two widely used ensemble learners
  • Though these two techniques can be used with several statistical models, the most predominant usage has been with decision trees.


Bootstrap aggregating, also called bagging, is a machine learning ensemble meta-algorithm designed to improve the stability and accuracy of machine learning algorithms used in statistical classification and regression. 
It reduces variance and helps to avoid overfitting.


  • In boosting, the trees are built sequentially such that each subsequent tree aims to reduce the errors of the previous tree. 
  • Each tree learns from its predecessors and updates the residual errors.
  • Hence, the tree that grows next in the sequence will learn from an updated version of the residuals.
  • The base learners in boosting are weak learners in which the bias is high, and the predictive power is just a tad better than random guessing. 
  • Each of these weak learners contributes some vital information for prediction, enabling the boosting technique to produce a strong learner by effectively combining these weak learners.
  • The final strong learner brings down both the bias and the variance.

  • Google+
  • Pinterest

No comments

Post a Comment