Sunday, June 8, 2025
News PouroverAI
Visit PourOver.AI
No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
News PouroverAI
No Result
View All Result

WTF is the Difference Between GBM and XGBoost?

December 6, 2023
in Data Science & ML
Reading Time: 4 mins read
0 0
A A
0
Share on FacebookShare on Twitter


Image by upklyak on Freepik
 

I am sure everyone knows about the algorithms GBM and XGBoost. They are go-to algorithms for many real-world use cases and competition because the metric output is often better than the other models.

For those who don’t know about GBM and XGBoost, GBM (Gradient Boosting Machine) and XGBoost (eXtreme Gradient Boosting) are ensemble learning methods. Ensemble learning is a machine learning technique where multiple “weak” models (often decision trees) are trained and combined for further purposes. 

The algorithm was based on the ensemble learning boosting technique shown in their name. Boosting techniques is a method that tries to combine multiple weak learners sequentially, with each one correcting its predecessor.  Each learner would learn from their previous mistakes and correct the errors of the previous models.

That’s the fundamental similarity between GBM and XGB, but how about the differences? We will discuss that in this article, so let’s get into it.

 

 

As mentioned above, GBM is based on boosting, which tries sequentially iterating the weak learner to learn from the error and develop a robust model. GBM developed a better model for each iteration by minimizing the loss function using gradient descent. Gradient descent is a concept to find the minimum function with each iteration, such as the loss function. The iteration would keep going until it achieves the stopping criterion.

For the GBM concepts, you can see it in the image below. 

WTF is the Difference Between GBM and XGBoost?GBM Model Concept (Chhetri et al. (2022))
 

You can see in the image above that for each iteration, the model tries to minimize the loss function and learn from the previous mistake. The final model would be the whole weak learner that sums up all the predictions from the model.

 

 

XGBoost or eXtreme Gradient Boosting is a machine-learning algorithm based on the gradient boosting algorithm developed by Tiangqi Chen and Carlos Guestrin in 2016. At a basic level, the algorithm still follows a sequential strategy to improve the next model based on gradient descent. However, a few differences of XGBoost push this model as one of the best in terms of performance and speed.

 

1. Regularization

 

Regularization is a technique in machine learning to avoid overfitting. It’s a collection of methods to constrain the model to become overcomplicated and have bad generalization power. It’s become an important technique as many models fit the training data too well.

GBM doesn’t implement Regularization in their algorithm, which makes the algorithm only focus on achieving minimum loss functions. Compared to the GBM, XGBoost implements the regularization methods to penalize the overfitting model. 

There are two kinds of regularization that XGBoost could apply: L1 Regularization (Lasso) and L2 Regularization (Ridge). L1 Regularization tries to minimize the feature weights or coefficients to zero (effectively becoming a feature selection), while L2 Regularization tries to shrink the coefficient evenly (help to deal with multicollinearity).  By implementing both regularizations, XGBoost could avoid overfitting better than the GBM.

 

2. Parallelization

 

GBM tends to have a slower training time than the XGBoost because the latter algorithm implements parallelization during the training process. The boosting technique might be sequential, but parallelization could still be done within the XGBoost process.

The parallelization aims to speed up the tree-building process, mainly during the splitting event. By utilizing all the available processing cores, the XGBoost training time can be shortened.

Speaking of speeding up the XGBoost process, the developer also preprocessed the data into their developed data format, DMatrix, for memory efficiency and improved training speed. 

 

3. Missing Data Handling

 

Our training dataset could contain missing data, which we must explicitly handle before passing them into the algorithm. However, XGBoost has its own in-built missing data handler, whereas GBM doesn’t.

XGBoost implemented their technique to handle missing data, called Sparsity-aware Split Finding. For any sparsities data that XGBoost encounters (Missing Data, Dense Zero, OHE), the model would learn from these data and find the most optimum split. The model would assign where the missing data should be placed during splitting and see which direction minimizes the loss.

 

4. Tree Pruning

 

The growth strategy for the GBM is to stop splitting after the algorithm arrives at the negative loss in the split. The strategy could lead to suboptimal results because it’s only based on local optimization and might neglect the overall picture.

XGBoost tries to avoid the GBM strategy and grows the tree until the set parameter max depth starts pruning backward. The split with negative loss is pruned, but there is a case when the negative loss split was not removed. When the split arrives at a negative loss, but the further split is positive, it would still be retained if the overall split is positive.

 

5. In-Built Cross-Validation

 

Cross-validation is a technique to assess our model generalization and robustness ability by splitting the data systematically during several iterations. Collectively, their result would show if the model is overfitting or not.

Normally, the machine algorithm would require external help to implement the Cross-Validation, but XGBoost has an in-built Cross-Validation that could be used during the training session. The Cross-Validation would be performed at each boosting iteration and ensure the produce tree is robust.

 

 

GBM and XGBoost are popular algorithms in many real-world cases and competitions. Conceptually, both a boosting algorithms that use weak learners to achieve better models. However, they contain few differences in their algorithm implementation. XGBoost enhances the algorithm by embedding regularization, performing parallelization, better-missing data handling, different tree pruning strategies, and in-built cross-validation techniques.  

Cornellius Yudha Wijaya is a data science assistant manager and data writer. While working full-time at Allianz Indonesia, he loves to share Python and Data tips via social media and writing media.



Source link

Tags: DifferenceGBMWTFXGBoost
Previous Post

Saginaw HMI Enclosures and Suspension Arm Systems from AutomationDirect – Library.Automationdirect.com

Next Post

Google’s most capable AI model yet

Related Posts

AI Compared: Which Assistant Is the Best?
Data Science & ML

AI Compared: Which Assistant Is the Best?

June 10, 2024
5 Machine Learning Models Explained in 5 Minutes
Data Science & ML

5 Machine Learning Models Explained in 5 Minutes

June 7, 2024
Cohere Picks Enterprise AI Needs Over ‘Abstract Concepts Like AGI’
Data Science & ML

Cohere Picks Enterprise AI Needs Over ‘Abstract Concepts Like AGI’

June 7, 2024
How to Learn Data Analytics – Dataquest
Data Science & ML

How to Learn Data Analytics – Dataquest

June 6, 2024
Adobe Terms Of Service Update Privacy Concerns
Data Science & ML

Adobe Terms Of Service Update Privacy Concerns

June 6, 2024
Build RAG applications using Jina Embeddings v2 on Amazon SageMaker JumpStart
Data Science & ML

Build RAG applications using Jina Embeddings v2 on Amazon SageMaker JumpStart

June 6, 2024
Next Post
Google’s most capable AI model yet

Google’s most capable AI model yet

Google’s next-gen AI model Gemini outperforms GPT-4

Google's next-gen AI model Gemini outperforms GPT-4

Yash (B.Tech Mechanical Student) Became Frontend Developer | Digikull Placement #shorts

Yash (B.Tech Mechanical Student) Became Frontend Developer | Digikull Placement #shorts

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Trending
  • Comments
  • Latest
23 Plagiarism Facts and Statistics to Analyze Latest Trends

23 Plagiarism Facts and Statistics to Analyze Latest Trends

June 4, 2024
Managing PDFs in Node.js with pdf-lib

Managing PDFs in Node.js with pdf-lib

November 16, 2023
Accenture creates a regulatory document authoring solution using AWS generative AI services

Accenture creates a regulatory document authoring solution using AWS generative AI services

February 6, 2024
Salesforce AI Introduces Moira: A Cutting-Edge Time Series Foundation Model Offering Universal Forecasting Capabilities

Salesforce AI Introduces Moira: A Cutting-Edge Time Series Foundation Model Offering Universal Forecasting Capabilities

April 3, 2024
The Importance of Choosing a Reliable Affiliate Network and Why Olavivo is Your Ideal Partner

The Importance of Choosing a Reliable Affiliate Network and Why Olavivo is Your Ideal Partner

October 30, 2023
Programming Language Tier List

Programming Language Tier List

November 9, 2023
Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

June 10, 2024
AI Compared: Which Assistant Is the Best?

AI Compared: Which Assistant Is the Best?

June 10, 2024
How insurance companies can use synthetic data to fight bias

How insurance companies can use synthetic data to fight bias

June 10, 2024
5 SLA metrics you should be monitoring

5 SLA metrics you should be monitoring

June 10, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

June 10, 2024
Facebook Twitter LinkedIn Pinterest RSS
News PouroverAI

The latest news and updates about the AI Technology and Latest Tech Updates around the world... PouroverAI keeps you in the loop.

CATEGORIES

  • AI Technology
  • Automation
  • Blockchain
  • Business
  • Cloud & Programming
  • Data Science & ML
  • Digital Marketing
  • Front-Tech
  • Uncategorized

SITEMAP

  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact us

Copyright © 2023 PouroverAI News.
PouroverAI News

No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing

Copyright © 2023 PouroverAI News.
PouroverAI News

Welcome Back!

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Fill the forms bellow to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In