Wednesday, May 14, 2025
No Result
View All Result
Eltaller Digital
  • Home
  • Latest
  • AI
  • Technology
  • Apple
  • Gadgets
  • Finance & Insurance
  • Deals
  • Automobile
  • Best AI Tools
  • Gaming
  • Home
  • Latest
  • AI
  • Technology
  • Apple
  • Gadgets
  • Finance & Insurance
  • Deals
  • Automobile
  • Best AI Tools
  • Gaming
No Result
View All Result
Eltaller Digital
No Result
View All Result
Home Artificial Intelligence

Google DeepMind Unveils ‘SALT’: Optimizing Large Language Model Training with SLMs through Machine Learning

December 19, 2024
in Artificial Intelligence
Reading Time: 5 mins read
0 0
A A
0
Google DeepMind Unveils ‘SALT’: Optimizing Large Language Model Training with SLMs through Machine Learning
Share on FacebookShare on Twitter


Understanding Large Language Models and Their Challenges

Large Language Models (LLMs) are crucial for a variety of applications, including chatbots, automated content creation, and tasks that require understanding natural language. These models are powerful because they can learn and predict complex language patterns from large sets of data. However, creating LLMs is challenging due to the high computational costs involved. Training these models requires optimizing billions of parameters using vast amounts of data, which demands significant hardware resources and time. As a result, there is a pressing need for new training methods that can overcome these challenges while maintaining or improving the quality of LLMs.

Limitations of Traditional Training Methods

Traditional methods for training LLMs are often inefficient because they treat all data equally, regardless of its complexity. These approaches do not prioritize specific data subsets that could speed up learning and fail to use existing models to aid training. This results in unnecessary computational effort, as simple data is processed alongside complex data without distinction. Additionally, standard self-supervised learning, which involves predicting the next word in a sequence, does not fully utilize smaller, less resource-intensive models that could guide and inform the training of larger models.

The Role of Knowledge Distillation

Knowledge distillation (KD) is a technique commonly used to transfer knowledge from larger, well-trained models to smaller, more efficient ones. However, KD is rarely used in reverse, where smaller models help in training larger models. This represents a missed opportunity, as smaller models, despite their limited capacity, can offer valuable insights into specific data patterns. They can efficiently identify “easy” and “hard” instances, which can significantly impact the training dynamics of LLMs.

Introducing Small model Aided Large model Training (SALT)

Researchers from Google Research and Google DeepMind have developed a new approach called Small model Aided Large model Training (SALT) to address these challenges. SALT uses smaller language models (SLMs) to enhance the efficiency of LLM training. It employs SLMs in two ways: providing additional supervision through soft labels during the initial training phase and selecting valuable data subsets for learning. This method ensures that LLMs focus on informative and challenging data sequences, reducing computational demands while improving the overall quality of the trained model.

How SALT Works

SALT operates in two phases:

Phase One: Leveraging Smaller Models

In the first phase, smaller models act as teachers, transferring their predictive insights to the larger models through knowledge distillation. This process helps align the predictions of the LLMs with the areas where the smaller models excel. Additionally, SLMs identify challenging yet learnable data subsets, allowing LLMs to focus on these critical examples early in training.

Phase Two: Traditional Self-Supervised Learning

The second phase transitions to traditional self-supervised learning, enabling the LLM to independently refine its understanding of more complex data distributions.

Benefits and Results of SALT

Experiments show that a 2.8-billion-parameter LLM trained with SALT on the Pile dataset outperformed a baseline model trained using conventional methods. Notably, the SALT-trained model excelled in reading comprehension, commonsense reasoning, and natural language inference benchmarks, using only 70% of the training steps. This resulted in a 28% reduction in training time. The SALT-trained LLM also achieved a 58.99% accuracy in next-token prediction, compared to 57.7% for the baseline, and had a lower log-perplexity of 1.868 versus 1.951, indicating better model quality.

Key Insights from SALT Research

  • SALT reduced the computational requirements for training LLMs by almost 28%, primarily by using smaller models to guide initial training phases.
  • The method consistently led to better-performing LLMs across various tasks, including summarization, arithmetic reasoning, and natural language inference.
  • By enabling smaller models to select challenging yet learnable data, SALT ensured that LLMs focused on high-value data points, expediting learning without compromising quality.
  • The approach is particularly beneficial for institutions with limited computational resources, as it leverages smaller, less costly models to aid in developing large-scale LLMs.
  • After supervised fine-tuning, SALT-trained models demonstrated better generalization capabilities in few-shot evaluations and downstream tasks.

Conclusion

SALT redefines LLM training by turning smaller models into valuable training aids. Its innovative two-stage process achieves a balance of efficiency and effectiveness, making it a pioneering approach in machine learning. SALT will be instrumental in overcoming resource constraints, enhancing model performance, and democratizing access to advanced AI technologies. This research highlights the importance of rethinking traditional practices and utilizing existing tools to achieve more with less.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

🚨 Trending: LG AI Research Releases EXAONE 3.5: Three Open-Source Bilingual Frontier AI-level Models Delivering Unmatched Instruction Following and Long Context Understanding for Global Leadership in Generative AI Excellence….

Sana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.

🧵🧵 [Download] Evaluation of Large Language Model Vulnerabilities Report (Promoted)



Source link

Related

Tags: DeepMindGoogleLanguageLargelearningmachineModelOptimizingSALTSLMstrainingUnveils
Previous Post

FBI investigates death of passenger on Royal Caribbean cruise ship docked in L.A.

Next Post

Festive Celebration 2024: Ultimate Guide for Ragnarok X Next Generation (ROX)

Related Posts

Will AI Take Over the World? How Close Is AI to World Domination?
Artificial Intelligence

Will AI Take Over the World? How Close Is AI to World Domination?

December 21, 2024
Will AI Take Over The World: What Experts Say
Artificial Intelligence

Will AI Take Over The World: What Experts Say

December 21, 2024
Google DeepMind’s Participation at NeurIPS 2024
Artificial Intelligence

Google DeepMind’s Participation at NeurIPS 2024

December 21, 2024
Are AI Models Efficiently Scaling Knowledge Storage? Meta Researchers Enhance Memory Layer Capabilities
Artificial Intelligence

Are AI Models Efficiently Scaling Knowledge Storage? Meta Researchers Enhance Memory Layer Capabilities

December 21, 2024
Ecologists Identify Limitations of Computer Vision Models in Wildlife Image Retrieval
Artificial Intelligence

Ecologists Identify Limitations of Computer Vision Models in Wildlife Image Retrieval

December 21, 2024
Efficient Text Compression for Reducing LLM Expenses
Artificial Intelligence

Efficient Text Compression for Reducing LLM Expenses

December 20, 2024
Next Post
Festive Celebration 2024: Ultimate Guide for Ragnarok X Next Generation (ROX)

Festive Celebration 2024: Ultimate Guide for Ragnarok X Next Generation (ROX)

2025 Toyota bZ4X EV Sees Price Reduction

2025 Toyota bZ4X EV Sees Price Reduction

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Trending
  • Comments
  • Latest
Enhance Your Racing Gameplay with the Mad Catz M.2.X. Pro Racing Wheel – The Game Fanatics

Enhance Your Racing Gameplay with the Mad Catz M.2.X. Pro Racing Wheel – The Game Fanatics

December 15, 2024
Installing the Nothing AI Gallery App on Any Nothing Device

Installing the Nothing AI Gallery App on Any Nothing Device

December 14, 2024
The Best 10 Luxury Perfumes for Women in 2025

The Best 10 Luxury Perfumes for Women in 2025

December 28, 2024
Roblox Winter Spotlight Guide: Rewards and Games

Roblox Winter Spotlight Guide: Rewards and Games

December 19, 2024
Rewards & Punishments Await the Curious in ‘Dungeons of Blood and Dream’

Rewards & Punishments Await the Curious in ‘Dungeons of Blood and Dream’

December 21, 2024
Master’s Program in Law Offered at ADA University in Azerbaijan

Master’s Program in Law Offered at ADA University in Azerbaijan

December 16, 2024
Bigscreen Beyond 2 Launching Next Month: Refining A Vision For VR Enthusiasts Without Apple Or Meta

Bigscreen Beyond 2 Launching Next Month: Refining A Vision For VR Enthusiasts Without Apple Or Meta

March 21, 2025
The Best 10 Luxury Perfumes for Women in 2025

The Best 10 Luxury Perfumes for Women in 2025

December 28, 2024
How Do I earn more money as a Fiverr affiliate?

How Do I earn more money as a Fiverr affiliate?

December 26, 2024
Is the Tesla Cybertruck *Really* Bulletproof? Here’s The Truth

Is the Tesla Cybertruck *Really* Bulletproof? Here’s The Truth

December 23, 2024
Will AI Take Over the World? How Close Is AI to World Domination?

Will AI Take Over the World? How Close Is AI to World Domination?

December 21, 2024
Will AI Take Over The World: What Experts Say

Will AI Take Over The World: What Experts Say

December 21, 2024
Eltaller Digital

Stay updated with Eltaller Digital – delivering the latest tech news, AI advancements, gadget reviews, and global updates. Explore the digital world with us today!

Categories

  • Apple
  • Artificial Intelligence
  • Automobile
  • Best AI Tools
  • Deals
  • Finance & Insurance
  • Gadgets
  • Gaming
  • Latest
  • Technology

Latest Updates

  • Bigscreen Beyond 2 Launching Next Month: Refining A Vision For VR Enthusiasts Without Apple Or Meta
  • The Best 10 Luxury Perfumes for Women in 2025
  • How Do I earn more money as a Fiverr affiliate?
  • About Us
  • Advertise With Us
  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact Us

Copyright © 2024 Eltaller Digital.
Eltaller Digital is not responsible for the content of external sites.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
Manage Consent
To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
Functional Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
Manage options Manage services Manage {vendor_count} vendors Read more about these purposes
View preferences
{title} {title} {title}
No Result
View All Result
  • Home
  • Latest
  • AI
  • Technology
  • Apple
  • Gadgets
  • Finance & Insurance
  • Deals
  • Automobile
  • Best AI Tools
  • Gaming

Copyright © 2024 Eltaller Digital.
Eltaller Digital is not responsible for the content of external sites.