Saturday, September 13, 2025
No Result
View All Result
Eltaller Digital
  • Home
  • Latest
  • AI
  • Technology
  • Apple
  • Gadgets
  • Finance & Insurance
  • Deals
  • Automobile
  • Best AI Tools
  • Gaming
  • Home
  • Latest
  • AI
  • Technology
  • Apple
  • Gadgets
  • Finance & Insurance
  • Deals
  • Automobile
  • Best AI Tools
  • Gaming
No Result
View All Result
Eltaller Digital
No Result
View All Result
Home Artificial Intelligence

Efficient Text Compression for Reducing LLM Expenses

December 20, 2024
in Artificial Intelligence
Reading Time: 5 mins read
0 0
A A
0
Efficient Text Compression for Reducing LLM Expenses
Share on FacebookShare on Twitter


LLMs Are Great… If They Can Handle Your Data

Originally published at https://blog.developer.bazaarvoice.com on October 28, 2024.

Large language models (LLMs) are powerful tools for handling unstructured text. However, they face a challenge when the text exceeds their context window. Bazaarvoice encountered this issue while developing its AI Review Summaries feature. With millions of user reviews, fitting them all into the context window of even the latest LLMs is impractical, and doing so would be prohibitively expensive.

In this post, I’ll explain how Bazaarvoice addressed this problem by compressing input text without losing meaning. We implemented a multi-pass hierarchical clustering approach that allows us to adjust the level of detail for compression, regardless of the chosen embedding model. This technique made our Review Summaries feature financially feasible and prepared us to scale our business in the future.

Bazaarvoice has been collecting user-generated product reviews for nearly 20 years, resulting in a large volume of unstructured data. These reviews vary in length and content. LLMs are excellent for processing unstructured text, as they can identify relevant information among distractions.

However, LLMs have limitations, such as the context window, which determines how many tokens (approximately the number of words) can be processed at once. State-of-the-art models like Anthropic’s Claude version 3 have large context windows of up to 200,000 tokens, enough to fit small novels. Yet, the internet is vast, and our user-generated reviews are no exception.

We faced the context window limit while building our Review Summaries feature, which summarizes all reviews for a specific product on a client’s website. Over time, many products accumulated thousands of reviews, quickly exceeding the LLM context window. Some products even have millions of reviews, requiring significant re-engineering of LLMs to process in one prompt.

Even if technically feasible, the costs would be prohibitive. LLM providers charge based on the number of input and output tokens, and approaching context window limits for millions of products can lead to cloud hosting bills exceeding six figures.

To overcome these technical and financial limitations, we focused on a simple insight: many reviews convey the same message. Review summaries capture recurring insights, themes, and sentiments. By leveraging data duplication, we reduced the amount of text sent to the LLM, preventing context window limits and lowering operating costs.

To achieve this, we needed to identify text segments conveying the same message. This task is challenging because people often use different words or phrases to express the same idea.

Fortunately, identifying semantically similar text has been an active research area in natural language processing. Agirre et al.’s 2013 study provided human-labeled data of semantically similar sentences, known as the STS Benchmark. The dataset asks humans to rate semantic similarity on a scale of 1–5.

The STS Benchmark is used to evaluate how well a text embedding model associates semantically similar sentences in its high-dimensional space. We use Pearson’s correlation to measure how well the embedding model represents human judgments.

Thus, we use an embedding model to identify semantically similar phrases from product reviews, removing repeated phrases before sending them to the LLM.

Our approach is as follows:

  1. Segment product reviews into sentences.
  2. Compute an embedding vector for each sentence using a network that performs well on the STS benchmark.
  3. Use agglomerative clustering on all embedding vectors for each product.
  4. Retain an example sentence — the one closest to the cluster centroid — from each cluster to send to the LLM, discarding other sentences in the cluster.
  5. Consider small clusters as outliers and randomly sample them for inclusion in the LLM.
  6. Include the number of sentences each cluster represents in the LLM prompt to ensure the weight of each sentiment is considered.

    This method may seem straightforward, but there were challenges to address before trusting this approach.

    First, we ensured the model embedded text in a space where semantically similar sentences are close together, and dissimilar ones are far apart. We used the STS benchmark dataset and computed Pearson correlation for the models we evaluated. As AWS is our cloud provider, we assessed their Titan Text Embedding models.

    AWS’s embedding models performed well in embedding semantically similar sentences, which was beneficial as we could use them off the shelf at a low cost.

    The next challenge was enforcing semantic similarity during clustering. Ideally, no cluster would have two sentences with semantic similarity less than what humans accept — a score of 4. However, these scores don’t directly translate to embedding distances needed for clustering thresholds.

    To address this, we used the STS benchmark dataset, computed distances for all pairs in the training dataset, and fit a polynomial from scores to distance thresholds.

    This polynomial helps compute the distance threshold needed to meet any semantic similarity target. For Review Summaries, we selected a score of 3.5, ensuring clusters contain sentences that are "roughly" to "mostly" equivalent or more.

    This can be done on any embedding network, allowing us to experiment with different networks as they become available and quickly swap them without worrying about semantic dissimilarity.

    We knew our semantic compression was reliable, but it was unclear how much compression we could achieve. The compression varied across products, clients, and industries.

    Without semantic information loss (a hard threshold of 4), we achieved a compression ratio of 1.18 (a space savings of 15%).

    Clearly, lossless compression wasn’t sufficient for financial viability.

    Our distance selection method offered an interesting possibility: we could gradually increase information loss by repeatedly running clustering at lower thresholds for remaining data.

    The approach is as follows:

  7. Run clustering with a threshold selected from score = 4 (lossless).
  8. Select outlying clusters with few vectors for the next phase and rerun clustering on clusters with fewer than 10 vectors.
  9. Run clustering again with a threshold selected from score = 3 (not lossless, but acceptable).
  10. Select clusters with fewer than 10 vectors.
  11. Repeat as desired, continuously decreasing the score threshold.

    At each clustering pass, we sacrifice more information loss but gain more compression without affecting the lossless representative phrases selected in the first pass.

    This approach is useful for Review Summaries, where high semantic similarity is desired, and other cases where less semantic information loss is acceptable but prompt input costs are a concern.

    Despite this, many clusters still had a single vector even after lowering the score threshold. These are considered outliers and randomly sampled for the final prompt, ensuring it contains 25,000 tokens or fewer.

    The multi-pass clustering and random outlier sampling allow for semantic information loss in exchange for a smaller context window to send to the LLM. This raises the question: how accurate are our summaries?

    At Bazaarvoice, authenticity is crucial for consumer trust, and our Review Summaries must authentically represent all voices in the reviews. Any lossy compression approach risks misrepresenting or excluding consumers who contributed reviews.

    To validate our compression technique, we measured it directly. For each product, we sampled reviews and used LLM Evals to determine if the summary was representative and relevant to each review, providing a metric to evaluate our compression.

    Over 20 years, we’ve collected nearly a billion user-generated reviews and needed to generate summaries for tens of millions of products. Many products have thousands of reviews, some up to millions, which would exhaust LLM context windows and be costly.

    Using our approach, we reduced input text size by 97.7% (a compression ratio of 42), allowing us to scale this solution for all products and any review volume. Additionally, the cost of generating summaries for our billion-scale dataset decreased by 82.4%, including the cost of embedding sentence data and storing them in a database.



Source link

Related

Tags: CompressionefficientExpensesLLMReducingtext
Previous Post

The 1986 Tied Test: Dehydration, a Double Century, and Disputed Decisions

Next Post

Guide to Creating a Fun, Safe, and Inviting Xbox Gaming Experience for Your Kids This Holiday Season

Related Posts

Artificial Intelligence

MLCommons: Benchmarking Machine Learning for a Better World

September 7, 2025
Artificial Intelligence

Generative Video AI: Creating Viral Videos with One Click

September 7, 2025
Artificial Intelligence

Realtime APIs: The Next Transformational Leap for AI Agents

September 7, 2025
Artificial Intelligence

AI in Cyber Threat Simulation: Outwitting Hackers with Bots

September 7, 2025
Artificial Intelligence

Responsible AI: How to Build Ethics into Intelligent Systems

September 7, 2025
Artificial Intelligence

Relevance AI & Autonomous Teams: Streamlining Work with AI

September 7, 2025
Next Post
Guide to Creating a Fun, Safe, and Inviting Xbox Gaming Experience for Your Kids This Holiday Season

Guide to Creating a Fun, Safe, and Inviting Xbox Gaming Experience for Your Kids This Holiday Season

Rick Hendrick Set to Pay a Premium for the Inaugural 2025 Corvette ZR1

Rick Hendrick Set to Pay a Premium for the Inaugural 2025 Corvette ZR1

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Trending
  • Comments
  • Latest
Get Your Steam Deck Payment Plan – Easy Monthly Options

Get Your Steam Deck Payment Plan – Easy Monthly Options

December 21, 2024
Will AI Take Over the World? How Close Is AI to World Domination?

Will AI Take Over the World? How Close Is AI to World Domination?

December 21, 2024
Installing the Nothing AI Gallery App on Any Nothing Device

Installing the Nothing AI Gallery App on Any Nothing Device

December 14, 2024
Applying Quartz Filters to Images in macOS Preview

Applying Quartz Filters to Images in macOS Preview

December 19, 2024
The Best 10 Luxury Perfumes for Women in 2025

The Best 10 Luxury Perfumes for Women in 2025

December 28, 2024
Bridging Knowledge Gaps with AI-Powered Contextual Search

Bridging Knowledge Gaps with AI-Powered Contextual Search

December 19, 2024

MLCommons: Benchmarking Machine Learning for a Better World

September 7, 2025

Generative Video AI: Creating Viral Videos with One Click

September 7, 2025

Realtime APIs: The Next Transformational Leap for AI Agents

September 7, 2025

AI in Cyber Threat Simulation: Outwitting Hackers with Bots

September 7, 2025

Responsible AI: How to Build Ethics into Intelligent Systems

September 7, 2025

Relevance AI & Autonomous Teams: Streamlining Work with AI

September 7, 2025
Eltaller Digital

Stay updated with Eltaller Digital – delivering the latest tech news, AI advancements, gadget reviews, and global updates. Explore the digital world with us today!

Categories

  • Apple
  • Artificial Intelligence
  • Automobile
  • Best AI Tools
  • Deals
  • Finance & Insurance
  • Gadgets
  • Gaming
  • Latest
  • Technology

Latest Updates

  • MLCommons: Benchmarking Machine Learning for a Better World
  • Generative Video AI: Creating Viral Videos with One Click
  • Realtime APIs: The Next Transformational Leap for AI Agents
  • About Us
  • Advertise With Us
  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact Us

Copyright © 2024 Eltaller Digital.
Eltaller Digital is not responsible for the content of external sites.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
Manage Consent
To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
Functional Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
Manage options Manage services Manage {vendor_count} vendors Read more about these purposes
View preferences
{title} {title} {title}
No Result
View All Result
  • Home
  • Latest
  • AI
  • Technology
  • Apple
  • Gadgets
  • Finance & Insurance
  • Deals
  • Automobile
  • Best AI Tools
  • Gaming

Copyright © 2024 Eltaller Digital.
Eltaller Digital is not responsible for the content of external sites.