Skip to content

Remove tokenizer elbow mechanism for accurate token estimation #1668

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

sharoneyal
Copy link
Contributor

@sharoneyal sharoneyal commented Apr 2, 2025

User description

To support cases where model is neither OpenAI nor Claude.


PR Type

Enhancement, Configuration changes


Description

  • Removed the tokenizer elbow mechanism for token estimation.

  • Simplified token counting logic for non-OpenAI/Claude models.

  • Deleted the model_token_count_estimate_factor configuration option.


Changes walkthrough 📝

Relevant files
Enhancement
token_handler.py
Simplified token counting logic and removed elbow mechanism

pr_agent/algo/token_handler.py

  • Removed the elbow mechanism for token estimation.
  • Simplified logic for token counting in non-OpenAI/Claude models.
  • Eliminated unnecessary warnings and fallback calculations.
  • +1/-9     
    Configuration changes
    configuration.toml
    Removed token count estimate factor configuration               

    pr_agent/settings/configuration.toml

  • Deleted the model_token_count_estimate_factor configuration option.
  • Updated configuration to reflect removal of elbow mechanism.
  • +0/-1     

    Need help?
  • Type /help how to ... in the comments thread for any questions about Qodo Merge usage.
  • Check out the documentation for more information.
  • Copy link
    Contributor

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Potential Regression

    The PR removes the elbow mechanism and fallback calculation for non-OpenAI/Claude models without providing an alternative for accurate token counting. This might lead to underestimation of tokens for these models.

    return encoder_estimate

    Copy link
    Contributor

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    Learned
    best practice
    Add null safety checks before performing operations on potentially None values to prevent runtime errors

    Add null safety checks for model before performing the lower() operation. If
    get_settings().config.model returns None, the code will raise an AttributeError.
    Also, add a check for encoder_estimate to ensure it's not None before returning
    it.

    pr_agent/algo/token_handler.py [125-128]

    -model = get_settings().config.model.lower()
    -if force_accurate and 'claude' in model and get_settings(use_context=False).get('anthropic.key'):
    +model = get_settings().config.model
    +model = model.lower() if model else ""
    +if force_accurate and model and 'claude' in model and get_settings(use_context=False).get('anthropic.key'):
         return self.calc_claude_tokens(patch) # API call to Anthropic for accurate token counting for Claude models
     #else: Non Anthropic provided model
     
    -return encoder_estimate
    +return encoder_estimate or 0

    [To ensure code accuracy, apply this suggestion manually]

    Suggestion importance[1-10]: 6
    Low
    • More
    • Author self-review: I have reviewed the PR code suggestions, and addressed the relevant ones.

    @sharoneyal sharoneyal closed this Apr 2, 2025
    @sharoneyal sharoneyal deleted the es/remove_token_elbow_mechanism_in_case_of_force_accurate_token_calc branch April 3, 2025 09:54
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    1 participant