Skip to content

feat(google-genai): instrumentation for google-genai #2828

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

dinmukhamedm
Copy link
Contributor

@dinmukhamedm dinmukhamedm commented Apr 21, 2025

Fixes #2675

Screenshot 2025-04-21 at 18 45 37
  • I have added tests that cover my changes.
  • If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
  • PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
  • (If applicable) I have updated the documentation accordingly. --> I checked and it looks like the README already claims we instrument Google gemini

Important

Introduces Google GenAI instrumentation with tracing capabilities, configuration, utilities, and comprehensive tests, along with project setup and version update.

  • New Instrumentation:
    • Adds GoogleGenAiInstrumentor in __init__.py for tracing Google GenAI API calls.
    • Supports synchronous and asynchronous methods for generate_content and generate_content_stream.
    • Attributes like model, temperature, and token usage are captured.
  • Configuration and Utilities:
    • Adds Config class in config.py for exception logging.
    • Utility functions in utils.py for handling span attributes and exceptions.
  • Testing:
    • Adds unit tests in test_generate_content.py for various scenarios including async and streaming.
    • Uses VCR cassettes for HTTP request recording in tests/cassettes/.
  • Project Setup:
    • Adds pyproject.toml, poetry.toml, and project.json for project configuration.
    • Includes .flake8 for linting configuration.
  • Version Update:
    • Updates version in version.py to 0.39.2.

This description was created by Ellipsis for f9c0b0e. You can customize this summary. It will automatically update as commits are pushed.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Changes requested ❌

Reviewed everything up to f9c0b0e in 2 minutes and 45 seconds. Click for details.
  • Reviewed 1773 lines of code in 20 files
  • Skipped 1 files when reviewing.
  • Skipped posting 12 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-google-genai/opentelemetry/instrumentation/google_genai/utils.py:15
  • Draft comment:
    This is a duplicate utility function. Consider reusing the existing implementation. - function set_span_attribute (utils.py)
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% Code duplication across packages is a valid concern. However, these are separate instrumentation packages for different services (Google GenAI vs Anthropic). Having some utility code duplication between separate instrumentation packages may be acceptable or even preferred over creating dependencies between them. The function is also quite simple. I might be underestimating the value of consistency across instrumentation packages. Perhaps there should be a shared utilities package. While consistency is valuable, the cost of maintaining separate simple utility functions is likely lower than the complexity of managing cross-package dependencies for such basic functionality. The comment should be removed. While technically correct about the duplication, addressing it would add more complexity than it's worth given these are separate instrumentation packages.
2. packages/opentelemetry-instrumentation-google-genai/opentelemetry/instrumentation/google_genai/__init__.py:163
  • Draft comment:
    Avoid using print() for logging in instrumentation code; use logger.debug instead for consistency.
  • Reason this comment was not posted:
    Marked as duplicate.
3. packages/opentelemetry-instrumentation-google-genai/opentelemetry/instrumentation/google_genai/utils.py:37
  • Draft comment:
    Typo in log message: 'OpenLLMetry' should likely be 'OpenTelemetry'.
  • Reason this comment was not posted:
    Marked as duplicate.
4. packages/opentelemetry-instrumentation-google-genai/project.json:14
  • Draft comment:
    Repository URL appears to contain a typo ('openllmetry' instead of 'opentelemetry').
  • Reason this comment was not posted:
    Comment was on unchanged code.
5. packages/opentelemetry-instrumentation-google-generativeai/opentelemetry/instrumentation/google_generativeai/version.py:1
  • Draft comment:
    Ensure naming consistency across packages: verify if 'google_generativeai' should match the 'google_genai' naming used elsewhere.
  • Reason this comment was not posted:
    Comment was on unchanged code.
6. packages/opentelemetry-instrumentation-google-genai/README.md:7
  • Draft comment:
    There appears to be a typographical inconsistency on line 7. The text reads 'tracing Google Gemini prompts and completions', but given the context of the instrumentation for Google GenAI (and the link pointing to the Google GenAI library), it's likely that 'Gemini' should be changed to 'GenAI'.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The comment appears to be incorrect. Gemini is the actual name of Google's LLM product, while GenAI is the name of the Python library. The current text is accurate - it's saying that this library traces Gemini prompts using the GenAI library. Changing it would make it less clear what's being traced. Maybe there's some official naming convention I'm not aware of? Maybe Google prefers to use GenAI consistently in documentation? Even if there is some naming preference, the current text is more precise and clearer about what's being traced (Gemini model prompts) versus how it's being traced (GenAI library). The comment should be deleted as it suggests a change that would make the documentation less clear and accurate.
7. packages/opentelemetry-instrumentation-google-genai/opentelemetry/instrumentation/google_genai/utils.py:37
  • Draft comment:
    Typo detected in the log message: 'OpenLLMetry' should be corrected to 'OpenTelemetry' for consistency.
  • Reason this comment was not posted:
    Marked as duplicate.
8. packages/opentelemetry-instrumentation-google-genai/pyproject.toml:14
  • Draft comment:
    Typographical error: The repository URL on line 14 contains 'openllmetry', which is likely a typo. Consider correcting it to 'opentelemetry' to ensure consistency.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This appears to be an intentional choice of repository name, not a typo. The repository is likely named 'openllmetry' as a play on words combining 'OpenTelemetry' and 'LLM'. Looking at the GitHub URL, it's probably the actual repository name. We shouldn't assume it's a typo without more context. I could be wrong - maybe it really is a typo and should be fixed. The package name does use 'opentelemetry' consistently. Even if there's some inconsistency, we can't be certain this is a typo without checking the actual GitHub repository. The URL likely points to the correct location. We should delete this comment as it's making assumptions about the repository name without strong evidence that it's actually incorrect.
9. packages/opentelemetry-instrumentation-google-genai/tests/cassettes/test_generate_content/test_gemini_generate_content_streaming.yaml:105
  • Draft comment:
    There appear to be stray backslash characters (e.g., before 'AI-Powered Chatbots:' and similar bullet items) in the response body string. These seem unintended and should be removed for clarity.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
10. packages/opentelemetry-instrumentation-google-genai/tests/cassettes/test_generate_content/test_gemini_generate_content_streaming_async.yaml:55
  • Draft comment:
    Typo: There appears to be an extra backslash before 'State-of-the-Art Performance:'. Please remove the stray '' so that the text reads correctly.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
11. packages/opentelemetry-instrumentation-google-genai/tests/cassettes/test_generate_content/test_gemini_generate_content_streaming_async.yaml:87
  • Draft comment:
    Typo: The contraction 'It's' is split across lines as "It'" on one line and "s available" on the next. Please combine these to correctly display "It's available...".
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50% This is a test fixture file that contains recorded API responses. The line wrapping is likely due to YAML formatting and doesn't affect functionality. The split contraction appears in the raw response data that was captured from an actual API call. Modifying it would make the test data less accurate to the real response. Maybe the split contraction could cause issues when parsing the response? Maybe it's important for readability of the test file? No, this is just captured response data in a test fixture. The JSON parser will handle the line breaks correctly. Modifying it would make the test data less representative of real API responses. Delete this comment. The split contraction is part of recorded API response data in a test fixture and should be preserved exactly as received.
12. packages/opentelemetry-instrumentation-google-genai/tests/cassettes/test_generate_content/test_gemini_generate_content_streaming_async.yaml:147
  • Draft comment:
    Typo: The text block starting with ":** Assisting with tasks..." contains an unnecessary leading colon. Please remove the colon so that the text starts with '**Assisting with tasks...'.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50% This is a test fixture file that records actual API responses. The "typo" is part of the real API response from Gemini, not something the PR author wrote. Modifying the recorded response would make the test fixture inaccurate. We want to preserve the exact API response, even if it contains minor formatting issues. Maybe the typo indicates a bug in the API that should be reported? Maybe the test should be updated to use a different prompt that generates cleaner output? No - this is just a test fixture recording an actual API response. We should preserve the exact response as-is, warts and all. Changing it would defeat the purpose of having recorded test fixtures. Delete this comment. The "typo" is part of a recorded API response that should be preserved exactly as-is.

Workflow ID: wflow_uW522JyY7dyDfZAL

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link
Contributor Author

@dinmukhamedm dinmukhamedm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nirga I'd love your review on this. I left a couple comments in places where I'd like your inputs.

Also, I didn't change the SDK as part of this PR, but happy to add that too.

Comment on lines +197 to +211
set_span_attribute(
span,
gen_ai_attributes.GEN_AI_USAGE_INPUT_TOKENS,
usage_dict.get("prompt_token_count")
)
set_span_attribute(
span,
gen_ai_attributes.GEN_AI_USAGE_OUTPUT_TOKENS,
usage_dict.get("candidates_token_count")
)
set_span_attribute(
span,
SpanAttributes.LLM_USAGE_TOTAL_TOKENS,
usage_dict.get("total_token_count")
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nirga here I chose to go with the adopted conventions with input_tokens and output_tokens, not prompt_tokens and completion_tokens. I don't have a strong opinion, and our (Laminar) backend can ingest both types. Let me know your thoughts

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine. I only hate it that we have conflicting standards across the codebase :/

Comment on lines +251 to +261
if chunk.usage_metadata:
usage_dict = to_dict(chunk.usage_metadata)
# prompt token count is sent in every chunk
# (and is less by 1 in the last chunk, so we set it once);
# total token count in every chunk is greater by prompt token count than it should be,
# thus this awkward logic here
if aggregated_usage_metadata.get("prompt_token_count") is None:
aggregated_usage_metadata["prompt_token_count"] = usage_dict.get("prompt_token_count") or 0
aggregated_usage_metadata["total_token_count"] = usage_dict.get("total_token_count") or 0
aggregated_usage_metadata["candidates_token_count"] += usage_dict.get("candidates_token_count") or 0
aggregated_usage_metadata["total_token_count"] += usage_dict.get("candidates_token_count") or 0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this part here may look weird, let me know if you need a better explanation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understood it from the comments :)

from typing import Any, Optional, Union


JOIN_PARTS_SEPARATOR = " "
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, wasn't sure if we need a whitespace between content parts – happy to change this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense, no?

@@ -1 +1 @@
__version__ = "0.21.5"
__version__ = "0.39.2"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was outdated for some reason, so I changed that too

@michaelsafyan
Copy link
Contributor

@dinmukhamedm
Copy link
Contributor Author

@michaelsafyan No, but I guess it should.

My PR does instrument google_genai, but is certainly worse (both the code quality and functionality) than the contrib library mentioned. However, it sends the prompt/completion contents as attributes, while the new library does so as log events (accepted semconv).

I don't think it's easy to mix up two instrumentors for the same library. @nirga What are your thoughts on this? I remember, you mentioned a potential mechanism that would convert log events to span attributes, is there a prototype? Feel free to close this PR in favor of the contrib library

@michaelsafyan
Copy link
Contributor

I think it would be better to have Traceloop wrap/invoke the OTel Contrib instrumentation. That way, more (joint) effort can be put into improving the (shared) instrumentation for the Google Gen AI instrumentation library rather than competing efforts. That work could include adding options to support multiple different kinds of output formats.

@nirga
Copy link
Member

nirga commented Apr 22, 2025

I completely agree, @michaelsafyan! The main blocker right now is aligning on the attribute format, since we can't use the current events format that's implemented. Once we finalize that, we can move forward with switching to the contrib implementation.

@traceloop traceloop deleted a comment from ellipsis-dev bot Apr 22, 2025
Copy link
Member

@nirga nirga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work @dinmukhamedm 🥳
One last think that is missing is using this in the SDK and a sample app. Mind adding them?
(Also replied separately to @michaelsafyan's comment)

Comment on lines +197 to +211
set_span_attribute(
span,
gen_ai_attributes.GEN_AI_USAGE_INPUT_TOKENS,
usage_dict.get("prompt_token_count")
)
set_span_attribute(
span,
gen_ai_attributes.GEN_AI_USAGE_OUTPUT_TOKENS,
usage_dict.get("candidates_token_count")
)
set_span_attribute(
span,
SpanAttributes.LLM_USAGE_TOTAL_TOKENS,
usage_dict.get("total_token_count")
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine. I only hate it that we have conflicting standards across the codebase :/

from typing import Any, Optional, Union


JOIN_PARTS_SEPARATOR = " "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense, no?

Comment on lines +251 to +261
if chunk.usage_metadata:
usage_dict = to_dict(chunk.usage_metadata)
# prompt token count is sent in every chunk
# (and is less by 1 in the last chunk, so we set it once);
# total token count in every chunk is greater by prompt token count than it should be,
# thus this awkward logic here
if aggregated_usage_metadata.get("prompt_token_count") is None:
aggregated_usage_metadata["prompt_token_count"] = usage_dict.get("prompt_token_count") or 0
aggregated_usage_metadata["total_token_count"] = usage_dict.get("total_token_count") or 0
aggregated_usage_metadata["candidates_token_count"] += usage_dict.get("candidates_token_count") or 0
aggregated_usage_metadata["total_token_count"] += usage_dict.get("candidates_token_count") or 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understood it from the comments :)

@dinmukhamedm
Copy link
Contributor Author

Mind adding them?

Sure will do

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

🚀 Feature: Instrument the new google genai library
3 participants