-
Notifications
You must be signed in to change notification settings - Fork 724
feat(google-genai): instrumentation for google-genai #2828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Changes requested ❌
Reviewed everything up to f9c0b0e in 2 minutes and 45 seconds. Click for details.
- Reviewed
1773
lines of code in20
files - Skipped
1
files when reviewing. - Skipped posting
12
draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-google-genai/opentelemetry/instrumentation/google_genai/utils.py:15
- Draft comment:
This is a duplicate utility function. Consider reusing the existing implementation. - functionset_span_attribute
(utils.py) - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% Code duplication across packages is a valid concern. However, these are separate instrumentation packages for different services (Google GenAI vs Anthropic). Having some utility code duplication between separate instrumentation packages may be acceptable or even preferred over creating dependencies between them. The function is also quite simple. I might be underestimating the value of consistency across instrumentation packages. Perhaps there should be a shared utilities package. While consistency is valuable, the cost of maintaining separate simple utility functions is likely lower than the complexity of managing cross-package dependencies for such basic functionality. The comment should be removed. While technically correct about the duplication, addressing it would add more complexity than it's worth given these are separate instrumentation packages.
2. packages/opentelemetry-instrumentation-google-genai/opentelemetry/instrumentation/google_genai/__init__.py:163
- Draft comment:
Avoid using print() for logging in instrumentation code; use logger.debug instead for consistency. - Reason this comment was not posted:
Marked as duplicate.
3. packages/opentelemetry-instrumentation-google-genai/opentelemetry/instrumentation/google_genai/utils.py:37
- Draft comment:
Typo in log message: 'OpenLLMetry' should likely be 'OpenTelemetry'. - Reason this comment was not posted:
Marked as duplicate.
4. packages/opentelemetry-instrumentation-google-genai/project.json:14
- Draft comment:
Repository URL appears to contain a typo ('openllmetry' instead of 'opentelemetry'). - Reason this comment was not posted:
Comment was on unchanged code.
5. packages/opentelemetry-instrumentation-google-generativeai/opentelemetry/instrumentation/google_generativeai/version.py:1
- Draft comment:
Ensure naming consistency across packages: verify if 'google_generativeai' should match the 'google_genai' naming used elsewhere. - Reason this comment was not posted:
Comment was on unchanged code.
6. packages/opentelemetry-instrumentation-google-genai/README.md:7
- Draft comment:
There appears to be a typographical inconsistency on line 7. The text reads 'tracing Google Gemini prompts and completions', but given the context of the instrumentation for Google GenAI (and the link pointing to the Google GenAI library), it's likely that 'Gemini' should be changed to 'GenAI'. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The comment appears to be incorrect. Gemini is the actual name of Google's LLM product, while GenAI is the name of the Python library. The current text is accurate - it's saying that this library traces Gemini prompts using the GenAI library. Changing it would make it less clear what's being traced. Maybe there's some official naming convention I'm not aware of? Maybe Google prefers to use GenAI consistently in documentation? Even if there is some naming preference, the current text is more precise and clearer about what's being traced (Gemini model prompts) versus how it's being traced (GenAI library). The comment should be deleted as it suggests a change that would make the documentation less clear and accurate.
7. packages/opentelemetry-instrumentation-google-genai/opentelemetry/instrumentation/google_genai/utils.py:37
- Draft comment:
Typo detected in the log message: 'OpenLLMetry' should be corrected to 'OpenTelemetry' for consistency. - Reason this comment was not posted:
Marked as duplicate.
8. packages/opentelemetry-instrumentation-google-genai/pyproject.toml:14
- Draft comment:
Typographical error: The repository URL on line 14 contains 'openllmetry', which is likely a typo. Consider correcting it to 'opentelemetry' to ensure consistency. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This appears to be an intentional choice of repository name, not a typo. The repository is likely named 'openllmetry' as a play on words combining 'OpenTelemetry' and 'LLM'. Looking at the GitHub URL, it's probably the actual repository name. We shouldn't assume it's a typo without more context. I could be wrong - maybe it really is a typo and should be fixed. The package name does use 'opentelemetry' consistently. Even if there's some inconsistency, we can't be certain this is a typo without checking the actual GitHub repository. The URL likely points to the correct location. We should delete this comment as it's making assumptions about the repository name without strong evidence that it's actually incorrect.
9. packages/opentelemetry-instrumentation-google-genai/tests/cassettes/test_generate_content/test_gemini_generate_content_streaming.yaml:105
- Draft comment:
There appear to be stray backslash characters (e.g., before 'AI-Powered Chatbots:' and similar bullet items) in the response body string. These seem unintended and should be removed for clarity. - Reason this comment was not posted:
Comment looked like it was already resolved.
10. packages/opentelemetry-instrumentation-google-genai/tests/cassettes/test_generate_content/test_gemini_generate_content_streaming_async.yaml:55
- Draft comment:
Typo: There appears to be an extra backslash before 'State-of-the-Art Performance:'. Please remove the stray '' so that the text reads correctly. - Reason this comment was not posted:
Comment looked like it was already resolved.
11. packages/opentelemetry-instrumentation-google-genai/tests/cassettes/test_generate_content/test_gemini_generate_content_streaming_async.yaml:87
- Draft comment:
Typo: The contraction 'It's' is split across lines as "It'" on one line and "s available" on the next. Please combine these to correctly display "It's available...". - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50% This is a test fixture file that contains recorded API responses. The line wrapping is likely due to YAML formatting and doesn't affect functionality. The split contraction appears in the raw response data that was captured from an actual API call. Modifying it would make the test data less accurate to the real response. Maybe the split contraction could cause issues when parsing the response? Maybe it's important for readability of the test file? No, this is just captured response data in a test fixture. The JSON parser will handle the line breaks correctly. Modifying it would make the test data less representative of real API responses. Delete this comment. The split contraction is part of recorded API response data in a test fixture and should be preserved exactly as received.
12. packages/opentelemetry-instrumentation-google-genai/tests/cassettes/test_generate_content/test_gemini_generate_content_streaming_async.yaml:147
- Draft comment:
Typo: The text block starting with ":** Assisting with tasks..." contains an unnecessary leading colon. Please remove the colon so that the text starts with '**Assisting with tasks...'. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50% This is a test fixture file that records actual API responses. The "typo" is part of the real API response from Gemini, not something the PR author wrote. Modifying the recorded response would make the test fixture inaccurate. We want to preserve the exact API response, even if it contains minor formatting issues. Maybe the typo indicates a bug in the API that should be reported? Maybe the test should be updated to use a different prompt that generates cleaner output? No - this is just a test fixture recording an actual API response. We should preserve the exact response as-is, warts and all. Changing it would defeat the purpose of having recorded test fixtures. Delete this comment. The "typo" is part of a recorded API response that should be preserved exactly as-is.
Workflow ID: wflow_uW522JyY7dyDfZAL
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
...elemetry-instrumentation-google-genai/opentelemetry/instrumentation/google_genai/__init__.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nirga I'd love your review on this. I left a couple comments in places where I'd like your inputs.
Also, I didn't change the SDK as part of this PR, but happy to add that too.
set_span_attribute( | ||
span, | ||
gen_ai_attributes.GEN_AI_USAGE_INPUT_TOKENS, | ||
usage_dict.get("prompt_token_count") | ||
) | ||
set_span_attribute( | ||
span, | ||
gen_ai_attributes.GEN_AI_USAGE_OUTPUT_TOKENS, | ||
usage_dict.get("candidates_token_count") | ||
) | ||
set_span_attribute( | ||
span, | ||
SpanAttributes.LLM_USAGE_TOTAL_TOKENS, | ||
usage_dict.get("total_token_count") | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nirga here I chose to go with the adopted conventions with input_tokens
and output_tokens
, not prompt_tokens
and completion_tokens
. I don't have a strong opinion, and our (Laminar) backend can ingest both types. Let me know your thoughts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine. I only hate it that we have conflicting standards across the codebase :/
if chunk.usage_metadata: | ||
usage_dict = to_dict(chunk.usage_metadata) | ||
# prompt token count is sent in every chunk | ||
# (and is less by 1 in the last chunk, so we set it once); | ||
# total token count in every chunk is greater by prompt token count than it should be, | ||
# thus this awkward logic here | ||
if aggregated_usage_metadata.get("prompt_token_count") is None: | ||
aggregated_usage_metadata["prompt_token_count"] = usage_dict.get("prompt_token_count") or 0 | ||
aggregated_usage_metadata["total_token_count"] = usage_dict.get("total_token_count") or 0 | ||
aggregated_usage_metadata["candidates_token_count"] += usage_dict.get("candidates_token_count") or 0 | ||
aggregated_usage_metadata["total_token_count"] += usage_dict.get("candidates_token_count") or 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this part here may look weird, let me know if you need a better explanation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understood it from the comments :)
from typing import Any, Optional, Union | ||
|
||
|
||
JOIN_PARTS_SEPARATOR = " " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, wasn't sure if we need a whitespace between content parts – happy to change this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense, no?
@@ -1 +1 @@ | |||
__version__ = "0.21.5" | |||
__version__ = "0.39.2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was outdated for some reason, so I changed that too
Does this share / reuse the instrumentation in the contrib repo? |
@michaelsafyan No, but I guess it should. My PR does instrument google_genai, but is certainly worse (both the code quality and functionality) than the contrib library mentioned. However, it sends the prompt/completion contents as attributes, while the new library does so as log events (accepted semconv). I don't think it's easy to mix up two instrumentors for the same library. @nirga What are your thoughts on this? I remember, you mentioned a potential mechanism that would convert log events to span attributes, is there a prototype? Feel free to close this PR in favor of the contrib library |
I think it would be better to have Traceloop wrap/invoke the OTel Contrib instrumentation. That way, more (joint) effort can be put into improving the (shared) instrumentation for the Google Gen AI instrumentation library rather than competing efforts. That work could include adding options to support multiple different kinds of output formats. |
I completely agree, @michaelsafyan! The main blocker right now is aligning on the attribute format, since we can't use the current events format that's implemented. Once we finalize that, we can move forward with switching to the contrib implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work @dinmukhamedm 🥳
One last think that is missing is using this in the SDK and a sample app. Mind adding them?
(Also replied separately to @michaelsafyan's comment)
set_span_attribute( | ||
span, | ||
gen_ai_attributes.GEN_AI_USAGE_INPUT_TOKENS, | ||
usage_dict.get("prompt_token_count") | ||
) | ||
set_span_attribute( | ||
span, | ||
gen_ai_attributes.GEN_AI_USAGE_OUTPUT_TOKENS, | ||
usage_dict.get("candidates_token_count") | ||
) | ||
set_span_attribute( | ||
span, | ||
SpanAttributes.LLM_USAGE_TOTAL_TOKENS, | ||
usage_dict.get("total_token_count") | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine. I only hate it that we have conflicting standards across the codebase :/
from typing import Any, Optional, Union | ||
|
||
|
||
JOIN_PARTS_SEPARATOR = " " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense, no?
if chunk.usage_metadata: | ||
usage_dict = to_dict(chunk.usage_metadata) | ||
# prompt token count is sent in every chunk | ||
# (and is less by 1 in the last chunk, so we set it once); | ||
# total token count in every chunk is greater by prompt token count than it should be, | ||
# thus this awkward logic here | ||
if aggregated_usage_metadata.get("prompt_token_count") is None: | ||
aggregated_usage_metadata["prompt_token_count"] = usage_dict.get("prompt_token_count") or 0 | ||
aggregated_usage_metadata["total_token_count"] = usage_dict.get("total_token_count") or 0 | ||
aggregated_usage_metadata["candidates_token_count"] += usage_dict.get("candidates_token_count") or 0 | ||
aggregated_usage_metadata["total_token_count"] += usage_dict.get("candidates_token_count") or 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understood it from the comments :)
Sure will do |
Fixes #2675
feat(instrumentation): ...
orfix(instrumentation): ...
.Important
Introduces Google GenAI instrumentation with tracing capabilities, configuration, utilities, and comprehensive tests, along with project setup and version update.
GoogleGenAiInstrumentor
in__init__.py
for tracing Google GenAI API calls.generate_content
andgenerate_content_stream
.Config
class inconfig.py
for exception logging.utils.py
for handling span attributes and exceptions.test_generate_content.py
for various scenarios including async and streaming.tests/cassettes/
.pyproject.toml
,poetry.toml
, andproject.json
for project configuration..flake8
for linting configuration.version.py
to0.39.2
.This description was created by
for f9c0b0e. You can customize this summary. It will automatically update as commits are pushed.