We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vllm distribution with no telemetry configuration.
vllm
telemetry
While running an inference request _count_tokens fails with the error in subject.
_count_tokens
The error is in InferenceRouter.__init__ as the self.formatter field is not init-ed unless a self.telemetry value is given:
InferenceRouter.__init__
self.formatter
self.telemetry
if self.telemetry: self.tokenizer = Tokenizer.get_instance() self.formatter = ChatFormat(self.tokenizer)
(1) Why does it depend on the availability of the self.telemetry?
Then in _count_tokens method we refer to the field without any check:
if isinstance(messages, list): encoded = self.formatter.encode_dialog_prompt(messages, tool_prompt_format) else: encoded = self.formatter.encode_content(messages)
Workaround: configure the telemetry
Stack refers to an eval execution but it's almost the same
eval
│ 193 │ │ if candidate.type == "agent": │ │ 194 │ │ │ generations = await self._run_agent_generation(input_rows, benchmark_config) │ │ 195 │ │ elif candidate.type == "model": │ │ ❱ 196 │ │ │ generations = await self._run_model_generation(input_rows, benchmark_config) │ │ 197 │ │ else: │ │ 198 │ │ │ raise ValueError(f"Invalid candidate type: {candidate.type}") │ │ 199 │ │ │ │ /usr/local/lib/python3.10/site-packages/llama_stack/providers/inline/eval/meta_reference/eval.py:174 in │ │ _run_model_generation │ │ │ │ 171 │ │ │ │ │ messages.append(candidate.system_message) │ │ 172 │ │ │ │ messages += [SystemMessage(**x) for x in chat_completion_input_json if │ │ x["role"] == "system"] │ │ 173 │ │ │ │ messages += input_messages │ │ ❱ 174 │ │ │ │ response = await self.inference_api.chat_completion( │ │ 175 │ │ │ │ │ model_id=candidate.model, │ │ 176 │ │ │ │ │ messages=messages, │ │ 177 │ │ │ │ │ sampling_params=candidate.sampling_params, │ │ │ │ /usr/local/lib/python3.10/site-packages/llama_stack/providers/utils/telemetry/trace_protocol.py:102 in │ │ async_wrapper │ │ │ │ 99 │ │ │ │ │ 100 │ │ │ with tracing.span(f"{class_name}.{method_name}", span_attributes) as span: │ │ 101 │ │ │ │ try: │ │ ❱ 102 │ │ │ │ │ result = await method(self, *args, **kwargs) │ │ 103 │ │ │ │ │ span.set_attribute("output", serialize_value(result)) │ │ 104 │ │ │ │ │ return result │ │ 105 │ │ │ │ except Exception as e: │ │ │ │ /usr/local/lib/python3.10/site-packages/llama_stack/distribution/routers/routers.py:288 in chat_completion │ │ │ │ 285 │ │ │ tool_config=tool_config, │ │ 286 │ │ ) │ │ 287 │ │ provider = self.routing_table.get_provider_impl(model_id) │ │ ❱ 288 │ │ prompt_tokens = await self._count_tokens(messages, │ │ tool_config.tool_prompt_format) │ │ 289 │ │ │ │ 290 │ │ if stream: │ │ 291 │ │ │ │ /usr/local/lib/python3.10/site-packages/llama_stack/distribution/routers/routers.py:221 in _count_tokens │ │ │ │ 218 │ │ tool_prompt_format: Optional[ToolPromptFormat] = None, │ │ 219 │ ) -> Optional[int]: │ │ 220 │ │ if isinstance(messages, list): │ │ ❱ 221 │ │ │ encoded = self.formatter.encode_dialog_prompt(messages, tool_prompt_format) │ │ 222 │ │ else: │ │ 223 │ │ │ encoded = self.formatter.encode_content(messages) │ │ 224 │ │ return len(encoded.tokens) if encoded and encoded.tokens else 0 │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ AttributeError: 'InferenceRouter' object has no attribute 'formatter'
No errors
The text was updated successfully, but these errors were encountered:
In my case looking at that line https://github.com/meta-llama/llama-stack/blob/main/llama_stack/distribution/routers/routers.py#L133 I found out that I have to enable the telemetry, once enabled it started working.
Sorry, something went wrong.
No branches or pull requests
System Info
vllm
distribution with notelemetry
configuration.Information
🐛 Describe the bug
While running an inference request
_count_tokens
fails with the error in subject.The error is in
InferenceRouter.__init__
as theself.formatter
field is not init-ed unless aself.telemetry
value is given:(1) Why does it depend on the availability of the
self.telemetry
?Then in
_count_tokens
method we refer to the field without any check:Workaround: configure the
telemetry
Error logs
Stack refers to an
eval
execution but it's almost the sameExpected behavior
No errors
The text was updated successfully, but these errors were encountered: