-
Notifications
You must be signed in to change notification settings - Fork 9
configurable rollouts actor #23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
"output_tokens": sum([llm_call.output_length_tokens for llm_call in llm_calls]), | ||
"overflow": 0, # TODO: should we treat max_loops stop as overflow? | ||
} | ||
return samples, metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we also return the resulting tape
? It could be useful if one has a different implementation for the metric "has_error". For instance, I am using this definition of tape error, which catches more cases:
def tape_contains_an_error(tape: WebTape) -> bool:
"""
Returns true if the tape ends with an error, ie if one of the following is true:
- the last step is an LLMOutputParsingFailureAction
- the tape metadata has an error
- the last step is a PageObservation with an error
"""
return isinstance(tape.steps[-1], LLMOutputParsingFailureAction) or \
tape.metadata.result.get("error") is not None or \
(isinstance(tape.steps[-1], PageObservation) and tape.steps[-1].error)
I could also be useful to compute additional stats on the tape without modifying this function for each statistic we would like to add.
For instance I am also computing the n_llm_calls
, n_error_steps
, n_page_observations
, and n_steps
in the tape.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. Also this function signature is a contract that should be universal enough and not bound to the tapeagents. So I would propose to return the object of a new class RolloutResult
with the fields samples
, metrics
and rollout_artifacts: dict
. This way we can put tape in the result.rollout_artifacts["tape"]
assert tape is not None, "No tape generated" | ||
has_errors = any([1 for s in tape.steps if s.llm_dict().get("error")]) | ||
has_answer = any([isinstance(s, StopStep) for s in tape.steps]) | ||
_, llm_calls = agent.reuse(tape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this better than just running this?
llm_calls = []
for i, step in enumerate(tape.steps):
if "llm_call" not in step.metadata.other or step.metadata.other["llm_call"] is None:
continue
llm_call = step.metadata.other["llm_call"]
if isinstance(llm_call, dict):
llm_call = LLMCall(**llm_call)
llm_calls.append(llm_call)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just copied our approach to generating training data from the tapeagent's agent.make_training_data()
https://github.com/ServiceNow/TapeAgents/blob/main/tapeagents/agent.py#L862. We can discuss if we really need it. This double-check that the tape was produced by the same agent is not necessary here imo.
No description provided.