Skip to content

configurable rollouts actor #23

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
May 29, 2025
Merged

configurable rollouts actor #23

merged 22 commits into from
May 29, 2025

Conversation

ollmer
Copy link
Collaborator

@ollmer ollmer commented May 14, 2025

No description provided.

@ollmer ollmer changed the base branch from oleh_exps to main May 15, 2025 09:15
"output_tokens": sum([llm_call.output_length_tokens for llm_call in llm_calls]),
"overflow": 0, # TODO: should we treat max_loops stop as overflow?
}
return samples, metrics

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also return the resulting tape? It could be useful if one has a different implementation for the metric "has_error". For instance, I am using this definition of tape error, which catches more cases:

def tape_contains_an_error(tape: WebTape) -> bool:
    """
    Returns true if the tape ends with an error, ie if one of the following is true:
    - the last step is an LLMOutputParsingFailureAction
    - the tape metadata has an error
    - the last step is a PageObservation with an error
    """
    return isinstance(tape.steps[-1], LLMOutputParsingFailureAction) or \
        tape.metadata.result.get("error") is not None or \
        (isinstance(tape.steps[-1], PageObservation) and tape.steps[-1].error)

I could also be useful to compute additional stats on the tape without modifying this function for each statistic we would like to add.
For instance I am also computing the n_llm_calls, n_error_steps, n_page_observations, and n_steps in the tape.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. Also this function signature is a contract that should be universal enough and not bound to the tapeagents. So I would propose to return the object of a new class RolloutResult with the fields samples, metrics and rollout_artifacts: dict. This way we can put tape in the result.rollout_artifacts["tape"]

assert tape is not None, "No tape generated"
has_errors = any([1 for s in tape.steps if s.llm_dict().get("error")])
has_answer = any([isinstance(s, StopStep) for s in tape.steps])
_, llm_calls = agent.reuse(tape)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this better than just running this?

    llm_calls = []
    for i, step in enumerate(tape.steps):
        if "llm_call" not in step.metadata.other or step.metadata.other["llm_call"] is None:
            continue
        llm_call = step.metadata.other["llm_call"]
        if isinstance(llm_call, dict):
            llm_call = LLMCall(**llm_call)
        llm_calls.append(llm_call)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just copied our approach to generating training data from the tapeagent's agent.make_training_data() https://github.com/ServiceNow/TapeAgents/blob/main/tapeagents/agent.py#L862. We can discuss if we really need it. This double-check that the tape was produced by the same agent is not necessary here imo.

@rizar rizar merged commit 21d3072 into main May 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants