Skip to content

configurable actor, environment, data loader #30

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 37 commits into from
May 29, 2025
Merged

Conversation

rizar
Copy link
Collaborator

@rizar rizar commented May 28, 2025

includes #23

@rizar rizar requested a review from ollmer May 28, 2025 15:48
@rizar rizar changed the base branch from configurable_rollouts to main May 29, 2025 18:42
@rizar rizar changed the title configurable environment endpoint configurable actor, environment, data loader May 29, 2025
@rizar rizar requested a review from AlexPiche May 29, 2025 19:32
Copy link
Collaborator

@AlexPiche AlexPiche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am very happy with the refactoring! It makes PipelineRL much more approachable. I left some minor comments. Also counting/ and math/ could live in an examples folder instead of directly living in pipelinerl/

@@ -405,8 +402,10 @@ def run_preprocessing_loop(
stats = {
"preprocessor/published_samples": published_samples,
"preprocessor/published_model_version": max_model_version,
"preprocessor/samples_in_input_queue": raw_chunk_queue.qsize() * cfg.preprocess.chunk_size,
"preprocessor/samples_in_output_queue": samples_in_queue,
"processossor/queue/raw_samples": raw_chunk_queue.qsize() * cfg.preprocess.chunk_size,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo in the stats key

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it would make a lot of sense to have examples someplace separate, but in practice right now it creates no pain, so let's do this when it does create pain

reward = 0 # TODO: implement verifier usage and reward calculation
metrics = {
"reward": reward,
"success": reward > 0,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure we want to hard code success as reward greater than 0.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not finished anyway

@@ -238,6 +240,24 @@ def wait_for_inference_servers(urls: list[str]):
logger.info("All inference servers are up")


def wait_for_environments(cfg: DictConfig):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we re-use wait_for_inference_servers?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, we could reuse some code here... too lazy to fix this right now

@rizar rizar merged commit 3bf08b6 into main May 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants