-
Notifications
You must be signed in to change notification settings - Fork 9
configurable actor, environment, data loader #30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am very happy with the refactoring! It makes PipelineRL much more approachable. I left some minor comments. Also counting/
and math/
could live in an examples
folder instead of directly living in pipelinerl/
pipelinerl/run_preprocess.py
Outdated
@@ -405,8 +402,10 @@ def run_preprocessing_loop( | |||
stats = { | |||
"preprocessor/published_samples": published_samples, | |||
"preprocessor/published_model_version": max_model_version, | |||
"preprocessor/samples_in_input_queue": raw_chunk_queue.qsize() * cfg.preprocess.chunk_size, | |||
"preprocessor/samples_in_output_queue": samples_in_queue, | |||
"processossor/queue/raw_samples": raw_chunk_queue.qsize() * cfg.preprocess.chunk_size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo in the stats key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah it would make a lot of sense to have examples someplace separate, but in practice right now it creates no pain, so let's do this when it does create pain
reward = 0 # TODO: implement verifier usage and reward calculation | ||
metrics = { | ||
"reward": reward, | ||
"success": reward > 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure we want to hard code success as reward greater than 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not finished anyway
@@ -238,6 +240,24 @@ def wait_for_inference_servers(urls: list[str]): | |||
logger.info("All inference servers are up") | |||
|
|||
|
|||
def wait_for_environments(cfg: DictConfig): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we re-use wait_for_inference_servers
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true, we could reuse some code here... too lazy to fix this right now
includes #23