-
Notifications
You must be signed in to change notification settings - Fork 106
Frontier Benchmarking (#453) #881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
.github/workflows/bench.yml
Outdated
|
||
frontier: | ||
name: Oak Ridge | Frontier (AMD ROCm) | ||
if: github.repository == 'MFlowCode/MFC' && needs.file-changes.outputs.checkall == 'true' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this almost entirely duplicates the code from phoenix... can you make a CI matrix instead to avoid the extra lines? also there is no need to measure performance on CPU cases on Frontier, that's just going to slow down the number of cases queued. i only have 1 frontier runner because that's the max num. of debug jobs one can submit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sure, I will apply those changes shortly.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #881 +/- ##
=======================================
Coverage 45.76% 45.76%
=======================================
Files 68 68
Lines 18668 18668
Branches 2251 2251
=======================================
Hits 8543 8543
Misses 8767 8767
Partials 1358 1358 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
.github/workflows/bench.yml
Outdated
device: gpu | ||
build_script: "" | ||
- cluster: frontier | ||
name: Oak Ridge | Frontier (AMD CCE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just CCE
Reduced the job duration to 3 hrs to see whether it would yield the same error regardless of duration. |
Description
Added two benchmarking cases by submitting SLURM jobs on Frontier - duplicate implementation of Phoenix. (#453)