Skip to content

Frontier Benchmarking (#453) #881

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 20 commits into
base: master
Choose a base branch
from

Conversation

Malmahrouqi3
Copy link
Collaborator

Description

Added two benchmarking cases by submitting SLURM jobs on Frontier - duplicate implementation of Phoenix. (#453)


frontier:
name: Oak Ridge | Frontier (AMD ROCm)
if: github.repository == 'MFlowCode/MFC' && needs.file-changes.outputs.checkall == 'true'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this almost entirely duplicates the code from phoenix... can you make a CI matrix instead to avoid the extra lines? also there is no need to measure performance on CPU cases on Frontier, that's just going to slow down the number of cases queued. i only have 1 frontier runner because that's the max num. of debug jobs one can submit.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sure, I will apply those changes shortly.

Copy link

codecov bot commented Jun 12, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 45.76%. Comparing base (2aad1d4) to head (bde0c17).
Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #881   +/-   ##
=======================================
  Coverage   45.76%   45.76%           
=======================================
  Files          68       68           
  Lines       18668    18668           
  Branches     2251     2251           
=======================================
  Hits         8543     8543           
  Misses       8767     8767           
  Partials     1358     1358           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

device: gpu
build_script: ""
- cluster: frontier
name: Oak Ridge | Frontier (AMD CCE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just CCE

@Malmahrouqi3
Copy link
Collaborator Author

Reduced the job duration to 3 hrs to see whether it would yield the same error regardless of duration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants