Skip to content

Commit 41e0f08

Browse files
authored
Add a script to gather runner info when uploading benchmark results (#6425)
Implement the logic to gather runner info for GPU. I adopt this logic from https://github.com/pytorch/pytorch-integration-testing/blob/master/vllm-benchmarks/upload_benchmark_results.py#L102 This also cleans up v2 logic which is not used anymore. cc @yangw-dev Please let me know if you have a better approach in mind from the utilization monitoring project. Essentially, I want to get the device name, i.e. CUDA, ROCm, and the device type, i.e. H100, MI300X, so that they can be displayed on the dashboard. Before this change, these fields are set by the caller, now they can be set automatically by the GHA.
1 parent 46df44c commit 41e0f08

File tree

6 files changed

+81
-12
lines changed

6 files changed

+81
-12
lines changed

.github/actions/upload-benchmark-results/action.yml

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ runs:
1919
shell: bash
2020
run: |
2121
set -eux
22-
python3 -mpip install boto3==1.35.33
22+
python3 -mpip install boto3==1.35.33 psutil==7.0.0 pynvml==12.0.0
2323
2424
- name: Check that GITHUB_TOKEN is defined
2525
if: ${{ inputs.schema-version != 'v2' }}
@@ -72,8 +72,7 @@ runs:
7272
run: |
7373
set -eux
7474
75-
# TODO (huydhn): Implement this part
76-
echo "runners=[]" >> "${GITHUB_OUTPUT}"
75+
python3 "${GITHUB_ACTION_PATH}/../../scripts/benchmarks/gather_runners_info.py"
7776
7877
- name: Gather the dependencies information
7978
id: gather-dependencies

.github/scripts/benchmark-results-dir-for-testing/v2/android-artifacts-31017223108.json

Lines changed: 0 additions & 1 deletion
This file was deleted.

.github/scripts/benchmark-results-dir-for-testing/v2/android-artifacts-31017223431.json

Lines changed: 0 additions & 1 deletion
This file was deleted.
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
{"benchmark": {"name": "ExecuTorch", "mode": "inference", "extra_info": {"app_type": "IOS_APP", "benchmark_config": "{\"model\": \"edsr\", \"config\": \"xnnpack_q8\", \"device_name\": \"apple_iphone_15\", \"device_arn\": \"arn:aws:devicefarm:us-west-2:308535385114:devicepool:02a2cf0f-6d9b-45ee-ba1a-a086587469e6/3b5acd2e-92e2-4778-b651-7726bafe129d\"}"}}, "model": {"name": "edsr", "type": "OSS model", "backend": "xnnpack_q8"}, "metric": {"name": "peak_inference_mem_usage(mb)", "benchmark_values": [333.2014794921875], "target_value": 0, "extra_info": {"method": "forward"}}, "runners": [{"name": "Apple iPhone 15", "type": "iOS 18.0", "avail_mem_in_gb": 0, "total_mem_in_gb": 0}]}
Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
#!/usr/bin/env python3
2+
# Copyright (c) Meta Platforms, Inc. and affiliates.
3+
# All rights reserved.
4+
#
5+
# This source code is licensed under the BSD-style license found in the
6+
# LICENSE file in the root directory of this source tree.
7+
8+
import json
9+
import logging
10+
import os
11+
import platform
12+
import socket
13+
from logging import info
14+
from typing import Any, Dict
15+
16+
import psutil
17+
18+
19+
logging.basicConfig(level=logging.INFO)
20+
21+
22+
def set_output(name: str, val: Any) -> None:
23+
if os.getenv("GITHUB_OUTPUT"):
24+
with open(str(os.getenv("GITHUB_OUTPUT")), "a") as env:
25+
print(f"{name}={val}", file=env)
26+
else:
27+
print(f"::set-output name={name}::{val}")
28+
29+
30+
def get_runner_info() -> Dict[str, Any]:
31+
device_name = ""
32+
device_type = ""
33+
34+
try:
35+
import torch
36+
37+
if torch.cuda.is_available():
38+
# TODO (huydhn): only support CUDA and ROCm for now
39+
if torch.version.hip:
40+
device_name = "rocm"
41+
elif torch.version.cuda:
42+
device_name = "cuda"
43+
44+
device_type = torch.cuda.get_device_name()
45+
46+
except ImportError:
47+
info("Fail to import torch to get the device name")
48+
49+
runner_info = {
50+
"cpu_info": platform.processor(),
51+
"cpu_count": psutil.cpu_count(),
52+
"avail_mem_in_gb": int(psutil.virtual_memory().total / (1024 * 1024 * 1024)),
53+
"extra_info": {
54+
"hostname": socket.gethostname(),
55+
},
56+
}
57+
58+
# TODO (huydhn): only support CUDA and ROCm for now
59+
if device_name and device_type:
60+
runner_info["name"] = device_name
61+
runner_info["type"] = device_type
62+
runner_info["gpu_count"] = torch.cuda.device_count()
63+
runner_info["avail_gpu_mem_in_gb"] = int(
64+
torch.cuda.get_device_properties(0).total_memory
65+
* torch.cuda.device_count()
66+
/ (1024 * 1024 * 1024)
67+
)
68+
69+
return runner_info
70+
71+
72+
def main() -> None:
73+
runner_info = get_runner_info()
74+
set_output("runners", json.dumps([runner_info]))
75+
76+
77+
if __name__ == "__main__":
78+
main()

.github/workflows/test_upload_benchmark_results.yml

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -13,13 +13,6 @@ jobs:
1313
steps:
1414
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
1515

16-
- name: Test upload the benchmark results (v2)
17-
uses: ./.github/actions/upload-benchmark-results
18-
with:
19-
benchmark-results-dir: .github/scripts/benchmark-results-dir-for-testing/v2
20-
schema-version: v2
21-
dry-run: true
22-
2316
- name: Test upload the benchmark results (v3)
2417
uses: ./.github/actions/upload-benchmark-results
2518
with:

0 commit comments

Comments
 (0)