-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Issue]: Product-name information missing for MI300 GPUs in AKS #112
Comments
Further looking at code. "product-name": func(gpus map[string]map[string]int) map[string]string {
counts := map[string]int{}
replacer := strings.NewReplacer(" ", "_", "(", "", ")", "")
for _, v := range gpus {
prodnamePath := fmt.Sprintf("/sys/class/drm/card%d/device/product_name", v["card"])
b, err := ioutil.ReadFile(prodnamePath)
if err != nil {
log.Error(err, prodnamePath)
continue
}
prodName := replacer.Replace(strings.TrimSpace(string(b)))
if prodName == "" {
continue
}
counts[prodName]++
}
return createLabels("product-name", counts)
}, I see it is looking for product_name file on specific path
However, it is present on MI200.
Is there some other path in MI300 where the file |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Problem Description
We are using AMD MI300X GPUs in AKS. We use the node-labeller Daemonset to discover the product-name however on this amdgpu-labeller.
Args passed to the container.
SKU of the node
Standard_ND96isr_MI300X_v5
Label added by AMD on the node.
This is from a different SKU in azure
Standard_nd96asr_mi200_v4
When I use
rocm-smi --showproductname
it works on both GPUs (MI200 and MI300x).Output on MI300.
On MI200
The expectation is that product-name should be available on all GPUs, both MI200 and MI300
Operating System
NAME="Ubuntu" VERSION="22.04.5 LTS (Jammy Jellyfish)"
CPU
model name : Intel(R) Xeon(R) Platinum 8480C
GPU
Name: Intel(R) Xeon(R) Platinum 8480C Marketing Name: Intel(R) Xeon(R) Platinum 8480C Name: Intel(R) Xeon(R) Platinum 8480C Marketing Name: Intel(R) Xeon(R) Platinum 8480C Name: gfx942 Marketing Name: AMD Instinct MI300X VF Name: amdgcn-amd-amdhsa--gfx942:sramecc+:xnack- Name: gfx942 Marketing Name: AMD Instinct MI300X VF Name: amdgcn-amd-amdhsa--gfx942:sramecc+:xnack- Name: gfx942 Marketing Name: AMD Instinct MI300X VF Name: amdgcn-amd-amdhsa--gfx942:sramecc+:xnack- Name: gfx942 Marketing Name: AMD Instinct MI300X VF Name: amdgcn-amd-amdhsa--gfx942:sramecc+:xnack- Name: gfx942 Marketing Name: AMD Instinct MI300X VF Name: amdgcn-amd-amdhsa--gfx942:sramecc+:xnack- Name: gfx942 Marketing Name: AMD Instinct MI300X VF Name: amdgcn-amd-amdhsa--gfx942:sramecc+:xnack- Name: gfx942 Marketing Name: AMD Instinct MI300X VF Name: amdgcn-amd-amdhsa--gfx942:sramecc+:xnack- Name: gfx942 Marketing Name: AMD Instinct MI300X VF Name: amdgcn-amd-amdhsa--gfx942:sramecc+:xnack-
ROCm Version
6.2.60204-1
ROCm Component
No response
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response
The text was updated successfully, but these errors were encountered: