Skip to content

Commit 970fceb

Browse files
committed
Device Plugin Resource Naming Strategy
The resource_naming_strategy is a new flag which can be passed to the device plugin daemonset. The supported values for the flag are "single" and "mixed" Terms to understand before viewing the changes in this commit: Homogeneous Node: If all GPUs in a node are following the same compute and memory partition style, the node is considered homogeneous Heterogeneous Node: If the GPUs on a node have different different compute and memory partition styles, the node is considered heterogeneous (Put simply, if node is not homogeneous) Behaviour of Resource Naming Strategy in different node types: Homogeneous Node: -> If node is homogeneous and resource naming strategy is "single", one plugin is started using the DevicePluginManager with the last name as “gpu”. If node is homogeneous and resource naming strategy is "mixed", one plugin is started using the DevicePluginManager with the last name as the partition style present on the node. -> The ListAndWatch function remains almost the same as it was before. It reports resources under a single resource name(the name will either be "gpu" or the partition style present on the node(cpx_nps1) depending on strategy) Heterogeneous: -> If node is heterogeneous and resource naming strategy is "mixed", we invoke the DevicePluginManager to start multiple plugins for different partitionTypes under the names “spx-nps1, “cpx-nps1”, etc. We use the devicesCount map to start plugins for the partitionTypes that are present in the map -> ListAndWatch sends the devices to the plugin for their respective resource type depending on its partitionType. Each device has computePartition and memoryPartition fields in its object as shown before, which is used to identify which plugin to report the resource under. (amd.com/spx-nps1,amd.com/cpx-nps1, etc..) Note: -> If node is heterogeneous, "single" strategy is not supported as multiple resource types getting reported under a single resource name wouldn't be mathematically accurate as to how many true gpus of each type there are -> For nodes where partitioning is not supported(MI200), irrespective of strategy, the resources will get reported under "amd.com/gpu" -> If the flag is not set by user, default value is "single". This is to maintain backwards compatibility with older resource name before strategy was introduced (amd.com/gpu)
1 parent 36f2deb commit 970fceb

File tree

3 files changed

+121
-11
lines changed

3 files changed

+121
-11
lines changed

cmd/k8s-device-plugin/main.go

Lines changed: 62 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -32,25 +32,68 @@ import (
3232

3333
var gitDescribe string
3434

35-
func getResourceList() []string {
35+
type ResourceNamingStrategy string
36+
37+
const (
38+
StrategySingle ResourceNamingStrategy = "single"
39+
StrategyMixed ResourceNamingStrategy = "mixed"
40+
)
41+
42+
func ParseStrategy(s string) (ResourceNamingStrategy, error) {
43+
switch s {
44+
case string(StrategySingle):
45+
return StrategySingle, nil
46+
case string(StrategyMixed):
47+
return StrategyMixed, nil
48+
default:
49+
return "", fmt.Errorf("invalid resource naming strategy: %s", s)
50+
}
51+
}
52+
53+
func getResourceList(resourceNamingStrategy ResourceNamingStrategy) ([]string, error) {
3654
var resources []string
3755

3856
// Check if the node is homogeneous
3957
isHomogeneous := amdgpu.IsHomogeneous()
58+
devices, deviceCountMap := amdgpu.GetAMDGPUs()
59+
if len(devices) == 0 {
60+
return resources, nil
61+
}
4062
if isHomogeneous {
41-
// Homogeneous node will report only "gpu" resource
42-
resources = []string{"gpu"}
63+
// Homogeneous node will report only "gpu" resource if strategy is single. If strategy is mixed, it will report resources under the partition type name
64+
if resourceNamingStrategy == StrategySingle {
65+
resources = []string{"gpu"}
66+
} else if resourceNamingStrategy == StrategyMixed {
67+
if len(deviceCountMap) == 0 {
68+
// If partitioning is not supported on the node, we should report resources under "gpu" regardless of the strategy
69+
resources = []string{"gpu"}
70+
} else {
71+
for partitionType, count := range deviceCountMap {
72+
if count > 0 {
73+
resources = append(resources, partitionType)
74+
}
75+
}
76+
}
77+
}
4378
} else {
4479
// Heterogeneous node reports resources based on partition types
4580
gpus := amdgpu.GetAMDGPUs()
4681
deviceCountMap := amdgpu.GetAMDDeviceCountMap(gpus)
4782
for partitionType, count := range deviceCountMap {
4883
if count > 0 {
4984
resources = append(resources, partitionType)
85+
// Heterogeneous node reports resources based on partition types if strategy is mixed. Heterogeneous is not allowed if Strategy is single
86+
if resourceNamingStrategy == StrategySingle {
87+
return resources, fmt.Errorf("Partitions of different styles across GPUs in a node is not supported with single strategy. Please start device plugin with mixed strategy")
88+
} else if resourceNamingStrategy == StrategyMixed {
89+
for partitionType, count := range deviceCountMap {
90+
if count > 0 {
91+
resources = append(resources, partitionType)
92+
}
5093
}
5194
}
5295
}
53-
return resources
96+
return resources, nil
5497
}
5598

5699
func main() {
@@ -68,9 +111,16 @@ func main() {
68111
flag.PrintDefaults()
69112
}
70113
var pulse int
114+
var resourceNamingStrategy string
71115
flag.IntVar(&pulse, "pulse", 0, "time between health check polling in seconds. Set to 0 to disable.")
116+
flag.StringVar(&resourceNamingStrategy, "resource_naming_strategy", "single", "Resource strategy to be used: single or mixed")
72117
// this is also needed to enable glog usage in dpm
73118
flag.Parse()
119+
strategy, err := ParseStrategy(resourceNamingStrategy)
120+
if err != nil {
121+
glog.Errorf("%v", err)
122+
os.Exit(1)
123+
}
74124

75125
for _, v := range versions {
76126
glog.Infof("%s", v)
@@ -96,8 +146,14 @@ func main() {
96146
// /sys/class/kfd only exists if ROCm kernel/driver is installed
97147
var path = "/sys/class/kfd"
98148
if _, err := os.Stat(path); err == nil {
99-
resources := getResourceList()
100-
l.ResUpdateChan <- resources
149+
resources, err := getResourceList(strategy)
150+
if err != nil {
151+
glog.Errorf("Error occured: %v", err)
152+
os.Exit(1)
153+
}
154+
if len(resources) > 0 {
155+
l.ResUpdateChan <- resources
156+
}
101157
}
102158
}()
103159
manager.Run()

docs/user-guide/configuration.md

Lines changed: 58 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ The device plugin supports the following command-line flags:
2929
|-----|------|-------------|
3030
| `--kubelet-url` | `http://localhost:10250` | The URL of the kubelet for device plugin registration |
3131
| `--pulse` | `0` | Time between health check polling in seconds. Set to 0 to disable. |
32+
| `--resource_naming_strategy` | `single` | Resource Naming strategy chosen for k8s resource reporting. |
3233

3334
## Configuration File
3435

@@ -139,16 +140,71 @@ The node labeller can expose labels such as:
139140

140141
[Download link](https://raw.githubusercontent.com/ROCm/k8s-device-plugin/master/k8s-ds-amdgpu-labeller.yaml)
141142

142-
## Resource Naming
143+
## Resource Naming Strategy
143144

144-
The device plugin advertises AMD GPUs as the `amd.com/gpu` resource type. Pods can request this resource in their specifications to access AMD GPUs:
145+
To customize the way device plugin reports gpu resources to kubernetes as allocatable k8s resources, use the `single` or `mixed` resource naming strategy flag mentioned above (--resource_naming_strategy)
146+
147+
Before understanding each strategy, please note the definition of homogeneous and heterogeneous nodes
148+
149+
Homogeneous node: A node whose gpu's follow the same compute-memory partition style
150+
-> Example: A node of 8 GPU's where all 8 GPU's are following CPX-NPS4 partition style
151+
152+
Heterogeneous node: A node whose gpu's follow different compute-memory partition styles
153+
-> Example: A node of 8 GPU's where 5 GPU's are following SPX-NPS1 and 3 GPU's are following CPX-NPS1
154+
155+
### Single
156+
157+
In `single` mode, the device plugin reports all gpu's (regardless of whether they are whole gpu's or partitions of a gpu) under the resource name `amd.com/gpu`
158+
This mode is supported for homogeneous nodes but not supported for heterogeneous nodes
159+
160+
A node which has 8 GPUs where all GPUs are not partitioned will report its resources as:
161+
162+
```bash
163+
amd.com/gpu: 8
164+
```
165+
166+
A node which has 8 GPUs where all GPUs are partitioned using CPX-NPS4 style will report its resources as:
167+
168+
```bash
169+
amd.com/gpu: 64
170+
```
171+
172+
### Mixed
173+
174+
In `mixed` mode, the device plugin reports all gpu's under a name which matches its partition style.
175+
This mode is supported for both homogeneous nodes and heterogeneous nodes
176+
177+
A node which has 8 GPUs which are all partitioned using CPX-NPS4 style will report its resources as:
178+
179+
```bash
180+
amd.com/cpx_nps4: 64
181+
```
182+
183+
A node which has 8 GPUs where 5 GPU's are following SPX-NPS1 and 3 GPU's are following CPX-NPS1 will report its resources as:
184+
185+
```bash
186+
amd.com/spx_nps1: 5
187+
amd.com/cpx_nps1: 24
188+
```
189+
190+
- If `resource_naming_strategy` is not passed using the flag, then device plugin will internally default to `single` resource naming strategy. This maintains backwards compatibility with earlier release of device plugin with reported resource name of `amd.com/gpu`
191+
192+
- If a node has GPUs which do not support partitioning, such as MI210, then the GPUs are reported under resource name `amd.com/gpu` regardless of the resource naming strategy
193+
194+
Pods can request the resource as per the naming style in their specifications to access AMD GPUs:
145195

146196
```yaml
147197
resources:
148198
limits:
149199
amd.com/gpu: 1
150200
```
151201

202+
```yaml
203+
resources:
204+
limits:
205+
amd.com/cpx_nps4: 1
206+
```
207+
152208
## Security and Access Control
153209

154210
### Non-Privileged GPU Access

internal/pkg/plugin/plugin.go

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -307,9 +307,7 @@ loop:
307307
// update with per device GPU health status
308308
if isHomogeneous {
309309
exporter.PopulatePerGPUDHealth(devs, health)
310-
if p.Resource == "gpu" {
311-
s.Send(&pluginapi.ListAndWatchResponse{Devices: devs})
312-
}
310+
s.Send(&pluginapi.ListAndWatchResponse{Devices: devs})
313311
} else {
314312
if devList, exists := resourceTypeDevs[p.Resource]; exists {
315313
exporter.PopulatePerGPUDHealth(devList, health)

0 commit comments

Comments
 (0)