Dynamic Resource Allocation (DRA) recently reached GA in Kubernetes v1.35, and I believe many of us are eager to give it a try. Adding to the momentum, NVIDIA has moved dra-driver-nvidia-gpu into Kubernetes SIGs, with the documentation dropping the Beta label — a sign that the technology and its standards are gradually maturing.
For this post, I borrowed all the NVIDIA GPUs currently available at CNTUG Infra Labs to learn how to elegantly allocate devices and resources with DRA.
Friendly Plug: CNTUG Infra Labs#
CNTUG Infra Labs was founded to nurture the next generation of students and engineers in Taiwan’s software infrastructure field. The lab is hosted in Equinix’s Tokyo data center and is jointly funded by several CNTUG community members. Building the environment leverages a stack of open source projects, including OpenStack, Ceph, and Ansible.
Since infrastructure software has a steep learning curve and requires substantial compute, storage, and network resources, CNTUG Infra Labs aims to provide a cloud platform where students and community members can experiment with and host related services. We hope this will attract more students to engage with — and grow in — the infrastructure space. Spare capacity is also offered to the open source community for hosting services such as websites, Mattermost, and Jitsi Meet, or for workshop events.
Many users are already actively making open source contributions through this platform — see the use cases for details.
With that, let’s get back to introducing the lab environment.
Lab Environment#
We’ll use a Kubernetes cluster built with Cluster API + OpenStack. For brevity, the setup process is omitted here — feel free to refer to other blog posts for the details, or wait for a future post once I finish writing it up.
- OS: Ubuntu 24.04
- Kubernetes v1.35.3
- Containerd 2.2.2
- Node:
- 1 Control Plane + etcd
- 3 Workers
- No GPU
- T10 * 2
- A5000 * 1
- NVIDIA GPU Operator v26.3.1
- NVIDIA DRA Driver GPU v25.12.0
Running kubectl get node should return something like:
NAME STATUS ROLES AGE VERSION
capi-dralabs-control-plane-xtcth Ready control-plane 8m7s v1.35.3
capi-dralabs-md-0-p4xkh-rpfxc Ready <none> 6m55s v1.35.3
capi-dralabs-md-gpua5000-jw4mx-d64jz Ready <none> 2m37s v1.35.3
capi-dralabs-md-gput10-gzl84-f2m2d Ready <none> 6m49s v1.35.3Installing NVIDIA GPU Operator#
Before installing the GPU Operator, label the Nodes that have GPUs. For my environment, this looks like:
kubectl label node capi-dralabs-md-gpua5000-jw4mx-d64jz nvidia.com/dra-kubelet-plugin=true
kubectl label node capi-dralabs-md-gput10-gzl84-f2m2d nvidia.com/dra-kubelet-plugin=trueAdd the NVIDIA Chart Repo:
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm repo updateCreate a values-gpu-operator.yaml file that we’ll use during installation:
1# version: v26.3.1
2devicePlugin:
3 enabled: false
4
5driver:
6 manager:
7 env:
8 - name: NODE_LABEL_FOR_GPU_POD_EVICTION
9 value: "nvidia.com/dra-kubelet-plugin"If you’re using a different Kubernetes distribution (e.g., Rancher or K3s), the default Containerd installation path may differ — remember to add the following settings to values-gpu-operator.yaml:
1toolkit:
2 env:
3 - name: CONTAINERD_SOCKET
4 value: /run/k3s/containerd/containerd.sockInstall NVIDIA GPU Operator:
helm upgrade --install gpu-operator nvidia/gpu-operator \
--version=v26.3.1 \
--create-namespace \
--namespace gpu-operator -f values-gpu-operator.yamlWait for the GPU Operator to come up. It will install the NVIDIA GPU Driver and tweak the Container Runtime configuration. For specific tuning needs, refer to the NVIDIA official documentation.
Installing NVIDIA DRA Driver GPU#
Create a values-nvidia-dra-driver-gpu.yaml file that we’ll use during installation:
1# version: 25.12.0
2nvidiaDriverRoot: /run/nvidia/driver
3gpuResourcesEnabledOverride: true
4image:
5 pullPolicy: IfNotPresent
6kubeletPlugin:
7 nodeSelector:
8 nvidia.com/dra-kubelet-plugin: "true"
9resources:
10 gpus:
11 enabled: true
12 computeDomains:
13 enabled: false # No NVLink here
14# featureGates:
15# TimeSlicingSettings: trueIf you’d like to try out Scenario IV’s GPU Time Slicing later, you can enable the TimeSlicingSettings Feature Gate now; otherwise, leave it commented out and helm upgrade later when needed.
Install NVIDIA DRA Driver GPU:
helm upgrade -i nvidia-dra-driver-gpu nvidia/nvidia-dra-driver-gpu \
--version="25.12.0" \
--namespace nvidia-dra-driver-gpu \
--create-namespace -f values-nvidia-dra-driver-gpu.yamlUse kubectl get pod to confirm the NVIDIA DRA Driver GPU is up:
kubectl get pod -n nvidia-dra-driver-gpuNAME READY STATUS RESTARTS AGE
nvidia-dra-driver-gpu-kubelet-plugin-6skhp 1/1 Running 0 10m
nvidia-dra-driver-gpu-kubelet-plugin-jswk6 1/1 Running 0 10mA First Look at DRA#
DeviceClass#
Once installed, you’ll find that DeviceClass and ResourceSlice have been set up by NVIDIA DRA Driver GPU. As the name suggests, DeviceClass represents categories of devices — opening it up reveals regular GPUs, MIG, and VFIO. (If ComputeDomains isn’t disabled, you’ll also see ComputeDomains information.)
kubectl get deviceclassNAME AGE
gpu.nvidia.com 44m
mig.nvidia.com 44m
vfio.gpu.nvidia.com 44mResourceSlice#
ResourceSlice is automatically updated by the DRA driver on each node, recording all devices the driver manages on that node.
Devices on the same node managed by the same driver belong to the same Pool. When the device count exceeds what fits in a single object (up to 128 entries, or 64 if any device uses taints or counters), the driver splits the Pool across multiple ResourceSlices.
.spec.pool.generation and .spec.pool.resourceSliceCount let the scheduler determine whether it has the complete and latest device list for a given node.
kubectl get resourcesliceNAME NODE DRIVER POOL AGE
capi-dralabs-md-gpua5000-jw4mx-d64jz-gpu.nvidia.com-w9fnv capi-dralabs-md-gpua5000-jw4mx-d64jz gpu.nvidia.com capi-dralabs-md-gpua5000-jw4mx-d64jz 5m13s
capi-dralabs-md-gput10-gzl84-f2m2d-gpu.nvidia.com-dtgtc capi-dralabs-md-gput10-gzl84-f2m2d gpu.nvidia.com capi-dralabs-md-gput10-gzl84-f2m2d 23mYou can expand the full content with -o yaml:
kubectl get resourceslices -o yamlClick the panel below to see the full output. Each ResourceSlice records its node in .metadata.ownerReferences and the devices in .spec.devices. Every device carries .attributes including (but not limited to) architecture, product name, and driver version.
Since each node in this lab has at most 2 GPUs — far below a single ResourceSlice’s 128-entry limit — every node only shows one ResourceSlice.
Full ResourceSlice output
1apiVersion: v1
2items:
3- apiVersion: resource.k8s.io/v1
4 kind: ResourceSlice
5 metadata:
6 creationTimestamp: "2026-05-04T14:40:17Z"
7 generateName: capi-dralabs-md-gpua5000-jw4mx-d64jz-gpu.nvidia.com-
8 generation: 1
9 name: capi-dralabs-md-gpua5000-jw4mx-d64jz-gpu.nvidia.com-w9fnv
10 ownerReferences:
11 - apiVersion: v1
12 controller: true
13 kind: Node
14 name: capi-dralabs-md-gpua5000-jw4mx-d64jz
15 uid: 83aafab6-7eb3-42d0-9faf-6118f78341ef
16 resourceVersion: "11490"
17 uid: d03fd27e-f6cb-4386-ae61-80aa84309e77
18 spec:
19 devices:
20 - attributes:
21 addressingMode:
22 string: HMM
23 architecture:
24 string: Ampere
25 brand:
26 string: NvidiaRTX
27 cudaComputeCapability:
28 version: 8.6.0
29 cudaDriverVersion:
30 version: 13.0.0
31 driverVersion:
32 version: 580.126.20
33 productName:
34 string: NVIDIA RTX A5000
35 resource.kubernetes.io/pciBusID:
36 string: "0000:00:06.0"
37 resource.kubernetes.io/pcieRoot:
38 string: pci0000:00
39 type:
40 string: gpu
41 uuid:
42 string: GPU-e13ce856-7474-797f-d143-16e99b65c0c3
43 capacity:
44 memory:
45 value: 23028Mi
46 name: gpu-0
47 driver: gpu.nvidia.com
48 nodeName: capi-dralabs-md-gpua5000-jw4mx-d64jz
49 pool:
50 generation: 1
51 name: capi-dralabs-md-gpua5000-jw4mx-d64jz
52 resourceSliceCount: 1
53- apiVersion: resource.k8s.io/v1
54 kind: ResourceSlice
55 metadata:
56 creationTimestamp: "2026-05-04T14:21:53Z"
57 generateName: capi-dralabs-md-gput10-gzl84-f2m2d-gpu.nvidia.com-
58 generation: 1
59 name: capi-dralabs-md-gput10-gzl84-f2m2d-gpu.nvidia.com-dtgtc
60 ownerReferences:
61 - apiVersion: v1
62 controller: true
63 kind: Node
64 name: capi-dralabs-md-gput10-gzl84-f2m2d
65 uid: d7ecdc93-1d6c-4868-8503-4251bcf8cf3b
66 resourceVersion: "7408"
67 uid: 66f32713-c547-4369-84de-97f86430d18d
68 spec:
69 devices:
70 - attributes:
71 addressingMode:
72 string: HMM
73 architecture:
74 string: Turing
75 brand:
76 string: Nvidia
77 cudaComputeCapability:
78 version: 7.5.0
79 cudaDriverVersion:
80 version: 13.0.0
81 driverVersion:
82 version: 580.126.20
83 productName:
84 string: Tesla T10
85 resource.kubernetes.io/pciBusID:
86 string: "0000:00:06.0"
87 resource.kubernetes.io/pcieRoot:
88 string: pci0000:00
89 type:
90 string: gpu
91 uuid:
92 string: GPU-dae084a2-974c-00e2-6dec-4ba1999b8652
93 capacity:
94 memory:
95 value: 16Gi
96 name: gpu-0
97 - attributes:
98 addressingMode:
99 string: HMM
100 architecture:
101 string: Turing
102 brand:
103 string: Nvidia
104 cudaComputeCapability:
105 version: 7.5.0
106 cudaDriverVersion:
107 version: 13.0.0
108 driverVersion:
109 version: 580.126.20
110 productName:
111 string: Tesla T10
112 resource.kubernetes.io/pciBusID:
113 string: "0000:00:07.0"
114 resource.kubernetes.io/pcieRoot:
115 string: pci0000:00
116 type:
117 string: gpu
118 uuid:
119 string: GPU-d1bf2033-42f6-096c-b0c6-470575bc08df
120 capacity:
121 memory:
122 value: 16Gi
123 name: gpu-1
124 driver: gpu.nvidia.com
125 nodeName: capi-dralabs-md-gput10-gzl84-f2m2d
126 pool:
127 generation: 1
128 name: capi-dralabs-md-gput10-gzl84-f2m2d
129 resourceSliceCount: 1
130kind: List
131metadata:
132 resourceVersion: ""With this information, how does a Pod tell Kubernetes which devices it wants? That’s where ResourceClaim and ResourceClaimTemplate come in!
ResourceClaim & ResourceClaimTemplate#
If you’d like multiple Pods to share the same device, you can manually create a ResourceClaim. It stays fully independent regardless of Pod creation or deletion.

What if you want each Pod to have its own dedicated device? ResourceClaimTemplate lets you predefine a ResourceClaim. Once a Deployment references the template by name, every new Pod automatically gets a corresponding ResourceClaim; conversely, deleting the Pod removes its claim.

Do these concepts feel familiar? DRA is modeled after Storage in Kubernetes — PersistentVolumeClaim and PersistentVolumeClaimTemplate (the latter only existing inside StatefulSet), with DeviceClass playing roughly the role of StorageClass.
Hands-On with DRA#
Scenario I: Two Containers Sharing One GPU#
Use a ResourceClaim to declare that we need one NVIDIA GPU, then run a Pod with two containers that share it.
1apiVersion: resource.k8s.io/v1
2kind: ResourceClaim
3metadata:
4 name: must-nvidia-gpu
5spec:
6 devices:
7 requests:
8 - name: gpu
9 exactly:
10 deviceClassName: gpu.nvidia.com
11 count: 1Apply the resource:
kubectl apply -f lab01-rc.yamlUse get to inspect the ResourceClaim:
kubectl get resourceclaimThe status is pending because no Pod is consuming it yet.
NAME STATE AGE
must-nvidia-gpu pending 10sNow define a Pod with two containers, both referencing the must-nvidia-gpu ResourceClaim we just created.
1apiVersion: v1
2kind: Pod
3metadata:
4 name: must-nvidia-gpu-pod
5spec:
6 restartPolicy: Never
7 containers:
8 - name: ctr0
9 image: ubuntu:24.04
10 command: ["bash", "-c"]
11 args: ["nvidia-smi -L; trap 'exit 0' TERM; sleep 9999 & wait"]
12 resources:
13 claims:
14 - name: gpu
15 - name: ctr1
16 image: ubuntu:24.04
17 command: ["bash", "-c"]
18 args: ["nvidia-smi -L; trap 'exit 0' TERM; sleep 9999 & wait"]
19 resources:
20 claims:
21 - name: gpu
22 resourceClaims:
23 - name: gpu
24 resourceClaimName: must-nvidia-gpuApply the Pod:
kubectl apply -f lab01-pod.yamlCheck the ResourceClaim again:
kubectl get resourceclaimThe status changes to allocated and reserved because a Pod is now using the resource.
NAME STATE AGE
must-nvidia-gpu allocated,reserved 16sNow we can use logs to print the output:
kubectl logs pod must-nvidia-gpu-pod --all-containers --prefix[pod/must-nvidia-gpu-pod/ctr0] GPU 0: Tesla T10 (UUID: GPU-dae084a2-974c-00e2-6dec-4ba1999b8652)
[pod/must-nvidia-gpu-pod/ctr1] GPU 0: Tesla T10 (UUID: GPU-dae084a2-974c-00e2-6dec-4ba1999b8652)In practice, it might not be a T10 — it could just as easily be an A5000.
Now delete the Pod:
kubectl delete -f lab01-pod.yamlCheck the ResourceClaim once more:
kubectl get resourceclaimThe status returns to pending because no Pod is using the resource anymore.
NAME STATE AGE
must-nvidia-gpu pending 3m39sDelete the ResourceClaim:
kubectl delete -f lab01-rc.yamlThe example above only asked for one GPU but didn’t tell us which one we’d get.
This scenario isn’t all that different from the original Device Plugin, right? The next scenarios are where DRA truly shines!
Scenario II: ResourceClaimTemplate — Prefer A5000 in a Deployment#
Today, an engineer asks me for an inference model that prefers the A5000, but since A5000s are scarce, they’re fine falling back to T10 when scaling up.
Beyond exactly, ResourceClaim also supports firstAvailable for ranked preferences. Going back to the full ResourceSlice output, we can target GPUs by name using .attributes.productName.
The configuration looks like this:
1apiVersion: resource.k8s.io/v1
2kind: ResourceClaimTemplate
3metadata:
4 name: first-a5000
5spec:
6 spec:
7 devices:
8 requests:
9 - name: gpu
10 firstAvailable:
11 - name: a5000
12 deviceClassName: gpu.nvidia.com
13 selectors:
14 - cel:
15 expression: device.attributes["gpu.nvidia.com"].productName == "NVIDIA RTX A5000"
16 - name: fallback-t10
17 deviceClassName: gpu.nvidia.com
18 selectors:
19 - cel:
20 expression: device.attributes["gpu.nvidia.com"].productName == "Tesla T10"
21---
22apiVersion: apps/v1
23kind: Deployment
24metadata:
25 name: first-a5000-deploy
26 labels:
27 app: first-a5000-deploy
28spec:
29 replicas: 1
30 selector:
31 matchLabels:
32 app: first-a5000-deploy
33 template:
34 metadata:
35 labels:
36 app: first-a5000-deploy
37 spec:
38 containers:
39 - name: ctr0
40 image: ubuntu:24.04
41 command: ["bash", "-c"]
42 args: ["nvidia-smi -L; trap 'exit 0' TERM; sleep 9999 & wait"]
43 resources:
44 claims:
45 - name: gpu
46 resourceClaims:
47 - name: gpu
48 resourceClaimTemplateName: first-a5000Save the above as lab02.yaml and apply it:
kubectl apply -f lab02.yamlConfirm the Pod is Running and use the nvidia-smi -L output to verify it got the A5000:
kubectl get pod
kubectl logs deployments/first-a5000-deploy --all-podsNAME READY STATUS RESTARTS AGE
first-a5000-deploy-8c6cf4568-2lsv9 1/1 Running 0 9s
[pod/first-a5000-deploy-8c6cf4568-2lsv9/ctr0] GPU 0: NVIDIA RTX A5000 (UUID: GPU-e13ce856-7474-797f-d143-16e99b65c0c3)Now scale up to 2 replicas to see which GPU the new Pod gets:
kubectl scale deployment first-a5000-deploy --replicas 2
kubectl logs deployments/first-a5000-deploy --all-pods[pod/first-a5000-deploy-8c6cf4568-2lsv9/ctr0] GPU 0: NVIDIA RTX A5000 (UUID: GPU-e13ce856-7474-797f-d143-16e99b65c0c3)
[pod/first-a5000-deploy-8c6cf4568-865jj/ctr0] GPU 0: Tesla T10 (UUID: GPU-dae084a2-974c-00e2-6dec-4ba1999b8652)The first Pod has taken the only A5000, so the second Pod falls back to T10 — exactly the expected behavior of firstAvailable when the top choice is unavailable.
We can also see the corresponding ResourceClaims:
kubectl get resourceclaimNAME STATE AGE
first-a5000-deploy-8c6cf4568-2lsv9-gpu-bdz9j allocated,reserved 4m29s
first-a5000-deploy-8c6cf4568-865jj-gpu-mqfcx allocated,reserved 3m29sIf we delete the A5000 Pod, will the rebuilt Pod return to A5000?
With the configuration above, no, it won’t return to A5000. The Deployment default strategy.type is RollingUpdate; while the old Pod is Terminating, its ResourceClaim hasn’t been released yet.
The Deployment controller immediately creates a new Pod and a new ResourceClaim from the ResourceClaimTemplate. Since the A5000 is still held by the old Pod, the new claim falls back to T10.
Finally, clean up:
kubectl delete -f lab02.yamlScenario III: GPUs With at Least 20 GiB of Memory#
Today an engineer wants to deploy an LLM that needs a single GPU with at least 20 GiB of memory. Since it’s still in testing, compute requirements are flexible — any available GPU meeting the memory threshold will do.
Beyond .attributes, we can also use .capacity.memory. How do we express comparison rules? Take a look at line 15:
1apiVersion: resource.k8s.io/v1
2kind: ResourceClaimTemplate
3metadata:
4 name: gt20g
5spec:
6 spec:
7 devices:
8 requests:
9 - name: gpu
10 firstAvailable:
11 - name: gt20g
12 deviceClassName: gpu.nvidia.com
13 selectors:
14 - cel:
15 expression: device.capacity["gpu.nvidia.com"].memory.isGreaterThan(quantity("20Gi"))
16---
17apiVersion: apps/v1
18kind: Deployment
19metadata:
20 name: gt20g-deploy
21 labels:
22 app: gt20g-deploy
23spec:
24 replicas: 1
25 selector:
26 matchLabels:
27 app: gt20g-deploy
28 template:
29 metadata:
30 labels:
31 app: gt20g-deploy
32 spec:
33 containers:
34 - name: ctr0
35 image: ubuntu:24.04
36 command: ["bash", "-c"]
37 args: ["nvidia-smi -L; trap 'exit 0' TERM; sleep 9999 & wait"]
38 resources:
39 claims:
40 - name: gpu
41 resourceClaims:
42 - name: gpu
43 resourceClaimTemplateName: gt20gWe use CEL’s isGreaterThan(quantity("20Gi")) to require more than 20 GiB.
Apply the YAML:
kubectl apply -f lab03.yamlConfirm the Pod is Running and that we got the A5000:
kubectl get pod
kubectl logs deployments/gt20g-deploy --all-podsNAME READY STATUS RESTARTS AGE
gt20g-deploy-5ff576476-hdz8f 1/1 Running 0 5m16s
[pod/gt20g-deploy-5ff576476-hdz8f/ctr0] GPU 0: NVIDIA RTX A5000 (UUID: GPU-e13ce856-7474-797f-d143-16e99b65c0c3)Now let’s see what happens when we scale up:
kubectl scale deployment gt20g-deploy --replicas 2After scaling, check whether a new Pod was added:
kubectl get podNAME READY STATUS RESTARTS AGE
gt20g-deploy-5ff576476-hdz8f 1/1 Running 0 8m9s
gt20g-deploy-5ff576476-vjss8 0/1 Pending 0 26sRun describe on the gt20g-deploy-5ff576476-vjss8 Pod:
kubectl describe pod gt20g-deploy-5ff576476-vjss8Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 98s default-scheduler 0/4 nodes are available: 1 node(s) had untolerated taint(s), 3 cannot allocate all claims. still not schedulable, preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling.Since there’s no other GPU with at least 20 GiB of memory in the cluster — the T10 has only 16 GiB — the new Pod is stuck in Pending.
Clean up the resources:
kubectl delete -f lab03.yamlFor more sophisticated matching logic, see CEL in Kubernetes.
Scenario IV: GPU Time Slicing in DRA#
As of 2026/5/10, neither the NVIDIA official documentation nor the NVIDIA DRA Driver GPU wiki contains any tutorial on Time Slicing.
The configuration below is adapted from demo/specs/quickstart/v1/gpu-test5.yaml, supplemented by reading parts of the source code; the Feature Gate part draws from third-party articles.
The setup may change in future releases — keep that in mind!
Today, an engineer comes back to me asking: “I know DRA is great for resource allocation, but is there a way to fall back to Time Slicing mode?”
Want to go back to the old mode? Nooo… problem at all!
Just specify the device under .spec.devices.config and switch the sharing strategy to TimeSlicing. Here’s an example:
1apiVersion: resource.k8s.io/v1
2kind: ResourceClaim
3metadata:
4 name: time-slicing-manual
5spec:
6 devices:
7 requests:
8 - name: ts-gpu
9 exactly:
10 deviceClassName: gpu.nvidia.com
11 config:
12 - requests: ["ts-gpu"]
13 opaque:
14 driver: gpu.nvidia.com
15 parameters:
16 apiVersion: resource.nvidia.com/v1beta1
17 kind: GpuConfig
18 sharing:
19 strategy: TimeSlicing
20 timeSlicingConfig:
21 interval: Long
22---
23apiVersion: apps/v1
24kind: Deployment
25metadata:
26 name: ts-gpu-deployment
27spec:
28 replicas: 4
29 selector:
30 matchLabels:
31 app: ts-gpu
32 template:
33 metadata:
34 labels:
35 app: ts-gpu
36 spec:
37 containers:
38 - name: ctr
39 image: nvcr.io/nvidia/k8s/cuda-sample:nbody-cuda11.6.0-ubuntu18.04
40 command: ["bash", "-c"]
41 args: ["trap 'exit 0' TERM; /tmp/sample --benchmark --numbodies=4226000 & wait"]
42 resources:
43 claims:
44 - name: gpu
45 resourceClaims:
46 - name: gpu
47 resourceClaimName: time-slicing-manualSave the above as lab04.yaml and apply it:
kubectl apply -f lab04.yamlVerify all the Pods are running:
kubectl get podNAME READY STATUS RESTARTS AGE
ts-gpu-deployment-549c945798-6t2dx 1/1 Running 0 4s
ts-gpu-deployment-549c945798-tlgp4 1/1 Running 0 4s
ts-gpu-deployment-549c945798-x2gbv 1/1 Running 0 4s
ts-gpu-deployment-549c945798-xbv22 1/1 Running 0 4sSince all 4 Pods share the same ResourceClaim, kubectl get resourceclaim returns only a single entry — which itself is evidence that they’re sharing:
kubectl get resourceclaimNAME STATE AGE
time-slicing-manual allocated,reserved 30sDrill in with describe and you’ll see all 4 Pods listed under Reserved For:
kubectl describe resourceclaim time-slicing-manualStatus:
Allocation:
Devices:
Config:
Opaque:
Driver: gpu.nvidia.com
Parameters:
API Version: resource.nvidia.com/v1beta1
Kind: GpuConfig
Sharing:
Strategy: TimeSlicing
Time Slicing Config:
Interval: Long
Requests:
ts-gpu
Source: FromClaim
Results:
Device: gpu-0
Driver: gpu.nvidia.com
Pool: capi-dralabs-md-gpua5000-jw4mx-d64jz
Request: ts-gpu
Node Selector:
Node Selector Terms:
Match Fields:
Key: metadata.name
Operator: In
Values:
capi-dralabs-md-gpua5000-jw4mx-d64jz
Reserved For:
Name: ts-gpu-deployment-549c945798-x2gbv
Resource: pods
UID: be21eecf-2d58-4891-9cac-ac3674f4ff09
Name: ts-gpu-deployment-549c945798-6t2dx
Resource: pods
UID: 37d504a0-966e-4263-9fcd-b713d73c0e77
Name: ts-gpu-deployment-549c945798-xbv22
Resource: pods
UID: 2b5ec297-9889-4cb9-8079-b626a854b292
Name: ts-gpu-deployment-549c945798-tlgp4
Resource: pods
UID: ee836856-8c6b-4161-9582-4bc4ff63a606That’s how Time Slicing is enabled under DRA — the effect is similar to the legacy Device Plugin’s Time Slicing mode, except we don’t need to specify how many slices to divide into. We just configure timeSlicingConfig and its interval.
Finally, clean up the resources:
kubectl delete -f lab04.yamlSummary#
Compared to the Device Plugin, DRA now offers a much cleaner usage model that lets developers and cluster admins allocate devices more precisely. There’s no longer a need to colocate the same kind of device on the same node, nor to write complex rules in nodeSelector or Affinity.
Starting with K8s v1.36, device health reporting is also available, so Pods no longer simply show Error — we can tell whether the failure stems from the device or from the application.
Previously, when a K8s cluster ran low on CPU or memory, Cluster Autoscaler could spin up new machines. In the future, the same may apply to GPU shortages — Cluster Autoscaler may provision GPU nodes on demand, enabling more efficient resource allocation.
Closing Thoughts#
A couple of years ago I bought an A5000 for some testing needs and parked it at Infra Labs, but it felt like a waste that the card sat there with no further experiments — and being too busy, it kept on collecting dust and consuming power in the machine room.
When dra-driver-nvidia-gpu made it into Kubernetes SIGs, I remembered that the lab also has two T10 cards available, which became the catalyst for this lab.
I’ve also been watching the rise of various AI Gateways, plus the Gateway API extension — Gateway API Inference Extension. I’ll write about those some other time!