kubernetes-sigs · mortent · Apr 9, 2025 · nojnhuh · Apr 9, 2025 · nojnhuh
diff --git a/README.md b/README.md
@@ -383,6 +383,49 @@ Finally, you can run the following to cleanup your environment and delete the
 ./demo/delete-cluster.sh
 ```
 
+## Installing the example driver on a GKE cluster
+It is also possible to run the example driver on a GKE cluster. For this, we
+will use the pre-built image for the kubelet plugin, so there is no need
+to build anything. All that is needed is a Google Cloud Platform account,
+the gcloud CLI and Helm.
+
+To keep things simple and identical to the Kind example, we will use a
+single-node GKE cluster.
+
+CDI must be enabled in containerd for the DRA driver to work. CDI is
+enabled by default in GKE since 1.32.1-gke.1489001, so we will create
+a cluster in the rapid channel to make sure we get a recent version.
+
+Since DRA is still a beta feature, we need to explicitely enable it
-Since DRA is still a beta feature, we need to explicitely enable it
+Since DRA is still a beta feature, we need to explicitly enable it
-Since DRA is still a beta feature, we need to explicitely enable it
+Since DRA is still a beta feature, we need to explicitly enable it
+when the cluster is created.
+
+First, create a GKE cluster with gcloud.
+```bash
+gcloud container clusters create dra-example-driver-cluster \
+--location=us-central1-c \
+--release-channel=rapid \
+--num-nodes=1 \
+--enable-kubernetes-unstable-apis=resource.k8s.io/v1beta1/deviceclasses,resource.k8s.io/v1beta1/resourceclaims,resource.k8s.io/v1beta1/resourceclaimtemplates,resource.k8s.io/v1beta1/resourceslices
+```
+
+Once the cluster is ready, we can install the DRA using Helm.
+
+The kubelet plugin in the example driver is set up to run with priority class
+`system-node-critical`. On GKE, pods are by default restricted from running
+with this priority class, so we need to use a ResourceQuota to allow it. The
+Helm chart supports, this, we just have to enable it.
+
+```bash
+helm upgrade -i \
+  --create-namespace \
+  --namespace dra-example-driver \
+  --set=resourcequota.enabled=true \
+  dra-example-driver \
+  deployments/helm/dra-example-driver
+```
+
+The examples in `demo/gpu-test{1,2,3,4,5}.yaml` works just like with Kind.
-The examples in `demo/gpu-test{1,2,3,4,5}.yaml` works just like with Kind.
+The examples in `demo/gpu-test{1,2,3,4,5}.yaml` work just like with Kind.
-The examples in `demo/gpu-test{1,2,3,4,5}.yaml` works just like with Kind.
+The examples in `demo/gpu-test{1,2,3,4,5}.yaml` work just like with Kind.
+
 ## Anatomy of a DRA resource driver
 
 TBD

diff --git a/deployments/helm/dra-example-driver/Chart.yaml b/deployments/helm/dra-example-driver/Chart.yaml
@@ -25,4 +25,6 @@ version: 0.0.0-dev
 # It is recommended to use it with quotes.
 appVersion: "v0.1.0"
 
-kubeVersion: "1.32.x"
+# The "-0" suffix is to make sure the chart works on GKE clusters, which uses versions on
+# the format 1.32.1-gke.1234567.
+kubeVersion: "1.32.x-0"
diff --git a/deployments/helm/dra-example-driver/templates/resourcequota.yaml b/deployments/helm/dra-example-driver/templates/resourcequota.yaml
@@ -0,0 +1,15 @@
+{{- if .Values.resourcequota.enabled }}
+apiVersion: v1
+kind: ResourceQuota
+metadata:
+  name: {{ include "dra-example-driver.fullname" . }}-resourcequota
+  namespace: {{ include "dra-example-driver.namespace" . }}
+spec:
+  hard:
+    pods: {{ .Values.resourcequota.pods }}
+  {{- with .Values.resourcequota.scopeSelector.matchExpressions }}
+  scopeSelector:
+    matchExpressions:
+      {{- toYaml . | nindent 4 }}
+  {{- end }}
+{{- end }}
diff --git a/deployments/helm/dra-example-driver/values.yaml b/deployments/helm/dra-example-driver/values.yaml
@@ -87,3 +87,14 @@ webhook:
     # The name of the service account to use.
     # If not set and create is true, a name is generated using the fullname template
     name: ""
+
+resourcequota:
+  enabled: false
 --set webhook.enabled=true \ 
 --set webhook.enabled=true \ 
+  pods: 10
+  scopeSelector:
+    matchExpressions:
+    - operator: In
+      scopeName: PriorityClass
+      values:
+      - system-node-critical
+      - system-cluster-critical