Skip to content

Conversation

@mortent
Copy link
Member

@mortent mortent commented Apr 9, 2025

We currently only provide instructions for running the example driver on Kind. We should provide instructions for how to run the driver on other Kubernetes distributions, as they will include some different steps. This PR adds instructions for how to get the example driver running on GKE.

For now this just adds a new section with the new content. Ideally we should restructure the README file, so that we can provide separate instructions for getting the driver running on a cluster, and then a shared section for running the examples. I decided not to do that in this PR, as I want to make sure we have agreement on a structure first. Created an issue for this: #94

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mortent
Once this PR has been reviewed and has the lgtm label, please assign elezar for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested review from bart0sh and klueska April 9, 2025 00:21
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 9, 2025
@pohly pohly moved this from 🆕 New to 👀 In review in Dynamic Resource Allocation Apr 9, 2025
Copy link
Contributor

@nojnhuh nojnhuh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a GCP account where I could test this myself, but the docs look good overall, thanks! I left a few small comments inline.

enabled by default in GKE since 1.32.1-gke.1489001, so we will create
a cluster in the rapid channel to make sure we get a recent version.

Since DRA is still a beta feature, we need to explicitely enable it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Since DRA is still a beta feature, we need to explicitely enable it
Since DRA is still a beta feature, we need to explicitly enable it

deployments/helm/dra-example-driver
```

The examples in `demo/gpu-test{1,2,3,4,5}.yaml` works just like with Kind.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The examples in `demo/gpu-test{1,2,3,4,5}.yaml` works just like with Kind.
The examples in `demo/gpu-test{1,2,3,4,5}.yaml` work just like with Kind.

./demo/delete-cluster.sh
```

## Installing the example driver on a GKE cluster
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to also run the e2e test on GKE to make sure we don't accidentally break this. I'm not sure if there's an existing pattern to do that in Prow or if GitHub Actions makes that easy. We can definitely address that later.

--namespace dra-example-driver \
--set=resourcequota.enabled=true \
dra-example-driver \
deployments/helm/dra-example-driver
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the chart from the main branch might not always be compatible with the latest released version of the container image. Ideally we'd stick to either building the image from source for the checked out version of the chart (like the e2e CI tests do), or use compatible published release versions of each. I see published releases won't work here immediately without these unreleased changes to the chart, but we could merge the chart changes, cut a chart release, then merge the docs.

Seamless upgrade support for 1.33 is one thing I'm anticipating which will involve changes to both the image and the chart. Installing a chart that is set up for seamless upgrades and an image that isn't will likely cause issues, though that might only be in marginal cases users walking through the demo wouldn't normally hit. I haven't thought that particular scenario all the way through. For sensitive changes like that I'm confident we can work things out such that a little bit of skew is probably fine, but zero skew is of course preferred.

name: ""

resourcequota:
enabled: false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd vote to enable this for e2e tests for that extra bit of coverage like we do for the webhook which is disabled by default:

--set webhook.enabled=true \

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 5, 2025
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@pohly
Copy link
Contributor

pohly commented May 13, 2025

@mortent: please address the comments and rebase.

@pohly
Copy link
Contributor

pohly commented Jul 21, 2025

/retitle WIP: Add instructions for running the example driver on GKE

@k8s-ci-robot k8s-ci-robot changed the title Add instructions for running the example driver on GKE WIP: Add instructions for running the example driver on GKE Jul 21, 2025
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 21, 2025
@pohly pohly moved this from 👀 In review to 🏗 In progress in Dynamic Resource Allocation Sep 15, 2025
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants