Add vLLM #221

doringeman · 2025-10-13T15:40:45Z

Building v0.0.14-rc1.

Make sure you're using a Docker Engine context with nvidia runtime.

Build:

docker build -t docker/model-runner:test-vllm-cuda \
    --target final-vllm \
    --build-arg LLAMA_SERVER_VARIANT=cuda \
    --build-arg BASE_IMAGE=nvidia/cuda:12.9.0-runtime-ubuntu24.04 \
    --build-arg VLLM_VERSION=0.11.0 --platform linux/amd64 \
    .

Run:

docker run --rm -p 8080:8080 -e MODEL_RUNNER_PORT=8080 --gpus all --runtime=nvidia docker/model-runner:test-vllm-cuda

Test:

MODEL_RUNNER_HOST=http://localhost:8080 docker model run --backend vllm aistaging/smollm2-vllm hi

Loading the model is expected to take ~30s on Docker Offload.
E.g.,

$ MODEL_RUNNER_HOST=http://localhost:8080 docker model run --backend vllm aistaging/smollm2-vllm hi
Unable to find model 'aistaging/smollm2-vllm' locally. Pulling from the server.
Downloaded 272.05MB of 272.05MB
Model pulled successfully
Hello there! I’m here to help with an interesting task. What can I assist you with?

Summary by Sourcery

Add support for building and publishing a vLLM-enabled CUDA Docker image by extending the Dockerfile and GitHub Actions release workflow.

New Features:

Introduce a new vLLM variant in the Dockerfile that installs uv and the specified vLLM version
Add vllmVersion input and corresponding tagging logic in the release workflow to handle vLLM CUDA images
Add a build-and-push step for the vLLM CUDA image in GitHub Actions using docker/build-push-action

Summary by Sourcery

Add vLLM backend support to model-runner, including backend implementation, scheduler integration, Docker build for vLLM CUDA images, CLI updates, and associated tests.

New Features:

Introduce vLLM backend with installation, run, status, disk usage, and memory estimation support
Automatically select vLLM backend for safetensors-format models in the scheduler
Add a vLLM-CUDA Docker image variant in Dockerfile and GitHub Actions release workflow
Extend CLI to recognize and document the vLLM backend as a valid option

Enhancements:

Sanitize backend arguments for safe logging and consolidate format compatibility logic in distribution client
Add platform.SupportsVLLM utility and refactor runner and loader to pass modelRef parameter
Upgrade GitHub Actions references to fixed commit SHAs across workflows

Build:

Add Dockerfile target final-vllm and supporting environment setup for vLLM

CI:

Add build-and-push step for the vLLM CUDA image and update CI workflows with action version pins

Documentation:

Update CLI reference documentation to include vLLM in backend options

Tests:

Add unit tests for vLLM configuration argument generation
Extend distribution client tests for platform-dependent safetensors support

sourcery-ai · 2025-10-13T15:40:51Z

Reviewer's Guide

This PR adds comprehensive vLLM support across the codebase: a complete vLLM backend implementation with configuration and tests, platform detection and scheduler integration for Safetensors routing, updates to the Backend interface and CLI/docs for vLLM, a new Dockerfile variant and CI steps to build and publish CUDA-enabled vLLM images, and a cleanup of obsolete distribution utilities with a new log sanitization helper.

Sequence diagram for scheduler routing Safetensors models to vLLM backend

sequenceDiagram
    participant User
    participant Scheduler
    participant ModelManager
    participant vLLMBackend
    User->>Scheduler: POST /v1/completions (Safetensors model)
    Scheduler->>ModelManager: Get model config
    ModelManager-->>Scheduler: Return config (Format=Safetensors)
    Scheduler->>vLLMBackend: Route request to vLLM backend
    vLLMBackend->>Scheduler: Process request
    Scheduler-->>User: Return response

Class diagram for new and updated vLLM backend types

classDiagram
    class vLLM {
        - log logging.Logger
        - modelManager *models.Manager
        - serverLog logging.Logger
        - config *Config
        - status string
        + Name() string
        + UsesExternalModelManagement() bool
        + Install(ctx context.Context, httpClient *http.Client) error
        + Run(ctx context.Context, socket string, model string, modelRef string, mode inference.BackendMode, backendConfig *inference.BackendConfiguration) error
        + Status() string
        + GetDiskUsage() (int64, error)
        + GetRequiredMemoryForModel(ctx context.Context, model string, config *inference.BackendConfiguration) (inference.RequiredMemory, error)
        - binaryPath() string
    }
    class Config {
        + Args []string
        + GetArgs(bundle types.ModelBundle, socket string, mode inference.BackendMode, config *inference.BackendConfiguration) ([]string, error)
        + NewDefaultVLLMConfig() *Config
    }
    vLLM --> Config
    class inference.Backend {
        <<interface>>
        + Run(ctx context.Context, socket string, model string, modelRef string, mode BackendMode, config *BackendConfiguration) error
        + Status() string
        + GetDiskUsage() (int64, error)
        + GetRequiredMemoryForModel(ctx context.Context, model string, config *BackendConfiguration) (RequiredMemory, error)
    }
    vLLM ..|> inference.Backend

File-Level Changes

Change	Details	Files
Implement full vLLM backend	Extend New() to accept serverLog and default Config Implement Install() with platform check and version detection Build Run() to generate args, launch sandbox, handle errors and modelRef Add Status(), GetDiskUsage(), GetRequiredMemoryForModel(), binaryPath()	`pkg/inference/backends/vllm/vllm.go`
Add vLLM configuration and unit tests	Define Config.GetArgs and GetMaxModelLen in vllm_config.go Cover argument generation and context-size logic in vllm_config_test.go	`pkg/inference/backends/vllm/vllm_config.go` `pkg/inference/backends/vllm/vllm_config_test.go`
Integrate platform detection and scheduler/distribution routing	Introduce platform.SupportsVLLM() util Auto-route Safetensors models to vLLM in scheduler Update distribution client to support Safetensors on Linux Extend run/loader to pass modelRef through runner calls	`pkg/inference/platform/platform.go` `pkg/inference/scheduling/scheduler.go` `pkg/distribution/distribution/client.go` `pkg/inference/scheduling/runner.go` `pkg/inference/scheduling/loader.go` `pkg/inference/backend.go`
Support vLLM in Dockerfile and CI workflows	Add vllm build stage in Dockerfile with python/uv/vLLM install Extend release.yml to take vllmVersion, tag and push vllm-cuda images Pin and update GitHub Actions versions across CI workflows	`Dockerfile` `.github/workflows/release.yml` `.github/workflows/cli-build.yml` `.github/workflows/cli-validate.yml` `.github/workflows/ci.yml` `.github/workflows/dmr-daily-check.yml`
Extend Backend interface, CLI and docs for vLLM	Update Backend.Run signature to include modelRef Adapt llama.cpp and mlx backends to new signature and sanitize args Register vLLM in main and add it to ValidBackends Document vLLM option in CLI reference YAMLs	`pkg/inference/backend.go` `pkg/inference/backends/llamacpp/llamacpp.go` `pkg/inference/backends/mlx/mlx.go` `main.go` `cmd/cli/commands/backend.go` `cmd/cli/docs/reference/docker_model_list.yaml` `cmd/cli/docs/reference/docker_model_run.yaml`
Remove obsolete distribution utils and introduce log sanitization	Delete old FormatBytes/ShowProgress/ReadContent from distribution utils Add SanitizeForLog helper in internal/utils/log.go Update imports and logging calls to use new sanitization utility	`pkg/distribution/internal/utils/utils.go` `pkg/internal/utils/log.go` `pkg/distribution/distribution/client.go` `pkg/inference/backends/llamacpp/llamacpp.go`

Possibly linked issues

#Feature Request: vLLM Backend: The PR adds vLLM backend support, including Docker image builds, CI/CD integration, and CLI updates, directly addressing the feature request.

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

Copilot

Pull Request Overview

This PR adds a new vLLM CUDA Docker image variant to support vLLM (a fast LLM inference engine) alongside the existing model runner functionality.

Adds a new Docker build stage for vLLM with CUDA support
Introduces a new GitHub Actions workflow job to build and push the vLLM image
Includes configuration for vLLM version parameterization

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
Dockerfile	Adds new vLLM build stage that installs vLLM using uv package manager
.github/workflows/release.yml	Adds workflow input parameter and build job for vLLM CUDA image

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

.github/workflows/release.yml

Dockerfile

.github/workflows/release.yml

sourcery-ai

New security issues found

.github/workflows/release.yml

Copilot

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-14T06:50:24Z

.github/workflows/release.yml

+          echo "vllm-cuda<<EOF" >> "$GITHUB_OUTPUT"
+          echo "docker/model-runner:${{ inputs.releaseTag }}-vllm-cuda" >> "$GITHUB_OUTPUT"
+          if [ "${{ inputs.pushLatest }}" == "true" ]; then
+            echo "docker/model-runner:latest-vllm-cuda" >> "$GITHUB_OUTPUT"
+          fi
+          echo 'EOF' >> "$GITHUB_OUTPUT"


The tag generation logic for vllm-cuda duplicates the pattern used for the cuda tags above. Consider extracting this into a reusable function or template to reduce code duplication and improve maintainability.

Dockerfile

Copilot

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot

Pull Request Overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

pkg/inference/backends/vllm/vllm.go

pkg/distribution/distribution/client.go

pkg/inference/backends/vllm/vllm.go

.github/workflows/release.yml

pkg/inference/backends/vllm/vllm.go

Copilot

Pull Request Overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

pkg/distribution/internal/bundle/unpack.go

pkg/inference/backends/vllm/vllm.go

Copilot

Pull Request Overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

pkg/inference/backends/vllm/vllm.go:1

The log message uses 'modelID' in the format string but should use 'model' to be consistent with the variable name pattern used elsewhere in the codebase.

package vllm

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

pkg/inference/backends/vllm/vllm.go

Copilot

Pull Request Overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

pkg/inference/backends/vllm/vllm.go:1

The log message refers to 'modelID' but should be consistent with the parameter name 'model' used elsewhere in the codebase for clarity.

package vllm

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

pkg/inference/backends/vllm/vllm.go

Reorganize Dockerfile stages to avoid unnecessary layer copying and improve build caching. Split final stages into final-llamacpp and final-vllm variants, copying the model-runner binary only in the final stages rather than in intermediate ones to profit of the build cache when only the model-runner binary has been changed. Signed-off-by: Dorin Geman <dorin.geman@docker.com>

Signed-off-by: Dorin Geman <dorin.geman@docker.com>

Add modelRef parameter to Backend.Run() interface to support serving models with vLLM under their reference names. Update vLLM backend to use modelRef with --served-model-name flag. Signed-off-by: Dorin Geman <dorin.geman@docker.com>

…n Linux This reverts commit 1ab1ead. Signed-off-by: Dorin Geman <dorin.geman@docker.com>

Signed-off-by: Dorin Geman <dorin.geman@docker.com>

Copilot

Pull Request Overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

pkg/inference/scheduling/scheduler.go

Copilot · 2025-10-21T09:48:46Z

pkg/inference/scheduling/runner.go

+			log.Warnf("Backend %s running modelID %s exited with error: %v",
+				backend.Name(), modelID, err,


[nitpick] Log message incorrectly says 'modelID' but the context suggests this should be 'model' to match the previous naming convention, or the variable name should remain as 'model' since modelID is the internal identifier. The original message 'running model' was clearer.

Suggested change

log.Warnf("Backend %s running modelID %s exited with error: %v",

backend.Name(), modelID, err,

log.Warnf("Backend %s running model %s exited with error: %v",

backend.Name(), modelRef, err,

Dockerfile

Signed-off-by: Dorin Geman <dorin.geman@docker.com>

ericcurtin · 2025-10-21T11:04:48Z

Dockerfile

+
+# Install uv and vLLM as modelrunner user
+RUN curl -LsSf https://astral.sh/uv/install.sh | sh \
+ && ~/.local/bin/uv venv --python /usr/bin/python3 /opt/vllm-env \


If we change this to copy from the vllm/vllm-openai:v0.11.0 container we get DGX Spark support (I know I suggested doing it this less hacky way, apologies, didn't realize the container had aarch64 and this way doesn't appear to).

Could be a follow on PR too.

Or this would be even better install the wheels from here:

https://wheels.vllm.ai/b8b302cde434df8c9289a2b465406b47ebab1c2d/vllm/

That commit sha is the 0.11.0 one.

They tipped me off in vLLM stack that they build CUDA x86_64 and aarch64 wheels for every commit. So this is the same thing, but has an aarch64 version also.

Be better than the hacky container copy (which is prone to error, missing files, OS mismatch, library version mismatch).

This is a way to get that programatically:

$ git rev-list -n 1 v0.11.0 b8b302cde434df8c9289a2b465406b47ebab1c2d

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Signed-off-by: Dorin Geman <dorin.geman@docker.com>

Copilot

Pull Request Overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-21T18:06:50Z

pkg/inference/scheduling/scheduler.go

+		} else {
+			if config.Format == types.FormatSafetensors {
+				if vllmBackend, ok := s.backends[vllm.Name]; ok {
+					backend = vllmBackend


The automatic backend selection logic should handle the case where vLLM is not installed. Currently, if vLLM is not in the backends map, the original backend assignment remains, but this could lead to runtime errors if that backend doesn't support safetensors. Consider adding a check to verify the selected backend supports the model format, or log a warning if vLLM is unavailable for a safetensors model.

Suggested change

backend = vllmBackend

backend = vllmBackend

} else {

s.log.Warnf("vLLM backend is not available for safetensors model '%s'; current backend '%s' may not support safetensors format", model.Name(), backend.Name())

pkg/inference/backends/vllm/vllm.go

pkg/distribution/distribution/client.go

ilopezluna · 2025-10-21T18:50:47Z

@sourcery-ai review

sourcery-ai

Hey there - I've reviewed your changes and they look great!

Blocking issues:

An action sourced from a third-party repository on GitHub is not pinned to a full length commit SHA. Pinning an action to a full length commit SHA is currently the only way to use an action as an immutable release. Pinning to a particular SHA helps mitigate the risk of a bad actor adding a backdoor to the action's repository, as they would need to generate a SHA-1 collision for a valid Git object payload. (link)

Prompt for AI Agents

Please address the comments from this code review:

## Individual Comments

### Comment 1
<location> `.github/workflows/release.yml:117` </location>
<code_context>
        uses: docker/build-push-action@v5
</code_context>

<issue_to_address>
**security (yaml.github-actions.security.third-party-action-not-pinned-to-commit-sha):** An action sourced from a third-party repository on GitHub is not pinned to a full length commit SHA. Pinning an action to a full length commit SHA is currently the only way to use an action as an immutable release. Pinning to a particular SHA helps mitigate the risk of a bad actor adding a backdoor to the action's repository, as they would need to generate a SHA-1 collision for a valid Git object payload.

*Source: opengrep*
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

ilopezluna · 2025-10-21T20:02:25Z

@sourcery-ai review

sourcery-ai

Hey there - I've reviewed your changes and they look great!

Prompt for AI Agents

Please address the comments from this code review:

## Individual Comments

### Comment 1
<location> `pkg/inference/backends/vllm/vllm.go:97` </location>
<code_context>
-	// TODO: Implement.
-	v.log.Warn("vLLM backend is not yet supported")
-	return errors.New("not implemented")
+func (v *vLLM) Run(ctx context.Context, socket, model string, modelRef string, mode inference.BackendMode, backendConfig *inference.BackendConfiguration) error {
+	if !platform.SupportsVLLM() {
+		v.log.Warn("vLLM backend is not yet supported")
</code_context>

<issue_to_address>
**issue (bug_risk):** The Run method appends both model and modelRef to --served-model-name, which may not match expected CLI usage.

If the CLI expects only one value for --served-model-name, passing both may cause errors. Please confirm the correct usage and update the argument construction if necessary.
</issue_to_address>

### Comment 2
<location> `pkg/inference/backends/vllm/vllm.go:204` </location>
<code_context>
+	return size, nil
+}
+
+func (v *vLLM) GetRequiredMemoryForModel(_ context.Context, _ string, _ *inference.BackendConfiguration) (inference.RequiredMemory, error) {
+	if !platform.SupportsVLLM() {
+		return inference.RequiredMemory{}, errors.New("not implemented")
</code_context>

<issue_to_address>
**suggestion:** GetRequiredMemoryForModel returns hardcoded values for RAM and VRAM.

Static memory values may cause incorrect resource allocation. Please add logic to estimate requirements from model metadata or configuration.

Suggested implementation:

```golang
func (v *vLLM) GetRequiredMemoryForModel(ctx context.Context, modelPath string, config *inference.BackendConfiguration) (inference.RequiredMemory, error) {
	if !platform.SupportsVLLM() {
		return inference.RequiredMemory{}, errors.New("not implemented")
	}

	// Example: Estimate memory based on model metadata
	metadata, err := inference.GetModelMetadata(modelPath)
	if err != nil {
		return inference.RequiredMemory{}, fmt.Errorf("failed to get model metadata: %w", err)
	}

	// Assume float32 weights, 4 bytes per parameter
	const bytesPerParam = 4
	paramCount := metadata.NumParameters
	ramEstimate := int64(paramCount * bytesPerParam)

	// VRAM estimate: typically similar to RAM, but may be higher for large batch sizes
	// Here we use a simple heuristic, can be improved with more info
	vramEstimate := ramEstimate

	// If config specifies batch size or precision, adjust estimates
	if config != nil {
		if config.Precision == "float16" {
			vramEstimate = int64(paramCount * 2)
			ramEstimate = int64(paramCount * 2)
		}
		if config.BatchSize > 0 {
			// Increase VRAM estimate for larger batch sizes
			vramEstimate += int64(config.BatchSize * 1024 * 1024) // 1MB per batch as a rough estimate
		}
	}

	return inference.RequiredMemory{
		RAM:  ramEstimate,
		VRAM: vramEstimate,
	}, nil
}

```

- You must implement or ensure the existence of `inference.GetModelMetadata(modelPath)` which should return a struct with at least `NumParameters` (int64).
- If your model metadata includes more details (e.g., data type, layers), you can refine the estimation logic.
- Adjust the estimation heuristics as needed for your specific models and hardware.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

pkg/inference/backends/vllm/vllm.go

sourcery-ai · 2025-10-21T20:03:53Z

pkg/inference/backends/vllm/vllm.go

+	return size, nil
+}
+
+func (v *vLLM) GetRequiredMemoryForModel(_ context.Context, _ string, _ *inference.BackendConfiguration) (inference.RequiredMemory, error) {


suggestion: GetRequiredMemoryForModel returns hardcoded values for RAM and VRAM.

Static memory values may cause incorrect resource allocation. Please add logic to estimate requirements from model metadata or configuration.

Suggested implementation:

func (v *vLLM) GetRequiredMemoryForModel(ctx context.Context, modelPath string, config *inference.BackendConfiguration) (inference.RequiredMemory, error) { if !platform.SupportsVLLM() { return inference.RequiredMemory{}, errors.New("not implemented") } // Example: Estimate memory based on model metadata metadata, err := inference.GetModelMetadata(modelPath) if err != nil { return inference.RequiredMemory{}, fmt.Errorf("failed to get model metadata: %w", err) } // Assume float32 weights, 4 bytes per parameter const bytesPerParam = 4 paramCount := metadata.NumParameters ramEstimate := int64(paramCount * bytesPerParam) // VRAM estimate: typically similar to RAM, but may be higher for large batch sizes // Here we use a simple heuristic, can be improved with more info vramEstimate := ramEstimate // If config specifies batch size or precision, adjust estimates if config != nil { if config.Precision == "float16" { vramEstimate = int64(paramCount * 2) ramEstimate = int64(paramCount * 2) } if config.BatchSize > 0 { // Increase VRAM estimate for larger batch sizes vramEstimate += int64(config.BatchSize * 1024 * 1024) // 1MB per batch as a rough estimate } } return inference.RequiredMemory{ RAM: ramEstimate, VRAM: vramEstimate, }, nil }

You must implement or ensure the existence of inference.GetModelMetadata(modelPath) which should return a struct with at least NumParameters (int64).

If your model metadata includes more details (e.g., data type, layers), you can refine the estimation logic.

Adjust the estimation heuristics as needed for your specific models and hardware.

ericcurtin · 2025-10-21T20:15:38Z

pkg/inference/backends/vllm/vllm.go

 	// Name is the backend name.
-	Name = "vllm"
+	Name    = "vllm"
+	vllmDir = "/opt/vllm-env/bin"


I am sometimes curious why we care about the directory in the golang code. We could just add this directory to the PATH in the Dockerfile.

Could be useful if we start to use artifacts from this container for example for ROCm which expect things to be in a different path:

https://hub.docker.com/r/rocm/vllm

can worry about this in a follow on PR though

Copilot AI review requested due to automatic review settings October 13, 2025 15:40

Copilot AI reviewed Oct 13, 2025

View reviewed changes

.github/workflows/release.yml Outdated Show resolved Hide resolved

Dockerfile Outdated Show resolved Hide resolved

github-advanced-security bot found potential problems Oct 13, 2025

View reviewed changes

.github/workflows/release.yml Fixed Show fixed Hide fixed

doringeman force-pushed the vllm branch from 86e81da to e1fe5d0 Compare October 13, 2025 15:42

sourcery-ai bot requested changes Oct 13, 2025

View reviewed changes

.github/workflows/release.yml Outdated Show resolved Hide resolved

ericcurtin reviewed Oct 13, 2025

View reviewed changes

.github/workflows/release.yml Outdated Show resolved Hide resolved

Copilot AI review requested due to automatic review settings October 14, 2025 06:49

doringeman force-pushed the vllm branch from e1fe5d0 to c639315 Compare October 14, 2025 06:49

Copilot AI reviewed Oct 14, 2025

View reviewed changes

Copilot AI review requested due to automatic review settings October 14, 2025 09:52

Copilot AI reviewed Oct 14, 2025

View reviewed changes

doringeman changed the title ~~build: add new vLLM CUDA image~~ Add vLLM Oct 14, 2025

ericcurtin approved these changes Oct 14, 2025

View reviewed changes

Copilot AI review requested due to automatic review settings October 14, 2025 12:14

Copilot AI reviewed Oct 14, 2025

View reviewed changes

pkg/inference/backends/vllm/vllm.go Show resolved Hide resolved

pkg/inference/backends/vllm/vllm.go Outdated Show resolved Hide resolved

pkg/inference/backends/vllm/vllm.go Outdated Show resolved Hide resolved

pkg/inference/backends/vllm/vllm.go Outdated Show resolved Hide resolved

p1-0tr reviewed Oct 14, 2025

View reviewed changes

pkg/distribution/distribution/client.go Show resolved Hide resolved

p1-0tr reviewed Oct 14, 2025

View reviewed changes

pkg/inference/backends/vllm/vllm.go Show resolved Hide resolved

p1-0tr reviewed Oct 14, 2025

View reviewed changes

pkg/inference/backends/vllm/vllm.go Show resolved Hide resolved

github-advanced-security bot found potential problems Oct 14, 2025

View reviewed changes

.github/workflows/release.yml Fixed Show fixed Hide fixed

github-advanced-security bot found potential problems Oct 14, 2025

View reviewed changes

pkg/inference/backends/vllm/vllm.go Fixed Show fixed Hide fixed

Copilot AI review requested due to automatic review settings October 14, 2025 16:46

doringeman force-pushed the vllm branch from c32f3e7 to 5b5e35f Compare October 14, 2025 16:46

Copilot AI reviewed Oct 14, 2025

View reviewed changes

pkg/distribution/internal/bundle/unpack.go Show resolved Hide resolved

pkg/inference/backends/vllm/vllm.go Outdated Show resolved Hide resolved

Copilot AI review requested due to automatic review settings October 15, 2025 19:56

Copilot AI reviewed Oct 15, 2025

View reviewed changes

pkg/inference/backends/vllm/vllm.go Outdated Show resolved Hide resolved

pkg/inference/backends/vllm/vllm.go Outdated Show resolved Hide resolved

pkg/inference/backends/vllm/vllm.go Outdated Show resolved Hide resolved

doringeman force-pushed the vllm branch from 0898cab to 4754eae Compare October 16, 2025 11:01

Copilot AI review requested due to automatic review settings October 16, 2025 12:04

Copilot AI reviewed Oct 16, 2025

View reviewed changes

pkg/inference/backends/vllm/vllm.go Outdated Show resolved Hide resolved

pkg/inference/backends/vllm/vllm.go Outdated Show resolved Hide resolved

p1-0tr approved these changes Oct 17, 2025

View reviewed changes

doringeman added 8 commits October 21, 2025 11:22

feat(cli): allow vLLM backend

c870ab0

Signed-off-by: Dorin Geman <dorin.geman@docker.com>

fix(vllm): use model bundle directory

b554107

Signed-off-by: Dorin Geman <dorin.geman@docker.com>

fix(client): remove unsupported model format checks for Safetensors o…

717c407

…n Linux This reverts commit 1ab1ead. Signed-off-by: Dorin Geman <dorin.geman@docker.com>

fix(vllm): check vllm.New error

0920524

Signed-off-by: Dorin Geman <dorin.geman@docker.com>

vllm: Linux support only

b94bd9f

Signed-off-by: Dorin Geman <dorin.geman@docker.com>

vllm: add version detection from build-time capture

2be8126

Signed-off-by: Dorin Geman <dorin.geman@docker.com>

doringeman force-pushed the vllm branch from 54e6b05 to 2be8126 Compare October 21, 2025 08:22

Copilot AI review requested due to automatic review settings October 21, 2025 09:47

Copilot AI reviewed Oct 21, 2025

View reviewed changes

feat(scheduler): automatically identify models for vLLM

0bb2f02

Signed-off-by: Dorin Geman <dorin.geman@docker.com>

doringeman force-pushed the vllm branch from 561d7ef to 0bb2f02 Compare October 21, 2025 09:56

feat(vllm): enhance argument handling for vLLM backend configuration

bf8d9e7

Copilot AI review requested due to automatic review settings October 21, 2025 10:58

ericcurtin reviewed Oct 21, 2025

View reviewed changes

fix(vllm): update model path handling to use directory for safetensors

c771e94

Copilot AI reviewed Oct 21, 2025

View reviewed changes

doringeman and others added 2 commits October 21, 2025 16:23

fix(vllm): add model ID as served-model-name

1a6f20e

Signed-off-by: Dorin Geman <dorin.geman@docker.com>

feat(vllm): add argument sanitization for safe logging

0a9893b

Copilot AI review requested due to automatic review settings October 21, 2025 18:05

Copilot AI reviewed Oct 21, 2025

View reviewed changes

sourcery-ai bot requested changes Oct 21, 2025

View reviewed changes

ilopezluna marked this pull request as ready for review October 21, 2025 19:21

ilopezluna approved these changes Oct 21, 2025

View reviewed changes

update actions versin to be pinned to a full length commit SHA

a8a6dc0

sourcery-ai bot reviewed Oct 21, 2025

View reviewed changes

ericcurtin reviewed Oct 21, 2025

View reviewed changes

		log.Warnf("Backend %s running modelID %s exited with error: %v",
		backend.Name(), modelID, err,

Add vLLM #221

Are you sure you want to change the base?

Add vLLM #221

Conversation

doringeman commented Oct 13, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for scheduler routing Safetensors models to vLLM backend

Class diagram for new and updated vLLM backend types

File-Level Changes

Possibly linked issues

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

doringeman commented Oct 13, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Oct 13, 2025 •

edited

Loading

ericcurtin Oct 21, 2025 •

edited

Loading

ericcurtin Oct 21, 2025 •

edited

Loading

ericcurtin Oct 21, 2025 •

edited

Loading

ericcurtin Oct 21, 2025 •

edited

Loading