Skip to content

[WebGPU] fp16 nanochat produces NaNs (CPU works fine) #26367

@xenova

Description

@xenova

Describe the issue

When running the fp16 or q4f16 variants of onnx-community/nanochat-d32-ONNX, the model fails to produce valid output. This issue is present on both JSEP and Native WebGPU EPs; WASM/CPU works fine. This probably points to some overflow issue, but it would be nice to get to the bottom of it.

To reproduce

  1. Load model from https://huggingface.co/onnx-community/nanochat-d32-ONNX/blob/main/onnx/model_q4f16.onnx (weights stored at https://huggingface.co/onnx-community/nanochat-d32-ONNX/blob/main/onnx/model_q4f16.onnx_data)
  2. Run with Transformers.js or onnxruntime-web
  3. Observe invalid output

Urgency

Blocks nanochat from running on q4f16. q4 works fine, so I'll use that in the meantime.

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

latest

Execution Provider

'webgpu' (WebGPU)

Metadata

Metadata

Assignees

No one assigned

    Labels

    ep:WebGPUort-web webgpu providerplatform:webissues related to ONNX Runtime web; typically submitted using template

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions