-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Open
Labels
ep:WebGPUort-web webgpu providerort-web webgpu providerplatform:webissues related to ONNX Runtime web; typically submitted using templateissues related to ONNX Runtime web; typically submitted using template
Description
Describe the issue
When running the fp16 or q4f16 variants of onnx-community/nanochat-d32-ONNX, the model fails to produce valid output. This issue is present on both JSEP and Native WebGPU EPs; WASM/CPU works fine. This probably points to some overflow issue, but it would be nice to get to the bottom of it.
To reproduce
- Load model from https://huggingface.co/onnx-community/nanochat-d32-ONNX/blob/main/onnx/model_q4f16.onnx (weights stored at https://huggingface.co/onnx-community/nanochat-d32-ONNX/blob/main/onnx/model_q4f16.onnx_data)
- Run with Transformers.js or onnxruntime-web
- Observe invalid output
Urgency
Blocks nanochat from running on q4f16. q4 works fine, so I'll use that in the meantime.
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
latest
Execution Provider
'webgpu' (WebGPU)
Metadata
Metadata
Assignees
Labels
ep:WebGPUort-web webgpu providerort-web webgpu providerplatform:webissues related to ONNX Runtime web; typically submitted using templateissues related to ONNX Runtime web; typically submitted using template