[WebGPU] fp16 nanochat produces NaNs (CPU works fine)

### Describe the issue

When running the fp16 or q4f16 variants of [onnx-community/nanochat-d32-ONNX](https://huggingface.co/onnx-community/nanochat-d32-ONNX/blob/main/onnx/model_q4f16.onnx), the model fails to produce valid output. This issue is present on both JSEP and Native WebGPU EPs; WASM/CPU works fine. This probably points to some overflow issue, but it would be nice to get to the bottom of it.

### To reproduce

1. Load model from https://huggingface.co/onnx-community/nanochat-d32-ONNX/blob/main/onnx/model_q4f16.onnx (weights stored at https://huggingface.co/onnx-community/nanochat-d32-ONNX/blob/main/onnx/model_q4f16.onnx_data)
2. Run with Transformers.js or onnxruntime-web
3. Observe invalid output

### Urgency

Blocks nanochat from running on q4f16. q4 works fine, so I'll use that in the meantime.

### ONNX Runtime Installation

Built from Source

### ONNX Runtime Version or Commit ID

latest

### Execution Provider

'webgpu' (WebGPU)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WebGPU] fp16 nanochat produces NaNs (CPU works fine) #26367

Describe the issue

To reproduce

Urgency

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

Execution Provider

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[WebGPU] fp16 nanochat produces NaNs (CPU works fine) #26367

Description

Describe the issue

To reproduce

Urgency

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

Execution Provider

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions