Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 9 additions & 9 deletions tests/pipelines/test_pipelines_common.py
Original file line number Diff line number Diff line change
Expand Up @@ -1438,19 +1438,19 @@ def test_save_load_float16(self, expected_max_diff=1e-2):
with tempfile.TemporaryDirectory() as tmpdir:
pipe.save_pretrained(tmpdir)
pipe_loaded = self.pipeline_class.from_pretrained(tmpdir, torch_dtype=torch.float16)
for component in pipe_loaded.components.values():
for name, component in pipe_loaded.components.items():
if hasattr(component, "set_default_attn_processor"):
component.set_default_attn_processor()
pipe_loaded.to(torch_device)
if hasattr(component, "dtype"):
self.assertTrue(
component.dtype == torch.float16,
f"`{name}.dtype` switched from `float16` to {component.dtype} after loading.",
)
if hasattr(component, "half"):
# Although all components for pipe_loaded should be float16 now, some submodules still use fp32, like in https://github.com/huggingface/transformers/blob/v4.57.1/src/transformers/models/t5/modeling_t5.py#L783, so we need to do the conversion again manally to align with the datatype we use in pipe exactly
component = component.to(torch_device).half()
Comment on lines +1449 to +1451
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem right at all. torch_dtype should be able to take care of it. I just ran it on my GPU for SD and it worked fine.

Copy link
Contributor Author

@kaixuanliu kaixuanliu Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @sayakpaul , I tested on A100, and when I print pipe_loaded.text_encoder.encoder.block[0].layer[1].DenseReluDense.wo.weight.dtype in L1455 , it returns torch.float32, not torch.float16, and the max_diff in L1456 is np.float16(0.0004883). When we apply this PR to align excatly with the behavior in pipe, the max_diff is 0. I think it's better to adjust the test case to make the output comparison of pipe and pipe_loaded apple to apple. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is torch_dtype in from_pretrained() should be enough for the model to be in fp16. Setting it with half() after loading the model in the FP16 torch_dtype seems erroneous to me.

I also ran the test on an A100, and it wasn't a problem. So, I am not sure if this test fix is correct at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I printed pipe_loaded.text_encoder.encoder.block[0].layer[1].DenseReluDense.wo.weight.dtype after pipe_loaded = self.pipeline_class.from_pretrained(tmpdir, torch_dtype=torch.float16), and it returns torch.float32, it is root caused in L783, so I manualy add .half() to pipe_loaded, although it looks a bit wierd... On A100, the tolerance value is OK, but I think from the fundamentals perspective, the output from pipelines loaded from former saved should be exactly the same, that is the max_diff should be 0, right?

pipe_loaded.set_progress_bar_config(disable=None)

for name, component in pipe_loaded.components.items():
if hasattr(component, "dtype"):
self.assertTrue(
component.dtype == torch.float16,
f"`{name}.dtype` switched from `float16` to {component.dtype} after loading.",
)

inputs = self.get_dummy_inputs(torch_device)
output_loaded = pipe_loaded(**inputs)[0]
max_diff = np.abs(to_np(output) - to_np(output_loaded)).max()
Expand Down