Skip to content

Conversation

@Synray
Copy link

@Synray Synray commented Aug 15, 2023

Instead of averaging every parameter's gradient at the end, just average the output gradient at the start, reducing the number of divisions. This is equivalent because the 1/n term propagates backwards to all the gradients.

Instead of averaging every parameter's gradient at the end, just average
the output gradient at the start, reducing the number of divisions. This
is equivalent because the `1/n` term propagates backwards to all the
gradients.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant