Skip to content

Conversation

@SeanTUT
Copy link
Contributor

@SeanTUT SeanTUT commented Oct 28, 2025

  • Applied Zig-style naming conventions to functions and values
    • Remove redundant namespacing
      • std.hash.autoHash -> std.hash.auto
      • std.hash.autoHashStrat -> std.hash.autoStrat
      • std.hash.HashStrategy -> std.hash.Strategy
    • Correct capitalization
      • std.hash.HashStrategy.Shallow -> std.hash.Strategy.shallow
      • std.hash.HashStrategy.Deep -> std.hash.Strategy.deep
      • std.hash.HashStrategy.DeepRecursive -> std.hash.Strategy.deep_recursive
    • All of the old identifiers are still available as deprecated aliases.
  • Bug fix: Slices are now detected when nested within error unions, optionals, and arrays. As a consequence of this, std.hash.auto may result in a compile error in places where it previously did not. The previous behavior was a bug, but this change is still technically breaking.
  • Optimization: In general, auto_hash.zig avoids copying large values, preferring to hash them in place. Moreover, auto_hash.zig is generally smarter about directly using calling hasher.update on values which have a unique representation. For instance, slices and arrays of values with unique representations will undergo a direct @ptrCast into a slice of bytes and hash all elements at once rather than doing this individually for every element in the span.
  • Cleaned up the implementation and tests, applying more current style and language features where appropriate

SeanTUT and others added 4 commits October 27, 2025 19:53
* Applied Zig-style naming conventions to functions and values
  * Remove redundant namespacing
    * `std.hash.autoHash` -> `std.hash.auto`
    * `std.hash.autoHashStrat` -> `std.hash.autoStrat`
    * `std.hash.HashStrategy` -> `std.hash.Strategy`
  * Correct capitalization
    * `std.hash.HashStrategy.Shallow` -> `std.hash.Strategy.Shallow`
    * `std.hash.HashStrategy.Deep` -> `std.hash.Strategy.Deep`
    * `std.hash.HashStrategy.DeepRecursive` -> `std.hash.Strategy.deep_recursive`
  * All of the old identifiers are still available as deprecated aliases.
* Bug fix: Slices are now detected when nested within error unions, optionals,
  and arrays. As a consequence of this, `std.hash.auto` may result in
  a compile error in places where it previously did not. The previous
  behavior was a bug, but this change is still technically breaking.
* Optimization: In general, `auto_hash.zig` avoids copying large values,
  preferring to hash them in place. Moreover, `auto_hash.zig` is
  generally smarter about directly using calling `hasher.update` on
  values which have a unique representation. For instance, slices and
  arrays of values with unique representations will undergo a direct
  `@ptrCast` into a slice of bytes and hash all elements at once rather
  than doing this individually for every element in the span.
* Cleaned up the implementation and tests, applying more current style
  and language features where appropriate
This file made use of an auto hash map of SemanticVersions. Due to the
recent fixes in `auto_hash.zig`, this is now a compile error, as
`SemanticVersion` contains slices. This oversight previously went
undetected, and simply hashed the slice slices by value if they were
present. With this fix, the slices are explicitly hashed deeply.
@SeanTUT
Copy link
Contributor Author

SeanTUT commented Oct 28, 2025

As a consequence of this, std.hash.auto may result in a compile error in places where it previously did not. The previous behavior was a bug, but this change is still technically breaking.

For example, in src/Builtin.zig, the hash function performs an auto hash on an OS version range. This includes SemanticVerions, which would be silently hashed by value rather than alerting the user of the ambiguity. Additionally, src/lib/glibc.zig included use of an AutoArrayHashMap of SemanticVersions, with the same issue taking effect.

Apologies for not not testing these changes locally, I have been unable
to build Zig with LLVM, and thus have to rely on the CI to test building
the compiler with LLVM enabled.
@SeanTUT
Copy link
Contributor Author

SeanTUT commented Oct 28, 2025

Upon closer inspection, it looks like the CI is failing due to the bug which will be addressed in #25713. A SemanticVersion with a stale reference to a stack buffer for its pre and build fields is hashed as part of the key for the builtin_modules map.
EDIT: Now that it is merged, this should just work. All bugs/CI failures past this point are my fault!

@SeanTUT
Copy link
Contributor Author

SeanTUT commented Oct 29, 2025

I kept the logic for how values are hashed mostly the same, but there are a few details that we may want to address in follow up issues:

  • When an optional is null, it doesn't update the hasher at all. This is usually fine, but in the case of optional optionals, there would be no difference when hashing @as(??T, null) and @as(??T, @as(?T, null)).
    • Furthermore, types like void or [0]T do not update the hasher, meaning that @as(?void, {}) and @as(?void, null) will result in the same hash.
    • The same issue applies to error unions. @as(anyerror!anyerror!void, anyerror.Foo) hashes the same as @as(anyerror!anyerror!void, @as(anyerror!void, error.Foo)), although this is admittedly a more contrived example
    • For these use cases, it might be a good idea to add key == null or isError(key) to the hash, which would distinguish all of these values.
  • Comptime struct fields are not included in hashes. This is consistent with the behavior of hashing structs by their bytes when possible, and it also doesn't make much sense to include data which never changes in the hash (before the refactor, this behavior was inconsistent; structs that had a unique representation omitted comptime fields, and structs that did not included them). This does present one problem: anonymous tuples. In some places in the compiler, I noticed that when hashing multiple values, rather than make multiple calls to std.hash.auto / std.hash.autoStrat, a single call would be made on a tuple literal containing all of the data to be hashed. When these are all runtime values, that works fine (in fact, when the tuple ends up having a unique representation, it's even able to hash everything at once with a single call to hasher.update). However, since comptime-known fields of anonymous struct literals, including tuple literals, are represented with comptime fields, this means that any comptime-known elements in an anonymous tuple literal are excluded from the hash.
    • I'm not sure what the best way to handle this is. On one hand, we could include comptime fields in hashes, but this is almost always going to be redundant, and then if we want anonymous struct literals to hash the same regardless of if their fields are comptime, then we can pretty much never optimize the hashing of a struct into a single hasher.update call (since a struct with comptime fields will be hashed field-by-field, we would have to always hash structs that way to maintain parity with a struct that does have comptime fields, even the struct we're hashing has no comptime fields). On the other hand, we could leave the behavior as it is now, change all of the call sites to not hash tuple literals, and discourage this usage, but it would be inconvenient and cause a footgun.`
    • My current proposal is this: do not hash comptime fields, but introduce a comptime T: type parameter to std.hash.auto / std.hash.autoStrat such that when you're hashing a type with comptime fields, you have to explicitly specify that type, with no room to accidentally omit data by making it a comptime field. Since this change would be breaking, I have not included it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant