AlotOfReading 8 days ago

Unrelated to the LLM discussion, but a hash function function is the wrong construction for the accumulator solution. The hashing part increases the probability that A and B have a collision that leads to a false negative here. Instead, you want a random invertible mapping, which guarantees that no two pointers will "hash" to the same value, while distributing the bits. Splitmix64 is a nice one, and I believe the murmurhash3 finalizer is invertible, as well as some of the xorshift RNGs if you avoid the degenerate zero cycle.

1
antirez 8 days ago

Any Feistel Network has the property you stated actually, and this was one of the approaches I was thinking using as I can have the seed as part of the non linear transformation of the Feistel Network. However I'm not sure that this actually decreases the probability of A xor B xor C xor D being accidentally zero, bacause the problem with pointers is that they may change only for a small part. When you using hashing because of avalanche effect this is going a lot harder since you are no longer xoring the pointer structure.

What I mean is that you are right assuming we use a transformation that still while revertible has avalanche effect. Btw in practical terms I doubt there are practical differences.

AlotOfReading 8 days ago

You can guarantee that the probability is the theoretical minimum with a bijection. I think that would be 2^-N since it's just the case where everything's on a maximum length cycle, but I haven't thought about it hard enough to be completely certain.

A good hash function intentionally won't hit that level, but it should be close enough not to matter with 64 bit pointers. 32 bits is small enough that I'd have concerns at scale.