🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

Best hashing functions for bool, char, int8_t, int16_t and all other simple types?

Started by
7 comments, last by LorenzoGatti 3 years, 1 month ago

Hi everybody,
I'm looking more in depth about best hash functions for learning purpose mostly.

For bool type, I saw STL is returning 0 or 1 but Java returns 1231 or 1237. The Java approach looks more solid since 1231 and 1237 are prime numbers and also large numbers so the collision would be reduced and the combination of hash would be better. Would you recommend to use the Java approach for boolean?

For the other types, char, int8, uint8, int16, uint16, int32, uint32, int64 and uint64 what would be the best approach of hash to have the best result avoiding the maximum of collison and having proper combination result?

For string the best known is ELF still nowadays if I'm correct.

Thanks a lot!

Advertisement

First, you should define quantitatively what you are looking for in a hash function. Only then will you be able to compare alternatives.

If you just want to avoid collisions, CRC functions are pretty fast and remarkably good.

FNV32b is used in common game engines like Unreal, together with CRC and FastHash. I prefer FNV for everything. Since every type, if it is bool or something multibyte, is just an array of bytes, FNV works if you pass the data type into it as byte array of certain size

All the mentioned types are numbers. Returning the number itself (0 or 1 for boolean values) as its hash if it is fits in the desired hash length is perfectly correct; in particular, it is trivially collision-free and the result fits in the minimum possible number of bits (e.g. 1 bit for booleans, not 11 if their hashes are 1231 or 1237) without unnecessary folding.

The interesting part of hashing is combining hashes: here prime and/or large numbers could become useful, but there are still completely incompatible potential design goals.

For instance, usually flipping any bit of the input should flip half the bits of the hash because the application is a hash table and equidistribution of hash values over an unknown distribution of inputs is needed to make it perform well, but sometimes you want an hash that depends on the “latest” part of the input only (and which can be computed cheaply by updating the previous hash for new inputs or by accessing a limited portion of the input).

Then there are cryptographic properties: do you want to spend more complication to handle keys, nonces, padding and other requirements, and more computation to make searches for collisions and preimages difficult?

Omae Wa Mou Shindeiru

Thanks a lot for the answers.
This is mostly for learning purpose about what would be the best default hash functions for all default types.

If you want “best” you have to define what that means to you first. There is no universal accepted, global “best” in almost everything. In general, all existing solutions to a problem are “best” in some way (and less good in another way), while each one is different.

Did you read the wikipedia page? gives a nice glimpse of what hashing is about and what things can be considered. https://en.wikipedia.org/wiki/Hash_function​​

For a more thorough discussion, I'd start with “The art of computer programming” by D. Knuth, vol 3, according to wikipedia.

Alundra said:
what would be the best default hash functions

Like I already mentioned, common game engines use FNV32b, CRC and FastHash so they could be considered “best”, at least for game development ¯\_(ツ)_/¯

Another aspect of hash functions that varies drastically by application is the size of the hash function output:

  • 32 bits (occasionally 16 or 64) for a general purpose hashtable with a reasonable number of elements; more would be redundant, less would be insufficient. The typical application of popular inexpensive and high quality constructions, like the mentioned FNV family, that are finely tuned to their state and output size,
  • Limited but significantly larger sizes to meet the demands of hardness in cryptography or low collision probability in luck-reliant schemes (with very different constructions in the two cases).
  • Very large sizes, particularly to initialize or perturb pseudorandom number generators.

Omae Wa Mou Shindeiru

This topic is closed to new replies.

Advertisement