LibUCW contains implementations of several hash algorithms.

Cryptographic hashes

MD5


typedef struct {
        u32 buf[4];
        u32 bits[2];
        byte in[64];
} md5_context;

Internal MD5 hash state. You should use it just as an opaque handle only.


void md5_init(md5_context *context);

Initialize the MD5 hashing algorithm in context.


void md5_update(md5_context *context, const byte *buf, uint len);

Push another len bytes of data from buf to the MD5 hash represented by context. You can call it multiple time on the same context without reinitializing it and the result will be the same as if you concatenated all the data together and fed them here all at once.


byte *md5_final(md5_context *context);

Call this after the last md5_update(). It will terminate the algorithm and return a pointer to the result.

Note that the data it points to are stored inside the context, so if you use it to compute another hash or it ceases to exist, the pointer becomes invalid.

To convert the hash to its usual hexadecimal representation, see mem_to_hex().


void md5_transform(u32 buf[4], const u32 in[16]);

This is the core routine of the MD5 algorithm. It takes 16 longwords of data in in and transforms the hash in buf according to them.

You probably do not want to call this one directly.


void md5_hash_buffer(byte *outbuf, const byte *buffer, uint length);

MD5 one-shot convenience method. It takes length bytes from buffer, creates the hash from them and returns it in output.

It is equivalent to this code:

md5_context c;
md5_init(&c);
md5_update(&c, buffer, length);
memcpy(outbuf, md5_final(&c), MD5_SIZE);

#define MD5_HEX_SIZE 33

How many bytes a string buffer for MD5 in hexadecimal format should have.


#define MD5_SIZE 16

Number of bytes the MD5 hash takes in the binary form.

SHA1


typedef struct {
  u32 h0,h1,h2,h3,h4;
  u32 nblocks;
  byte buf[64];
  int count;
} sha1_context;

Internal SHA1 state. You should use it just as an opaque handle only.


void sha1_init(sha1_context *hd);

Initialize new algorithm run in the hd context.


void sha1_update(sha1_context *hd, const byte *inbuf, uint inlen);

Push another inlen bytes of data pointed to by inbuf onto the SHA1 hash currently in hd. You can call this any times you want on the same hash (and you do not need to reinitialize it by sha1_init()). It has the same effect as concatenating all the data together and passing them at once.


byte *sha1_final(sha1_context *hd);

No more sha1_update() calls will be done. This terminates the hash and returns a pointer to it.

Note that the pointer points into data in the hd context. If it ceases to exist, the pointer becomes invalid.

To convert the hash to its usual hexadecimal representation, see mem_to_hex().


void sha1_hash_buffer(byte *outbuf, const byte *buffer, uint length);

A convenience one-shot function for SHA1 hash. It is equivalent to this snippet of code:

sha1_context hd;
sha1_init(&hd);
sha1_update(&hd, buffer, length);
memcpy(outbuf, sha1_final(&hd), SHA1_SIZE);

void sha1_hmac(byte *outbuf, const byte *key, uint keylen, const byte *data, uint datalen);

SHA1 HMAC message authentication. If you provide key and data, the result will be stored in outbuf.


typedef struct {
  sha1_context ictx;
  sha1_context octx;
} sha1_hmac_context;

The HMAC also exists in a stream version in a way analogous to the plain SHA1. Pass this as a context.


#define SHA1_HEX_SIZE 41

#define SHA1_BLOCK_SIZE 64

SHA1 splits input to blocks of this size.

Common usage

There are two ways you can use the hashing routines.

  • Single-shot interface. If you have an in-memory buffer of the whole message you want to hash, you can use this.

    char *message = "Hello world";
    byte output[MD5_SIZE];
    md5_hash_buffer(output, message, strlen(message));
  • Multi-shot interface. If you have the message scattered in many buffers or you get it by parts, you do not need to concatenate the parts together.

    byte buffer[MAX_BUFFER];
    uint buffer_len;
    md5_context c;
    md5_init(&c);
    while(buffer_len = get_chunk(buffer, MAX_BUFFER)) {
      md5_update(&c, buffer, buffer_len);
    }
    byte output[MD5_SIZE];
    memcpy(output, md5_final(&c), MD5_SIZE);

SHA1 has the same interface, so the same two ways apply.

See also mem_to_hex().

Checksums

Their purpose is checking against random data changes, hardware failures and alike. They are not to be used against aimed attacks.

Adler-32

The Adler-32 checksum is documented in the compression capter.

CRC-32

32-bit Cyclic Redundancy Check with the polynomial suggested by Castagnoli et al.: Optimization of Cyclic Redundancy-Check Codes with 24 and 32 Parity Bits", IEEE Trans. on Communications, Vol. 41, No. 6, 1993.

The interface is similar to the one we use for the cryptographic hashes.


typedef struct crc32_context {
  u32 state;
  void (*update_func)(struct crc32_context *ctx, const byte *buf, uint len);
} crc32_context;

Internal CRC calculator context. You should use it just as an opaque handle only.


void crc32_init(crc32_context *ctx, uint crc_mode);

Initialize new calculation of CRC in a given context. crc_mode selects which algorithm should be used.


enum crc_mode {
  CRC_MODE_DEFAULT,             /* Default algorithm (4K table) */
  CRC_MODE_SMALL,               /* Optimize for small data (1K table) */
  CRC_MODE_BIG,                 /* Optimize for large data (8K table) */
  CRC_MODE_MAX,
};

Algorithm used for CRC calculation. The algorithms differ by the amount of precomputed tables they use. Bigger tables imply faster calculation at the cost of an increased cache footprint.


static inline void crc32_update(crc32_context *ctx, const byte *buf, uint len);

Feed len bytes starting at buf to the CRC calculator.


static inline u32 crc32_final(crc32_context *ctx);

Finish calculation and return the CRC value.


u32 crc32_hash_buffer(const byte *buf, uint len);

A convenience one-shot function for CRC. It is equivalent to this snippet of code:

crc32_context ctx;
crc32_init(&ctx, CRC_MODE_DEFAULT);
crc32_update(&ctx, buf, len);
return crc32_final(&ctx);

Non-cryptographic hashes

They are usually used to identify values in hash tables.

All these functions expect to be moduled by the size of a hash table. The size should be a prime number (it gives better distribution).

String hashes


uint str_len_aligned(const char *str) PURE;

Get the string length (not a really useful hash function, but there is no better place for it). The string must be aligned to sizeof(uint). For unaligned see str_len().


uint hash_string_aligned(const char *str) PURE;

Hash the string. The string must be aligned to sizeof(uint). For unaligned see hash_string().


uint hash_block_aligned(const byte *buf, uint len) PURE;

Hash arbitrary data. They must be aligned to sizeof(uint). For unaligned see hash_block().


uint str_len(const char *str) PURE;

Get the string length. If you know it is aligned to sizeof(uint), you can use faster str_len_aligned().


uint hash_string(const char *str) PURE;

Hash the string. If it is aligned to sizeof(uint), you can use faster hash_string_aligned().


uint hash_block(const byte *buf, uint len) PURE;

Hash arbitrary data. If they are aligned to sizeof(uint), use faster hash_block_aligned().


uint hash_string_nocase(const char *str) PURE;

Hash the string in a case insensitive way. Works only with ASCII characters.

Integer hashes

We hash integers by multiplying by a reasonably large prime with few ones in its binary form (to give the compiler the possibility of using shifts and adds on architectures where multiplication instructions are slow).


static inline uint CONST hash_u32(uint x);

Hash a 32 bit unsigned integer.


static inline uint CONST hash_u64(u64 x);

Hash a 64 bit unsigned integer.


static inline uint CONST hash_pointer(void *x);

Hash a pointer.