<<< Previous chapter

6. Hashing

 

Storing a tree structure in computer memory can be done in various ways:

 

For use with the CTW algorithm, the hashing method is best suitable. The CTW hashing implementation is simple and fast:

 

                       offset = hash(symbol) xor ((phase+1) · 2) , with 0 ≤ phase ≤ 7 .                     (6.1)


Because we actually store different trees for each phase in the tree array, the phase is also used in calculating this key because this decreases the probability that two trees conflict with each other. This is explained in more detail in paragraph 7.2. The value of hash(symbol) is retrieved from the hash table, which is just a permutation table with 256 pseudo-random chosen values.

 

Using hashing appeared to be a good choice. It decreased the amount of required memory enormously. However a drawback still is that it is difficult to estimate how large the array in which the values are stored should be for a certain file. This size mostly depends on the characteristic of the input file and the maximum tree depth. The total array has to be allocated before the actual encoding/decoding process, and at that time the characteristic of the input file is not known yet.

 

 

More information can be found in [EIDMA ch. 4 par. 3]. Hashing is implemented in ctwtree.c.


Next chapter >>>