<<< Previous chapter

2. Probability estimation

 

In every node along the path in a context tree, the probability on the next symbol is estimated based on the counts of zeros and ones in this node. This is an estimation of the probability that a memoryless source (with an unknown parameter θ) generates Xt as its next symbol. In this implementation of CTW, there are two different estimators to choose from: the Krichevski-Trofimov (KT) estimator and the zero-redundancy estimator (default).

 

2.1. Krichevski-Trofimov estimator

 

To estimate the probability on the next symbol for a memoryless source with an unknown parameter θ (i.e. the probability of generating a 1), the Krichevski-Trofimov estimator can be used. The KT-distribution is defined by the following conditional probability:

 

                                ,                      (2.1)

 

where a is the number of zeroes and b is the number of ones in the sequence , which was generated by this source before. In this CTW implementation, the estimated conditional probabilities are represented by their logarithms for easier multiplying and dividing (see chapter 3 about weighting arithmetic). The logarithmic representations of the estimated conditional probabilities are (with all logarithms base 2):

 

                        (2.2)

                         (2.3)

 

In the actual implementation, these logarithms can be found in a log-table, which is explained in chapter 3 about weighting arithmetic.

 

To analyze how well this estimator performs; the "parameter redundancy" can be calculated, which is the part of the total redundancy that results from not knowing the parameter θ of the memoryless source. This redundancy can be calculated by first deriving an expression for the estimated block probability Pe(a,b), which is the estimated probability of a certain sequence with a ones and b zeroes. For  a + b 1, this estimated block probability satisfies:

 

                                      ,                             (2.4)

 

which is proved in [EIDMA ch. 3 par. 10]. Using this lowerbound on Pe(a,b), the parameter redundancy can be bounded by:

 

                          (2.5)

 

When a source generates zeroes or ones only, the sequences generated by this source do not contain any information and therefore encoding such a sequence wouldn’t take any bits in the ideal case. However, the parameter redundancy introduced by the KT-estimator (equation 2.5) grows logarithmic with the number of zeroes (or ones) in this case. This means the parameter redundancy gets quite large if the number of generated zeroes (or ones) gets large.

 

2.2. Zero-redundancy estimator

 

Instead of the KT-estimator, the zero-redundancy (ZR) estimator could be used to reduce the parameter redundancy for a source that generates zeroes or ones only. The ZR-estimator is defined as:

 

                                                       (2.6)

 

where Pe(a,b) are the block probabilities estimated by the KT-estimator. Now, the parameter redundancy for a sequence generated by a source that generates zeroes or ones only, can be bounded by:

 

                                                                          (2.7)

 

This means that the parameter redundancy for such a source is never larger than two bits (for any number of zeroes or ones), which is far better than for the KT-estimator. The parameter redundancy for a source which generates zeroes as well as ones, can now be bounded by:

 

                                          (2.8)

 

So for a sequence containing zeroes as well as ones, the parameter redundancy is only one bit larger than for the KT-estimator (equation 2.5). Since a lot of byte-values never occur in normal text, many of the estimators in CTW only estimate probabilities on sequences which contain zeroes or ones only. Therefore the zero-redundancy estimator usually gives a better result than the KT-estimator, because of these non-occurring symbols.

 

This CTW implementation uses conditional probabilities. The conditional probability on the next symbol for a > 0 and b > 0 (i.e. zeroes as well as ones have been generated until now), is the same as for the Krichevski-Trofimov estimator. If only zeroes have been generated so far, the conditional probability that the next symbol will be "one" can be written as:

 

                        (2.9)

 

Similarly, the conditional probability that the next symbol will be "zero" is:

 

                                              (2.10)

 

If only ones have been generated so far, the same equations hold (of course with the 0’s and 1’s and the a’s and b’s interchanged). These conditional probabilities will also be represented by their logarithms in this implementation. To be able to calculate these conditional probabilities, the estimated block probabilities from the Krichevski-Trofimov estimator Pe(a,0) are required. These can be calculated sequentially (for a = 1, 2, ...), using:

 

                                                                                      (2.11)

 

The same goes for Pe(0,b) for b = 1, 2, .... In the actual implementation the conditional probabilities (from equations 2.9 and 2.10) are stored in two tables for all possible values of a and b, which is further explained in chapter 3 about weighting arithmetic.

 

 

The probability estimators are implemented in ctwmath.c. The log-tables that are used for these estimations can be found in ctwmath-tables.h and these are generated using ctw_gentables.c. More information on the Krichevski-Trofimov estimator can be found in [EIDMA ch. 3 par. 5] and [EIDMA ch. 3 par. 10]. More information on the zero-redundancy estimator can be found in [EIDMA ch. 5 par. 5].


Next chapter >>>