Natural Thoughts: Optimized Huffman Coding

Introduction :
Huffman coding is an entropy encoding algorithm used for lossless data compression. The term refers to the use of a variable length code table for encoding a source symbol (such as a character in a file) where the variable -length code table has been derived in a particular way based on the estimated probability of occurrence for each possible value of the source symbol. It was developed by David A. Huffman. Huffman coding uses a specific method for choosing the representation for each symbol, resulting in a pre Huffman coding uses a specific method for choosing the representation for each symbol, resulting in a prefix-free code (that is, the bit string representing some particular symbol is never a prefix of the bit string representing any other symbol) that expresses the most common characters using shorter strings of bits than are used for less common source symbols [1].

There are various ways to decode the Huffman code either by arithmetic operation or look up table. Each method has various advantages over to one like arithmetic operation takes less memory space, less memory reference but arithmetic processor must be capable to handle that mathematical operation efficiently while the look up table method is fast but require more space in memory and need many memory references.Here optimized ways of decoding the Huffman code by look up table method are explained.

The basic properties of Huffman code is prefix free code and the more frequent symbols has less codelength as compared the other symbols which is used less frequently. In this manner Huffman code optimized the memory requirement but this advantage become problem while decoding by look up table method. As the symbols having codelength less than the bits read to be decoded the symbols have multiple entries in the table that we have to take care while decoding. The example is taken for Huffman decoding is shown in table1 [2].

Table 1 Symbols and codewords
Char(Symbols) Codes(in Binary)
A ----> 010
B ----> 0000
C ----> 0001
D ----> 011
E ----> 10
F ----> 0010
G ----> 0011
H ----> 11

While decoding by the table look up method there will be 16 entries of symbols. The table itself will tell you what symbol you decoded and how many bits you used. The table will be as follows:-

index 0000 would contain B,4
index 0001 C,4
...
index 1000 to 1011 would all contain E,2
index 1100 to 1111 would all contain H,2

Table 2 Table representation of codewords
---------------------------------------------------------
| 0000(B,4) | 0001(C,4) | 0010(F,4) | 0011(G,4)|
| 0100(A,3) | 0101(A,3) | 0110(D,3) | 0111(D,3)|
| 1000(E,2) | 1001(E,2) | 1010(E,2) | 1011(E,2) |
| 1100(H,2) | 1101(H,2) | 1110(H,2) | 1111(H,2) |
----------------------------------------------------------

And the input bitstream to be decoded will be 010 0000 0001 011 10 0010 0011 11 0000 10,so the output symbols should be ‘ABCDEFGHBE’. There are various other practical ways of huffman coding implementation just refer the links given below. But I wish to discuss only the efficient way of implementation.The various ways of Huffman decoding are as follows:-

1)Single level Table Decoding
When we carefully observe the code we find that we can decode the codes by single table.

Table 4.5 Single level table
Table Index Symbol Codelength
0 ----> B ----> 4
1 ----> C ----> 4
2 ----> F ----> 4
3 ----> G ----> 4
4 ----> A ----> 3
5 ----> A ----> 3
6 ----> D ----> 3
7 ----> D ----> 3
8 ----> E ----> 2
9 ----> E ----> 2
10 ----> E ----> 2
11 ----> E ----> 2
12 ----> H ----> 2
13 ----> H ----> 2
14 ----> H ----> 2
15 ----> H ----> 2

Here we have made a table having two values the symbol and codelength of that symbol with each table index. Now the algorithm is as follows:
1.Each time we will read the fixed number of bits from the input bitstream; that is equal to maximum codelength of all the codewords and the bit pointer is at location 0.
2.Now the bits read will be used as the index for the table, which will directly give the output symbol for that code and corresponding codelength.
3.Now the bit pointer will be updated as bit pointer + = codelength.
For the given input bitstream the steps will be
1.As per table 1 the maximum codelength will be 4, so each time we will read 4 bits. Say for start it will be 0100. The bit pointer is at location 0.
2.So by using it as table index the output symbol will be ‘A’ and the corresponding codelength will be 3.
3.Now the new bit pointer location will be bit pointer +=3.
4.Now the next 4 bits will be 0000 and process will continue so on till the end of bitstream.

The basic drawback of this method is the table size as large memory space is required. It is suitable only for small codelength codewords not for the big codelength like CAVLC codes of H.264/AVC.

2)Efficient Automated Merged Table:
This is the efficient/optimized way of implementation.Actually before this , I tried various other algorithms but finally come to this implementation, and implemented in my project work on that huffman decoding was taking around 30% of overall computational complexity.

Successive number of bits to read can be automated i.e. bits to be read is decided by the algorithm and the table is arranged accordingly. For decoding we need the value of how many maximum numbers of bits to be read as Max Bit Read. The table contents the symbols and the corresponding codelengths but with some modifications. Now the algorithm will be as follows:

1.Firstly read the n number of bits is equal to Max Bit Read so Bit Read = Max Bit Read.
2.Use the value as table index to get symbol and the corresponding codelength. If the codelength is not equal to -1(Flag) means the symbol is valid, and the bit pointer is modified accordingly.
3.If the codelength is equal to -1 then more bits are required to be read. The corresponding symbol will be considered as table offset value for table index. And next number of bits to be read is calculated as Bit Read = (Bit Read+1) >>1, now the Bit Read number of bits will be read. And the value is used as table index as table index = table offset + value, to decode the codelength for corresponding table index.
4.If the codelength is not equal to -1 then we will precede as step 2 otherwise step 3 until we get proper codelength of particular symbol.

The example for this method for our input bitstream will be explained here. For that we can use table 3 with the Max Bit Read equal to 2.

Table 3 Automated Merged table for Max Bit Read =2
Table Index Symbol Codelength
0 ----> 4 ----> -1
1 ----> 6 ----> -1
2 ----> E ----> 2
3 ----> H ----> 2
4 ----> 8 ----> -1
5 ---->10 ----> -1
6 ----> A ----> 3
7 ----> D ----> 3
8 ----> B ----> 4
9 ----> C ----> 4
10 ----> F ----> 4
11 ----> G ----> 4

1.In the very first read the Max Bit Rate = Bit Read number of bits. So here Max Bit Read is 2, we will read first two bits that are 1 (01).
2.Use the value as table index; we get the symbol as 6 and codelength -1. As the codelength is -1 hence we have to read some more bits to decode and 6 will be table offset.
3.Calculate the next Bit Read as Bit Read = (Bit Read+1)>>1 = (2+1)>>1 =1.Now next 1 bit will be read and the value is 0 (0).
4.Now again calculate the table index as table index = table offset + value =6+0 =6.
5.Again table is read with new table index (6) and corresponding codelength is read that is 3 i.e. the codelength and corresponding symbol ‘A’ are valid now no need to read more bits. Modify the bit pointer accordingly.
6.Now read next 2 (Max Bit Read) bits to decode the next codeword that are 0 (00).
7.When we use this as table index; we get the symbol as 4 and the codelength equal to -1 hence we need to read more bits to decode the next codeword.
8.Calculate the next Bit Read = (2+1)>>1 = 1. Hence the next 1 bit value is 0 (0).
9.Now again calculate the table index as table index = 4+0 = 4.
10.Table is read with table index the corresponding codelength is again -1 and the symbol is 8. So we need more bits to read and now table offset is equal to 8.
11.Calculate the next Bit Read = (1+1)>>1 =1. Hence the next 1 bit value is 0(0).
12.Again calculate the table index as table index = 8+0 =8.
13.Use table index, we get codelength is 4 and the symbol is ‘B’ which is now valid symbol. Modify the bit pointer accordingly.
14.Now follow from the step 6 and so on until the end of bitstream.

This method is most efficient method is and independent of Max Bit Read i.e. this method is applicable for our example of Max Bit Read = 1, 2, 3, 4.

Note:In the all table look up methods we have to take care while decoding the last bits whenever Max Bit Read is greater than the last remaining bits to read. So insert 0’s towards LSB and then read the bitstream.

Conclusion:
The algorithm discussed here is the generalized efficient/optimized huffman decoding as it takes less memory area as compared to single level table decoding method (just apply the algo for max codelength of 16/24/32 bits, you will get huge memory saving) with keeping the accessing the code property of single level table. In below I am putting the quantitative analysis for the memory and complexity.

Table 4 Performance Analysis
1)Single level table 2^N (Memory consumption) , 2^N search for Worst case.

2)Automated merged table 2k + A.2k/2+ B. 2k/4 +… (Memory consumption), 2^k for Good case and 2^(k+k/2+k/4…) for Worst case(search complexity).

Where N is codelength in bits and k + m + n +… = N. And A, B,…….= Unidentified codes in the previous search.

Useful Links:
1:Huffman coding http://en.wikipedia.org/wiki/Huffman_code
2:Practical Huffman coding http://www.compressconsult.com/huffman/

Enjoy this techie discussion.....
Wait for new post in this catogory.
Till then bbye n think naturally.

3 comments:

jpapNovember 5, 2011 at 12:49 PM
How did you construct Table 3? Do you have any references that further describe the automated merged table approach?

jpap
AnonymousJanuary 30, 2013 at 11:04 PM
Hey There. I founԁ your blog using msn. This iѕ аn
extremеlу well wгitten агtiсle.
I'll make sure to bookmark it and return to read more of your useful information. Thanks for the post. I'll dеfіnіtelу return.

Feel free to visіt my blog post - www.bucket--truck.com
Also visit my blog post bucket trucks
AnonymousMarch 7, 2013 at 9:45 PM
An outstanding share! I've just forwarded this onto a coworker who has been doing a little homework on this. And he actually bought me breakfast due to the fact that I found it for him... lol. So let me reword this.... Thank YOU for the meal!! But yeah, thanx for spending some time to discuss this topic here on your blog.

Also visit my blog: online graduate certificate

Thursday, December 27, 2007

Optimized Huffman Coding

3 comments: