2014年2月22日星期六

Week 4

This week, we have finish implementing the Huffman encoder and the decoder.

Demonstration of the encoder:

This is the function used to test the encoder:


After running the code, the result is showed below:
In this case, the coding efficiency is 100%(entropy/average length*100%), which is really high. However, in practical situation, the efficiency is not possible to reach 100%.


This is the sequence need to be encode.


The dictionary was established by the function wrote last week.


The output code is shown in the following chart:





The encoded sequence will be transmitted through channel. At the receiver end, the code will be received and should be decoded according to the dictionary.

Following is a test of the decoder:

Add the decoder.
Since the actualsig is generated randomly, hence, when running the code again, the actualsig will be different from the previous encode one. Here we involved 'isequal' function. We only need to make sure that the code before encode and after decode are same. 'isequal' will return 1 if decode and actualsig are same. 
After running the program, it returns 1, which means this function works.


2014年2月15日星期六

week3

This week, we still focus on the dictionary establishment. Last week, the function has past a test with very small amount of data. However, when we try to test is with a very large amount of data(data from 2 second recording with 44100Hz sampling rate), it will cost more than 5 minutes to generate the dictionary, which will make this program have no practical value. Then we try to improve the algorithm.

The algorithm we use in last week is a direct apply of the basic ideal of Huffman coding involved nested cell. Cell is a data type in Matlab, which can contain many types of other data. For the nested cell used in last weeks function, the top level of the cell may have a length of 88200, and each element of this cell will contain a other CELL with length of 88200 and the second level of cell contains a vector in each of its term. This means the program need to process a 88200^2 of vectors and this will cost a huge amount of computing resource.

However, after a further study of the Huffman code, the symbol it self are not very important during the coding, only the index of the symbols need to be considered. Hence, the problem of nested cell was solved.

After editing and improve of the code, 2 second of voice data will cost only about 0.5 second. If this coding method was combined with high performance hardware, it will have a considerable practical value.

Besides, establish the dictionary, we also continued to design the encode and decode method, but it still not finish yet. Hence there is no result can be showed here.

2014年2月14日星期五

Huffman coding

the history of Huffman coding


David A. Huffman had a great idea of an algorithm deal with the optimal code of given probability.

The Huffman coding is an entropy encoding algorithm used for lossless data compression, this method developed in 1952 paper " A method for the Construction of Minimum-Redundancy Codes".



procedure of Binary Huffman coding procedure

1. Sort the probability of all source symbols in a descending order.

2. Merge the last two into a new symbol, add their probabilities.

3. Repeat the above steps until only one root left.

4. Code assignment: Traverse the tree from the root to each leaf node, assign 0 to the top branch and 1 to the bottom branch.


(The Huffman algorithm works from leaves to the root of the code tree)


Limitations of Huffman Codes:

- Fixed source probability is required.

- For online computation, one needs extra bits to encode the table.


Application of Huffman codes:

Applications: Huffman Codes are often used

as a “back-end” to other compression 

methods. For example, MP3, JPEG and MPEG

have a “front-end” model (lossy compression) 

and quantisation followed by Huffman coding.









*D.A. Huffman, "A Method for the Construction of Minimum-
Redundancy Codes", Proceedings of the I.R.E., September 1952,

pp 1098–1102. Huffman's original article.

2014年2月8日星期六

week2

the basic core codes have been designed in the middle of the second week, when coming to the lab in friday, the rest of the code wish to being complete.

The primary algorithm of Huffman code was able to run successful, however there are some difficulties in Decode part when Encode part have been designed during the half work day in friday,
the difficulty is the encoded signal is a sequence binary code, and refer to dictionary, each signal has it own different length sequence binary code, and those different length sequence binary code will combine to a final sequence binary code in order, the problem is how to distinguish the each signal code from the final sequence binary code, then refer to dictionary to Decoded.

The test of establish the dictionary:

This is part of the code of function: huffmandictionary

This is the code used to test the function:
1.  Generate symbols for 6 to 11 and assume a vector represent the times they may appear in the sequence need to be code.
2. Establish the dictionary

The dictionary has been established successfully:



Other values:



2014年2月1日星期六

week1

Week 1
The work of project in week one is all about background knowledge and theoretical knowledge.

Source coding for signal is one way to deal with the data compression, the aim of source coding is to allow efficient transmission with less redundancy. The two methods of our project are Huffman coding and Arithmetic coding, both two are lossless data compression, that means the data will be reconstructed exactly.


To meet each group members the general plan made:


                                        two group members deal with the Huffman Coding
                                     


                                       two group members deal with the arithmetic Coding