At the moment, few people think about how file compression works. Compared to the past, using a personal computer has become much easier. And almost every person working with the file system uses archives. But few people think about how they work and by what principle file compression occurs. Huffman codes became the very first version of this process, and they are still used in various popular archivers. Many users do not even think about how simple the file compression is and by what scheme it works. In this article we will consider how compression occurs, what nuances help speed up and simplify the coding process, and also we will understand what the principle of building a coding tree is.
Algorithm History
The very first algorithm for conducting effective coding of electronic information was the code proposed by Huffman back in the mid-twentieth century, namely in 1952. It is he who at the moment is the main basic element of most programs designed to compress information. At the moment, one of the most popular sources using this code are ZIP, ARJ, RAR and many others.
Also, this Huffman algorithm is used to compress JPEG images and other graphic objects. Well, all modern faxes also use coding, invented in 1952. Despite the fact that so much time has passed since the creation of the code, to this day it is used in the newest shells and on equipment of the old and modern types.
The principle of efficient coding
The Huffman algorithm is based on a scheme that allows you to replace the most likely, most common characters with binary system codes . And those that are less common are replaced with longer codes. The transition to long Huffman codes occurs only after the system uses all the minimum values. This technique allows you to minimize the code length for each character of the original message as a whole.
An important point is that at the beginning of the coding, the probabilities of the appearance of letters should already be known. It is from them that the final message will be compiled. Based on these data, the Huffman code tree is being constructed, on the basis of which the process of encoding letters in the archive will be carried out.Huffman Code Example
, . , , . , , . :
There is also such a concept, included in Huffman codes, as a leaf of a tree. It is a node from which no arcs should exit. If two nodes are connected by an arc, then one of them is a parent, the other is a child, depending on which node the arc exits from and into which it enters. If two nodes have the same parent node, they are usually called sister nodes. If, in addition to leaves, several arcs emerge at the nodes, then this tree is called binary. Just such is the Huffman tree. A feature of the nodes of this construction is that the weight of each parent is equal to the sum of the weight of all its nodal children.Huffman Tree Algorithm
. , . , , . , . , .
After that, a parent node is created, which should weigh as much as the sum of this pair of nodes. After that, the parent is sent to the list with free nodes, and the children are deleted. In this case, arcs get the corresponding indicators, units and zeros. This process is repeated exactly as long as it takes to leave only one node. Then binary digits are written in the direction from top to bottom.Compression Efficiency
, , , , . , , , .
, , . , . , .
In addition, working in this mode, the dynamic Huffman code, or rather the algorithm itself, is not subject to any changes. This is mainly due to the fact that the probabilities are directly proportional to the frequencies. It is worth paying special attention to the fact that the final weight of the file or the so-called root node will be equal to the sum of the number of letters in the object to be processed.Conclusion
Huffman codes - a simple and long-established algorithm that is still used by many well-known programs and companies. Its simplicity and comprehensibility make it possible to achieve effective results of file compression of any size and significantly reduce the space they occupy on the storage disk. In other words, the Huffman algorithm is a long-studied and developed scheme, the relevance of which does not decrease to this day.
And thanks to the ability to reduce the size of files, transferring them over the network or in other ways becomes easier, faster and more convenient. Working with the algorithm, you can compress absolutely any information without harm to its structure and quality, but with the maximum effect of reducing the file weight. In other words, Huffman coding has been and remains the most popular and relevant method for compressing file size.