/\as builder said five months ago(earlier in this topic), the only way to know for sure if there will be any size improvements is to test it out.
as for ralphdspam's question:
i'm not entirely sure what it is you're asking (and i'm sure that somebody else will likely have a question about other things at some point), so ima just go through a quick explanation of huffman.
Simple Huffman compression is very easy to read and write. Firstly go through the data you wish to compress and determine, in order, the number of times each byte value appears(this can be done with two+ byte numbers as well, I suppose, but you aren't likely to gain any space from it). Once you have these values, store them in a lookup table which will be accessed by your compressor/decompressor later on.
Reading and writing are then very simple, if you choose to use the quick and easy method:
each of your byte values will be stored in your compressed data, not as fixed length chunks, but as variable length, null-terminated strings of bits. This means that, to your decompressor
will read as byte value number one in your look-up table,
will read as the second,
as the third, and so on.
Now, this works very well for data which isn't very diverse in values, chunk number one taking only ¼ of the original space, number two taking 3/8, number 3 ½, and so on. However, any value past number 7 in the lookup table will begin to take up more space than the original, meaning that more complex data sets will, very possible, be increased in size when run through the compressor. A way to remedy this is to use a more complex pattern for squishing data into sub-byte chunks like the one below with double null-terminated strings:
which continues saving space up through value number 10 and is on par for 4 more after that. even more complex patterns can be used, but keep in mind that, the more complexity one adds, the more difficult defining rules for the compressor/decompressor becomes, and the more code/time it will take to carry out it's tasks.
If you're really looking to save some space on redundancies in your data then it's possible to use these methods alongside Run Length Encoding (like builder hinted at [also five months ago
]) which appends after each value a second value which tells how many values afterwards are the same. Therefore, using full bytes:
00000001 11111111, or 01 FF
would be read by a decompiler as string of 255 0's.
In order to use this concept alongside Huffman compression, one can apply the same concept but, instead of using byte sized chunks, using Huffman-style varying strings:
for example, when using the quick and easy patter from above, would be translated as 3 consecutive 'value # 2's from your look-up table.
would be 2 'value # 4's,
would be 6 'value #5's, and so on...
Have fun playing around with data!