Omnimaga

Calculator Community => TI Calculators => General Calculator Help => Topic started by: shmibs on September 21, 2010, 11:04:28 pm

Title: map compression
Post by: shmibs on September 21, 2010, 11:04:28 pm
ok, it's looking like unit 533D is going to have some hella-massive map sizes, so i was wondering...
what would be a compression algorithm well suited to a one byte per tile, 8 bites tall by n bytes wide(where n is >=13), and quite a bit of variance in tile types(so rle wouldnt work all that well)
Title: Re: map compression
Post by: Builderboy on September 21, 2010, 11:09:22 pm
well for one, i wouldn't rule out RLE just yet.  You might want to try it just to see if you like it.  How many tiles are you planning?
Title: Re: map compression
Post by: shmibs on September 21, 2010, 11:19:10 pm
im not sure just yet(as it's still being worked out), but a mega ton
maybe 100
Title: Re: map compression
Post by: Builderboy on September 21, 2010, 11:24:43 pm
Ah i see, so no half byte compression for you.  In that case i can only suggest RLE or Huffman.  Unfortunately neither can be decompressed in real time, so you must be able to decompress them to a location before you start with displaying the map.
Title: Re: map compression
Post by: shmibs on September 22, 2010, 01:09:01 am
beforehand is what i was planning on anyways. i think i might have an idea for a different algorithm, though(running it past kerm)
thanks, builder!
Title: Re: map compression
Post by: Madskillz on September 22, 2010, 01:16:51 am
RLE can provide wonderful results on the right type of things. I know we used it in Sonic, at least I ran some tests with it and it had pretty decent results.
Title: Re: map compression
Post by: Builderboy on September 22, 2010, 01:37:36 am
Alright sounds good :) Although i am curious as to what it is ;D

And why do you need so many tiles out of curiosity?
Title: Re: map compression
Post by: DJ Omnimaga on September 22, 2010, 11:14:46 am
I really need to try compression at one point x.x

Builderboy some games really requires more tiles than usual. See E:SoR and Reuben Quest, for example. It can't be helped.
Title: Re: map compression
Post by: Builderboy on September 22, 2010, 11:16:58 am
Yeah I guess so, I wonder what kind of compression he dreampt up though o.O
Title: Re: map compression
Post by: DJ Omnimaga on September 22, 2010, 11:19:15 am
Not sure either. I know in Reuben Quest there are several tiles in a row that are the same in maps, though. In such game, I wonder if RLE would be applicable?
Title: Re: map compression
Post by: Builderboy on September 22, 2010, 11:21:44 am
I don't know, really the only way to know for sure is to test.  I had an idea to combine RLE with Huffman but I don't know how complicated that would be.
Title: Re: map compression
Post by: DJ Omnimaga on September 22, 2010, 11:32:43 am
Yeah my concern is if some maps are complex, they may end up a bit larger x.x
Title: Re: map compression
Post by: shmibs on September 22, 2010, 10:06:03 pm
Yeah I guess so, I wonder what kind of compression he dreampt up though o.O
it's nothing all that special. basically it detects when the parser(which is running through the code and decompressing it to an appvar) encounters a certain tile(all the tiles for which this applies are predifined and will be clumped together in the tile data so instead of checking for each tile individually it merely checks if the tile number is greater than a set number and less than another[or just greater, as i'll probably be putting this at the end of the tile data] and if so the following byte is read as the length over which to stretch a predefined pattern of tiles like this messy ground:
(http://img.removedfromgame.com/imgs/1285207098-New Bitmap Image.bmp))
additionally, there will be tiles that are always next to one another to form objects which are too large to fit in an 8*8, and these can be expressed by a single tile value after which the one or two others that make up the object width-wise will be automatically appended(again by predefined patterns)

all told, this will make the code required for decompression rather large, but with the map sizes im dealing with it would definitely be worth it.

as for the number of tiles, i like things that look pretty
Title: Re: map compression
Post by: DJ Omnimaga on September 22, 2010, 11:06:57 pm
I agree sometimes to have complex maps and ones that won't look too simple in certain games, it's better to have more tiles. Look at Reuben Quest forests and compare them with ROL3, for example. IMHO the ones in Reuben looks better, because it looks less like a grid of trees, but I needed 9 tree tiles.
Title: Re: map compression
Post by: ralphdspam on February 14, 2011, 02:32:50 pm
Can someone explain how to save a tree for Huffman compression? That would be helpful.
Title: Re: map compression
Post by: AngelFish on February 14, 2011, 02:40:52 pm
Just a random unrelated note, but all of these compression topics seem to neglect one fundamental fact of compression: No compression algorithm that does not change the form of the data (eg Binary to Hexadecimal) can achieve compression under all circumstances.

Title: Re: map compression
Post by: shmibs on February 14, 2011, 08:10:33 pm
/\as builder said five months ago(earlier in this topic), the only way to know for sure if there will be any size improvements is to test it out.

as for ralphdspam's question:
i'm not entirely sure what it is you're asking (and i'm sure that somebody else will likely have a question about other things at some point), so ima just go through a quick explanation of huffman.



   Simple Huffman compression is very easy to read and write. Firstly go through the data you wish to compress and determine, in order, the number of times each byte value appears(this can be done with two+ byte numbers as well, I suppose, but you aren't likely to gain any space from it). Once you have these values, store them in a lookup table which will be accessed by your compressor/decompressor later on.

Reading and writing are then very simple, if you choose to use the quick and easy method:

each of your byte values will be stored in your compressed data, not as fixed length chunks, but as variable length, null-terminated strings of bits. This means that, to your decompressor

10

will read as byte value number one in your look-up table,

110

will read as the second,

1110

as the third, and so on.



   Now, this works very well for data which isn't very diverse in values, chunk number one taking only ¼ of the original space, number two taking 3/8, number 3 ½, and so on. However, any value past number 7 in the lookup table will begin to take up more space than the original, meaning that more complex data sets will, very possible, be increased in size when run through the compressor. A way to remedy this is to use a more complex pattern for squishing data into sub-byte chunks like the one below with double null-terminated strings:

100
1100
10100
11100
101100
110100
111100
1011100
1101100
1110100
10111100
11011100
etc...

which continues saving space up through value number 10 and is on par for 4 more after that. even more complex patterns can be used, but keep in mind that, the more complexity one adds, the more difficult defining rules for the compressor/decompressor becomes, and the more code/time it will take to carry out it's tasks.



   If you're really looking to save some space on redundancies in your data then it's possible to use these methods alongside Run Length Encoding (like builder hinted at [also five months ago XD]) which appends after each value a second value which tells how many values afterwards are the same. Therefore, using full bytes:
00000001 11111111, or 01 FF
would be read by a decompiler as string of 255 0's.

In order to use this concept alongside Huffman compression, one can apply the same concept but, instead of using byte sized chunks, using Huffman-style varying strings:

110 1110

for example, when using the quick and easy patter from above, would be translated as 3 consecutive 'value # 2's from your look-up table.

11110 110

would be 2 'value # 4's,

111110 1111110

would be 6 'value #5's, and so on...

Have fun playing around with data!
Title: Re: map compression
Post by: ralphdspam on February 14, 2011, 08:53:49 pm
I meant to ask for the best way to format the lookup table.  (Sorry my question was too vague :P)
Title: Re: map compression
Post by: Runer112 on February 15, 2011, 12:30:36 am
shmibs, are you thinking of a different compression algorithm maybe? Because I'm pretty sure what you just described isn't Huffman. Huffman compression doesn't employ any tactic involving null-terminated strings of 1s. It relies on an algorithm that simply encodes more common values with smaller strings of bits and less common values with larger strings of bits, with 1s and 0s throughout. It does so by counting the number of occurrences of each value present in a set of data to give each value a "weight." Then, the two units with the smallest total "weight" are paired together, becoming a new unit with the combined weight. The two units with the smallest "weights" are continually paired together until all units have been joined into a tree with one root element. This will result in values with a higher weight being further up the tree and having been assigned a shorter bit string. Here's is an example of a finished Huffman tree that I shamelessly stole from Wikipedia, based on the string: j'aime aller sur le bord de l'eau les jeudis ou les jours impairs


(http://img254.imageshack.us/img254/6914/huffman.png)



Anyways, back to ralphdspam's question.

Can someone explain how to save a tree for Huffman compression? That would be helpful.

Seeing as you're asking how to save a tree, I'll go ahead assuming you already know how to make a tree like above. If not, feel free to ask how that's done, or you can read the Wikipedia article (http://en.wikipedia.org/wiki/Huffman_coding) which is how I learned how Huffman coding works. :P

There may be better ways than the method I'm about to describe, but I'm not very familiar with binary heap/tree storage so I'll just explain the simplest way I could imagine!

I would code the tree starting from the top down as an array consisting of nodes and leaves. Just to clarify some terms before we continue: a node is a point on the tree that branches into two other elements, and a leaf is a terminal element that doesn't branch further. So nodes would be the circles in the above diagram, and leaves would be the ends that represent pieces of data.

I would build the tree into the array by following node branches down to leaves, and after reaching each leaf, backtracking until you reach an unvisited branch and exploring that branch. And every time you visit a node or leaf that hasn't yet been reached, assign it the next element in the array. The above tree could be navigated like so by always navigating 0-branches with priority:

Spoiler For A pretty big step-by-step table explaining how one would determine the order of elements in the array:
Step   Current location   Table so far
0nullnull
10null   0
200null   0   00
3000null   0   00   000
40000null   0   00   000   0000
500000null   0   00   000   0000   00000
6000000null   0   00   000   0000   00000   000000
700000null   0   00   000   0000   00000   000000
8000001null   0   00   000   0000   00000   000000   000001
900000null   0   00   000   0000   00000   000000   000001
100000null   0   00   000   0000   00000   000000   000001
1100001null   0   00   000   0000   00000   000000   000001   00001
120000null   0   00   000   0000   00000   000000   000001   00001
13000null   0   00   000   0000   00000   000000   000001   00001
140001null   0   00   000   0000   00000   000000   000001   00001   0001
15000null   0   00   000   0000   00000   000000   000001   00001   0001
1600null   0   00   000   0000   00000   000000   000001   00001   0001
17001null   0   00   000   0000   00000   000000   000001   00001   0001   001
1800null   0   00   000   0000   00000   000000   000001   00001   0001   001
190null   0   00   000   0000   00000   000000   000001   00001   0001   001
2001null   0   00   000   0000   00000   000000   000001   00001   0001   001   01
21010null   0   00   000   0000   00000   000000   000001   00001   0001   001   01   010
220100null   0   00   000   0000   00000   000000   000001   00001   0001   001   01   010   0100
2301000null   0   00   000   0000   00000   000000   000001   00001   0001   001   01   010   0100   01000
240100null   0   00   000   0000   00000   000000   000001   00001   0001   001   01   010   0100   01000
2501001null   0   00   000   0000   00000   000000   000001   00001   0001   001   01   010   0100   01000   01001
260100null   0   00   000   0000   00000   000000   000001   00001   0001   001   01   010   0100   01000   01001
27010null   0   00   000   0000   00000   000000   000001   00001   0001   001   01   010   0100   01000   01001
280101null   0   00   000   0000   00000   000000   000001   00001   0001   001   01   010   0100   01000   01001   0101
29010null   0   00   000   0000   00000   000000   000001   00001   0001   001   01   010   0100   01000   01001   0101
3001null   0   00   000   0000   00000   000000   000001   00001   0001   001   01   010   0100   01000   01001   0101
31011null   0   00   000   0000   00000   000000   000001   00001   0001   001   01   010   0100   01000   01001   0101   011
320110null   0   00   000   0000   00000   000000   000001   00001   0001   001   01   010   0100   01000   01001   0101   011   0110
33011null   0   00   000   0000   00000   000000   000001   00001   0001   001   01   010   0100   01000   01001   0101   011   0110
340111null   0   00   000   0000   00000   000000   000001   00001   0001   001   01   010   0100   01000   01001   0101   011   0110   0111
35011null   0   00   000   0000   00000   000000   000001   00001   0001   001   01   010   0100   01000   01001   0101   011   0110   0111
3601null   0   00   000   0000   00000   000000   000001   00001   0001   001   01   010   0100   01000   01001   0101   011   0110   0111
370null   0   00   000   0000   00000   000000   000001   00001   0001   001   01   010   0100   01000   01001   0101   011   0110   0111
38nullnull   0   00   000   0000   00000   000000   000001   00001   0001   001   01   010   0100   01000   01001   0101   011   0110   0111
391null   0   00   000   0000   00000   000000   000001   00001   0001   001   01   010   0100   01000   01001   0101   011   0110   0111   1
4010null   0   00   000   0000   00000   000000   000001   00001   0001   001   01   010   0100   01000   01001   0101   011   0110   0111   1   10
41100null   0   00   000   0000   00000   000000   000001   00001   0001   001   01   010   0100   01000   01001   0101   011   0110   0111   1   10   100
421000null   0   00   000   0000   00000   000000   000001   00001   0001   001   01   010   0100   01000   01001   0101   011   0110   0111   1   10   100   1000
43100null   0   00   000   0000   00000   000000   000001   00001   0001   001   01   010   0100   01000   01001   0101   011   0110   0111   1   10   100   1000
441001null   0   00   000   0000   00000   000000   000001   00001   0001   001   01   010   0100   01000   01001   0101   011   0110   0111   1   10   100   1000   1001
45100null   0   00   000   0000   00000   000000   000001   00001   0001   001   01   010   0100   01000   01001   0101   011   0110   0111   1   10   100   1000   1001
4610null   0   00   000   0000   00000   000000   000001   00001   0001   001   01   010   0100   01000   01001   0101   011   0110   0111   1   10   100   1000   1001
47101null   0   00   000   0000   00000   000000   000001   00001   0001   001   01   010   0100   01000   01001   0101   011   0110   0111   1   10   100   1000   1001   101
4810null   0   00   000   0000   00000   000000   000001   00001   0001   001   01   010   0100   01000   01001   0101   011   0110   0111   1   10   100   1000   1001   101
491null   0   00   000   0000   00000   000000   000001   00001   0001   001   01   010   0100   01000   01001   0101   011   0110   0111   1   10   100   1000   1001   101
5011null   0   00   000   0000   00000   000000   000001   00001   0001   001   01   010   0100   01000   01001   0101   011   0110   0111   1   10   100   1000   1001   101   11
51110null   0   00   000   0000   00000   000000   000001   00001   0001   001   01   010   0100   01000   01001   0101   011   0110   0111   1   10   100   1000   1001   101   11   110
5211null   0   00   000   0000   00000   000000   000001   00001   0001   001   01   010   0100   01000   01001   0101   011   0110   0111   1   10   100   1000   1001   101   11   110
53111null   0   00   000   0000   00000   000000   000001   00001   0001   001   01   010   0100   01000   01001   0101   011   0110   0111   1   10   100   1000   1001   101   11   110   111
5411null   0   00   000   0000   00000   000000   000001   00001   0001   001   01   010   0100   01000   01001   0101   011   0110   0111   1   10   100   1000   1001   101   11   110   111
551null   0   00   000   0000   00000   000000   000001   00001   0001   001   01   010   0100   01000   01001   0101   011   0110   0111   1   10   100   1000   1001   101   11   110   111
56nullnull   0   00   000   0000   00000   000000   000001   00001   0001   001   01   010   0100   01000   01001   0101   011   0110   0111   1   10   100   1000   1001   101   11   110   111
Man that took forever to format right...

Anyway. Once you have finished that process, the final table that you have will tell you the placement in which to fill in the real data in the real table! Yay, we're almost done! Now go through the list you came up with from the above process element by element, looking at what the bit string corresponds to in the Huffman tree. If the element is a node, encode that element in the array with the first bit being a 0 and the following bits representing the distance minus one between this element and the element containing the current bit string with an additional 1 at the end (I'll explain this later). If the element is a leaf, encode the data with the first bit being a 1 and the following bits being the uncompressed data represented by that bit string. The example I've been using throughout this post would generate a table like so:

Data is represented as 8 bits in the form of [Type bit]:[Data], so the actual data would be 7 bits. A number for [Data] means these 7 bits should be the binary representation of the number, which will be added to the table pointer if the current table entry is of this type and the bit read is a 1. This, along with the mandated pointer+1 every iteration, will simulate a jump to the 1-branch of the node. A symbol for [Data] means the 7 bits should be the ASCII value of the symbol (disregarding the high bit, because none of our ASCII characters have this bit set anyway)

Offset in table:   0      1      2      3      4      5      6      7      8      9      10      11      12      13      14      15      16      17      18      19      20      21      22      23      24      25      26      27      28      
Data at location:   0:190:90:70:50:30:11:b1:p1:j1:u1:space0:50:30:11:'1:m1:r0:11:a1:i0:50:30:11:o1:d1:e0:11:l1:s

The decompressor can then decompress the data stream by repeating the following steps for each compressed value:





And that's it! Wasn't that simple? ;D

But I'm sure that was confusing as hell, so please ask me what I mean by something if you get confused by any of this.
Title: Re: map compression
Post by: ralphdspam on February 15, 2011, 12:57:14 am
Thank you very much! :)
Would it be too much to ask for a code example in Axe or BASIC?
Title: Re: map compression
Post by: Runer112 on February 15, 2011, 01:37:11 am
A compressor or a decompressor? A decompressor shouldn't be too complex, although a Huffman compressor would probably be a decent amount harder to write for a calculator. I guess it's possible, though.
Title: Re: map compression
Post by: shmibs on February 15, 2011, 07:54:56 pm
oh wow, you meant a tree for the actual Huffman algorithm used by computers and such o.0?
sorry, usually when i've heard people talking about Huffman around this community it was in reference to "an algorithm that simply encodes more common values with smaller strings of bits and less common values with larger strings of bits," without using the full on complex method simply because the size of the code required for compressing/decompressing such a thing in relation to the data to be compressed (as well as the variety present in the source data), didn't merit it.

i am now very excited to see if you can pull this off =D!