Flames

Submitted By: Xeda112358 Date: July 05, 2013, 10:57:34 pm Views: 892
Summary: This is a tutorial on how to create flame graphics in Z80 Assembly for the TI-83+ and 84+ calcs.

Fire Animation

Fire animations are always neat to watch, and luckily, creating such an animation is relatively easy.
  • The Algorithm
  • If a pixel is ON (or OFF, depending on which state you choose), give it a 1/n chance of surviving.
  • If the pixel dies, toggle it OFF (or ON).
  • Move the pixel up
  • Randomly set pixels on the bottom row once you have scanned the whole image

You will want to scan from top to bottom and it will be easier to read pixels from left to right. let's create some pseudo-code:
Code: [Select]

Set Y = 1
While Y<64
  Set X = 0
  While X<96
    Set pxl = pxl-test(Y,X)
    If (pxl=1) and (0=randInt(0,N-1))
      Set pxl = 0
    pxl-set(Y-1,X,pxl)
    Set X = X+1
  EndWhile
  Set Y = Y+1
EndWhile

On our calculators, the corresponding assembly code is a rather ugly code (and by ugly, I mean inelegant). Plotting and testing individual pixels on the B/W models takes several hundred t-states each and we will be checking 96*63 = 6048 pixels. On a 6MHz calc, we would only be getting about 1FPS and I am sure you aren't learning assembly to achieve BASIC speeds.
We can modify 8 pixels at a time if we can somehow easily compute and use the 1/n chance of dying and storing those chances as bits in a byte.
  • Platform Specific Optimisation
  • Use a 1/8 chance of dying.
  • create the following LUT:
  • Code: [Select]
    .db %11111110[/li][/list]
    .db %11111101
    .db %11111011
    .db %11110111
    .db %11101111
    .db %11011111
    .db %10111111
    .db %01111111

     
  • Now we select a random integer from 0 to 7 and locate the corresponding byte in the LUT. Use this value to turn one of the 8 pixels off with a simple logic instruction.


It is true that we slightly modified the algorithm, but it is much, much faster, allowing us to evolve the whole screen in about 1/70th of a second (at 6MHz). So now to implement this scheme...

We will start with a straight-forward approach and then we will tweak the algorithm to get more speed from it.
Note:In the following examples, ld a,rwill be used to obtain "pseudo-random numbers" but unfortunately, the effect is quite boring and predictable. Instead, it is encouraged that the reader use their own routine to generate a better value, though it can be a significant hit to speed. Here is an example that works rather nicely and is used in the screenshots:
Code: [Select]

     ld hl,seed
     ld a,r        ;9     ;get a pseudo-random number in A
     add a,(hl)
     ld h,a
     add a,(hl)
     ld l,a
     add a,(hl)
     ld (seed),a

As well, the screenshots include code for initially drawing to the bottom row of pixels, or for loading in images.


Attempt 1 : 'Beginner'
Note:the numbers to the right of the instructions are the number of clock cycles required for the instruction. At 6MHz, the calculator runs 6 million cycles worth of code per second.
Code: [Select]

     ei            ;4     ;we are going to need the interrupts for reading the keyboard
Main:
     ld bc,756     ;10    ;we are going to read through 756 bytes worth of the screen at a time
     ld ix,plotSScreen+12 ;14 ;IX points to row 1 of the graph buffer
FireLoop:
     ld hl,LUT     ;10    ;This is our LUT for the pixel mask.
     ld a,r        ;9     ;get a pseudo-random number in A
     and 7         ;7     ;mask it with %00000111 to get it in the range of 000 to 111 (0 to 7)
     add a,l       ;4     ;add this value to HL
     ld l,a        ;4     ;
     jr nc,$+3     ;12|11 ;check if there was overflow to correct
     inc h         ;--
     ld a,(hl)     ;7     ;get the value of the byte in the LUT
     and (ix)      ;19    ;mask it with the byte of pixels at IX
     ld (ix-12),a  ;19    ;load the updated value into the previous row of pixels
     inc ix        ;10    ;IX is now incremented
     dec bc        ;6     ;decrement BC which is our counter
     ld a,b        ;4     ;we need to check if BC = 0
     or c          ;4     ;returns z when A and C are 0 (so when BC = 0)
     jr nz,FireLoop;12|7    ;12*2+7

     bcall(_GrBufCpy) ;?  ;copy the graph buf to the LCD
     ld a,(kbdScanCode) ;13 ;read the last read keyboard value by the OS interrupt
     cp 15         ;7   ;test if CLEAR is pressed
     jr nz,Main    ;12|7
     ret           ;10
LUT:
.db %11111110
.db %11111101
.db %11111011
.db %11110111
.db %11101111
.db %11011111
.db %10111111
.db %01111111

As you may see, we did not update the bottom row of pixels with new pixels. Any ON pixels will remain ON and flames will rise from them.



Attempt 2
The previous code, excluding everything starting at the first bcall(), takes 96031 t-states, which isn't bad for an animation like this. However, let's introduce a trick that will save us a little on t-states.
First note that that dec [reg8] \ jr nz,labelis 3 bytes and 16 cycles if it jumps, 11 if it doesn't, whereas djnz labelis 2 bytes and 13 cycles if it jumps, 8 if it doesn't. Of course, djnz is limited to just b, but we can work around that.
The optimisation here will essentially let B be the counter in the main loop and C be the counter in the outside loop. If B is 0, initially, djnz labelwill loop 256 times. Preliminary calculations!
  • Let C = 1
  • Let B = 10
Code: [Select]

     ld hl,0
Loop:
     inc hl
     djnz Loop   ;loops 10 times
     dec c       ;Here, B is now 0, C gets decremented to 0
     jr nz,Loop  ;we don't jump back again

So now HL = 10. If you let C = 2, though, then the DJNZ loop is executed 10 times, then another 256, and if C = 3, it is 10 times and another 512. Now we move on to the code at hand:
(note that 756=2*256 + 244
Code: [Select]

     ei            ;4
Main:
     ld bc,$F403   ;10    ;B = 244, C = 3
     ld ix,plotSScreen+12 ;14 ;IX points to row 1 of the graph buffer
FireLoop:
     ld hl,LUT     ;10    ;This is our LUT for the pixel mask.
     ld a,r        ;9     ;get a pseudo-random number in A
     and 7         ;7     ;mask it with %00000111 to get it in the range of 000 to 111 (0 to 7)
     add a,l       ;4     ;add this value to HL
     ld l,a        ;4     ;
     jr nc,$+3     ;12|11 ;check if there was overflow to correct
     inc h         ;--
     ld a,(hl)     ;7     ;get the value of the byte in the LUT
     and (ix)      ;19    ;mask it with the byte of pixels at IX
     ld (ix-12),a  ;19    ;load the updated value into the previous row of pixels
     inc ix        ;10    ;IX is now incremented
     djnz FireLoop ;13|8
     dec c         ;4
     jr nz,FireLoop;12|7

     bcall(_GrBufCpy) ;?  ;copy the graph buf to the LCD
     ld a,(kbdScanCode) ;13 ;read the last read keyboard value by the OS interrupt
     cp 15         ;7   ;test if CLEAR is pressed
     jr nz,Main    ;12|7
     ret           ;10
LUT:
.db %11111110
.db %11111101
.db %11111011
.db %11110111
.db %11101111
.db %11011111
.db %10111111
.db %01111111

So we removed 26 t-states from the main loop and replaced it with 13 t-states the dec c \ jr nz,FireLoopexecutes a total of 3 times all for the same size. So we saved 756*(26-13) t-states in the main loop and in all added 33 extra t-states for no extra cost in size. The total speed savings were 9795 t-states for a total of 86236 t-states for one fire loop.


Attempt 3
Every cycle that we optimise out of the main loop translates to 756 cycles saved in all.
Our current routine is straight forward with how IX is being used. On these calculators, the graph buffer is made up of 12-byte rows, so to access the previous row, you use ix-12. However, the IX register is slow and in our uses so far, //very slow//. The simple ld a,(ix+const) takes 19 t-states and 3 bytes, compared to 7 t-states for ld a,(hl) or ld a,(bc) or ld a,(de) which are 1 byte.
Now we are going to obfuscate our code a little, but hopefully you catch the trick:
Code: [Select]

     ei            ;4
Main:
     ld bc,$F403       ;10
     ld de,plotSScreen ;10
Loop:
     ld hl,LUT     ;10
     ld a,r        ;9
     and 7         ;7
     add a,l       ;4
     ld l,a        ;4
     jr nc,$+3     ;12|11
     inc h         ;--
     ld a,(hl)     ;7
     ld hl,12      ;10
     add hl,de     ;11     ;Now HL points to the byte that we want to read, and DE is HL-12... the row above!
     and (hl)      ;7
     ld (de),a     ;7
     inc de        ;6
     djnz Loop     ;13|8
     dec c         ;4
     jr nz,Loop    ;12|7

     bcall(_GrBufCpy) ;?  ;copy the graph buf to the LCD
     ld a,(kbdScanCode) ;13 ;read the last read keyboard value by the OS interrupt
     cp 15         ;7   ;test if CLEAR is pressed
     jr nz,Main    ;12|7
     ret           ;10
LUT:
.db %11111110
.db %11111101
.db %11111011
.db %11110111
.db %11101111
.db %11011111
.db %10111111
.db %01111111

In all we saved 7 cycles from the main loop and 4 from overhead, saving a total of 5296 more t-states, bringing the speed of the fire code to 80940 t-states at most.
But can we save more?


Attempt 4 : Intermediate
Of course we can make it faster, but now we have to add more code. The part that I am not liking about the code in the main loop is this part:
Code: [Select]

     jr nc,$+3     ;12|11
     inc h         ;--

If we can get rid of that, then we can knock off 12*756 t-states from our calculation and have it precise. Another 9072 t-states sounds worth it, but how can we get rid of it? If we can make sure that our LUT is never within 8 bytes of a 256-byte boundary, we won't need to worry about incrementing H. Luckily, in our current code, we actually don't need to worry about that since the code starts at 9D95h, but what if this was part of a bigger program and we weren't sure precisely where the LUT would end up, or what if it is run from a non-standard location? What we can do, then, is create our LUT elsewhere and if we choose wisely, we can make an even better optimisation! If we make sure L = 0, then add a,l is just A. Our LUT is only 8 bytes, so let's put it in saveSScreen at 8700h:
Code: [Select]

     ld a,7Fh      ;7     ;just have 1 bit reset
     ld hl,8708h   ;10
       rlca        ;4    *8
       dec l       ;4    *8
       ld (hl),a   ;7    *8
       jr nz,$-3   ;12|7 *8 (12*8-5 in all)

There were some tricks there, so to be sure you understand it, not only did we use dec lto adjust HL, we also used it as a counter so that we knew when HL = 8700h. We also used rlca to make sure that all of our masks were loaded. Now the optimisations will start to get more difficult to follow:
Code: [Select]

     ld a,7Fh      ;7     ;just have 1 bit reset
     ld hl,8708h   ;10
       rlca        ;4    *8
       dec l       ;4    *8
       ld (hl),a   ;7    *8
       jr nz,$-3   ;12|7 *8 (12*8-5 in all)
     ei            ;4
Main:
     ld bc,$F403       ;10
     ld de,plotSScreen ;10
Loop:
     ld h,87h      ;7
     ld a,r        ;9
     and 7         ;7
 ;    add a,l       ;4
     ld l,a        ;4
 ;    jr nc,$+3     ;12|11
 ;    inc h         ;--
     ld a,(hl)     ;7
     ld hl,12      ;10
     add hl,de     ;11     ;Now HL points to the byte that we want to read, and DE is HL-12... the row above!
     and (hl)      ;7
     ld (de),a     ;7
     inc de        ;6
     djnz Loop     ;13|8
     dec c         ;4
     jr nz,Loop    ;12|7

     bcall(_GrBufCpy) ;?  ;copy the graph buf to the LCD
     ld a,(kbdScanCode) ;13 ;read the last read keyboard value by the OS interrupt
     cp 15         ;7
     jr nz,Main    ;12|7
     ret           ;10

I commented out the code that we got rid of, and ld hl,LUT turned into an 8-bit load, so we saved 19 t-states from the main loop, but we added in more overhead. Was it worth it?
The overhead code takes 228 t-states but only needs to be copied once, the 19 cycles saved from the main loop save 14364 t-states. Speed-wise, it was very much worth it. Memory-wise, we got rid of 5 bytes in the main loop, and 8 bytes for the built in LUT, and added 10 bytes in overhead, so in all 3 bytes more were saved. It was definitely worth it.
So we currently have a routine that is 66576 t-states, plus a bcall and 32 more t-states to test if clear is pressed. The total size of this program is a tiny 47 bytes and you can expect around 20FPS at 6MHz, since the bcall() takes just under 200 000 t-states. //That// is pretty good by most standards and will provide a nice, smooth visual graphic. All you should really need to modify now is possibly making a better pseudo-random number generator (PRNG), or, if you are ready to //really// make it fast, you can move on to the advanced level.



Attempt 5 : Advanced
We will start off with a little modification that will add bytes to the code, but possibly speed things up a little more by getting rid of interrupts. However, this means that our use of kbdScanCode will no longer work since the OS interrupt routine is what updates that. We will need to directly poll the keyboard hardware, but this only adds 18 t-states to the overhead once, and -2 t-state every frame:
Code: [Select]

     di            ;4     ;disable interrupts
     ld a,FDh      ;7     ;This is the key group that includes [CLEAR] and [ENTER]
     out (1),a     ;11    ;tell the keyboard to poll that keygroup
;another optimisation-- since A has only 1-bit reset, why not resuse it?
     ld hl,8708h   ;10
       rlca        ;4    *8
       dec l       ;4    *8
       ld (hl),a   ;7    *8
       jr nz,$-3   ;12|7 *8 (12*8-5 in all)
Main:
     ld bc,$F403       ;10
     ld de,plotSScreen ;10
Loop:
     ld h,87h      ;7
     ld a,r        ;9
     and 7         ;7
     ld l,a        ;4
     ld a,(hl)     ;7
     ld hl,12      ;10
     add hl,de     ;11     ;Now HL points to the byte that we want to read, and DE is HL-12... the row above!
     and (hl)      ;7
     ld (de),a     ;7
     inc de        ;6
     djnz Loop     ;13|8
     dec c         ;4
     jr nz,Loop    ;12|7

     bcall(_GrBufCpy) ;?  ;copy the graph buf to the LCD
     in a,(1)      ;11    ;check the keyboard
     and 40h       ;7     ;check if Clear was pressed
     jr nz,Main    ;12|7
     ret           ;10

I cheated, actually. By reusing A in the beginning, I removed 2 bytes and 7-tstates of the dent that we put into the program. The total cost was 2 bytes and 15 tstates, minus 2 t-state for every iteration of fire.


Attempt 6 (long intro)
The LCD can be a monster to work with for beginners, so don't worry if you don't quite follow this! What you should understand, though, is that we are working with a particularly slow responding LCD. Many games and graphics programs created for these calcualtors experience a huge bottleneck with dealing with the LCD and that is because you have to wait for some time between each read or write to the LCD. The worst part is that the waiting time is not constant and it is independent of processor speed. This means that regardless of how fast your calc is clocked (6MHz or 15MHz), the LCD will still take a given amount of time to respond and some LCDs will naturally take longer than others (for example, I have two calculators that respond very quickly to most writes, but another that repsonds abotu 4 times slower).

Unfortunately, the OS bcall is not very fast, even compared to 'safecopy' routines written by the community that work on all LCDs. Fortunately, for a routine like this, we get two perks. The first is that running at 15MHz will only slightly boost performance and will just drain more battery power because of the LCD bottleneck (15MHz is better if you don't have any bottlenecks, like when you are just running computations). So we can leave our routine in 6MHz mode and TI-83+ users can enjoy our program just as well as TI-83+SE and TI-84+ users. As well, we can write a much faster LCD updating routine to push the limits of the LCDs ability.

  • Updating the LCD
  • To update the whole LCD, we will typically start by writing 80h and 20h to the LCD instruction port, to set the coordinates.
  • We will also need to make sure that the LCD is in the proper increment mode so we write 5 to the LCD port. This will cause the LCD's data pointer to increment downward


for every read/write to the LCD data port (for us, the Y direction, but the official documents call this the X direction).
  • Optimisations with code organisation
  • The wait time between LCD writes is usually around 60 t-states, so the fewer writes that need to be done, the less time will be wasted (and the faster the code will be).
  • We only need to write 5 to the instruction port once to set the mode. So instead of keeping this inside our LCD updating routine, we can keep it outside of the main loop, at the beginning of the program. As well, we usually don't need an LCD delay at the beginning of our program because program loading code usually takes much longer than any necessary delay between LCD writes. However, to be safe...
Code: [Select]

     di            ;4     ;disable interrupts
     ld a,FDh      ;7     ;This is the key group that includes [CLEAR] and [ENTER]
     out (1),a     ;11    ;tell the keyboard to poll that keygroup
;another optimisation-- since A has only 1-bit reset, why not resuse it?
     ld hl,8708h   ;10
       rlca        ;4    *8
       dec l       ;4    *8
       ld (hl),a   ;7    *8
       jr nz,$-3   ;12|7 *8 (12*8-5 in all)
     ld a,5        ;7
     out (10h),a   ;11    ;write A to the LCD Instruction port (port 16)
Main:
     ld bc,$F403       ;10
     ld de,plotSScreen ;10
Loop:
     ld h,87h      ;7
     ld a,r        ;9
     and 7         ;7
     ld l,a        ;4
     ld a,(hl)     ;7
     ld hl,12      ;10
     add hl,de     ;11     ;Now HL points to the byte that we want to read, and DE is HL-12... the row above!
     and (hl)      ;7
     ld (de),a     ;7
     inc de        ;6
     djnz Loop     ;13|8
     dec c         ;4
     jr nz,Loop    ;12|7

;=================
;Update LCD
;This is a traditional-ish fastcopy routine
;Don't worry about an initial delay. 60000 t-states is more than enough.
     ld a,80h      ;7    set the LCD X coordinate (we would call it Y) to 0
     out (16),a    ;11
     ld hl,plotSScreen
     ld de,11      ;10   we will use this later to increment HL through plotSScreen line-by-line
     in a,(16) \ rlca \ jr c,$-3 ;this will wait until the LCD tells us it is ready for another write
     ld a,20h      ;7    this will set the LCD Y-coordinate (we would call it X)
col:
     out (16),a    ;11
     push af       ;11
     ld bc,4011h   ;10   B=64 (number of pixels tall the screen is), C=11h, corresponding to the LCD Data port
row:
     in a,(16) \ rlca \ jr c,$-3
     outi          ;16   writes the byte at (hl) to the port pointed to by C, increments HL, decrements B, if B = 0, set z flag
     add hl,de     ;11   HL points to the next line, this doesn't affect the z flag, either
     jr nz,row     ;12|7
     in a,(16) \ rlca \ jr c,$-3
     pop af        ;10
     inc a         ;4
;decrement HL by 768
     dec h         ;4
     dec h         ;4
     dec h         ;4
;increment HL, since L won't go past a 256-byte boundary
     inc l         ;4
     cp 2Ch        ;7    ;see if we are finished
     jr nz,col     ;12|7
;=================

     in a,(1)      ;11    ;check the keyboard
     and 40h       ;7     ;check if Clear was pressed
     jr nz,Main    ;12|7
     ret           ;10

Now if you run the code, you will probably notice a significant speed improvement (almost doubling the FPS to around 50), but the cost was 49 more bytes (doubling the size of the code).


Attempt 7
Lowering the number of LCD writes can improve performance, so I will point out that if you set the LCD X-coordinate to 0 with 80h and update the LCD, the X-coordiante will wrap around, back to 0 after 64 increments. Interrupts are disabled, so they won't interfere with LCD settings or keyboard settings, so we can put that part of the code outside th main loop. We only save an imperceptible 18 t-states, but it will be useful later and is a useful trick to keep in mind for your own programs.
Code: [Select]

     di            ;4     ;disable interrupts
     ld a,80h      ;7     ;set the LCD X coordinate (we would call it Y) to 0
     out (16),a    ;11
     ld a,FDh      ;7     ;This is the key group that includes [CLEAR] and [ENTER]
     out (1),a     ;11    ;tell the keyboard to poll that keygroup
;another optimisation-- since A has only 1-bit reset, why not resuse it?
     ld hl,8708h   ;10
       rlca        ;4    *8
       dec l       ;4    *8
       ld (hl),a   ;7    *8
       jr nz,$-3   ;12|7 *8 (12*8-5 in all)
     ld a,5        ;7
     out (10h),a   ;11    ;write A to the LCD Instruction port (port 16)
Main:
     ld bc,$F403       ;10
     ld de,plotSScreen ;10
Loop:
     ld h,87h      ;7
     ld a,r        ;9
     and 7         ;7
     ld l,a        ;4
     ld a,(hl)     ;7
     ld hl,12      ;10
     add hl,de     ;11     ;Now HL points to the byte that we want to read, and DE is HL-12... the row above!
     and (hl)      ;7
     ld (de),a     ;7
     inc de        ;6
     djnz Loop     ;13|8
     dec c         ;4
     jr nz,Loop    ;12|7

;=================
;Update LCD
;This is a traditional-ish fastcopy routine
;Don't worry about an initial delay. 60000 t-states is more than enough.
     ld hl,plotSScreen
     ld de,11      ;10   we will use this later to increment HL through plotSScreen line-by-line write
     ld a,20h      ;7    this will set the LCD Y-coordinate (we would call it X)
col:
     out (16),a    ;11
     push af       ;11
     ld bc,4011h   ;10   B=64 (number of pixels tall the screen is), C=11h, corresponding to the LCD Data port
row:
     in a,(16) \ rlca \ jr c,$-3
     outi          ;16   writes the byte at (hl) to the port pointed to by C, increments HL, decrements B, if B = 0, set z flag
     add hl,de     ;11   HL points to the next line, this doesn't affect the z flag, either
     jr nz,row     ;12|7
     in a,(16) \ rlca \ jr c,$-3
     pop af        ;10
     inc a         ;4
;decrement HL by 768
     dec h         ;4
     dec h         ;4
     dec h         ;4
;increment HL, since L won't go past a 256-byte boundary
     inc l         ;4
     cp 2Ch        ;7    ;see if we are finished
     jr nz,col     ;12|7
;=================

     in a,(1)      ;11    ;check the keyboard
     and 40h       ;7     ;check if Clear was pressed
     jr nz,Main    ;12|7
     ret           ;10

Since we don't need an initial delay on our LCD updating code, we actually got rid of even more cycles and 5 bytes. It is difficult to estimate the speed, now, because of the LCD's volatile timings, but we might have saved around 78 t-states on some models on average.



Attempt 8 : Expert Level
On the advanced levels, we boosted performance to around double the FPS to 50FPS with a flame animation. On the Expert level, we are going to boost it well beyond what a BASIC programmer would dream of and make a fledgling programmer (intermediate) envy the day they achieve this level. The trick with full screen graphics like this, to boost performance, is to cut out about 50000 t-states which may sound absurd-- that is close to half of our current program! This technique is not often used except in the most crucial situations where speed is of utmost importance because it can get very complicated.
  • Interleaving an LCD Update
  • For Graphics, interleaving an LCD update means adding around 10 000 t-states to the part of the code that excludes LCD updating, and removing the LCD updating code. This can be the difference between a 50FPS animation and an 80FPS animation. It essentially involves writing to the LCD at key parts of your normal algorithm in such a way that it provides a substantial delay between writes. The fastest that should be expected from this is around 100FPS to account for particularly slow models.

To imagine the code, take your LCD updating routine and try to put our flame code in the main loop.

Also, just because it makes the code easier, we are finally going to put in code to randomly fill the bottom row of pixels.
Code: [Select]

     di            ;4     ;disable interrupts
     ld a,80h      ;7     ;set the LCD X coordinate (we would call it Y) to 0
     out (16),a    ;11
     ld a,FDh      ;7     ;This is the key group that includes [CLEAR] and [ENTER]
     out (1),a     ;11    ;tell the keyboard to poll that keygroup
;another optimisation-- since A has only 1-bit reset, why not resuse it?
     ld hl,8708h   ;10
       rlca        ;4    *8
       dec l       ;4    *8
       ld (hl),a   ;7    *8
       jr nz,$-3   ;12|7 *8 (12*8-5 in all)
     ld a,5        ;7
     out (10h),a   ;11    ;write A to the LCD Instruction port (port 16)
     in a,(16) \ rlca \ jr c,$-3
     ld de,12
Main:
     ld ix,plotSScreen ;14
     ld a,20h      ;7
col:
     out (16),a    ;11
     push af       ;11
     ld b,3Fh      ;7
row:
     ld h,87h      ;7
     ld a,r        ;9
     and 7         ;7
     ld l,a        ;4
     ld a,(hl)     ;7
     and (ix+12)   ;19
     ld (ix),a     ;7
     add ix,de     ;15
     out (17),a    ;11     ;99 t-states between LCD writes is good for 6MHz
     djnz row      ;13|8
     ld a,r        ;9
     add a,c       ;4
     ld c,a        ;4
     ld a,r        ;9
     adc a,c       ;4
     ld (ix),a     ;19
     add ix,de     ;15
     out (17),a    ;11
     pop af        ;10
     dec ixh       ;8
     dec ixh       ;8
     dec ixh       ;8
     inc ixl       ;8
     inc a         ;4
     cp 2Ch        ;7
     jr nz,col     ;12|7  ;80 t-states *should[+]be good enough

     in a,(1)      ;11    ;check the keyboard
     and 40h       ;7     ;check if Clear was pressed
     jr nz,Main    ;12|7
     ret           ;10

So now the main loop takes precisely 76858 t-states which is only about 10000 t-states more than just the fire routine from before and should run at about 78FPS at 6MHz. As well, the size is now 98 bytes in total.


Modifications
One modification that you can do is to have white flames on a black background. The way this will work is you will use an LUT of bytes with only 1 bit set, and instead of using AND logic, you will use OR logic.

While having flames rise from the bottom of your screen is kind of cool, a burning image is often a more powerful animation. To do this, we need to somehow keep the image from disappearing while it burns and the simple method for doing this is to redraw the image back on the screen with OR logic. If your image is at AppBackUpScreen, you can include this at the beginning of your main loop:
Code: [Select]

     ld de,AppbackUpScreen+768
     ld hl,plotSScreen+768
     ld bc,3
ORLoop:
        dec de
        dec hl
        ld a,(de)
        or (hl)
        ld (hl),a
        djnz ORLoop
        dec c
        jr nz,ORLoop

The cost is 35386 t-states, but HL=plotSScreen by the end of it, so you can use ex de,hldirectly after it to set DE=plotSScreen instead of using ld de,plotSScreen(this is for some of the routines). For the interleaving routine, you can obfuscate your code more by including shadow registers (since interrupts are off):
Code: [Select]

     di            ;4     ;disable interrupts
     ld a,80h      ;7     ;set the LCD X coordinate (we would call it Y) to 0
     out (16),a    ;11
     ld a,FDh      ;7     ;This is the key group that includes [CLEAR] and [ENTER]
     out (1),a     ;11    ;tell the keyboard to poll that keygroup
     ld hl,8708h   ;10
       rlca        ;4    *8
       dec l       ;4    *8
       ld (hl),a   ;7    *8
       jr nz,$-3   ;12|7 *8 (12*8-5 in all)
     ld a,5        ;7
     out (10h),a   ;11    ;write A to the LCD Instruction port (port 16)
     in a,(16) \ rlca \ jr c,$-3
     ld de,12      ;10
     exx           ;4
     ld bc,12      ;10
     exx           ;4
Main:
     exx           ;4
     ld de,plotSScreen ;10
     ld ix,appBackUpScreen ;10
     exx           ;4
     ld a,20h      ;7
col:
     out (16),a    ;11
     push af       ;11
     ld b,3Fh      ;7
row:
     ld h,87h      ;7
     ld a,r        ;9
     and 7         ;7
     ld l,a        ;4
     ld a,(hl)     ;7
     exx           ;4
     ld h,d        ;4
     ld l,e        ;4
     add hl,bc     ;11
     and (hl)      ;7
     or (ix)       ;19
     ld (de),a     ;7
     ex de,hl      ;4
     add ix,bc     ;15
     exx           ;4
     out (17),a    ;11
     djnz row      ;13|8
     exx           ;4
     ld a,(ix)     ;19
     ld h,d        ;4
     ld l,e        ;4
     add hl,bc     ;11
     ld (hl),a     ;4
     add ix,bc     ;15
     ex de,hl      ;4
     dec d         ;4
     out (17),a    ;11
     dec d         ;4
     dec d         ;4
     inc e         ;4
     exx           ;4
     pop af        ;10
     dec ixh       ;8
     dec ixh       ;8
     dec ixh       ;8
     inc ixl       ;8
     inc a         ;4
     cp 2Ch        ;7
     jr nz,col     ;12|7

     in a,(1)      ;11
     and 40h       ;7
     jr nz,Main    ;12|7
     ret           ;10

The main fire code is now 105852 t-states, so you can get about 56FPS at 6MHz. You saved 6392 t-states over ORing the two buffers in a separate loop.



Final Routines
These are the final routiens using actual PRNGs for better flames. Speed is decreased quite a bit in some cases, but the flames look nice and that was the goal. These two get about 40FPS and here is the final code for regular flames:
Code: [Select]

.nolist
plotSScreen = 9340h
seed = 8008h
#define bcall(xx)  rst 28h \ .dw xx
.list
.org 9D93h
.db $BB,6Dh
     di            ;4     ;disable interrupts
     ld hl,8008h   ;10
     ld a,h        ;4     ;set the LCD X coordinate (we would call it Y) to 0
     out (16),a    ;11
     ld a,FDh      ;7     ;This is the key group that includes [CLEAR] and [ENTER]
     out (1),a     ;11    ;tell the keyboard to poll that keygroup
       rlca        ;4    *8
       dec l       ;4    *8
       ld (hl),a   ;7    *8
       jr nz,$-3   ;12|7 *8 (12*8-5 in all)
     ld a,5        ;7
     out (10h),a   ;11    ;write A to the LCD Instruction port (port 16)
     in a,(16) \ rlca \ jr c,$-3
     ld de,12
Main:
     ld ix,plotSScreen ;14
     ld a,20h      ;7
col:
     out (16),a    ;11
     push af       ;11
     ld b,3Fh      ;7
row:
     ld hl,seed
     ld a,r        ;9     ;get a pseudo-random number in A
     add a,(hl)
     ld (seed),a

     and 7         ;7
     ld l,a        ;4
     ld a,(hl)     ;7
     and (ix+12)   ;19
     ld (ix),a     ;7
     add ix,de     ;15
     out (17),a    ;11     ;99 t-states between LCD writes is good for 6MHz
     djnz row      ;13|8

     ld hl,seed
     ld a,r        ;9     ;get a pseudo-random number in A
     add a,(hl)
     ld h,a
     adc a,(hl)
     ld l,a
     adc a,(hl)
     ld (seed),a


     ld (ix),a     ;19
     add ix,de     ;15
     out (17),a    ;11
     pop af        ;10
     dec ixh       ;8
     dec ixh       ;8
     dec ixh       ;8
     inc ixl       ;8
     inc a         ;4
     cp 2Ch        ;7
     jr nz,col     ;12|7  ;80 t-states *should[+]be good enough

     in a,(1)      ;11    ;check the keyboard
     and 40h       ;7     ;check if Clear was pressed
     jr nz,Main    ;12|7
     ret           ;10


And for using the graph screen as a background image:
(note that this code may be able to run at 15MHz, too, without LCD glitches)
Code: [Select]

.nolist
plotSScreen = 9340h
AppBackUpScreen = 9872h
seed = 8008h
.list
.org 9D93h
.db $BB,6Dh

     ld de,AppBackUpScreen
     ld hl,plotSScreen
     ld bc,768
     ldir

     di            ;4     ;disable interrupts
;another optimisation-- since A has only 1-bit reset, why not resuse it?
     ld hl,8008h   ;10
     ld a,h        ;4     ;set the LCD X coordinate (we would call it Y) to 0
     out (16),a    ;11
     ld a,FDh      ;7     ;This is the key group that includes [CLEAR] and [ENTER]
     out (1),a     ;11    ;tell the keyboard to poll that keygroup
       rlca        ;4    *8
       dec l       ;4    *8
       ld (hl),a   ;7    *8
       jr nz,$-3   ;12|7 *8 (12*8-5 in all)
     ld a,5        ;7
     out (10h),a   ;11    ;write A to the LCD Instruction port (port 16)
     in a,(16) \ rlca \ jr c,$-3
     ld de,12      ;10
     exx           ;4
     ld bc,12      ;10
     exx           ;4
Main:
     exx           ;4
     ld de,plotSScreen ;10
     ld ix,appBackUpScreen ;10
     exx           ;4
     ld a,20h      ;7
col:
     out (16),a    ;11
     push af       ;11
     ld b,3Fh      ;7
row:
     ld hl,seed
     ld a,r        ;9     ;get a pseudo-random number in A
     add a,(hl)
     ld h,a
     adc a,(hl)
     ld l,a
     adc a,(hl)
     ld (seed),a
     ld h,80h
     and 7         ;7
     ld l,a        ;4
     ld a,(hl)     ;7
     exx           ;4
     ld h,d        ;4
     ld l,e        ;4
     add hl,bc     ;11
     and (hl)      ;7
     or (ix)       ;19
     ld (de),a     ;7
     ex de,hl      ;4
     add ix,bc     ;15
     exx           ;4
     out (17),a    ;11
     djnz row      ;13|8
     exx           ;4
     ld a,(ix)     ;19
     ld h,d        ;4
     ld l,e        ;4
     add hl,bc     ;11
     ld (hl),a     ;4
     add ix,bc     ;15
     ex de,hl      ;4
     dec d         ;4
     out (17),a    ;11
     dec d         ;4
     dec d         ;4
     inc e         ;4
     exx           ;4
     pop af        ;10
     dec ixh       ;8
     dec ixh       ;8
     dec ixh       ;8
     inc ixl       ;8
     inc a         ;4
     cp 2Ch        ;7
     jr nz,col     ;12|7

     in a,(1)      ;11
     and 40h       ;7
     jr nz,Main    ;12|7
     ret           ;10


Rating: This article has not been rated yet.

Comments



Powered By SMF Articles by CreateAForum.com