4*4 Sprite Routine Optimization/Help

4*4 Sprite Routine Optimization/Help
« on: May 04, 2011, 01:42:35 am »
Hi, I just made this 4*4 sprite routine aligned to a 16*24 (half pixel) grid.
Can anyone help with optimizations or suggestions?

Code: [Select]
;XOR 4*4
;ix = sprite address
;c = X Position
;b = Y Position
;a = last line of sprite displayed
;b = 0
;c = X position (undestroyed ^^)
;de = 12
;hl = buffer location of sprite + 12 (next line)
;ix = sprite address + 2 (next sprite?)
;This was made to be a standalone program.
#ifdef bcall(xxxx)
#include ""
.org $9d93
.db t2ByteTok, tAsmCmp
;Just for testing purposes:
        ld ix, x4sprite ;ix holds sprite address
        ld c, 1 ;c holds X location
        ld b, 1 ;b holds Y location
        ld a, b       
        add a, a ;\A * 3
        add a, b ;/
        add a, a ;\A * 4
        add a, a ;/
        ld h, 0 ;A*2 is too large to fit into 8 bits, so we must use a 16 bit register.
        ld l, a       
        add hl, hl ;\A * 4
        add hl, hl ;/Align Y pos to the X4 grid.
        ld d, 0
        ld e, c       
        srl e ;Now, we are only concerned on the X8 alignment.  We will fine tune it to X4 later.
        add hl, de ;add X pos to Y pos

        ld de, plotsscreen ;Finally, we add the Buffer location.
        add hl, de
        ld b, 2 ;B*2 Pixels tall sprite       
        ld e, 12 ;Width of Screen
        ld a, (ix) ;loads sprite at ix
        and %11110000 ;use left side of byte
        bit 0, c ;If X pos is an odd number, we have to shift to the right.
        jr z, _FourDispN1 ;if D is aligned to the byte, branch.  Else, shift sprite by 4.
        srl a ;shift sprite to the right 4 pixels.
        srl a
        srl a
        srl a
        ld d, (hl)
        xor d ;xor sprite onto current buffer contents
        ld (hl), a
        ld d, 0 ;DE = 12
        add hl, de ;shift pen down one row
        ld a, (ix)
        and %00001111 ;It is on the right side of the byte.
        bit 0, c
        jr nz, _FourDispN2 ;if d is not aligned to the byte, branch.
        add a, a ;shift sprite to the left 4 pixels
        add a, a
        add a, a
        add a, a
        ld d, (hl)
        xor d ;xor sprite onto current buffer contents
        ld (hl), a
        inc ix
        ld d, 0
        add hl, de
        djnz _FourDispByte
;More pseudo random stuff just for testing
.db %10010110
.db %01101001

Also, Zeda's routines can be found here:

Runer112's routine:

Thanks.  :)
Re: 4*4 Sprite Routine Optimization/Help
« Reply #1 on: May 04, 2011, 10:46:34 pm »
Hmm... It's been 6 hours, right?  (This can also probably count as an update too.)

Here is the current routine:
EDIT: See above.  ;)
Re: 4*4 Sprite Routine Optimization/Help
« Reply #2 on: May 05, 2011, 08:44:03 am »
Okay, this is a code I put together... I was going to use the code from BatLib for displaying 4x6 sprites (for fonts), because it used nibble data (so that each 4x6 sprite used 3 bytes), but I messed up when I was trying to optimise it XD Anywho, here is this... I don't even know if it is more or less optimised than your routine, I just wanted to make one, too XD
Code: [Select]
;     C is the y coordinate (0 to 15)
;     B is the column to draw to (0 to 23)
;     DE points to the font data
;     A is 1
;     BC is 12
;     DE is incremented by 4 (pointing to next sprite?)
;     HL is incremented by 30h
    ld a,b           ;78
    ld b,0           ;0600
    ld h,b           ;60
    ld l,c           ;69
    add hl,hl        ;29
    add hl,bc        ;09
    add hl,hl        ;29
    add hl,hl        ;29
    add hl,hl        ;29
    add hl,hl        ;29
    rra              ;1F
    ld c,a           ;4F
    push af          ;F5
    add hl,bc        ;09
    ld bc,plotSScreen  ;014093
    add hl,bc        ;09
    ld b,4           ;0604
    pop af           ;F1
    ld a,$F0         ;3EF0
    jr c,RightMask   ;3801
      cpl            ;2F
    ld (asm_flags1),a  ;32118A
    ld a,(asm_flags1)  ;3A118A
    ld c,(hl)        ;4E
    and c            ;A1
    ld (hl),a        ;77
    ld a,(de)        ;1A
    and $F0          ;E6F0
    bit 0,(iy+asm_flags1)  ;FDCB2146
    jr nz,NoShift    ;2004
      rlca           ;07
      rlca           ;07
      rlca           ;07
      rlca           ;07
    or (hl)          ;B6
    ld (hl),a        ;77
    ld a,b           ;78
    ld bc,12         ;010C00
    add hl,bc        ;09
    ld b,a           ;47
    inc de           ;13
    djnz DrawTheSprite  ;10E2
    ret              ;C9

Re: 4*4 Sprite Routine Optimization/Help
« Reply #3 on: May 05, 2011, 08:24:23 pm »
Thanks!  I will try this later.  :D

EDIT: I found out that AND can take imm8 values!  Saved myself a couple of bytes.  :)
Re: 4*4 Sprite Routine Optimization/Help
« Reply #4 on: May 05, 2011, 08:50:31 pm »
Nice :) There are a lot of fun tricks like that >.> As a note about the "cpl" instruction, that inverts the bits in the "a" register. So if it was 01100100, cpl would change a to 10011011 :)

Re: 4*4 Sprite Routine Optimization/Help
« Reply #5 on: May 05, 2011, 09:07:07 pm »
Zeda, I moved your routine to its own dedicated page.  ;)

I also updated my first post with your link and the updated code.
Re: 4*4 Sprite Routine Optimization/Help
« Reply #6 on: May 06, 2011, 01:17:34 am »
Sorry it took me so long to get together a routine, but I was busy with some other stuff. My routine is in the style of Xeda's routine. It reads from a 4-byte sprite and uses overwrite logic, and for the most part mimics her routine. Except it's improved. ;D

  • 7 bytes smaller
  • ~160 cycles faster
  • Doesn't use any RAM

It also allows for drawing to a buffer besides plotSScreen, although I didn't mention this because hers could easily be modified for this as well. You can also view the routine at Xeda or anyone else, feel free to grab this routine for your own projects. :)

Code: [Select]
;———————————————————54 bytes———————————————————;
;ENTRY POINT #1: PutSprite4x4
;—> Draws a 4x4 sprite to plotSScreen, aligned to a 24x16 grid
;INPUTS:    a=row (0-15)    c=column (0-23)    de=sprite
;OUTPUTS:   a=0    bc=12    de=sprite+4    hl=((row+4)*48)+(column/2)+plotSScreen
;FLAGS:     S=0  Z=1  H=0  V=0  N=1  C=column mod 2
;—————————————————~620 cycles——————————————————;
;ENTRY POINT #2: PutSprite4x4_AnyBuf
;—> Draws a 4x4 sprite to the specified buffer, aligned to a 24x16 grid
;INPUTS:    a=row (0-15)    c=column (0-23)    de=sprite    hl=buffer
;OUTPUTS:   a=0    bc=12    de=sprite+4    hl=((row+4)*48)+(column/2)+buffer
;FLAGS:     S=0  Z=1  H=0  V=0  N=1  C=column mod 2
;—————————————————~610 cycles——————————————————;
    ld hl,plotSScreen
    push hl
    ld b,a
    add a,a
    add a,b
    add a,a
    add a,a
    ld h,0
    ld l,a
    ld b,h
    add hl,hl
    add hl,hl
    rr c
    add hl,bc
    pop bc
    add hl,bc
    ld bc,12
    ld a,4
    push af
    ld a,(hl)
    jr c,__PutSprite4x4_Loop_AlignLeft
    and %00001111
    ld (hl),a
    ld a,(de)
    jr __PutSprite4x4_Loop_AlignEnd
    and %11110000
    ld (hl),a
    ld a,(de)
    or (hl)
    ld (hl),a
    add hl,bc
    inc de
    pop af
    dec a
    jr nz,__PutSprite4x4_Loop

EDIT: Found a way to save 12 more cycles.
Re: 4*4 Sprite Routine Optimization/Help
« Reply #7 on: May 06, 2011, 01:28:31 am »
Ah, looks well optimized!  :)

Xeda or anyone else, feel free to grab this routine for your own projects. :)

Thanks, I will. :)
Re: 4*4 Sprite Routine Optimization/Help
« Reply #8 on: May 06, 2011, 11:06:57 am »
Nice :D I am still working on optimising my routines for BatLib, but I have the one that uses 3 bytes for 4x6 sprites. If I find the time to convert it, I will post it, but I don't think I will have the time :/