• ASM Optimized routines 5 1
Currently:  

Author Topic: ASM Optimized routines  (Read 32892 times)

0 Members and 1 Guest are viewing this topic.

Offline Xeda112358

  • Xombie.
  • Moderator
  • LV12 Extreme Poster (Next: 5000)
  • ************
  • Posts: 4543
  • Rating: +715/-6
  • meow :3
    • View Profile
Re: ASM Optimized routines
« Reply #90 on: October 16, 2017, 11:52:22 pm »
I find that doing the sign check before and after is the best method. I did find a few optimizations and noticed that the remainder isn't properly restored to A in the case where the result is negative. Also keep in mind that the routine may fail when C>127.

Code: [Select]
; divide HL by C (HL is signed, C is not)
; output: HL = quotient, A = remainder
divHLbyCs:
    bit 7,h
    push    af
    jr      z,divHLbyCsStart
    xor a \ sub l \ ld l,a
    sbc a,a \ sub h \ ld h,a
divHLbyCsStart:
    xor     a
    ld      b,16
divHLbyCsLoop:
    add     hl,hl
    rla
    cp      c
    jp      c,divHLbyCsNext
    sub     c
    inc     l
divHLbyCsNext:
    djnz    divHLbyCLoop
    ld      b,a
    pop     af
    ld      a,b
    ret     z
    xor a \ sub l \ ld l,a
    sbc a,a \ sub h \ ld h,a
    ld a,c
    sub b
    ret
I changed the output remainder so that it returns the remainder modulo c. So -7/3 returns 2 instead of 1. This means that no bytes nor clock cycles were saved after all :P

Offline JamesV

  • LV5 Advanced (Next: 300)
  • *****
  • Posts: 266
  • Rating: +76/-0
    • View Profile
    • James V's TI-Calculator page
Re: ASM Optimized routines
« Reply #91 on: October 17, 2017, 04:49:18 pm »
Thanks for the optimisations/fixes! I guess I was lucky that I don't actually use the remainder in my program, and also C will only ever be between 1-64 inclusive. I like the negate HL optimisation too, that's neat :)

Offline Xeda112358

  • Xombie.
  • Moderator
  • LV12 Extreme Poster (Next: 5000)
  • ************
  • Posts: 4543
  • Rating: +715/-6
  • meow :3
    • View Profile
Re: ASM Optimized routines
« Reply #92 on: July 27, 2018, 07:40:53 pm »
I have two neat routines, pushpop and diRestore. If you want to preserve HL,DE,BC, and AF, then you can just put a call to pushpop at the start of your routine. If you want interrupts to be restored on exiting your routine, you can call diRestore at the top of your code.

They both mess with the stack to insert a return routine on the stack. For example, when your code calls diRestore, then an ret will first return to the code to restore the interrupt status, and then back to the calling routine.

Code: [Select]
pushpop:
;26 bytes, adds 229cc to the calling routine
  ex (sp),hl
  push de
  push bc
  push af
  push hl
  ld hl,pushpopret
  ex (sp),hl
  push hl
  push af
  ld hl,12
  add hl,sp
  ld a,(hl)
  inc hl
  ld h,(hl)
  ld l,a
  pop af
  ret
pushpopret:
  pop af
  pop bc
  pop de
  pop hl
  ret
Code: [Select]
diRestore:
    ex (sp),hl
    push hl
    push af
    ld hl,restoreei
    ld a,r
    jp pe,+_
    dec hl
    dec hl
_:
    di
    inc sp
    inc sp
    inc sp
    inc sp
    ex (sp),hl
    dec sp
    dec sp
    dec sp
    dec sp
    pop af
    ret
restoredi:
    di
    ret
restoreei:
    ei
    ret
Edit: calc84maniac brought up that those 'inc sp' instructions might cause some issues if interrupts fire in between them. The code is modified to disable interrupts first, then mess with the stack.

Offline calc84maniac

  • eZ80 Guru
  • Coder Of Tomorrow
  • LV11 Super Veteran (Next: 3000)
  • ***********
  • Posts: 2896
  • Rating: +467/-17
    • View Profile
    • TI-Boy CE
Re: ASM Optimized routines
« Reply #93 on: August 04, 2018, 06:55:30 pm »
32-Bit Endian Swap (eZ80)

Swaps the byte order of a 32-bit value stored in EUHL. Can be adapted to work with other register combinations as well.

Code: [Select]
endianswap32:
    push hl
    ld h,e
    ld e,l
    inc sp
    push hl
    inc sp
    pop hl
    inc sp
    ret
"Most people ask, 'What does a thing do?' Hackers ask, 'What can I make it do?'" - Pablos Holman

Offline Xeda112358

  • Xombie.
  • Moderator
  • LV12 Extreme Poster (Next: 5000)
  • ************
  • Posts: 4543
  • Rating: +715/-6
  • meow :3
    • View Profile
Re: ASM Optimized routines
« Reply #94 on: October 16, 2018, 06:33:14 pm »
Here is a decent arctangent routine that works on [0,1), so you'll have to do the work to extend it outside that range. It uses a small lookup table and linear interpolation.
Spoiler For Range Reduction:
atan(-x)=-atan(x) is useful to extend to negative numbers
atan(x)=pi/2-atan(1/x) is useful to extend to inputs with magnitude >=1
Code: [Select]
atan8:
;computes 256*atan(A/256)->A
;56 bytes including the LUT
;min: 246cc
;max: 271cc
;avg: 258.5cc
  rlca
  rlca
  rlca
  ld d,a
  and 7
  ld hl,atan8LUT
  add a,l
  ld l,a
#if (atan8LUT&255)>248    ;this section not included in size/speed totals
  jr nc,$+3               ;can add three bytes, 12cc to max, 11cc to min, and 11.5cc to avg
  inc h
#endif
  ld c,(hl)
  inc hl
  ld a,(hl)
  sub c
  ld e,0
  ex de,hl
  ld d,l
  ld e,a
  sla h \ jr nc,$+3 \ ld l,e
  add hl,hl \ jr nc,$+3 \ add hl,de
  add hl,hl \ jr nc,$+3 \ add hl,de
  add hl,hl \ jr nc,$+3 \ add hl,de
  add hl,hl \ jr nc,$+3 \ add hl,de
  add hl,hl
  add hl,hl
  add hl,hl
;  add hl,hl    ;used in rounding...
  ld a,h
;  rra          ;but doesn't seem to improve the error
  adc a,c
  ret
atan8LUT:
  .db 0,32,63,92,119,143,165,184,201

Offline Xeda112358

  • Xombie.
  • Moderator
  • LV12 Extreme Poster (Next: 5000)
  • ************
  • Posts: 4543
  • Rating: +715/-6
  • meow :3
    • View Profile
Re: ASM Optimized routines
« Reply #95 on: March 24, 2019, 11:58:29 am »
32-bit square root:
Code: [Select]
sqrtHLIX:
;Input: HLIX
;Output: DE is the sqrt, AHL is the remainder
;speed: 754+{0,1}+6{0,6}+{0,3+{0,18}}+{0,38}+sqrtHL
;min: 1130
;max: 1266
;avg: 1190.5
;167 bytes

  call sqrtHL
  add a,a
  ld e,a
  jr nc,+_
  inc d
_:

  ld a,ixh
  sll e \ rl d
  add a,a \ adc hl,hl
  add a,a \ adc hl,hl
  sbc hl,de
  jr nc,+_
  add hl,de
  dec e
  .db $FE     ;start of `cp *`
_:
  inc e

  sll e \ rl d
  add a,a \ adc hl,hl
  add a,a \ adc hl,hl
  sbc hl,de
  jr nc,+_
  add hl,de
  dec e
  .db $FE     ;start of `cp *`
_:
  inc e

  sll e \ rl d
  add a,a \ adc hl,hl
  add a,a \ adc hl,hl
  sbc hl,de
  jr nc,+_
  add hl,de
  dec e
  .db $FE     ;start of `cp *`
_:
  inc e

  sll e \ rl d
  add a,a \ adc hl,hl
  add a,a \ adc hl,hl
  sbc hl,de
  jr nc,+_
  add hl,de
  dec e
  .db $FE     ;start of `cp *`
_:
  inc e

;Now we have four more iterations
;The first two are no problem
  ld a,ixl
  sll e \ rl d
  add a,a \ adc hl,hl
  add a,a \ adc hl,hl
  sbc hl,de
  jr nc,+_
  add hl,de
  dec e
  .db $FE     ;start of `cp *`
_:
  inc e

  sll e \ rl d
  add a,a \ adc hl,hl
  add a,a \ adc hl,hl
  sbc hl,de
  jr nc,+_
  add hl,de
  dec e
  .db $FE     ;start of `cp *`
_:
  inc e

sqrt32_iter15:
;On the next iteration, HL might temporarily overflow by 1 bit
  sll e \ rl d      ;sla e \ rl d \ inc e
  add a,a
  adc hl,hl
  add a,a
  adc hl,hl       ;This might overflow!
  jr c,sqrt32_iter15_br0
;
  sbc hl,de
  jr nc,+_
  add hl,de
  dec e
  jr sqrt32_iter16
sqrt32_iter15_br0:
  or a
  sbc hl,de
_:
  inc e

;On the next iteration, HL is allowed to overflow, DE could overflow with our current routine, but it needs to be shifted right at the end, anyways
sqrt32_iter16:
  add a,a
  ld b,a        ;either 0x00 or 0x80
  adc hl,hl
  rla
  adc hl,hl
  rla
;AHL - (DE+DE+1)
  sbc hl,de \ sbc a,b
  inc e
  or a
  sbc hl,de \ sbc a,b
  ret p
  add hl,de
  adc a,b
  dec e
  add hl,de
  adc a,b
  ret
This uses this sqrtHL routine:
Code: [Select]
sqrtHL:
;returns A as the sqrt, HL as the remainder, D = 0
;min: 376cc
;max: 416cc
;avg: 393cc
  ld de,$5040
  ld a,h
  sub e
  jr nc,+_
  add a,e
  ld d,$10
_:
  sub d
  jr nc,+_
  add a,d
  .db $01   ;start of ld bc,** which is 10cc to skip the next two bytes.
_:
  set 5,d
  res 4,d
  srl d

  set 2,d
  sub d
  jr nc,+_
  add a,d
  .db $01   ;start of ld bc,** which is 10cc to skip the next two bytes.
_:
  set 3,d
  res 2,d
  srl d

  inc d
  sub d
  jr nc,+_
  add a,d
  dec d   ;this resets the low bit of D, so `srl d` resets carry.
  .db $06   ;start of ld b,* which is 7cc to skip the next byte.
_:
  inc d
  srl d
  ld h,a


  sbc hl,de
  ld a,e
  jr nc,+_
  add hl,de
_:
  ccf
  rra
  srl d
  rra
  ld e,a

  sbc hl,de
  jr nc,+_
  add hl,de
  .db $01   ;start of ld bc,** which is 10cc to skip the next two bytes.
_:
  or %00100000
  xor %00011000
  srl d
  rra
  ld e,a


  sbc hl,de
  jr nc,+_
  add hl,de
  .db $01   ;start of ld bc,** which is 10cc to skip the next two bytes.
_:
  or %00001000
  xor %00000110
  srl d
  rra
  ld e,a
  sbc hl,de
  jr nc,+_
  add hl,de
  srl d
  rra
  ret
_:
  inc a
  srl d
  rra
  ret

sqrtHL was from my work on the float routines, sqrtHLIX was inspired by this thread :)