Author Topic: ASM Optimized routines  (Read 24055 times)

0 Members and 2 Guests are viewing this topic.

Offline Xeda112358

  • Xombie. I am it.
  • Coder Of Tomorrow
  • LV12 Extreme Poster (Next: 5000)
  • ************
  • Posts: 4429
  • Rating: +696/-6
  • meow :3
    • View Profile
Re: ASM Optimized routines
« Reply #90 on: October 16, 2017, 11:52:22 pm »
I find that doing the sign check before and after is the best method. I did find a few optimizations and noticed that the remainder isn't properly restored to A in the case where the result is negative. Also keep in mind that the routine may fail when C>127.

Code: [Select]
; divide HL by C (HL is signed, C is not)
; output: HL = quotient, A = remainder
divHLbyCs:
    bit 7,h
    push    af
    jr      z,divHLbyCsStart
    xor a \ sub l \ ld l,a
    sbc a,a \ sub h \ ld h,a
divHLbyCsStart:
    xor     a
    ld      b,16
divHLbyCsLoop:
    add     hl,hl
    rla
    cp      c
    jp      c,divHLbyCsNext
    sub     c
    inc     l
divHLbyCsNext:
    djnz    divHLbyCLoop
    ld      b,a
    pop     af
    ld      a,b
    ret     z
    xor a \ sub l \ ld l,a
    sbc a,a \ sub h \ ld h,a
    ld a,c
    sub b
    ret
I changed the output remainder so that it returns the remainder modulo c. So -7/3 returns 2 instead of 1. This means that no bytes nor clock cycles were saved after all :P

Offline JamesV

  • LV5 Advanced (Next: 300)
  • *****
  • Posts: 266
  • Rating: +76/-0
    • View Profile
    • James V's TI-Calculator page
Re: ASM Optimized routines
« Reply #91 on: October 17, 2017, 04:49:18 pm »
Thanks for the optimisations/fixes! I guess I was lucky that I don't actually use the remainder in my program, and also C will only ever be between 1-64 inclusive. I like the negate HL optimisation too, that's neat :)

Offline Xeda112358

  • Xombie. I am it.
  • Coder Of Tomorrow
  • LV12 Extreme Poster (Next: 5000)
  • ************
  • Posts: 4429
  • Rating: +696/-6
  • meow :3
    • View Profile
Re: ASM Optimized routines
« Reply #92 on: July 27, 2018, 07:40:53 pm »
I have two neat routines, pushpop and diRestore. If you want to preserve HL,DE,BC, and AF, then you can just put a call to pushpop at the start of your routine. If you want interrupts to be restored on exiting your routine, you can call diRestore at the top of your code.

They both mess with the stack to insert a return routine on the stack. For example, when your code calls diRestore, then an ret will first return to the code to restore the interrupt status, and then back to the calling routine.

Code: [Select]
pushpop:
;26 bytes, adds 229cc to the calling routine
  ex (sp),hl
  push de
  push bc
  push af
  push hl
  ld hl,pushpopret
  ex (sp),hl
  push hl
  push af
  ld hl,12
  add hl,sp
  ld a,(hl)
  inc hl
  ld h,(hl)
  ld l,a
  pop af
  ret
pushpopret:
  pop af
  pop bc
  pop de
  pop hl
  ret
Code: [Select]
diRestore:
    ex (sp),hl
    push hl
    push af
    ld hl,restoreei
    ld a,r
    jp pe,+_
    dec hl
    dec hl
_:
    di
    inc sp
    inc sp
    inc sp
    inc sp
    ex (sp),hl
    dec sp
    dec sp
    dec sp
    dec sp
    pop af
    ret
restoredi:
    di
    ret
restoreei:
    ei
    ret
Edit: calc84maniac brought up that those 'inc sp' instructions might cause some issues if interrupts fire in between them. The code is modified to disable interrupts first, then mess with the stack.

Offline calc84maniac

  • eZ80 Guru
  • Coder Of Tomorrow
  • LV11 Super Veteran (Next: 3000)
  • ***********
  • Posts: 2895
  • Rating: +466/-17
    • View Profile
    • TI-Boy CE
Re: ASM Optimized routines
« Reply #93 on: August 04, 2018, 06:55:30 pm »
32-Bit Endian Swap (eZ80)

Swaps the byte order of a 32-bit value stored in EUHL. Can be adapted to work with other register combinations as well.

Code: [Select]
endianswap32:
    push hl
    ld h,e
    ld e,l
    inc sp
    push hl
    inc sp
    pop hl
    inc sp
    ret
"Most people ask, 'What does a thing do?' Hackers ask, 'What can I make it do?'" - Pablos Holman

Offline Xeda112358

  • Xombie. I am it.
  • Coder Of Tomorrow
  • LV12 Extreme Poster (Next: 5000)
  • ************
  • Posts: 4429
  • Rating: +696/-6
  • meow :3
    • View Profile
Re: ASM Optimized routines
« Reply #94 on: October 16, 2018, 06:33:14 pm »
Here is a decent arctangent routine that works on [0,1), so you'll have to do the work to extend it outside that range. It uses a small lookup table and linear interpolation.
Spoiler For Range Reduction:
atan(-x)=-atan(x) is useful to extend to negative numbers
atan(x)=pi/2-atan(1/x) is useful to extend to inputs with magnitude >=1
Code: [Select]
atan8:
;computes 256*atan(A/256)->A
;56 bytes including the LUT
;min: 246cc
;max: 271cc
;avg: 258.5cc
  rlca
  rlca
  rlca
  ld d,a
  and 7
  ld hl,atan8LUT
  add a,l
  ld l,a
#if (atan8LUT&255)>248    ;this section not included in size/speed totals
  jr nc,$+3               ;can add three bytes, 12cc to max, 11cc to min, and 11.5cc to avg
  inc h
#endif
  ld c,(hl)
  inc hl
  ld a,(hl)
  sub c
  ld e,0
  ex de,hl
  ld d,l
  ld e,a
  sla h \ jr nc,$+3 \ ld l,e
  add hl,hl \ jr nc,$+3 \ add hl,de
  add hl,hl \ jr nc,$+3 \ add hl,de
  add hl,hl \ jr nc,$+3 \ add hl,de
  add hl,hl \ jr nc,$+3 \ add hl,de
  add hl,hl
  add hl,hl
  add hl,hl
;  add hl,hl    ;used in rounding...
  ld a,h
;  rra          ;but doesn't seem to improve the error
  adc a,c
  ret
atan8LUT:
  .db 0,32,63,92,119,143,165,184,201