Author Topic: Assembly Programmers - Help Axe Optimize!  (Read 46252 times)

0 Members and 1 Guest are viewing this topic.

Offline Quigibo

  • The Executioner
  • CoT Emeritus
  • LV11 Super Veteran (Next: 3000)
  • *
  • Posts: 2031
  • Rating: +1075/-24
  • I wish real life had a "Save" and "Load" button...
    • View Profile
Re: Assembly Programmers - Help Axe Optimize!
« Reply #225 on: July 12, 2011, 02:02:39 am »
Awesome wow!  Yeah, forward djnz is about as rare as cpir.  Although I think calc84maniac's original 4 level routine used them as well but for a different purpose.

Also on the same subject, although you'll be the only one who knows what I'm talking about, all 12 DispGraph forms work perfectly now.
___Axe_Parser___
Today the calculator, tomorrow the world!

Offline Runer112

  • Project Author
  • LV11 Super Veteran (Next: 3000)
  • ***********
  • Posts: 2289
  • Rating: +639/-31
    • View Profile
Re: Assembly Programmers - Help Axe Optimize!
« Reply #226 on: July 12, 2011, 08:48:57 am »
Just checking, have you actually tested the routines out? Because I didn't actually test those routines I gave you, I just modeled them after some routines I knew worked and hoped these would still work as well. :P

Offline Quigibo

  • The Executioner
  • CoT Emeritus
  • LV11 Super Veteran (Next: 3000)
  • *
  • Posts: 2031
  • Rating: +1075/-24
  • I wish real life had a "Save" and "Load" button...
    • View Profile
Re: Assembly Programmers - Help Axe Optimize!
« Reply #227 on: July 12, 2011, 03:33:09 pm »
Yeah I tested everything.  One of them had a problem that I fixed with the buffer ordering being switched though.
___Axe_Parser___
Today the calculator, tomorrow the world!

Offline calc84maniac

  • eZ80 Guru
  • Coder Of Tomorrow
  • LV11 Super Veteran (Next: 3000)
  • ***********
  • Posts: 2896
  • Rating: +467/-17
    • View Profile
    • TI-Boy CE
Re: Assembly Programmers - Help Axe Optimize!
« Reply #228 on: July 14, 2011, 11:31:27 pm »
Here's a peephole optimization suggestion: Keep track of whether the value in HL is a constant or not, and if so, what constant. For example, I have some code:
Code: [Select]
If condition
do stuff
Else
16->W
End
Obviously, after the Else, HL has to be 0. Thus the 16 can be reduced to a ld l,16 instead of ld hl,16. It might be possible to auto-optimize stuff like 1->A:2->B into 1->A+1->B, but you could always leave that to the user like usual.

Also, I found it a bit annoying that when I did something like If E<(96*256), the part in the parentheses wasn't reduced to a constant before doing the less-than operation. Could the look-ahead parsing be able to detect constants in parentheses?
"Most people ask, 'What does a thing do?' Hackers ask, 'What can I make it do?'" - Pablos Holman

Offline Runer112

  • Project Author
  • LV11 Super Veteran (Next: 3000)
  • ***********
  • Posts: 2289
  • Rating: +639/-31
    • View Profile
Re: Assembly Programmers - Help Axe Optimize!
« Reply #229 on: July 14, 2011, 11:34:17 pm »
Also, I found it a bit annoying that when I did something like If E<(96*256), the part in the parentheses wasn't reduced to a constant before doing the less-than operation. Could the look-ahead parsing be able to detect constants in parentheses?

This times a million.

Offline ztrumpet

  • The Rarely Active One
  • CoT Emeritus
  • LV13 Extreme Addict (Next: 9001)
  • *
  • Posts: 5712
  • Rating: +364/-4
  • If you see this, send me a PM. Just for fun.
    • View Profile
Re: Assembly Programmers - Help Axe Optimize!
« Reply #230 on: July 14, 2011, 11:35:45 pm »
Also, I found it a bit annoying that when I did something like If E<(96*256), the part in the parentheses wasn't reduced to a constant before doing the less-than operation. Could the look-ahead parsing be able to detect constants in parentheses?

This times a million.
This times a million and five.

Seriously, I thought Axe did this already.  Apparently not... so, please? :D
« Last Edit: July 14, 2011, 11:35:53 pm by ztrumpet »

Offline Runer112

  • Project Author
  • LV11 Super Veteran (Next: 3000)
  • ***********
  • Posts: 2289
  • Rating: +639/-31
    • View Profile
Re: Assembly Programmers - Help Axe Optimize!
« Reply #231 on: July 25, 2011, 11:16:41 am »
Quigibo, you read my mind. I was about to make a post with code for commands that deal with archived variables to work with variables in RAM too, but you added that in Axe 1.0.2 before I could finish! However, I'll make a post anyways because my p_GetArc routine is smaller. :P I also have a few other things.




p_GetArc: 7 bytes smaller.

Code: (Old code: 76 bytes) [Select]
p_GetArc:
.db __GetArcEnd-1-$
push de
MOV9TOOP1()
B_CALL(_ChkFindSym)
jr c,__GetArcFail
dec b
inc b
jr z,__GetArcRam
B_CALL(_IsFixedName)
ld hl,9
jr z,__GetArcName
__GetArcStatic:
ld l,12
and %00011111
jr z,__GetArcDone
cp l
jr z,__GetArcDone
ld l,14
jr __GetArcDone
__GetArcName:
add hl,de
bit 7,h
jr z,$+7
res 7,h
set 6,h
inc b
B_CALL(_LoadDEIndPaged)
ld d,0
inc e
inc e
__GetArcDone:
add hl,de
ex de,hl
__GetArcStore:
pop hl
ld (hl),e
inc hl
ld (hl),d
inc hl
ld (hl),b
ex de,hl
ret
__GetArcRam:
and %00011111
jr z,__GetArcStore
cp CplxObj
jr z,__GetArcStore
inc de
inc de
jr __GetArcStore
__GetArcFail:
ld hl,0
pop de
ret
__GetArcEnd:
       
   
Code: (New code: 69 bytes) [Select]
p_GetArc:
.db __GetArcEnd-1-$
push de
MOV9TOOP1()
B_CALL(_ChkFindSym)
jr c,__GetArcFail
dec b
inc b
jr z,__GetArcRam
B_CALL(_IsFixedName)
ld hl,9
jr z,__GetArcName
ld l,12
__GetArcChkFloat:
and %00011111
jr z,__GetArcDone
cp CplxObj
jr z,__GetArcDone
inc l
inc l
jr __GetArcDone
__GetArcName:
add hl,de
bit 7,h
jr z,$+7
res 7,h
set 6,h
inc b
B_CALL(_LoadDEIndPaged)
ld d,0
inc e
inc e
__GetArcDone:
add hl,de
ex de,hl
pop hl
ld (hl),e
inc hl
ld (hl),d
inc hl
ld (hl),b
ex de,hl
ret
__GetArcRam:
ld h,b
ld l,b
jr __GetArcChkFloat
__GetArcFail:
ld hl,0
pop de
ret
__GetArcEnd:
       




p_ReadArc: Bumping an old request for larger but drastically faster archive reading routines. The routines would need to modified slightly to allow for reading from RAM as well, but that should be no problem. I would understand if you didn't want to add the app version, but the program version is immensely better in my opinion.

And on the topic of stuff that involves port 6, I think it would be nice if the archive byte reading routine avoided using a B_CALL for a massive speed boost, especially for code compiled as programs:

p_ReadArc: 18 bytes (2x) larger, but ~1400 cycles (!!!10x!!!) faster

Code: (36 bytes, ~142 cycles) [Select]
p_ReadArc:
.db __ReadArcEnd-1-$
ld c,a
in a,(6)
ld b,a
ld a,h
set 6,h
res 7,h
rlca
rlca
dec a
and %00000011
add a,c
out (6),a
ld c,(hl)
inc hl
bit 7,h
jr z,__ReadArcNoBoundary
set 6,h
res 7,h
inc a
out (6),a
__ReadArcNoBoundary:
ld l,(hl)
ld h,c
ld a,b
out (6),a
ret
__ReadArcEnd:

p_ReadArcApp: 36 bytes (3x) larger, but ~1050 cycles (4x) faster

Code: (54 bytes, ~396 cycles) [Select]
p_ReadArcApp:
.db __ReadArcAppEnd-1-$
push hl
ld hl,$0000
ld de,ramCode
ld bc,__ReadArcAppRamCodeEnd-__ReadArcAppRamCode
ldir
pop hl
ld e,a
ld c,6
in b,(c)
ld a,h
set 6,h
res 7,h
rlca
rlca
dec a
and %00000011
add a,e
call ramCode
ld e,d
inc hl
bit 7,h
jr z,__ReadArcAppNoBoundary
set 6,h
res 7,h
inc a
__ReadArcAppNoBoundary:
call ramCode
ex de,hl
ret
__ReadArcAppEnd:
.db rp_Ans,__ReadArcAppEnd-p_ReadArcApp-3

__ReadArcAppRamCode:
out (6),a
ld d,(hl)
out (c),b
ret
__ReadArcAppRamCodeEnd:




p_CopyArc: Modified to allow for sources in RAM.

Code: (Old code: 22 bytes) [Select]
p_CopyArc:
.db __CopyArcEnd-1-$
pop ix
pop de
ex (sp),hl
ld b,a
ld a,h
rlca
rlca
dec a
and %00000011
add a,b
set 6,h
res 7,h
pop bc
B_CALL(_FlashToRAM)
jp (ix)
__CopyArcEnd:
       
   
Code: (New code: 28 bytes) [Select]
p_CopyArc:
.db __CopyArcEnd-1-$
ex (sp),hl
pop bc
pop de
ex (sp),hl
or a
jr z,__CopyArcRam
push bc
ld b,a
ld a,h
rlca
rlca
dec a
and %00000011
add a,b
set 6,h
res 7,h
pop bc
B_CALL(_FlashToRAM)
ret
__CopyArcRam:
ldir
ret
__CopyArcEnd:
       




Also, I'm not sure why I just realized this now, but why don't the 8-bit logic operations on variables just load the variable into a instead of de to save 2 bytes?



« Last Edit: July 25, 2011, 02:20:48 pm by Runer112 »

Offline Quigibo

  • The Executioner
  • CoT Emeritus
  • LV11 Super Veteran (Next: 3000)
  • *
  • Posts: 2031
  • Rating: +1075/-24
  • I wish real life had a "Save" and "Load" button...
    • View Profile
Re: Assembly Programmers - Help Axe Optimize!
« Reply #232 on: July 26, 2011, 06:04:13 am »
Hmm, I'm still not sure if the extra speed is worth the size increase.  I guess a new argument for the speed is to make file reads more consistent (a program using a file from archive might run slower than one reading from ram). But I will put this up in the poll since I'd like to know how may people this would benefit or hurt.

The 8-bit logical operators I don't do that optimization because then I'd need duplicate commands and have even more special casing.  This is something that can easily be peephole optimized in the future however so it might become a non-issue.

I was trying to recursively parse constants in parenthesis in the last update, but it was extremely complicated so I gave up.  I will have to modify the core number reading system to get it to work (which I was planning to do eventually anyway) so I will get to it then.
___Axe_Parser___
Today the calculator, tomorrow the world!

Offline Quigibo

  • The Executioner
  • CoT Emeritus
  • LV11 Super Veteran (Next: 3000)
  • *
  • Posts: 2031
  • Rating: +1075/-24
  • I wish real life had a "Save" and "Load" button...
    • View Profile
Re: Assembly Programmers - Help Axe Optimize!
« Reply #233 on: July 27, 2011, 02:47:08 pm »
Wait.... Runer!  What were you thinking!  The App code for file reading can be the same as the program code, but just use port 7 instead of port 6 and set the high bits of hl for the $8000-$BFFF range.  That's what the Axe app does. :)

EDIT: Also, another thing that the routines would need is to disable interrupts and then restore them afterwards... which I can use the "Safety" code for, but its going to be slower and even larger.
« Last Edit: July 27, 2011, 02:49:59 pm by Quigibo »
___Axe_Parser___
Today the calculator, tomorrow the world!

Offline Runer112

  • Project Author
  • LV11 Super Veteran (Next: 3000)
  • ***********
  • Posts: 2289
  • Rating: +639/-31
    • View Profile
Re: Assembly Programmers - Help Axe Optimize!
« Reply #234 on: July 27, 2011, 10:47:25 pm »
The app version might need to disable interrupts, but why would the program version need to? Both Axe's and the OS's interrupt handlers back up the page in the $4000-$7FFF bank and restore it upon returning.

Offline Xeda112358

  • Xombie.
  • Moderator
  • LV12 Extreme Poster (Next: 5000)
  • ************
  • Posts: 4543
  • Rating: +715/-6
  • meow :3
    • View Profile
Re: Assembly Programmers - Help Axe Optimize!
« Reply #235 on: August 11, 2011, 01:10:51 pm »
I was toying around with some math routines while I was away and I was curious about the square root algorithms. Are the designed to return the square root rounded down, up, or just rounded? If it is rounded down and you want to round it to the nearest integer answer, here is a code I made a while ago (it isn't even close to what Axe needs, but it should only be taken as an example):
Code: [Select]
;===============================================================
sqrtE:
;===============================================================
;Input:
;     E is the value to find the square root of
;Outputs:
;     A is E-D^2
;     B is 0
;     D is the rounded result
;     E is not changed
;     HL is not changed
;Destroys:
;     C
;
        xor a               ;1      4         4
        ld d,a              ;1      4         4
        ld c,a              ;1      4         4
        ld b,4              ;2      7         7
sqrtELoop:
        rlc d               ;2      8        32
        ld c,d              ;1      4        16
        scf                 ;1      4        16
        rl c                ;2      8        32

        rlc e               ;2      8        32
        rla                 ;1      4        16
        rlc e               ;2      8        32
        rla                 ;1      4        16

        cp c                ;1      4        16
        jr c,$+4            ;4    12|15      48+3x
          inc d             ;--    --        --
          sub c             ;--    --        --
        djnz sqrtELoop      ;2    13|8       47
        cp d                ;1      4         4
        jr c,$+3            ;3    12|11     12|11
          inc d             ;--    --        --
        ret                 ;1     10        10
;===============================================================
;Size  : 29 bytes
;Speed : 347+3x cycles plus 1 if rounded down
;   x is the number of set bits in the result.
;===============================================================

The only reason that I mention this is that I know a lot of graphical algorithms would have better results if the square root was returned in rounded form as opposed to just rounded up or down.

Sorry if this was already covered and I missed it :/
Spoiler For Spoiler:
EDIT:
Wow, I just did something I didn't think was even possible. I found a good use for a forward djnz. *.*
>.> Hehe, I use forward djnz in many-- if not most-- of my programs... It is one of the most useful tricks I use and is kind of my signature touch :) I use it to save time and memory a lot, especially in instances like this:
Code: [Select]
     ld b,a
     or a \ jr nz,Next1
       ;code
Next1:
     djnz Next2
       ;code
Next2:
     djnz Next3
       ;code
;...et cetera
* Xeda112358 loves it

Offline Runer112

  • Project Author
  • LV11 Super Veteran (Next: 3000)
  • ***********
  • Posts: 2289
  • Rating: +639/-31
    • View Profile
Re: Assembly Programmers - Help Axe Optimize!
« Reply #236 on: August 11, 2011, 04:29:36 pm »
All of Axe math simply truncates, so I think the current square root algorithm is pretty good. Anyways you have to remember that Axe uses 16-bit math and that's an 8-bit square root function. :P

Offline Xeda112358

  • Xombie.
  • Moderator
  • LV12 Extreme Poster (Next: 5000)
  • ************
  • Posts: 4543
  • Rating: +715/-6
  • meow :3
    • View Profile
Re: Assembly Programmers - Help Axe Optimize!
« Reply #237 on: August 16, 2011, 07:39:41 pm »
Yeah, I know, but I just wanted to give an example. It is really only the last few bytes that are important, though, and I wanted to give a simple, easy to follow example. Also, great job with the optimisations :D I wish I could help more, but most of the codes are a bit beyond my optimisation abilities.

Offline Runer112

  • Project Author
  • LV11 Super Veteran (Next: 3000)
  • ***********
  • Posts: 2289
  • Rating: +639/-31
    • View Profile
Re: Assembly Programmers - Help Axe Optimize!
« Reply #238 on: August 30, 2011, 02:45:57 am »
I think I just performed the most ridiculous, impressive optimization I've ever performed on an Axe command. 27 bytes optimized down to 60% of its size: 17 bytes! :w00t:

Code: (Old code: 27 bytes, ~220.5 cycles) [Select]
p_DKeyVar:
.db __DKeyVarEnd-1-$
dec l
ld a,l
rra
rra
rra
and %00000111
inc a
ld b,a
ld a,%01111111
rlca
djnz $-1
ld h,a
ld a,l
and %00000111
inc a
ld b,a
ld a,%10000000
rlca
djnz $-1
ld l,a
ret
__DKeyVarEnd:
       
   
Code: (New code: 17 bytes, ~225 cycles) [Select]
p_DKeyVar:
.db __DKeyVarEnd-1-$
ld a,l
ld hl,%0111111111110111
rlc h
adc a,l
jr c,$-3
ld l,%0000001
rrc l
inc a
jr nz,$-3
ret
__DKeyVarEnd:
       
« Last Edit: August 30, 2011, 02:53:45 am by Runer112 »

Offline Quigibo

  • The Executioner
  • CoT Emeritus
  • LV11 Super Veteran (Next: 3000)
  • *
  • Posts: 2031
  • Rating: +1075/-24
  • I wish real life had a "Save" and "Load" button...
    • View Profile
Re: Assembly Programmers - Help Axe Optimize!
« Reply #239 on: August 30, 2011, 03:02:02 am »
O_O  I don't even understand what's going on here.  That's quite impressive!

EDIT: Also, a really obvious optimization I just noticed is that the return should be replaced by a jump to the direct key command so it doesn't have to return and re-call it.  :P
« Last Edit: August 30, 2011, 03:06:23 am by Quigibo »
___Axe_Parser___
Today the calculator, tomorrow the world!