Author Topic: Assembly Programmers - Help Axe Optimize! (Read 145160 times)

Runer112 · « **Reply #75 on:** September 22, 2010, 06:42:19 pm »

And how would the parser decide to do that

Quigibo · « **Reply #76 on:** September 22, 2010, 06:48:29 pm »

The problem with conditional "short-circuit evaluation" is that it has to do a lot of non-linear "look-ahead" parsing to determine if it's okay to get out of the statement early or not. You might for example have If A≠5 and sub(EQL,B,C) which might need to evaluate the second expression even if the first one is false. The idea definitely sounds good though, but it seems like it would be really complicated for the compiler to tell whether or not it can actually use that optimization and be completely compatible with previous versions. And even when it can, I would have to write completely new block code and assembly templates for those conditionals.

Runer112 · « **Reply #77 on:** October 20, 2010, 08:56:30 am »

Oops, necropost, oh well

I don't know if this approach was purposely left out, as it's 15 bytes larger than the current routine and sometimes slower. I'm referring to the square root routine. Whereas the current routine (14 bytes) takes 37n+38 T-states (linear time), where n is the result+1 (1-256), the following routine (29 bytes) takes 5n+800 T-states (near constant time), where n is the number of set bits in the result (0-8). The existing routine is faster for values that would yield results of 0-19, but this routine would be faster for values that would yield results of 20-255, which is a much broader range of the 8-bit spectrum. Also, it would be much more reliable to run at a near constant speed in programs which rely on that to run smoothly themselves. The existing routine would take only a few hundred T-states for low inputs, but would take up to OVER NINE THOUSAND T-states to calculate the square roots for the highest inputs. So it's up to you if this is something you want to use.

Code: [Select]

p_Sqrt:
 .db __SqrtEnd-1-$
 ld a,l
 ld l,h
 ld de,$0040
 ld h,d
 ld b,8
 or a
__SqrtLoop:
 sbc hl,de
 jr nc,__SqrtSkip
 add hl,de
__SqrtSkip:
 ccf
 rl d
 rla
 adc hl,hl
 rla
 adc hl,hl
 djnz __SqrtLoop
 ld h,0
 ld l,d
 ret
__SqrtEnd:

Runer112 · « **Reply #78 on:** October 30, 2010, 08:11:33 pm »

I think it's been long enough that I can safely double post

Bit routine optimizations! Please tell me if any of these wouldn't work correctly, as I wrote them myself and I'm not a terribly experienced assembly programmer so that's a definite possibility.

Code: (Current code) [Select]

p_GetBit0:
 .db 5  ;5 bytes, 36 T-states
 add hl,hl
 ccf
 sbc hl,hl
 inc hl


p_GetBit1:
 .db 6  ;6 bytes, 47 T-states
 add hl,hl
 add hl,hl
 ccf
 sbc hl,hl
 inc hl


p_GetBit2:
 .db 7  ;7 bytes, 58 T-states
 add hl,hl
 add hl,hl
 add hl,hl
 ccf
 sbc hl,hl
 inc hl


p_GetBit6:
 .db 7  ;7 bytes, 37 T-states
 ld a,h
 rra
 rra
 ccf
 sbc hl,hl
 inc hl

p_GetBit7:
 .db 6  ;6 bytes, 33 T-states
 rr h
 ccf
 sbc hl,hl
 inc hl

p_GetBit8:
 .db 6  ;6 bytes, 33 T-states
 rl l
 ccf
 sbc hl,hl
 inc hl


p_GetBit9:
 .db 7  ;7 bytes, 37 T-states
 ld a,l
 rla
 rla
 ccf
 sbc hl,hl
 inc hl

p_GetBit10:
 .db 8  ;8 bytes, 30/29 T-states
 bit 5,l
 ld hl,0
 jr z,$+3
 inc l




p_GetBit14:
 .db 7  ;7 bytes, 37 T-states
 ld a,l
 rra
 rra
 ccf
 sbc hl,hl
 inc hl

p_GetBit15:
 .db 6  ;6 bytes, 33 T-states
 rr l
 ccf
 sbc hl,hl
 inc hl

Code: (Optimized code) [Select]

p_GetBit0:
 .db 5  ;5 bytes, 27 T-states
 xor a
 add hl,hl
 ld h,a
 rla
 ld l,a

p_GetBit1:
 .db 6  ;6 bytes, 38 T-states
 xor a
 add hl,hl
 add hl,hl
 ld h,a
 rla
 ld l,a

p_GetBit2:
 .db 7  ;7 bytes, 49 T-states
 xor a
 add hl,hl
 add hl,hl
 add hl,hl
 ld h,a
 rla
 ld l,a

p_GetBit6:
 .db 7  ;7 bytes, 26 T-states
 ld a,%00000010
 and h
 rrca
 ld h,0
 ld l,a


p_GetBit7:
 .db 6  ;6 bytes, 22 T-states
 ld a,%00000001
 and h
 ld h,0
 ld l,a

p_GetBit8:
 .db 5  ;5 bytes, 27 T-states
 xor a
 ld h,a
 add hl,hl
 ld l,h
 ld h,a

p_GetBit9:
 .db 6  ;6 bytes, 38 T-states
 xor a
 add hl,hl
 ld h,a
 add hl,hl
 ld l,h
 ld h,a

p_GetBit10:
 .db 7  ;7 bytes, 49 T-states
 xor a
 add hl,hl
 add hl,hl
 ld h,a
 add hl,hl
 ld l,h
 ld h,a

p_GetBit14:
 .db 7  ;7 bytes, 26 T-states
 ld a,%00000010
 and l
 rrca
 ld h,0
 ld l,a


p_GetBit15:
 .db 5  ;5 bytes, 20 T-states
 xor a
 ld h,a
 inc a
 and l
 ld l,a

Other optimizations:

The signed less than zero comparison (p_SLT0) can be optimized to the optimized p_GetBit0 above.

Runer112 · « **Reply #79 on:** November 10, 2010, 02:38:37 am »

Signed greater than comparison:

Code: (Current code) [Select]

p_SIntGt:
 .db 13  ;13 bytes, 48 T-states
 ex de,hl
 xor a
 ld b,h
 sbc hl,de
 ld h,a
 rra
 xor b
 xor d
 rlca
 and 1
 ld l,a

Code: (Optimized code) [Select]

p_SIntGt:
 .db 12  ;12 bytes, 67 T-states
 ld bc,$8000
 add hl,bc
 ex de,hl
 add hl,bc
 xor a
 sbc hl,de
 ld h,a
 rla
 ld l,a

You getting all this Quigibo?

DJ Omnimaga · « **Reply #80 on:** November 10, 2010, 02:40:58 am »

I think he is too busy, which might explain why he doesn't respond.

* DJ Omnimaga hopes his school schedule doesn't get so drastic that he gets forced to quit the community for good... I am not too worried about the future of Axe programming, though. I was worried that if her became less active, there would be less activity in his sub-forum since he replied to a lot of help topics, but then activity still continued. I guess a huge thank to you and a bunch of other people is in order. Sadly, having quit programming a while ago I did not really participate much, though X.x

Quigibo · « **Reply #81 on:** November 10, 2010, 07:55:16 pm »

Yeah, I'm still reading all of this, even though I'm less active, I still visit just about every day

I've even been able to do a little more progress with Axe even with my busy schedule.

Runer112, are you sure that comparison is correct? It seems like all it does is just change the high order bit before doing the subtraction. It needs to check if the parity changed in that bit before and after the subtraction. I actually already have plans to optimize this since I will be able to use the parity/overflow flag once I get relative jump replacement working with the axioms (so I can carry that feature over to the built-in commands).

calc84maniac · « **Reply #82 on:** November 10, 2010, 09:20:06 pm »

Changing the high order bit does work, actually. It changes a comparison in the -32768 to 32767 range to a comparison in the 0 to 65535 range (effectively changing from a signed comparison to an unsigned comparison).

Runer112 · « **Reply #83 on:** November 10, 2010, 11:55:08 pm »

Quote from: calc84maniac on November 10, 2010, 09:20:06 pm

Changing the high order bit does work, actually. It changes a comparison in the -32768 to 32767 range to a comparison in the 0 to 65535 range (effectively changing from a signed comparison to an unsigned comparison).

Yup

This is the only signed comparison for which this method is better though.

Do all the bit optimizations look correct by the way?

EDIT: If you plan on optimizing the signed comparisons to use the parity/overflow flag, you might want to check into that a bit. I was playing around with signed comparisons and wabbitemu was telling me very strange things. It seemed to tell me that signed comparisons relied on an xor of the p/v and s flags. Which makes no sense, but that's what wabbitemu was telling me. See table below.

hl	de	sbc hl,de	c	p/v	s	hl>>de
2000	6000	C000	1	0	1	0
2000	A000	8000	1	1	1	1
2000	E000	4000	1	0	0	1
6000	2000	4000	0	0	0	1
6000	A000	C000	1	1	1	1
6000	E000	8000	1	1	1	1
A000	2000	8000	0	0	1	0
A000	6000	4000	0	1	0	0
A000	E000	C000	1	0	1	0
E000	2000	C000	0	0	1	0
E000	6000	8000	0	0	1	0
E000	A000	4000	0	0	0	1

DJ Omnimaga · « **Reply #84 on:** November 11, 2010, 12:32:17 am »

Quote from: Quigibo on November 10, 2010, 07:55:16 pm

Yeah, I'm still reading all of this, even though I'm less active, I still visit just about every day I've even been able to do a little more progress with Axe even with my busy schedule.

Ah phew, good to hear

. Still, I hope the schedule won't get even more hectic with the time. X.x

calc84maniac · « **Reply #85 on:** November 11, 2010, 01:04:30 am »

Quote from: Runer112 on November 10, 2010, 11:55:08 pm

It seemed to tell me that signed comparisons relied on an xor of the p/v and s flags. Which makes no sense, but that's what wabbitemu was telling me.

It actually does make a bit of sense. Whether the mathematical (non-overflowed) result of the subtraction is positive or negative should give you the result of the comparison. However, if there was a signed overflow, it will give the wrong result. So the sign flag needs to be inverted if there was an overflow, and XOR achieves this perfectly.

Runer112 · « **Reply #86 on:** November 11, 2010, 01:43:50 am »

Yeah, my point is that Quigibo is probably just better off using the signed comparisons he already uses instead of bothering with the p/v flag, because it gets messy.

calc84maniac · « **Reply #87 on:** November 15, 2010, 08:27:44 am »

Actually, the main reason he didn't use the p/v flag is because his routines didn't support absolute jumps. They apparently do now, so some speed-up using these flags might be possible.

Runer112 · « **Reply #88 on:** November 28, 2010, 06:04:40 pm »

Cool, Quigibo added all my optimized auto-optimizations

But I think you missed p_GetBit15, which can be optimized to be the same as p_Mod2.

Munchor · « **Reply #89 on:** November 28, 2010, 06:08:28 pm »

Quote from: Runer112 on November 28, 2010, 06:04:40 pm

Cool, Quigibo added all my optimized auto-optimizations But I think you missed p_GetBit15, which can be optimized to be the same as p_Mod2.

Great! So, it optimizes the Axe script or the Asm conversion?

Like, the following program:

Code: [Select]

Output(0,0,"Hello World")
Is optimized to:

Code: [Select]

Output(0,0,"Hello World
Or is it Assembly that is optimized?

Author Topic: Assembly Programmers - Help Axe Optimize! (Read 145160 times)

Runer112

Re: Assembly Programmers - Help Axe Optimize!

Quigibo

Re: Assembly Programmers - Help Axe Optimize!

Runer112

Re: Assembly Programmers - Help Axe Optimize!

Runer112

Re: Assembly Programmers - Help Axe Optimize!

Runer112

Re: Assembly Programmers - Help Axe Optimize!

DJ Omnimaga

Re: Assembly Programmers - Help Axe Optimize!

Quigibo

Re: Assembly Programmers - Help Axe Optimize!

calc84maniac

Re: Assembly Programmers - Help Axe Optimize!

Runer112

Re: Assembly Programmers - Help Axe Optimize!

DJ Omnimaga

Re: Assembly Programmers - Help Axe Optimize!

calc84maniac

Re: Assembly Programmers - Help Axe Optimize!

Runer112

Re: Assembly Programmers - Help Axe Optimize!

calc84maniac

Re: Assembly Programmers - Help Axe Optimize!

Runer112

Re: Assembly Programmers - Help Axe Optimize!

Munchor

Re: Assembly Programmers - Help Axe Optimize!