Print Page - Assembly Coding Optimization

Calculator Community => TI Calculators => ASM => Topic started by: Halifax on March 23, 2007, 10:46:00 am

Title: Assembly Coding Optimization
Post by: Halifax on March 23, 2007, 10:46:00 am

Ok I have compiled this code from MaxCoderz and figured it would be useful for some ASM programmers over here. You all can post your own routines and optimized stuff here too.

Optimization 1 - Loading 0 into A
----------------------------------
Unoptimized
c1-->

CODE

ec1
ld a,0
c2

ec2
to
c1

-->

CODE

ec1
xor a

or

sub a,a
c2

ec2

ld a,0 is unoptimized because it takes more bytes and tstates than both "xor a" and "sub a,a". They both do the same as ld a,0
------------------------------------

Optimization 2 - Comparing the value 0
------------------------------------
Unoptimized
c1

-->

CODE

ec1
cp 0
c2

ec2
to
c1

-->

CODE

ec1
or a
c2

ec2

same with above. "or a" takes less space and tstates than "cp 0" and does the same thing.
-------------------------------------

Optimization 3 - Loading numbers into register pairs
-------------------------------------
Unoptimized
c1

-->

CODE

ec1
ld b,$88
ld c,$45
c2

ec2
to
c1

-->

CODE

ec1
ld bc,$8845
c2

ec2

the first one takes 14 tstates while "ld bc,$8845" only takes 10 tstates. It is self-explanatory that what goes in B comes first and C comes after.
---------------------------------------

Optimization 4 - Loading a Pic onto the screen (SPEED OPTIMIZATION)
---------------------------------------
Unoptimized
c1

-->

CODE

ec1
ld hl,gbuf
ld de,pic
ld bc,768
ldir
c2

ec2
to
c1

-->

CODE

ec1
ld hl,gbuf
ld de,pic
ld bc,768

copyfast:

Title: Assembly Coding Optimization
Post by: Fallen Ghost on March 24, 2007, 06:13:00 am

For #5:

Why should interrupts be enabled?

If an interrupt triggers, it will return correctly (lets suppose) and let the stack at the same value, therefore not modifying the bytes you are going to have when you push, so the result is the same.
But as we know, interrupts can trigger anytime. If it happens that the interrupt is triggered when b=1 and there is only a couple pushes to do, then if in the interrupt routine, there are more pushes than the number remaining in your routine, then whatever data was before the buffer will be erased/modified, so it is no a good idea, therefore interrupts disabled.

Title: Assembly Coding Optimization
Post by: Halifax on March 24, 2007, 11:22:00 am

Fallen_Ghost none of those routines belong to me. They were copied right from Maxcoderz. I just pasted them in here as I stated in the top of my post up there

QUOTE

Ok I have compiled this code from MaxCoderz and figured it would be useful for some ASM programmers over here. You all can post your own routines and optimized stuff here too.

Title: Assembly Coding Optimization
Post by: calc84maniac on March 25, 2007, 08:48:00 am

If you don't want the flags affected use ld a,0 instead of xor a

Title: Assembly Coding Optimization
Post by: Jon on March 25, 2007, 04:44:00 pm

Here's a simple size optimization for direct input:
Instead of:
c1-->

CODE

ec1
out (1),a
nop
nop
in a,(1)
c2

ec2

Use:
c1

-->

CODE

ec1
out (1),a
ld a,(de)
in a,(1)
c2

ec2

LD A,(DE) creates the same delay as 2 NOP's, but it takes 1 byte instead of 2

Title: Assembly Coding Optimization
Post by: Halifax on March 26, 2007, 09:44:00 am

wow very nice

EDIT: I just gained (with this post here) post number 666!

Title: Assembly Coding Optimization
Post by: Iambian on April 10, 2007, 10:16:00 am

Size and speed optimization:
If you make an unconditional CALL prior to a RET, you can replace that CALL with a JP instruction (since you're then going to be using that calling routine's RET). This won't work if the calling routine doesn't exit out using a RET, but that's up to you to decide. But if it works, that's one byte saved by not having to use a RET.

If the calling routine is local enough, you can save another byte by using the JR instruction instead.
i.e.
c1-->

CODE

ec1 CALL Someroutine
RET
;more code
Someroutine:
XOR A
RETc2

ec2
can be condensed to this:
c1

-->

CODE

ec1 JR Someroutine
;more code
SomeRoutine:
XOR A
RETc2

ec2
If you can rearrange the code to make "SomeRoutine" appear right after the calling routine, you can save two more bytes by omitting JR altogether.

In this respect, it may do you some good to rearrange your ASM code to take advantage of this kind of optimization. Of course, you're going to have to be wary about errors resulting in the use of JR.

Oh, and as a response to the previous post that made a size optimization using a "LD A,(DE)", it's not exactly the same as using two NOPs. The "LD A,(DE)" is faster by one clockcycle. I understand that this is negligible, but I just wanted to point that out.

Title: Assembly Coding Optimization
Post by: Jon on April 12, 2007, 11:56:00 am

yeah good point :)

7<8 heh

Title: Assembly Coding Optimization
Post by: Iambian on April 16, 2007, 09:21:00 am

It might also be worth it to mention, in case you're doing interrupts, that if you wanted to call the TI-OS's interrupt service routine (RST 38h) on a conditional jump, you could save some memory by using a trick of the Z80's instruction set.

Instead of, say, "CALL C,0038h" or "JP C,0038h", you could do "JR C,$FF". Conditions will vary, all of which will do the same thing.

This works because the "$FF" is an offset to make a relative jump one byte behind the already-executed instruction ( JR C,$+1). The part of the argument for JR, "$FF", is the opcode for RST 38h. In that sense, you're combining two instructions in one.

For what you'd use this for, I'd have no idea. Perhaps someone that wanted to call the interrupt service routine while the interrupts were off? Perhaps it might be a way to keep romcalls like _getCSC and _getKey working while the interrupts are gone.

Title: Assembly Coding Optimization
Post by: Jon on April 16, 2007, 04:09:00 pm

That's brilliant man. And it's pure luck that the opcode for rst 38h is the signed 8-bit value for -1. That's friggin' awesome, kudos! :)

Although, wouldn't the command be jr c,$-1 ? I believe tasm will interpret jr c,$ff to mean jr c,$00ff, and hence give you a range of relative branch error.

Title: Assembly Coding Optimization
Post by: Iambian on April 17, 2007, 04:28:00 am

I actually got that trick off of some very old Z80 documentation. I just thought it curious.

In TASM, JR (condition),$+2 gives you the instruction after it, and JR (condition),$-1 gives you the instruction behind it. To make it continuously loop on itself (jump back to the beginning of the instruction), you'd do JR (condition),$+0, so it's natural to then believe that JR (condition),$+1 would give you one byte within the instruction.

Also, no errors will happen since TASM already recognizes the instruction to take a single byte argument. Believe me; I've tried a similar stunt before (though not as memory-efficient), especially if you read up on my unreadable Z80 source :)

Also, to keep on topic:

For a speed optimization when working with a list of two-byte values, you can abuse the stack to quickly address these values. That is, set SP to the start of the list. Then you can repeatedly POP values off the table. The values are not destroyed, but only read. If you need to edit a value, say, for updating, you can just PUSH the value back in and use some instructions like INC SP (twice) to move the pointer back to where you were (although just POP-ing the value will save two clockcycles as opposed to using INC SP twice, except if you're trying to POP to an index register). Accessing values in this fashion will save you many clockcycles, especially compared to the standard "LD E,(HL) \ INC HL \ LD D,(HL) \ INC HL" sequence that eats up a hefty 26 clockcycles. Simply POP-ing the value will use up only ten clockcycles.

This is especially useful for bubble-sorting or perhaps grabbing the largest and smallest value on the table.

Remember to disable interrupts and to save SP prior to editing SP to read the table.
------------------
Another optimization trick is first of all, when you want to, say, stop the program to output an error message, you could, instead of loading in HL the address of the string, do the following instead:

c1-->

CODE

ec1CALL ErrorCode \ .db "ERROR1",0

ErrorCode:
POP HL
;process string address now in HL
JR ProgramEndc2

ec2
That works because when the CALL is made, the address in the stack will point to the string thereafter, since it would've been the next address to execute from then. The ErrorCode will have a jump to a place that will restore SP prior to exiting, so this will only work if you have code to restore the stack or something. If all your error messages are of a fixed size, you could save more space by cutting out that null-terminator and editing your text output routine to cope with a fixed-size.

For further optimization (in case you don't want to have code to jump over all that CALL and text), all ASCII alphabet characters are within the code block for loads. If you do not have any special characters, spaces, or numbers, you could actually have the call a conditional one and have your little Z80 run the text as if it was code. This is a dangerous practice, especially if you were going to change your text, but in this case, you'd be looking at an opcode table to determine which registers get destroyed in the various loads. The reason why I say that you cannot use the space character is because its opcode refers to a JR instruction. If you were especially savvy about the placement of your code, you could use this feature to your advantage.

Take care with the use of that optimization. It certainly isn't a speed optimization, but it most certainly is a size optimization, especially if you're CALLing on a condition that JR doesn't take (like M or P, for instance).

Title: Assembly Coding Optimization
Post by: Fallen Ghost on April 24, 2007, 02:29:00 pm

On other trick I found out is a small speed optimization (but you loose 1 byte):
instead of doing this (22T states, 3 bytes)c1-->

CODE

ec1ld e,a ;4,1
ld d,0 ;7,2
add hl,de;11,1c2

ec2

One could do (19/20T states, 4 bytes)c1

-->

CODE

ec1add a,l

Title: Assembly Coding Optimization
Post by: calc84maniac on April 27, 2007, 02:27:00 am

And, rather than comparing you can repeatedly 'dec a'. And as a bonus you can assume a is 0 when the function gets called. :)

Edit: There is also no command "call (hl)". Not to mention addresses are two bytes, not one.

Title: Assembly Coding Optimization
Post by: Fallen Ghost on April 27, 2007, 10:37:00 am

But actually, "cp reg8" takes 4T, while "cp imm8" takes 2 bytes and 7T.

For that effect, you could do this:

c1-->

CODE

ec1ld hl,jump_table
add a,a

Title: Assembly Coding Optimization
Post by: Halifax on April 27, 2007, 10:41:00 am

Oh yeah your right Fallen_Ghost heh good catch. :thumb:

Isn't it kind of self explanatory that jp (ix) would work since jp (hl) works ;)

Oh yeah and just in case you didn't know or wanted an eaiser way to find if a command works then you could just look in TASM80.tab

Title: Assembly Coding Optimization
Post by: Fallen Ghost on April 27, 2007, 02:50:00 pm

Hey! Where did your posts go?

Well, is it kind of self-explanatory that jp (hl) works and jp (de) does not work while jp (ix) and jp (iy) both work?

And why not jr (a)...

Title: Assembly Coding Optimization
Post by: Halifax on April 27, 2007, 02:56:00 pm

well yeah it should be because ix is supposed to be a replacement wherever hl is and de, and bc aren't. And there would be problems with jr (a) since somone could unsign a and make it 255 instead of between 128 and -127 so yeah.

Title: Assembly Coding Optimization
Post by: Iambian on April 30, 2007, 05:45:00 am

JP (HL) works with JP (IX) and JP (IY) and not JP (DE) because of the way the Z80 instruction set works. Since there's an opcode for JP (HL), and the docs say that many instructions that work with HL can also work with IX and IY, that would seem self explanitory. A little study of the instruction set would reveal that any IY or IX instruction is simply the corresponding HL instruction with a prefix byte attached to tell the processor to treat the instruction using the index register instead of the original HL register. Because of the way the processor works, you also get these "undocumented" instructions that allow one to edit the LSB or the MSB of the index registers.

Is JR (A) even a valid instruction? I can't find it anywhere in the documentation, so I dun think that's supported by the Z80. Correct me if I'm wrong by showing me the corresponding opcode and what the binary turns out to be.

But, if one wanted to use SMC relating to a JR instruction and a jump table, one could try something like this, provided that the table is page-aligned (aligned to $xx00)
c1-->

CODE

ec1
LD HL,$8000;assuming that the table is in the $800-$80FF space
LD L,A

Title: Assembly Coding Optimization
Post by: Halifax on April 30, 2007, 09:57:00 am

wow I am amazed. Anyways no Fallen_Ghost was not saying that jr (a) is a valid instruction he was simply trying to prove a point of why it was not self-explanatory. He was saying hey why wouldn't jr (a) work. That is all.

Title: Assembly Coding Optimization
Post by: Iambian on April 30, 2007, 12:23:00 pm

Oh. I thought that anyone with a decent knowledge of the Z80 instruction set would see the connection between any instruction that includes HL and turning it to IX or IY.

The only thing that is *not* self-explanitory is the lack of an EX DE,IY or EX DE,IX instruction, or any other exchanges that play around with HL. (believe me; I've tested the EX DE... thing)

And on a previous topic, the assembler would actually take JP (DE) or JP (A). TASM would complain about a missing label, tho. If you have either DE or A defined as a label, TASM would evaluate (DE) as the location of the label itself and not the stuff at the label, as the indirection might've indicated.

Perhaps this is a good way to obfuscate your code?

Of course, if you have the assembler invoked with the right flags, TASM would emit a warning.

Title: Assembly Coding Optimization
Post by: Jon on May 21, 2007, 01:34:00 pm

I'm wondering, does ex (sp),ix/ex (sp),iy work?

Title: Assembly Coding Optimization
Post by: Halifax on May 21, 2007, 11:46:00 pm

Well if ex (sp),hl works then I would imagine so. Because as he said ix is just hl with an extra byte attached to tell its ix

Title: Assembly Coding Optimization
Post by: calc84maniac on May 22, 2007, 07:58:00 am

Well the ex commands, unfortunately, are exceptions to that rule (last i checked)

Title: Assembly Coding Optimization
Post by: Halifax on May 22, 2007, 12:46:00 pm

EX (SP),HL E3 1 NOP 1
EX (SP),IX E3DD 2 NOP 1
EX (SP),IY E3FD 2 NOP 1

As you can see DD stands for use IX and FD stands for use IY.

Title: Assembly Coding Optimization
Post by: Iambian on May 23, 2007, 04:01:00 am

EX (SP),IY and the like *do* work, according to the "documentation". They're slow, so I wouldn't recommend using 'em unless you ran out of registers or something (which is very likely in high-intensity situations).

It's just that EX DE,HL lacks that same kind of modifier. And don't we wish EXX had that kind of modifier? :P

But for some other assembly code optimizations...
( meh. I'm running out of things to say... )

To make use of the wonderful array of flags, you should make all loops that count decrement toward zero. In this way, you remove the needed CP instruction prior to the condition test and you cause the loop to run faster. If you're using register B for this purpose, then you should already be familiar with the DJNZ instruction. If you need something counting upward, however, you can use an extra register to keep track of the count. If you are running short on registers, you can do some CPL/NEG magic with the counter if the count upwards happens to be starting from 255 counting downward, or if the counting number happens to require that kind of value.

As with any ASM optimization, you ought to write your code first and then see if you can improve the code's form in any way. Whether it be taking advantage of a conveniently placed flag, condensing the code so it uses less registers, removing instances of "slow" instructions in favor of "faster" instructions, limiting the scope of what your code is doing to *exactly* what it should do... and making sure that if you make any changes to subroutines with its input, ensure that the input values of the calling routine are well-suited to handle the changes.

Sometimes, you will want to change input values to certain subroutines because the calling routine will usually have its input in a certain format. For example, if you needed a routine that would extract size bytes out of a program var right after a _chkfindsym, you'd take DE as the address of the program. If you're doing this multiple times, you're saving yourself a few bytes of having to do "EX DE,HL" each time you called _chkfindsym.

Experiment with these optimizations, but before you do any kind of optimization, please, for the love of all that is good, make sure that the program works prior to potentially breaking it. You'll thank yourself later.

Title: Assembly Coding Optimization
Post by: calc84maniac on June 12, 2007, 04:32:00 am

The Better CP HL,DE
compares hl to de, same flag outputs as 8-bit compare
c1-->

CODE

ec1or a
sbc hl,de
add hl,dec2

ec2

Title: Assembly Coding Optimization
Post by: Halifax on June 12, 2007, 05:09:00 am

hmm yes very nice calc84maniac. Simple yet straight to the point.

Title: Assembly Coding Optimization
Post by: Iambian on June 12, 2007, 06:36:00 am

QuoteBegin-calc84maniac+12 Jun, 2007, 10:32-->

QUOTE (calc84maniac @ 12 Jun, 2007, 10:32)

The Better CP HL,DE
compares hl to de, same flag outputs as 8-bit compare
c1-->

CODE

ec1or a
sbc hl,de
add hl,dec2

ec2

Are you sure that would work? Wouldn't the "ADD HL,DE" destroy the Z flag set by the SBC? If anything, shouldn't the the "OR A \ SBC HL,DE" and the "ADD HL,DE" be switched?

Or is it something I'm not getting?

Title: Assembly Coding Optimization
Post by: calc84maniac on June 12, 2007, 07:25:00 am

ADD HL,DE only modifies the carry flags, which doesn't matter in this case. ADC HL,DE, however, modifies the same flags as SBC HL,DE.

Title: Assembly Coding Optimization
Post by: Jon on June 15, 2007, 02:14:00 pm

Calc84maniac is right, partly. If the sbc causes a sign change, so will the add, coming back. However, unless HL=0, the above set of commands will always yeild a nz, even if hl=de. you need to use the additional 21 cc's (push \ pop) to back up flags.

Title: Assembly Coding Optimization
Post by: Fallen Ghost on June 15, 2007, 03:32:00 pm

That's why I answered him this on UTI and Cemetech (didn't dare posting it on DS)

QuoteBegin-myself+-->

QUOTE (myself)

Also, I just found out why I said the other is faster

c1-->

CODE

ec1
or a

Title: Assembly Coding Optimization
Post by: calc84maniac on June 16, 2007, 02:51:00 am

@Jon: I said ADD HL,DE modified only the carry flags...

@Fallen Ghost: Fine, you win...

Title: Assembly Coding Optimization
Post by: Fallen Ghost on June 16, 2007, 02:26:00 pm

QuoteBegin-calc84maniac+16 Jun, 2007, 8:51-->

QUOTE (calc84maniac @ 16 Jun, 2007, 8:51)

@Jon: I said ADD HL,DE modified only the carry flags...

@Fallen Ghost: Fine, you win...

2nd message: Yays! But I still venerate your asm madskillz! :king:

1st: Add HL,DE does not mess up the z, p/v, s flags. it only uses carry and H and resets (or sets, whatever) N.

Title: Assembly Coding Optimization
Post by: Halifax on June 16, 2007, 03:20:00 pm

Sorry I just have to say this. I didn't even know it until I just read this article and I mean JUST read it.

But it resets the Z flag. The article says there is no such thing as the NZ flag because it is just an inversion of the Z flag.

*Halifax celebrates his small unknown victory!

Notice: This post was for comic relief.

Omnimaga

Calculator Community => TI Calculators => ASM => Topic started by: Halifax on March 23, 2007, 10:46:00 am