Calculator Community > ASM

Better LCD Delay Routines?

(1/2) > >>

Sue Doenim:
The LCD delay is long, and sometimes there's nothing to do while it's delaying, so you have to use a routine that waits until the LCD is ready. In LCD-heavy programs, like a grayscale game, tons of time is wasted in such routines. The most common routine that I learned from @thepenguin77 is:

--- Code: ---label: ;T-states
IN A,(\$10) ;11
AND %10010000 ;7
JR NZ,label ;7/12
;6 bytes, destroys A

--- End code ---
This routine is okay, but it might be improvable.  If you can use only bit 7 instead of both bit 7 and 4 (which is what I'm not sure about), these would work:

--- Code: ---label:
IN A,(\$10) ;11
RLA ;4
JR C,label ;7/12
;5 bytes, destroys A

--- End code ---
That one is a bit better, but if you want to optimize for speed:

--- Code: ---  LD C,\$10 ;7
label:
IN (C),A ;12
JP M,label ;10
;7 bytes, destroys A/C
;If C is already equal to \$10, you can skip the
;load instruction and save 2 bytes/7 T-states.
;If you are okay with using undocumented
;instructions, IN (C) would preserve A

--- End code ---
These look like they're really helpful routines to help save a bit of space and to save a few T-states, which will really add up when you're writing to the screen 768+ times in a row. The second one in particular might work really well with a grayscale program. If you move around some instructions, then you only need to load \$10 into C once, and then you can go through your whole screen-writing routine without having to do so for every delay.

thepenguin77:
That's a great idea Sue! I've never thought to try to optimize that before. I originally got that code from somewhere on ticalc.org. (Maybe Pheonix?) In my code it looks like this:

--- Code: ---#define DWAIT IN A, (\$10) \ AND %10010000 \ JR NZ, \$-4

; Then where needed

ld      a, 07
out     (\$10), a
DWAIT
ld      a, \$20
out     (\$10), a
DWAIT
ld      a, \$BF
out     (\$10), a

--- End code ---

I'm not sure how stable the LCD driver is, but if your goal is speed. Maybe you could even figure out how long the delay needs to be between writes and then try to match it exactly with some SMC code or a variable and a jump table. That's how many of the TI-84 music-playing programs account for the different cpu speeds of different calculators.

Xeda112358:
@Sue Doenim : your second routine should use "jr c,", not "jr nz,". I usually go with the second method unless I can get \$10 in C, then I use the "in a,(c)" method. I also optionally use compiler directives so the user can use undocumented instructions.

For example, in Grammer, I define my LCDDelay routine as:

--- Code: ---in a,(16) \ rla \ jr c,\$-3

--- End code ---

But one of my favorite tricks that many people don't use (and you'll see in many of my projects) is that if I am only doing full-screen LCD updates and I don't need interrupts, then at the beginning of my program I disable interrupts and write 80h to port 16 (or BFh to port 16 if you are doing it the weird way). Then I can skip that entire step in my LCD update routine, since I write column-by-column and that internal LCD counter is automatically reset to the desired initial value by the end of my routine.

It doesn't save much, but it does save space (you almost certainly don't need to worry about an LCD delay between initializing with 80h and the first time you update the LCD), and you save a non-zero number of clock cycles each update, so it really is a "free" optimization.

thepenguin77:
So, originally, I was going to point out that my "exact-timing" scheme could be rather easily accomplished by using fixed length gaps between your port \$10 writes and then using port \$29 - \$2C to do the t-state level timing for you.

But then I saw this quote on the port \$2A wiki:

--- Code: ---... by adding a delay to any instruction which reads from or writes to ports 10 or 11 ...

--- End code ---

Which means that for all these LCD delay routines, in a, \$(10) actually takes a lot longer than expected. (An extra 11 t-states on my calculator). I guess this means that there's a huge speedup to be had simply by clearing port \$2A (or \$29 or \$2C depending on your port \$20 setting) at the start of your program and resetting it when you're done. (Although, it looks like my old programs do this, so maybe everyone already knows this ¯\_(ツ)_/¯)

Xeda112358:
Oh wow, I hadn't realized that!
EDIT: I saw this on that page:
--- Quote ---NOTE: The contents of this port should NOT be less than 0Ch or the LCD driver will no longer respond.
--- End quote ---