### Author Topic: optimizing asm code  (Read 3267 times)

0 Members and 1 Guest are viewing this topic.

#### ben_g

• Hey cool I can set a custom title now :)
• LV9 Veteran (Next: 1337)
• Posts: 1002
• Rating: +125/-4
• Asm noob
##### optimizing asm code
« on: April 17, 2012, 04:27:08 pm »
Hi,

I have to optimize this code, But I haven't really optimized before and whatever I try to do, I can't get it to work faster. Can anyone give me some tips?

Also: I'm optimizing for speed, not for size. The code can be quite big.

Here's the code itself:
Spoiler For code:
Code: [Select]
DrawTriangle:  ;IN: x1,y1,u1,v1,x2,y2,u2,v2,x3,y3,u3,v3  ;scherm = 96*64;the following code was used to add 100 to the x coordinates, to see if the sign was the problem;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;  ld de, 100;  ld hl, (x1);  add hl, de;  ld (x1), hl;  ld hl, (x2);  add hl, de;  ld (x2), hl;  ld hl, (x3);  add hl, de;  ld (x3), hl;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;---------------------------------------------------------------; This part sorts the points so that Y1 <= Y2 <= Y3 so; we can just draw each scanline below the last one.;---------------------------------------------------------------  ld hl, (x1)  call Signed16To8  ld h, $FF ld l, a ld (x1), hl ld hl, (x2) call Signed16To8 ld h,$FF  ld l, a  ld (x2), hl  ld hl, (x3)  call Signed16To8  ld h, $FF ld l, a ld (x3), hl ld hl, (y1) ld de, (y2) cpHLDE jr c, Y1SmallerThanY2 ld hl, x1 ld de, dx1 ;temp location ld bc, 6 ;size ldir ld hl, x2 ld de, x1 ld bc, 6 ldir ld hl, dx1 ld de, x2 ld bc, 6 ldirY1SmallerThanY2: ld hl, (y1) ld de, (y3) cpHLDE jr c, Y1SmallerThanY3 ld hl, x1 ld de, dx1 ;temp location ld bc, 6 ;size ldir ld hl, x3 ld de, x1 ld bc, 6 ldir ld hl, dx1 ld de, x3 ld bc, 6 ldirY1SmallerThanY3: ld hl, (y2) ld de, (y3) cpHLDE jr c, Y2SmallerThanY3 ld hl, x2 ld de, dx1 ;temp location ld bc, 6 ;size ldir ld hl, x3 ld de, x2 ld bc, 6 ldir ld hl, dx1 ld de, x3 ld bc, 6 ldirY2SmallerThanY3:; +++++ End of sorting code +++++;----------------------------------------------------------; Here, some variables are initialized. The delta; variables (the variables which start with a 'd'); contain the values that need to be added to; the variables which start with a 't'. Variables; with a 't' and a '1' are used for the start of; the scanline. Those with a 't' and a '2' are used; for the end of the scanline.;---------------------------------------------------------- res 0, (IY) ;if this bit is 0, the routine is drawing the top half of the triangle. if it's 1, it's drawing the bottom half. res 1, (IY) ;This bit is used to store if the deltas for the texture coordinates inside scanlines are already calculated. They are constants, so they only need to be calculated once per half. ld hl, (y2) ld de, (y1) subFP ;This routine is for substracting fixed-point values, but here it's used to substract integer values. ld h, l ld l, 0 push hl ld hl, (x2) ld de, (x1) subFP ld h, l ld l, 0 pop de call DivFP ld (dx1), hl ld hl, (y3) ld de, (y2) subFP ld h, l ld l, 0 push hl ld hl, (x3) ld de, (x2) subFP ld h, l ld l, 0 pop de call DivFP ld (dx2), hl ld hl, (y3) ld de, (y1) subFP ld h, l ld l, 0 push hl ld hl, (x3) ld de, (x1) subFP ld h, l ld l, 0 pop de call DivFP ld (dx3), hl ld hl, (y2) ld de, (y1) subFP ld h, l ld l, 0 push hl ld a, (u2) ld h, a ld l, 0 ld a, (u1) ld d, a ld e, 0 subFP ;ld h, l ;ld l, 0 pop de call DivFP ld (du1), hl ld hl, (y3) ld de, (y2) subFP ld h, l ld l, 0 push hl ld a, (u3) ld h, a ld l, 0 ld a, (u2) ld d, a ld e, 0 subFP ;ld h, l ;ld l, 0 pop de call DivFP ld (du2), hl ld hl, (y3) ld de, (y1) subFP ld h, l ld l, 0 push hl ld a, (u3) ld h, a ld l, 0 ld a, (u1) ld d, a ld e, 0 subFP ;ld h, l ;ld l, 0 pop de call DivFP ld (du3), hl ld hl, (y2) ld de, (y1) subFP ld h, l ld l, 0 push hl ld a, (v2) ld h, a ld l, 0 ld a, (v1) ld d, a ld l, 0 subFP ;ld h, l ;ld l, 0 pop de call DivFP ld (dv1), hl ld hl, (y3) ld de, (y2) subFP ld h, l ld l, 0 push hl ld a, (v3) ld h, a ld l, 0 ld a, (v2) ld d, a ld e, 0 subFP ;ld h, l ;ld l, 0 pop de call DivFP ld (dv2), hl ld hl, (y3) ld de, (y1) subFP ld h, l ld l, 0 push hl ld a, (v3) ld h, a ld l, 0 ld a, (v1) ld d, a ld e, 0 subFP ;ld h, l ;ld l, 0 pop de call DivFP ld (dv3), hl ld hl, (x1) bit 7, h jr z, TPos1 ld (tx1+1),hl \ ld a,$FF \ ld (tx1),a  ld (tx2+1),hl \ ld a, $FF \ ld (tx2),a jr TEnd1TPos1: ld (tx1+1),hl \ xor a \ ld (tx1),a ;store the 16bit integer at hl into 16.8 fixed point number tx1 ld (tx2+1),hl \ xor a \ ld (tx2),aTEnd1: ld hl, (y1) ld (_ty), hl ld a, (u1) ld h, a ld l, 0 ld (tu1), hl ld (tu2), hl ld a, (v1) ld h, a ld l, 0 ld (tv1), hl ld (tv2), hl;if Y1 == Y2, then we don't need to draw the first half. ld hl, (Y1) ld de, (y2) cpHLDE jp z, __TEndLoop; +++++ End of initializing code +++++;------------------------------------------------------------; This is the loop in which the triangle is drawn.; In each interval of the loop, a single scanline is; drawn. When this loop finished, one half of the; triangle is drawn.;------------------------------------------------------------TDrawLoop: ld a, (_ty) ld d, a;if the Y of the scanline is negative, then go to the next one. bit 7, a jp nz, Clip ld a, (_ty);If it reaches the bottom of the screen, then stop drawing the triangle. cp 64 ret nc;Initialize variables for the scanline ld hl, (tu1) ld (tmpu), hl ld hl, (tv1) ld (tmpv), hl ld hl, (tu2) ld (temp2), hl ld hl, (tv2) ld (temp3), hl ld a, (tx2+1) ld (temp+1), a add a, 128 ld b, a ld a, (tx1+1) ld (temp), a add a, 128 cp b jr c, TOrdered ;jp po, TOrdered ld hl, (tu2) ld (tmpu), hl ld hl, (tv2) ld (tmpv), hl ld hl, (tu1) ld (temp2), hl ld hl, (tv1) ld (temp3), hl ld a, (tx2+1) ld (temp), a ld a, (tx1+1) ld (temp+1), aTOrdered: ld l, d ld a, (temp);folowing line was for the test to see if the sign was the problem ;sub 100 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; bit 7, a jr z, TGetPixel xor aTGetPixel: call GetPixel ld (mask), a ld (pointer), hl;If the deltas for the texture coordinates inside a scanline are already;calculated, then calculating them again is a wast of cycles. bit 1, (IY) jr nz, TPlotLoop ld hl, (tx1) ld de, (tx2) cpHLDE jr z, TPlotLoop ld a, (temp) ld h, a ld l, 0 ld a, (temp+1) ld d, a ld e, 0 subFP push hl ld hl, (tmpu) ld de, (temp2) subFP pop de call DivFP ld (tmpdu), hl ld a, (temp) ld h, a ld l, 0 ld a, (temp+1) ld d, a ld e, 0 subFP push hl ld hl, (tmpv) ld de, (temp3) subFP pop de call DivFP ld (tmpdv), hl set 1, (IY);---------------------------------------------------------------------; In this loop, the scanline is drawn. One interval here; draws one pixel. When the loop ends, one scanline is drawn.;---------------------------------------------------------------------TPlotLoop:;If the x coordinate of the pixel is negative, then go to the next pixel. ld a, (temp) bit 7, a jr nz, TNoCarry;if the pixel goes of the right side of the screen, then go to the next scanline cp 96 jp nc, Clip;Everything with 4 ;'s behind it are for 16x16 textures. Remove those and the;textures will be 8x8. ld a, (tmpv+1) add a, a ;;;; ld hl, texture add a, l ld l, a ld a, (tmpu+1) bit 3, a ;;;; jr z, TFirstByte ;;;; res 3, a ;;;; inc hl ;;;;TFirstByte: ;;;; ld b, a inc b ld a, (hl)TshiftLoop: rla djnz TshiftLoop ld a, (mask) ld hl, (pointer) jr c, TSetPixelTResPixel: ;ld a, b cpl and (hl) ld (hl), a jr TEndPlotTSetPixel: ;ld a, b or (hl) ld (hl), aTEndPlot: ld hl, mask rrc (hl) jr nc, TNoCarry ld hl, (pointer) inc hl ld (pointer), hlTNoCarry: ld hl, (tmpu) ld de, (tmpdu) add hl, de ld (tmpu), hl ld hl, (tmpv) ld de, (tmpdv) add hl, de ld (tmpv), hl ld a, (temp+1) ld b, a ld a, (temp) ld hl, temp inc (hl) cp b jp nz, TPlotLoop; +++++ End of pixel plotting code +++++;If it's drawing the secound half, then make it recalculate the thexture deltas;for inside the scanlines. This was to solve a bug in the texture mapping. bit 0, (IY) jr nz, aaaa ;I suddenly ran out of inspiration for label names; res 1, (IY)aaaa:Clip: ld hl,(tx1) ld de, (dx1) ld a, d rla sbc a, a ld b, a add hl, de ld (tx1), hl ld a, (tx1+2) adc a, b ld (tx1+2), a ld hl,(tx2) ld de, (dx3) ld a, d rla sbc a, a ld b, a add hl, de ld (tx2), hl ld a, (tx2+2) adc a, b ld (tx2+2), a ld hl, (tu1) ld de, (du1) add hl, de ld (tu1), hl ld hl, (tu2) ld de, (du3) add hl, de ld (tu2), hl ld hl, (tv1) ld de, (dv1) add hl, de ld (tv1), hl ld hl, (tv2) ld de, (dv3) add hl, de ld (tv2), hl ld hl, (_ty) inc hl ld (_ty), hl ld de, (y2) cpHLDE jp c, TDrawLoop;This is the end of the drawing loop;If the secound half was drawn, then stop this routine. bit 0, (IY) jr nz, _TEnd__TEndLoop: ;Here, some variables are initialized for drawing the secound half. ld hl, (y2) ld (_ty), hl ld hl, (y3) ld (y2), hl ld hl, (dx2) ld (dx1), hl ld hl, (du2) ld (du1), hl ld hl, (dv2) ld (dv1), hl ld hl, (x2) bit 7, h jr nz, TPos4 ld (tx1+1),hl \ ld a,$FF \ ld (tx1),a  jr TEnd4TPos4:  ld (tx1+1),hl \ xor a \ ld (tx1),aTend4:  ld a, (u2)  ld h, a  ld l, 0  ld (tu1), hl  ld a, (v2)  ld h, a  ld l, 0  ld (tv1), hl  set 0, (IY)  jp TDrawLoop_TEnd:  retgetPixel:   bit 7, a   ret nz   bit 7, l   ret nz   ld   h, 0   ld   d, h   ld   e, l      add   hl, hl   add   hl, de   add   hl, hl   add   hl, hl      ld   e, a   srl   e   srl   e   srl   e   add   hl, de      ld   de, PlotSScreen   add   hl, de      and   7   ld   b, a   ld   a, $80 ret z rrca djnz$-1   ret

note: I'm not asking to optimize it for me, but to help me optimize it, so I can learn from this and optimize better in the future.
My projects
- The Lost Survivors (Unreal Engine) ACTIVE [GameCommandoSquad main project]
- Oxo, with single-calc multiplayer and AI (axe) RELEASED (screenshot) (topic)
- An android version of oxo (java)  ACTIVE
- A 3D collision detection library (axe) RELEASED! (topic)(screenshot)(more recent screenshot)(screenshot of it being used in a tilemapper)
Spoiler For inactive:
- A first person shooter with a polygon-based 3d engine. (z80, will probably be recoded in axe using GLib) ON HOLD (screenshot)
- A java MORPG. (pc) DEEP COMA(read more)(screenshot)
- a minecraft game in axe DEAD (source code available)
- a 3D racing game (axe) ON HOLD (outdated screenshot of asm version)

This signature was last updated on 20/04/2015 and may be outdated

#### thepenguin77

• z80 Assembly Master
• LV10 31337 u53r (Next: 2000)
• Posts: 1591
• Rating: +823/-5
• The game in my avatar is bit.ly/p0zPWu
##### Re: optimizing asm code
« Reply #1 on: April 20, 2012, 07:28:27 pm »
I'll start working on some stuff for you, but since this routine is so massive, could you like post a guide to what it does? I understand it's mostly math, but a simple explanation of what each section does could greatly improve the way people can optimize it.

Edit:
I see the section headers. But how about an overarching explanation of it all.

I'm just going to keep adding information until I write done, or you respond. Feel free to correct anything that's wrong.

First of all, I'd like to see FPDiv and FPSub.

;dx1 = dx/dy of 1 to 2
;dx2 = dx/dy of 2 to 3
;dx3 = dx/dy of 1 to 3
;du1 = du/dy of 1 to 2
;du2 = du/dy of 2 to 3
;du3 = du/dy of 1 to 3
;dv1 = dv/dy of 1 to 2
;dv2 = dv/dy of 2 to 3
;dv2 = dv/dy of 1 to 3

What are u and v?

What are the bounds on the input variables?

Done for now (got stuff to do)
« Last Edit: April 20, 2012, 07:46:13 pm by thepenguin77 »
zStart v1.3.013 9-20-2013
All of my utilities
TI-Connect Help
You can build a statue out of either 1'x1' blocks or 12'x12' blocks. The 1'x1' blocks will take a lot longer, but the final product is worth it.
-Runer112

#### ben_g

• Hey cool I can set a custom title now :)
• LV9 Veteran (Next: 1337)
• Posts: 1002
• Rating: +125/-4
• Asm noob
##### Re: optimizing asm code
« Reply #2 on: April 20, 2012, 07:52:43 pm »
pretty much all code that's commented out were either failed tests or debug code. You can just ignore those parts.
The part after the first header sorts the points based on their Y-coordinate (from smallest to biggest)
The secound part sets up flags and variables, and it calculates some numbers. For example dx is the fixed-point number that needs to be added in every scanline to adjust the X and make the line meet at the next point.
The part after that calculates some values for the scanline (drawloop is the loop in wich scanlines are drawn, 1 cycle=1scanline). The first part does vertical clipping, then it checks the start and the end and when nessicary it switches them so the scanline can always be drawn from left to right. Then GetPixel is called and saved so it only has to be called once per scanline. The rest of that part is texture interpolation.

PlotLoop is the loop in which the pixels are plotted. 1 cycle = 1 pixel.
It basically just reads the pixel of the texture at coordinates calculated in the part before this one, and plots that pixel. Then it shifts the texture coordinates and adjusts the results of the getPixel to plot the next pixel.

After that, the values for the scanlines are calculated again to update the x coordinates and texture coordinates.
after the end of pizel plotting code comment, the variables are updated to draw the bottom half of the triangle. After repeating the loop, the triangle is complete and it returns.

The getPixel routine bihind that was just as documentation, so you could see which one I'm using and what exactely that it does. I doubt that that can be optimized.

If anything isn't still clear, feel free to ask.
My projects
- The Lost Survivors (Unreal Engine) ACTIVE [GameCommandoSquad main project]
- Oxo, with single-calc multiplayer and AI (axe) RELEASED (screenshot) (topic)
- An android version of oxo (java)  ACTIVE
- A 3D collision detection library (axe) RELEASED! (topic)(screenshot)(more recent screenshot)(screenshot of it being used in a tilemapper)
Spoiler For inactive:
- A first person shooter with a polygon-based 3d engine. (z80, will probably be recoded in axe using GLib) ON HOLD (screenshot)
- A java MORPG. (pc) DEEP COMA(read more)(screenshot)
- a minecraft game in axe DEAD (source code available)
- a 3D racing game (axe) ON HOLD (outdated screenshot of asm version)

This signature was last updated on 20/04/2015 and may be outdated

#### thepenguin77

• z80 Assembly Master
• LV10 31337 u53r (Next: 2000)
• Posts: 1591
• Rating: +823/-5
• The game in my avatar is bit.ly/p0zPWu
##### Re: optimizing asm code
« Reply #3 on: April 21, 2012, 11:21:52 pm »
Ok, here's my interpretation.

I don't actually expect you to reply to the questions I asked, they are just to get you to think about what you are doing.

Edit:
Things you should do to make it look nice:
1. Put your browser in full screen
2. View>Document View>Compact
3. View>Compact controls
« Last Edit: April 21, 2012, 11:24:04 pm by thepenguin77 »
zStart v1.3.013 9-20-2013
All of my utilities
TI-Connect Help
You can build a statue out of either 1'x1' blocks or 12'x12' blocks. The 1'x1' blocks will take a lot longer, but the final product is worth it.
-Runer112