Omnimaga
Calculator Community => Other Calc-Related Projects and Ideas => TI Z80 => Topic started by: Matrefeytontias on July 04, 2013, 04:49:13 am
-
<cross-posted from Cemetech>
Hallaw people,
I finally got something related with 3D working in ASM ! :D <this is a bit like a dream coming true for me :P>
Well, nothing too unbelievable, but I'm happy with what I have so far :)
So, I wrote 3 main routines, and a few other ones you can use, but these latter are not focused on 3D.
Main 3D routines :- BuildMatrix : builds the XY composite rotation matrix given the rotation angle about X and Y axis, respectively in b and c. The advantage is that you can rotate several vertices about the same angle without recalculating the rotation for each vertex.
- ApplyMatrix : applies the matrix stored from AX to CZ (some .db in the program) to the three points stored at inX, inY and inZ and stores the result at outX, outY and outZ (more .dw). This is not necessarily a rotation matrix, but BuildMatrix writes to this area.
- ProjectVertex : does the necessary calculation to project a vertex on-screen. Takes outX, outY and outZ as parameters and outputs at outX and outY. outZ remains unchanged.
Others routines provided (only because they are used by the 3 above routines) :- multHL_DE : performs HL*DE and stores the result in HL.
- divHL64 : I let you guess.
- mulHL64 : ↑ this.
- AextendSignDE : converts the 8-bits signed number in A to a 16-bits signed number in DE.
- getSin, getCos : retrieves the entry corresponding to the angle A in the trig look-up table (1-byte entries scaled to 64, so it goes from -64 to 64).
- HLsignedDivDE : performs signed HL/DE, result in HL.
- HLdivDE : performs unsigned HL/DE, result in HL (by Xeda112358).
Screenshot ? Yes, definitely :P
(http://img.omnimaga.org//dottedCube.gif)
If you have any comment on the routines I described before I release the source, please tell me :)
-
Ui, that's looking nice!
So much 3D dev for the z80 recently ^^/me wonders when there will be a 3D basic lib
-
Looks great! Im glad you got it working :).
Do you have anything in mind for this project?
-
Not really, it's just something I really wanted to do, but I didn't have the knowledge to do it - until now :)
I hope I'll be able to reach solid drawing, but it's still something reaaally far away x)
-
Don't forget to add textures :P
-
Eeeeeeeeeeeeeeeeeeeer no :P
Well, if that will ever be, it'll be waaaaaaaaaaaaaaaaaaaay later. I'm not sure if I will still be alive at this time ;D
-
Oh, the signed division wasn't working? It worked for me when I tested it and it was faster than regular division (sometimes a lot faster)
-
It wasn't working for me, so I got back to the first option (I optimized a bit the abs part tho).
-
Hmm, what were you passing and what was it returning? It was definitely working for me.
-
Looks really great! I wonder how fast it is with lines connecting each dot?
-
@Xeda112358 : I used it in the ProjectVertex routine, it was all weird, I can't really say since it was 2 days ago. I'll test again.
@DJ Omnimaga : can't really say, I'm afraid that this will be a bit slow though :/ I'll make a test with MirageOS's fastline routine.
EDIT : btw, I already use MirageOS's fastcopys routine. IDK if this has any effect on the speed (it has IIRC).
-
Okay, because the only problem area that I could find was pointed out by Jacobly earlier (if HL=8000h it will return a wrong result that is negative the real answer). The fix is simple:
;===============================================================
HL_Div_BC_Signed:
;===============================================================
;Performs HL/BC
;Speed: 1350-55a-2b
; b is the number of set bits in the result
; a is the number of leading zeroes in the absolute value of HL, minus 1
; add 24 if HL is negative
; add 19 if BC is negative
; add 28 if the result is negative
;Size: 68 bytes
;Inputs:
; HL is the numerator
; BC is the denominator
;Outputs:
; DE is the quotient
; HL is the remainder
; BC is not changed
;Destroys:
; A
;===============================================================
ld a,h
xor b
push af
;absHL
xor b
jp p,$+9
xor a \ sub l \ ld l,a
sbc a,a \ sub h \ ld h,a
;absBC:
bit 7,b
jr z,$+8
xor a \ sub c \ ld c,a
sbc a,a \ sub b \ ld b,a
ld de,0
adc hl,hl
jr z,EndSDiv
ld a,16
add hl,hl
dec a
jp nc,$-2
ex de,hl
jp jumpin
Loop1:
add hl,bc ;--
Loop2:
dec a ;4
jr z,EndSDiv ;12|23
sla e ;--
rl d ;--
jumpin: ;
adc hl,hl ;15
sbc hl,bc ;15
jr c,Loop1 ;23-2b ;b is the number of bits in the absolute value of the result.
inc e ;--
jp Loop2 ;--
EndSDiv:
pop af \ ret p
xor a \ sub e \ ld e,a
sbc a,a \ sub d \ ld d,a
ret
Remember that HL and BC are the inputs, DE is the output (HL is the remainder).
-
Okay, it was an error of my part, excuse me :P
And ... oh waw. Didn't think a simple change of division routine could speed up the whole thing that much O.O I think I gained 1 or 2 FPS ...
-
Yay! Yeah, for some values, it is almost 3 times faster than the routine I gave you originally. What are your typical numbers for HL?
-
Hell if I know ... try to figure that out by applying a matrix varying depending on varying angles to varying numbers ;D
-
Well you're matrix elements are between -64 and +64 (-1 to 1) arent they? I have usually taken advantage of this with regards to multiplication.
-
Yeah true, I can use a simple HLtimesA there, but optimizations will come later.
-
Well i meant because you have a lower bit requirement there. But you're right, working first, optimisations later :).
-
If you only need 8-bit multiplication, I recently wrote my new personal best for speed and size:
H_Times_E:
;Inputs:
; H,E
;Outputs:
; HL is the product
; D,B are 0
; A,E,C are preserved
;Size: 12 bytes
;Speed: 311+6b, b is the number of bits set in the input H
; average is 335 cycles
; max required is 359 cycles
ld d,0 ;1600 7 7
ld l,d ;6A 4 4
ld b,8 ;0608 7 7
;
add hl,hl ;29 11*8 88
jr nc,$+3 ;3001 12*8-5b 96-5b
add hl,de ;19 11*b 11b
djnz $-4 ;10FA 13*8-5 99
;
ret ;C9 10 10
And the unrolled code isn't too large, either, so you can get away with a ridiculously fast routine:
H_Times_E:
;Inputs:
; H,E
;Outputs:
; HL is the product
; D,B are 0
; A,E,C are preserved
;Size: 36 bytes
;Speed: 191+6b+9p, b is the number of bits set in the input H, p is if it is odd
; average is 229.5 cycles (105.5 cycles saved)
; max required is 258 cycles (101 cycles saved)
ld d,0 ;1600 7 7
ld l,d ;6A 4 4
;
sla h ;CB24 8
jr nc,$+3 ;3001 12-1b
ld l,e ;6B --
add hl,hl ;29 11
jr nc,$+3 ;3001 12+6b
add hl,de ;19 --
add hl,hl ;29 11
jr nc,$+3 ;3001 12+6b
add hl,de ;19 --
add hl,hl ;29 11
jr nc,$+3 ;3001 12+6b
add hl,de ;19 --
add hl,hl ;29 11
jr nc,$+3 ;3001 12+6b
add hl,de ;19 --
add hl,hl ;29 11
jr nc,$+3 ;3001 12+6b
add hl,de ;19 --
add hl,hl ;29 11
jr nc,$+3 ;3001 12+6b
add hl,de ;19 --
add hl,hl ;29 11
ret nc ;D0 11+15p
add hl,de ;19 --
ret ;C9 --
Also, it returns a 16-bit result that you can work with to do whatever.
EDIT: Simple optimisation in the unrolled loop >.>
-
@Xeda112358 actually I exactly need an HLtimesA routine, so it would perform HL=HL*signed(A) ;D
Anyway, randomly-proposing-things-that-came-to-my-mind time. Since I want the engine to be able to display actual objects (one can still dream), I need a way to correctly handle face declaration. I'll base a clipped triangle routine out of the one I included in AxeJh3D and the other one I just wrote in Axe for TheMachine02's GLib, so filling won't be a problem. However, I can't really think of a practical way to declare surfaces ...
Maybe I'll use some sort of a stack, which you push objects to be rendered on. By objects, I mean either a single vertex which will be rendered as a dot, a pair of vertices which will be rendered linked by ... a line (duh) or three vertices which will be rendered as a triangle (either black or solid white I guess). As I write it, it seems to be the best solution to me.
I'm thinking of something like this (heavily inspired by DCS7's GUI stack, but hey, it really works) : ld hl,verticesTable ; points to 6-bytes vertices table (X, Y, Z)
call setCurrentVertices
ld a,OBJECT_DOT
ld hl,23 ; which vertex of the table do we use ?
call pushObject
ld a,OBJECT_LINK
ld hl,vertices2 ; points to string of 2 2-bytes offsets for vertices in the table
call pushObject
ld a,OBJECT_TRI
ld b,RENDER_BLACK ; either black or solid white
ld hl,vertices3 ; same as above with 3 offsets
call pushObject
call renderScene ; objects are drawn, then popped
I don't know at all how will each function work, for now I'm only thinking of a syntax that would be practical to use (understand "not so painful"). What do you guys think of that ?
-
A lot of users would probably have pre-built images to use, so you should have a way so that users can pass a pointer to the image data (in the format of your stack) to have it rendered. Also, HL_Times_A:
HL_Times_A:
ex de,hl
DE_Times_A:
;Inputs:
; DE and A are factors
;Outputs:
; AHL is the product
; B is 0
; C is not changed
; DE is not changed
;Time:
; 342+13x
;
ld b,8 ;7 7
ld hl,0 ;10 10
aaa:
add hl,hl ;11*8 88
adc a,a ;4*8 32
jr nc,rrr ;(12|25)*8 96+13x
add hl,de ;-- --
adc a,0
rrr:
djnz aaa ;13*7+8 99
ret ;10 10
I feel like there is a much better way to do this... Also, it returns a 24-bit result. If you only need the lower 16 bits, you can remove 'adc a,0' and change 'adc a,a' to 'rlca' to preserve a.
-
It doesn't work ... I mean doesn't give the same result as when I previously did :
call AextendSignDE
call multHL_DE
And this time I recopied the exact same code. No modifications tried :P
EDIT : maybe it's because it's unsigned only ...
-
Well, multiplication is always signed, regardless. Division is the only one that you need to do a specific routine for the sign. What inputs/outputs are you expecting, though, when you use it? (just some numbers, so I can figure out what you are looking for)
-
But I can't say <_< there are too many variable factors involved. I calculate a 3*3 rotation matrix out of 2 angles, and then I apply it to a vertex. It's nearly random. And it really seems on-screen that the operation performed is unsigned.
-
I see, A basically works as a 16-bit integer where the upper 9 bits are all the same. I have this:
ld hl,0
or a
jp p,$+5
sbc hl,de
ld b,8
mulloop:
add hl,hl
rla
jr nc,$+5
add hl,de
adc a,0
djnz mulloop
ret
That treats A as a signed integer, HL as an unsigned integer. I hope that works!
-
It works :D and OH MY GOD I CAN'T SEE WHAT'S GOING ON ON-SCREEN IT DISPLAYS FASTER THAN THE LCD UPDATES O.O gonna test it
EDIT : 34 FPS !!!! *.* :w00t: :banghead: O.O :o :crazy:
-
WOW, and i guess it is still 6MHZ?
HOW IS THAT POSSIBLE O.O :crazy:
-
That is awesome! Is that 6MHz? Also, what other kinds of math routines do you have in there? ^_^
-
Yes, I keep repeating that I always work on projects at 6 MHz !
And for now I only have LUT-based sincos, your HLtimesA et your HLsdivDE.
-
Yes, I keep repeating that I always work on projects at 6 MHz !
Add it to your signature. :P
Anyway that's looking freakin' awesome. O.O
-
Yeah good idea ;D
And thanks :)
EDIT : I wrote a preview of what will surely be the "graphical pipeline" part of the engine :
Graphical pipeline organisation
-------------------------------
0°) Fill (alreadyProcessed) with -32768
1°) Retrieve element from stack
a) object type : 1 byte
b) rendering mode : 1 byte
c) offset(s) : 2 bytes
2°) Test object type
2.1°) OBJECT_DOT :
a) Retrieve vertex from vertices table
b) See if it has already been processed during the rendering process. If yes, retrieve coordinates and jump to f)
c) Rotate
d) Project
e) Save coordinates
f) Clip a square of 2*2 pixels at the calculated coordinates
2.2°) OBJECT_LINK :
a) Retrieve the first vertex from vertices table
b) See if it has already been processed during the rendering process. If yes, retrieve coordinates and jump to f)
c) Rotate
d) Project
e) Save coordinates
f) Repeat a)b)c)d)e) for the second vertex
g) Retrieve coordinates
h) Draw a clipped line between the two calculated coordinates
2.3°) OBJECT_TRI :
a) Retrieve the first vertex from vertices table
b) See if it has already been processed during the rendering process. If yes, retrieve coordinates and jump to f)
c) Rotate
d) Project
e) Save coordinates
f) Repeat a)b)c)d)e) for the two other vertices
g) Retrieve coordinates
h) Draw a clipped triangle with the three calculated coordinates
3°) If there is another object on the stack, jump to 1°)
What do you guys think of that ? Do you think of anything I could add ?
-
This looks really fast Matrefeytontias. Of course at such fast speed, if you're gonna make a game you might actually have to artificially reduce the framerate by only updating the LCD every two or three frame like the grayscale version of gbc4nspire, but I guess that with more complex stuff you might not really need to drop the speed yourself anyway. :P