Omnimaga
Calculator Community => TI Calculators => Calculator C => Topic started by: Matrefeytontias on June 26, 2014, 06:25:50 am
-
Hi guys,
I've been working on nKaruga since quite some time now, and although it's reasonably fast on monochrome TI-Nspires, it's accurately twice as slow on color calcs, although it's the same code running and that the monochrome screen is being configured in 16-bits mode by the game.
I have no idea why it's like this, I keep trying things and nothing changes. I even removed every drawing command except the screen update and the ship, and the speed stays the same. Since I'm really, really stuck, I thought of asking you guys.
The full source is here : https://github.com/matrefeytontias/nKaruga
It's also possible that it's n2DLib's fault, since it's used to interact with the screen : https://github.com/n2DLib/n2DLib
Feel free to fork it and submit PR, I'll be very happy to see things fixed because I don't know what to do.
Also, I have no color calc to test, only monochrome. Several people made tests for me.
-
Does nspire_emu show the same symptoms?
-
Nope, nspire_emu tells me the game is equally fast on both screen types.
-
I've been working on nKaruga since quite some time now, and although it's reasonably fast on monochrome TI-Nspires, it's accurately twice as slow on color calcs, although it's the same code running and that the monochrome screen is being configured in 16-bits mode by the game.
So you don't handle those different LCDs differently? Using the monochrome LCD in 16bpp mode like a color one looks awful.
-
I've run a few tests and I'm fairly sure it's n2DLibs sprite drawing function. I already did test IkarugaX for you, and the time-to-scroll-sprite-over-screen time is almost the same, namely 06:55 seconds. When I take away the fullscreen pic on the background, it's suddenly three times as fast: 01:91 seconds. It was mentioned on IRC that the first versions of n2DLib (nFastGraphX) supposedly were faster (jetpack impossible 2). This has been proven wrong by porting the source to the newest version of n2DLib, see attached files. Hayleia told me that pictures are drawn to screen by copying them pixel-by-pixel to the screen. I don't know exactly what the nspire can do and what not, but I think you'll have to thoroughly re-think the way you are drawing sprites.
I really hope you can fix the lib though, I love how easy to use it is.
EDIT: Of course, this does not explain why it is almost twice as fast on the b/w series nspire, but because both methods for drawing (both with and without buffer) are about the same speed (should I say slow? that'd be a lame joke), maybe it's the display driver? O.O But no, that's ridiculous, gpsp can run F-Zero at much higher speeds <_<
-
Now that I think of it, which version of Jetpack Impossible did you use to do your tests ? Because there was one ridiculously slow version (using double buffering) that was attached to a post by Matref in that topic, but the version I said could possibly use fast routines was the one you can find in the TI Planet archives that this post (http://www.omnimaga.org/ti-nspire-projects/%28ndless%29-jetpack-impossible/) links to.
-
Oh. I used the slow version. I wish I never found the fast version D: j/k.
So disregard what I said above about "proven wrong", the version Hayleia linked here is quite a bit faster. Holy Cow.
-
│16:22:32 &aeTIos | void set_display_buffer(void* buffer) │
│16:22:32 &aeTIos | { *(volatile void**)0xC0000010 = buffer; │
│16:22:32 &aeTIos | } │
│16:22:32 &aeTIos | void update_at_vblank() │
│16:22:32 &aeTIos | { while((*(volatile unsigned*)0xC0000020 & 4) == 0) { } *(volatile unsigned*)0xC0000028 = 4; set_display_buffer(nspire_displayed_screen); │
│16:22:41 &aeTIos | }
He dug that in the gpSP source, I think it might be useful. Basically buffer swapping.
-
Well matref and pierrot already tried to that but for some reason it was slow too -.-
-
So you don't handle those different LCDs differently? Using the monochrome LCD in 16bpp mode like a color one looks awful.
<_< yeah well you could avoid these useless comments. I do that because it's faster since you're not required to test if the screens are different. And the problem right now is that it's 3 times slower on the color calcs than on the monochrome calcs, so using the monochrome LCD in 16 bpp mode is apparently a good idea.
I've run a few tests and I'm fairly sure it's n2DLibs sprite drawing function.
I already sent you a binary where no sprites were drawn, and you told me the speed was the same.
I already did test IkarugaX for you, and the time-to-scroll-sprite-over-screen time is almost the same, namely 06:55 seconds. When I take away the fullscreen pic on the background, it's suddenly three times as fast: 01:91 seconds.
*nKaruga
Well that's kinda strange because that's supposed to be super-fast, as it's barely slower than just clearing the screen - except if accessing arrays is really that slow.
Well matref and pierrot already tried to that but for some reason it was slow too -.-
Mh, I never tested bit 2 of the LCD driver. I'll try that.
-
Could you do something entirely without n2dlib, just directly memcpy one buffer into the other 500 times and compare that? Both LCDs in 16bpp mode of course.
-
So you mean copying a buffer 500 times to the screen, and comparing that with copying 500 times a buffer in another buffer ?
-
No, run the program on CX and not-CX.
-
Also, I have no color calc to test, only monochrome.
-
Also you have plenty of color calc owner to test it for you. Just give the thingy to us. :P
-
Alright so that's what to test. It's simply memcpy-ing a buffer to the screen 1000 times.
-
This executed in about 6.5 seconds for me.
-
2.5 seconds on the GS CAS calc.
-
Holy cow that's quite the difference. O.O So it's got to do with writing to the LCD being slow I guess.
-
But I never saw anything like that happen before ... There must be something Nspire-related I forgot to setup ... Could TCT_Local_Interrupts(int) be the cause of this ?
-
Protip : always disable OS crap before doing anything. :P
-
I had to disable all IRQs and FIQs to get rid of the clock on screen, but TCT_Local_Control_Interrupts(0) should do the same.
But normally ndless itself does that already.
And matref, are you copying directly to SCREEN_BASE_ADDRESS? On GS calcs it's in SRAM and on CX in sdram because of it's size.
-
I'm indeed copying to *(void**)0xC0000010. So would allocating a buffer in RAM and using it even on color calcs fix the speed issue ? (hopeless try)
-
You shouldn't be able to write to SRAM if you're in 16bpp mode and filling the screen.
SRAM is 0xA4000000 - 0xA4020000 = 128 KiB and 320*240*16b= 150KiB
-
On monochrome calcs, I malloc two 320*240*2 buffers instead of one, one being an actual buffer and one acting as the screen. On color calcs I only allocate one buffer and use the default screen buffer.
-
I'm doing the same thing in nGL, but actually a third buffer, so the screen buffer is still usable after inversion.
So is the GS's SDRAM indeed that much slower?
-
Well, I don't know but I'll try that, and if that's it I'll call it another thing that we should know.
/me would be so happy if that was the actual problem
EDIT : wait a minute you misunderstood. The program is 2.5 times faster on GS calcs than on color calcs, not the other way around.
-
I decided to un-overclock my calc and re-run copyscreen1000times.12:53 seconds.
-
But I never saw anything like that happen before ... There must be something Nspire-related I forgot to setup ... Could TCT_Local_Interrupts(int) be the cause of this ?
Could it be because of the same logic as with how slow it is to draw TI-84 Plus C Silver Edition pixels compared to TI-84 Plus Silver Edition ones due to the pixel data being much larger? I know the Nspire calcs are much faster and that the color ones can be overclocked higher, but still.
-
As I said, I set up both GS and color screen so they use the same number of bits for a pixel, for instance 16.
-
Yeah but wouldn't it still be faster to paste the data on the grayscale screen since it only supports 4 bits, such as converting the data beforehand or stripping the extra data? Not sure how the grayscale Nspire screens work, though, so maybe I'm just misunderstanding something.
-
I have a theory:
The frame buffer contents mustn't be cached, as that may cause artifacts. So caching is disabled by the OS.
On GS calcs the screen buffer is in SRAM, so SRAM caching is disabled. But it's active on SDRAM writes and reads.
On CX calcs the screen buffer is somewhere in SDRAM, and the buffer set by the OS has caching disabled. So writes and reads will be slower.
What happens if you allocate two buffers for both calcs so the default one is never used?
-
DJ : what you don't understand is that the program is already 6 times faster (according to latest aeTIos's tests) on GS calcs than on color calcs at merely the same frequency (120 MHz for GS vs 132 MHz for color).
Vogtinator : I was wondering that. Will test when I can (probably in some minutes).
-
DJ : what you don't understand is that the program is already 6 times faster (according to latest aeTIos's tests) on GS calcs than on color calcs at merely the same frequency (120 MHz for GS vs 132 MHz for color).
In case you didn't see :
I decided to un-overclock my calc and re-run copyscreen1000times.12:53 seconds.
(Or well, I'm not sure where you see this '6x' faster :P)
GL&HF anyway ^^
-
12 ≈ 6 * 2 if I still know how to count. aeTIos's tests are made on a color calc and give 12 seconds, pimath's tests are done on a GS calc and gives 2 seconds.
-
I don't know if pimaths calc is OCed though. Even then, the max speed doesn't go much over 150MHz. So yeah something really strainge is going on.
-
Well, it's actually closer so 5 times, but OK, as long as tests are consistent it's good.
-
I have not OC'd. My calc came with a non-downgrade-able OS, so I updated to 3.6, then ndless'd
-
So I came up with this, which actually seems to give 167 FPS (I guess this is the maximum the color Nspire's LCD can give) at any frequency on a color calc ; giving a mere 6 seconds at 132 MHz as well as 262 MHz, vs 12.58 seconds at 132 MHz and a bit more than 6 seconds at 262 MHz previously (all are aeTIos's tests).
Only problem, it crashes on exit (that is, after 6 seconds), and I have no idea why.
unsigned short *BUFF_BASE_ADDRESS;
void *SCREEN_BACKUP;
void initBuffering()
{
void *temp;
temp = malloc(BUFF_BYTES_SIZE);
if(!temp)
exit(0);
BUFF_BASE_ADDRESS = (unsigned short*)malloc(BUFF_BYTES_SIZE);
if(!BUFF_BASE_ADDRESS)
{
free(temp);
exit(0);
}
SCREEN_BACKUP = *(void**)0xC0000010;
// Handle monochrome screens-specific shit
if(is_classic)
*(int32_t*)0xC000001C = (*(int32_t*)0xC000001C & ~0x0e) | 0x08;
*(void**)0xC0000010 = temp;
}
void updateScreen()
{
// Screen-access delays make this the fastest method apparently
memcpy(*(void**)0xC0000010, BUFF_BASE_ADDRESS, BUFF_BYTES_SIZE);
}
void deinitBuffering()
{
void *temp = *(void**)0xC0000010;
// Handle monochrome screens-specific shit again
if(is_classic)
*(int32_t*)0xC000001C = (*(int32_t*)0xC000001C & ~0x0e) | 0x04;
*(void**)0xC0000010 = SCREEN_BACKUP;
free(temp);
free(BUFF_BASE_ADDRESS);
}
int main(void)
{
int i;
initBuffering();
clearBufferB();
for(i = 0; i < 1000; i++)
{
memcpy(*(void**)0xC0000010, BUFF_BASE_ADDRESS, BUFF_BYTES_SIZE);
}
deinitBuffering();
return 0;
}
Binaries attached, but it does crash your calc.
EDIT : did some tests with the current version of nKaruga (source accessible from github). The game runs at 95 FPS on my 120 MHz grayscale TI-Nspire CAS with Ndless 3.1 r914, calculated by seeing that you take 3.4 seconds to cross 320 pixels by moving one pixel by one pixel.
-
I did one last experiment today. I was getting the same results as aeTIos. I tried with CPU at 246MHz and AHB at 35MMHz, the thing crashes after 12 seconds, which makes it even clearer that it's memory that causes the bottleneck.
aeTIos tried different scaling modes in gpSP and unscaled is much faster, not only because of the scaling but also for the same reason.
-
I have not yet tested downclocking my AHB in gpSP-nspire, I'm doing that right now.
-
I just tried it on my CX CAS 3.1, the program doesn't crash for me and it takes ~6 seconds = ~167 fps.
I don't know how to interpret it, I don't understand
giving a mere 6 seconds at 132 MHz as well as 262 MHz, vs 12.58 seconds at 132 MHz and a bit more than 6 seconds at 262 MHz previously
???
Edit: Could you post a program that writes 1000 times to something else than the screen? I don't think it could make a difference, but just in case..
-
I meant that with this version, the program executes in 6 seconds whether the calc has been clocked to 132 or 262 MHz, whereas with the previous version, the program ran in 12.58 seconds when the calc was clocked to 132 MHz and in 6 seconds when it was clocked at 262 MHz. So the newest version is better.
I'll do that in a minute.
-
I meant that with this version, the program executes in 6 seconds whether the calc has been clocked to 132 or 262 MHz, whereas with the previous version, the program ran in 12.58 seconds when the calc was clocked to 132 MHz and in 6 seconds when it was clocked at 262 MHz. So the newest version is better.
I'll do that in a minute.
Note that the 132 MHz program also ran at AHB = 33...
-
Well yeah, at this point CPU speed isn't important because it's huge compared to AHB speed (only when it comes to writing memory and doing only that of course).
-
What I mean to point out is that while you seem to think the newer one is better, I accidentally ran it at the wrong AHB and thus memory speed. This means the old version is no worse than the newest version. Tests that I ran at AHB = 66 MHz with the old version support this.
-
Hum yeah, well we'll see how it goes with writing to "normal" memory.
-
Bump,
so this writes 1000 times a 320*240*2 bytes buffer in RAM to another similar buffer in RAM. Start timing when the screen goes black, one second after you ran the program.
http://www.mirari.fr/xKZq